MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1

Similar documents
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Chapter 2: Resampling Maarten Jansen

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

The Nonparametric Bootstrap

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

11. Bootstrap Methods

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Bootstrap, Jackknife and other resampling methods

The exact bootstrap method shown on the example of the mean and variance estimation

Resampling and the Bootstrap

Bootstrap Method for Dependent Data Structure and Measure of Statistical Precision

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Characterizing Forecast Uncertainty Prediction Intervals. The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ

A better way to bootstrap pairs

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

CAS MA575 Linear Models

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.

Advanced Statistics II: Non Parametric Tests

STAT440/840: Statistical Computing

STAT 830 Non-parametric Inference Basics

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003.

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Applied Econometrics (QEM)

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

Simple Linear Regression

The Surprising Conditional Adventures of the Bootstrap

Appendix D INTRODUCTION TO BOOTSTRAP ESTIMATION D.1 INTRODUCTION

Linear Regression. Junhui Qian. October 27, 2014

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

The Bootstrap Suppose we draw aniid sample Y 1 ;:::;Y B from a distribution G. Bythe law of large numbers, Y n = 1 B BX j=1 Y j P! Z ydg(y) =E

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Asymptotic Statistics-III. Changliang Zou

IEOR E4703: Monte-Carlo Simulation

Monte Carlo Methods for Stochastic Programming

Lecture 1: August 28

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values

Generalized Cp (GCp) in a Model Lean Framework

Block Bootstrap Prediction Intervals for Vector Autoregression

Large Sample Properties & Simulation

Bootstrapping Australian inbound tourism

Better Bootstrap Confidence Intervals

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

Variance and Bias for General Loss Functions

Analysis of Fast Input Selection: Application in Time Series Prediction

7 Influence Functions

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

STAT Section 2.1: Basic Inference. Basic Definitions

Lecture 1: Random number generation, permutation test, and the bootstrap. August 25, 2016

Linear Model Under General Variance

Bootstrap Resampling

Bootstrap tests. Patrick Breheny. October 11. Bootstrap vs. permutation tests Testing for equality of location

Statistical Inference

Bootstrap inference for the finite population total under complex sampling designs

Particle Filters. Outline

Lecture Notes 15 Prediction Chapters 13, 22, 20.4.

Sampling Distributions

Statistics: Learning models from data

Monte Carlo Simulations and the PcNaive Software

Introduction to Estimation Methods for Time Series models. Lecture 1

A General Overview of Parametric Estimation and Inference Techniques.

Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Resampling and the Bootstrap

Data Integration for Big Data Analysis for finite population inference

Overview of statistical methods used in analyses with your group between 2000 and 2013

arxiv: v2 [math.st] 20 Jun 2014

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

First Year Examination Department of Statistics, University of Florida

A New Bootstrap Based Algorithm for Hotelling s T2 Multivariate Control Chart

A note on multiple imputation for general purpose estimation

Inference For High Dimensional M-estimates: Fixed Design Results

Inference via Kernel Smoothing of Bootstrap P Values

Chapter 7: Simple linear regression

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

First Year Examination Department of Statistics, University of Florida

A Bias Correction for the Minimum Error Rate in Cross-validation

Bootstrapping, Randomization, 2B-PLS

4. Distributions of Functions of Random Variables

Conditional Least Squares and Copulae in Claims Reserving for a Single Line of Business

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory

Constructing Prediction Intervals for Random Forests

Applying the proportional hazard premium calculation principle

Imputation for Missing Data under PPSWR Sampling

Inference in Normal Regression Model. Dr. Frank Wood

1 Motivation for Instrumental Variable (IV) Regression

Maximum Non-extensive Entropy Block Bootstrap

ST 371 (IX): Theories of Sampling Distributions

A Note on Bayesian Inference After Multiple Imputation

Monte Carlo Simulations and PcNaive

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Regression and Statistical Inference

Transcription:

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 1 The General Bootstrap This is a computer-intensive resampling algorithm for estimating the empirical distribution function (EDF) of a random variable X from a set of observations x = {x 1,..., x n }. This technique therefore permits to obtain empirical estimates, without making any assumptions about the distribution of X, when analytical ones are not available. The bootstrap was first introduced in 1979 as an algorithm to obtain sufficient estimates of standard errors (Efron and Tibshirani, 1993). According to the legend, Baron Munchausen saved himself from drowning in quicksand by pulling himself up using only his bootstraps. The statistical bootstrap, which uses re-sampling from a given set of data to mimic the variability that produced the data in the first place, has a rather more dependable theoretical basis, and can be a highly effective procedure for estimation of error quantities in statistical problems. 1.1 Motivations for Using the Bootstrap When performing regression analysis, the distributional assumptions on the behavior of the error terms need not be satisfied. In such cases, it may be difficult to identify the distribution of the regression coefficients, and to compute a test statistic on that basis. Thus far, we have invoked the central limit theorem (CLT) to justify our use of normal assumptions. However, for small sample sizes, such theoretical justifications will not apply. Hence, we need to use a non-parametric toolkit, which does not make any assumption about the distribution of the data. 1.2 The Plug-in Principle Consider a general scenario in which we have drawn realizations from an unknown population distribution F, such that y 1,..., y n F. The sample mean of these realizations is then computed ȳ n = 1 n y i. What is the standard error of the statistic ȳ n? Invoking the central limit theorem, we know that for moderately large n, we would obtain, ȳ n N(µ F, σ 2 F /n), Department of Mathematics and Statistics, Boston University 1

where µ F and σf 2 are the mean and variance of the unknown F -distributed random variable. Using this result, we can define the standard error of ȳ n, as follows, se F (Ȳn) = ( Var[Ȳn] ) ( ) 1/2 1/2 1 = n 2 Var[Y i ] = σ F. n where we have emphasized the dependence of this quantity on F, through the use of a subscript. Here, the population standard deviation can be estimated using the sample estimate, σ F 2 = 1 (y i ȳ n ) 2. n 1 Alternatively, we can use the plug-in principle, which proposes to replace the unknown distribution F by a sample estimate F, such that we may use se F (Ȳn), where F is obtained by bootstrapping the sampled data. 1.3 Sampling with Replacement We first introduce the notion of a bootstrap sample, denoted y b. Each such bootstrap sample is drawn from the empirical distribution function (EDF), constructed using the original sample y 1,..., y n, such that F n (t; y) := 1 I {y i t}, n where I{} is the indicator function, defined as follows, { 1 if y i t, I {y i t} := 0 otherwise. The EDF, F n (y), is therefore obtained by assigning an equal probability 1/n and a label i 1, i 2,..., i n to each element in y. We can then sample with replacement from the EDF by drawing n values from the distribution of the indexes. That is, drawing samples from the EDF, y j Fn, j = 1,..., n; is equivalent to drawing indexes from a uniform distribution on the indexes between 1 and n, i j Unif(1,..., n), j = 1,..., n. The resulting bootstrap sample consists of the following sequence of elements, {y 1 = y i1, y 2 = y i2,..., y n = y in }, forming an n-dimensional bootstrap sample. This procedure is repeated B times in order to produce b = 1,..., B samples of the form, y b := [y 1b,..., y nb] T. Such bootstrap samples are best conceived as a resampling or a randomization of the original data. Sampling with replacement ensures that the bootstrap samples are indeed probabilistically independent, E[y j (y k) T ] = E[y j ]E[y k] T, j, k = 1,..., B; where we are here treating each y j as a random vector. It is common practice to draw about B = 1000 bootstrap samples. However, Efron and Tibshirani (1993) originally advocated that anything between 25 and 200 samples was sufficient for most inferential purposes. Department of Mathematics and Statistics, Boston University 2

1.4 Bootstrapped Standard Error Continuing the previous example, we may be interested in estimating the standard error of the statistic ȳ n using the bootstrap. Such an estimate can be obtained by computing the statistic of interest here, the sample mean of the data y for each bootstrap sample, θ b := 1 n yib. (1) Once this is obtained, it suffices to compute the standard error of this distribution of bootstrapped sample means, ( ) 1/2 1 B se F (ȳ) := (θb B 1 θ ) 2, where the bootstrap mean of the bootstrapped sample means is given by θ := 1 B b=1 B θ b. The quantity, se F (ȳ), is then referred to as the bootstrapped standard error. Of course, this procedure could be repeated for any statistic θ := s(y), since we are only using the fact that the quantity of interest is a function of the data. In such cases, the bootstrap estimates in equation (1) would be computed using the bootstrap samples, such that θ b := s(y ). The central advantage of using the bootstrap is that we can control the accuracy of the bootstrap estimate through our choice of B. A larger value of B will yield a better estimate of the ideal bootstrap estimate, which would be based on all resamples of the data vector y. Because the number of possible such resamples grows factorially with n, we have adopted a Monte Carlo method for estimating this quantity. Since the bootstrap does not make any assumption about the distribution of the data, it should be regarded as a non-parametric procedure. 2 Bootstrap for Regression A key assumption made when conducting simple or multiple regression is that the error terms are normally distributed. In many practical situations, such an assumption may be untenable, or difficult to verify. When this occurs, one can resort to a bootstrap estimation of the standard errors in the model of interest. There exists two different methods for applying the bootstrap to regression. One can either sample the pairs of predictors and observed values, or directly re-sample the residuals, once we have fitted the model. 2.1 Bootstrapping Cases Firstly, a naive approach to bootstrap estimation in regression analysis is to re-sample cases. With this approach, we proceed as follows, b=1 b b := {(y i1b, x i1b),..., (y inb, x inb)}, for every b = 1,..., B. For each vector of bootstrap replicates, we compute βb, which is obtained by minimizing the RSS based on each bootstrap sample, b b, such that β b := argmin β R p (yib x T ibβ) 2. Department of Mathematics and Statistics, Boston University 3

The bootstrap estimate of the standard error of an estimator in our model, say β l for instance, with l = 1,..., p, can then be estimated as where the bootstrap mean is β l := β lb /B. 2.2 Bootstrapping Residuals ( ) 1/2 1 B se( β l ) = (βlb B 1 β l ) 2, b=1 Alternatively, one can sample with replacement from the residuals of a fitted model based on the OLS estimator β. This produces the following bootstrap sample, based on the fitted values ŷ i s, b b := {(x T 1 β + ê i1b, x 1 ),..., (x T n β + ê inb, x n )}, where for every j = 1,..., n, we could also have defined y i j := x T j β + ê ij. Note that the vector of predictor x T j does not have the same index as the residual ê i j. The latter quantity was sampled with replacement from the EDF of residuals under the OLS estimator, β, {ê 1 = y 1 ŷ 1,..., ê n = y n ŷ n }. That is, in this procedure, we are first fitting our standard model to derive the OLS estimate, β. This, in turn, allows us to resample the residuals, given that particular estimate. This second strategy is less statistically robust than the boostrapping cases, as it assumes that homoscedasticity holds. That is, since we are breaking the dependence of the residuals on the vectors of predictors, x i, we are implicitly assuming that the variance of the residuals does not depend on the values of x i. When this assumption is unlikely to hold, it is preferable to boostrap cases, which is more robust than bootstrapping the residuals. 3 Theory of the Bootstrap 3.1 Consistency of the EDF For any set of random variables {Y 1,..., Y n }, from some unknown cumulative distribution function (CDF) denoted F, the empirical distribution function (EDF), F n, is defined for any t R, F n (t; Y) := 1 n I {Y i t}, where we have emphasized the fact that F n is a random quantity, which depends on the full n-dimensional random vector, y. The EDF has two desirable properties. It is both (i) unbiased and (ii) consistent, with respect to F. To show that F n is unbiased with respect to the target CDF, F, it suffices to take the Department of Mathematics and Statistics, Boston University 4

expectation for some t R and any n N, E[ F n (t; Y)] = Fn (t; Y)dF (Y 1 )... df (Y n ) R n = 1 I {Y i t} df (Y i ) n = 1 n R P[Y i t] = P[Y t] = F (t), where the penultimate step follows from the fact that Y i F, for every i = 1,..., n. Secondly, Fn can also be shown to be consistent, in the sense that as n, the estimate F n (t; Y n ) converges to F (t), for every t R. That is, for every y R, we have the following pointwise convergence, [ P lim n ] F n (t; Y n ) = F (t) = 1. This is simply the strong law of large numbers, stating for any random variable, X, with finite second moments, we have X n a.s. X. In this case, the sequences of random numbers are composed of the F n (t; y). 3.2 Unbiasedness vs. Consistency Observe that the unbiasedness and consistency of an estimator are two different criteria. i. Unbiasedness refers to the average behavior of an estimator. What is its expectation? ii. Consistency captures the long-range behavior of an estimator, and is generally based on one of the laws of large numbers. Observe that these two criteria are independent: An estimator can be unbiased and inconsistent, such as for any sequence of sample means with expectation θ, such that X n θ, and some random variable, Y, centered at 0; we have E[ X n + Y ] = θ, and lim X n + Y θ, a.s. n Inversely, we may also have a random variable, which is consistent, yet biased, such as for instance, Xn +1/n, which is biased for every n, but nonetheless consistent. That is, [ E X n + 1 ] = θ + 1, and lim X n + 1 n n n n = θ, a.s. 3.3 Rates of Convergence Taken together, these results show that the good performance of the bootstrap relies on the rate of convergence of the EDF, Fn to the population distribution, F. Therefore, we have replaced a distributional assumption on the random variables of interest, by an appeal to the strong law of large number. Since the strong law of large number converges at a rate O(1/n), it follows that we are gaining in accuracy over a reliance on the central limit theorem, whose convergence rate is only of order O(1/ n). Roughly, for any sequence of random variables X 1,..., X n, with mean E[X i ] = µ, the sum S n converges as follows, S n n a.s. µ. Department of Mathematics and Statistics, Boston University 5

The strong law captures the first-order approximation of the sample mean. If, in addition, we know that Var[X i ] = σ 2, for every i = 1,..., n, we then have S n nµ n d N(0, σ 2 ), which represents a second-order approximation of the mean µ. When using the bootstrap, we are exploiting the fact that the strong law of large number has a better rate of convergence than the central limit theorem. References Efron, B. and Tibshirani, R. (1993). An introduction to the bootstrap. Chapman & Hall, London. Department of Mathematics and Statistics, Boston University 6