Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Similar documents
Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

11. Bootstrap Methods

A better way to bootstrap pairs

Chapter 1 Likelihood-Based Inference and Finite-Sample Corrections: A Brief Overview

Heteroskedasticity-Robust Inference in Finite Samples

Bootstrapping Heteroskedasticity Consistent Covariance Matrix Estimator

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

CER Prediction Uncertainty

Bootstrap Testing in Econometrics

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Bayesian Interpretations of Heteroskedastic Consistent Covariance Estimators Using the Informed Bayesian Bootstrap

The Nonparametric Bootstrap

POLI 8501 Introduction to Maximum Likelihood Estimation

Bayesian Interpretations of Heteroskedastic Consistent Covariance. Estimators Using the Informed Bayesian Bootstrap

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

Stat 5101 Lecture Notes

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics

Nonparametric Methods II

New heteroskedasticity-robust standard errors for the linear regression model

Resampling and the Bootstrap

QED. Queen s Economics Department Working Paper No The Size Distortion of Bootstrap Tests

UNIVERSITÄT POTSDAM Institut für Mathematik

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Bootstrap-Based Improvements for Inference with Clustered Errors

Inference via Kernel Smoothing of Bootstrap P Values

Better Bootstrap Confidence Intervals

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1

Submitted to the Brazilian Journal of Probability and Statistics. Bootstrap-based testing inference in beta regressions

Finite Population Correction Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

LESLIE GODFREY LIST OF PUBLICATIONS

Bootstrapping the Confidence Intervals of R 2 MAD for Samples from Contaminated Standard Logistic Distribution

The Bootstrap Suppose we draw aniid sample Y 1 ;:::;Y B from a distribution G. Bythe law of large numbers, Y n = 1 B BX j=1 Y j P! Z ydg(y) =E

Econometrics I, Estimation

A Note on Bootstraps and Robustness. Tony Lancaster, Brown University, December 2003.

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

Sequential Implementation of Monte Carlo Tests with Uniformly Bounded Resampling Risk

arxiv: v1 [stat.co] 26 May 2009

STAT440/840: Statistical Computing

A note on multiple imputation for general purpose estimation

Bias-corrected Estimators of Scalar Skew Normal

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Analytical Bootstrap Methods for Censored Data

Bootstrap, Jackknife and other resampling methods

Inference for P(Y<X) in Exponentiated Gumbel Distribution

Online Appendix to Correcting Estimation Bias in Dynamic Term Structure Models

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

inferences on stress-strength reliability from lindley distributions

Econ 273B Advanced Econometrics Spring

Bootstrap Tests: How Many Bootstraps?

Slack and Net Technical Efficiency Measurement: A Bootstrap Approach

New Bayesian methods for model comparison

Quantile regression and heteroskedasticity

The Exact Distribution of the t-ratio with Robust and Clustered Standard Errors

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

6. Fractional Imputation in Survey Sampling

Comparing Two Dependent Groups: Dealing with Missing Values

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

The exact bootstrap method shown on the example of the mean and variance estimation

Semi-parametric estimation of non-stationary Pickands functions

Bootstrap Confidence Intervals

Reliable Inference in Conditions of Extreme Events. Adriana Cornea

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

A Simple, Graphical Procedure for Comparing Multiple Treatment Effects

Location-adjusted Wald statistics

Bootstrap inference for the finite population total under complex sampling designs

On the Accuracy of Bootstrap Confidence Intervals for Efficiency Levels in Stochastic Frontier Models with Panel Data

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Bahadur representations for bootstrap quantiles 1

ON THE NUMBER OF BOOTSTRAP REPETITIONS FOR BC a CONFIDENCE INTERVALS. DONALD W. K. ANDREWS and MOSHE BUCHINSKY COWLES FOUNDATION PAPER NO.

Defect Detection using Nonparametric Regression

Theory and Methods of Statistical Inference. PART I Frequentist likelihood methods

Statistics: Learning models from data

Day 3B Nonparametrics and Bootstrap

Theory and Methods of Statistical Inference. PART I Frequentist theory and methods

Estimation with Inequality Constraints on Parameters and Truncation of the Sampling Distribution

Recitation 5. Inference and Power Calculations. Yiqing Xu. March 7, 2014 MIT

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

IEOR E4703: Monte-Carlo Simulation

Small area prediction based on unit level models when the covariate mean is measured with error

Optimal Jackknife for Unit Root Models

arxiv: v1 [stat.co] 14 Feb 2017

Testing Statistical Hypotheses

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Bootstrap Methods in Econometrics

Statistical Estimation

Block Bootstrap Prediction Intervals for Vector Autoregression

Theory and Methods of Statistical Inference

Bootstrapping heteroskedastic regression models: wild bootstrap vs. pairs bootstrap

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Analysis of Type-II Progressively Hybrid Censored Data

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models

Discussion Paper Series

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bootstrap Approximation of Gibbs Measure for Finite-Range Potential in Image Analysis

Transcription:

1 / 171 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2013

2 / 171 Unpaid advertisement Graduate program in Statistics: Masters and PhD Graduate Program in Statistics at Federal University of Pernambuco: http://www.ufpe.br/ppge CAPES: 5 Research areas: asymptotic theory, econometrics, game theory, multivariate analysis, probability theory, regression analysis, signal processing, time series.

3 / 171 Figure 1 : Boa Viagem beach.

4 / 171 Figure 2 : Boa Viagem beach (at night).

5 / 171 Figure 3 : Recife ( the Brazilian Venice ).

6 / 171 Figure 4 : Porto de Galinhas beach (near Recife).

7 / 171 In a world in which the price of calculation continues to decrease rapidly, but the price of theorem proving continues to hold steady or increase, elementary economics indicates that we ought to spend a larger fraction of our time on calculation. John W. Tukey, 1986

8 / 171 Figure 5 : This is the man.

9 / 171 Some references General 1 Chernick, M.R. (1999). Bootstrap Methods: A Practitioner s Guide. New York: Wiley. 2 Davison, A.C. & Hinkley, D.V. (1997). Bootstrap Methods and their Application. Cambridge: Cambridge University Press. 3 Efon, B. (1982) The Jackknife, the Bootstrap and Other Resampling Plans. Philadelphia: Society for Industrial and Applied Mathematics. 4 Efron, B. & Tibshirani, R.J. (1993). An Introduction to the Bootstrap. New York: Chapman & Hall.

10 / 171 Some references General 5 Godfrey, L. (2009). Bootstrap Tests for Regression Models. New York: Palgrave MacMillan. 6 Hall, P. (1992). The Bootstrap and Edgeworth Expansion. New York: Springer Verlag. 7 Shao, J. & Tu, D. (1995). The Jackknife and Bootstrap. New York: Springer.

11 / 171 Some references Specific 1 Booth, J.G. & Hall, P. (1994). Monte Carlo approximation and the iterated bootstrap. Biometrika, 81, 331-340. 2 Cribari-Neto, F. & Zarkos, S.G. (1999). Bootstrap methods for heteroskedastic regression models: evidence on estimation and testing. Econometric Reviews, 18, 465 476. 3 Cribari-Neto, F. & Zarkos, S.G. (2001). Heteroskedasticity-consistent covariance matrix estimation: White s estimator and the bootstrap. Journal of Statistical Computation and Simulation, 68, 391 411.

12 / 171 Some references Specific (cont.) 4 Cribari-Neto, F. (2004). Asymptotic inference under heteroskedasticity of unknown form. Computational Statistics and Data Analysis, 45, 215 233. 5 Cribari-Neto, F.; Frery, A.C. & Silva, M.F. (2002). Improved estimation of clutter properties in speckled imagery. Computational Statistics and Data Analysis, 40, 801 824. 6 Davidson, R. & MacKinnon, J.G. (2000). Bootstrap tests: how many bootstraps? Econometric Reviews, 19, 55 68.

13 / 171 Some references Specific (cont.) 7 Ferrari, S.L.P. & Cribari-Neto, F. (1997). On bootstrap and analytical bias corrections. Economics Letters, 58, 7 15. 8 Ferrari, S.L.P. & Cribari-Neto, F. (1999). On the robustness of analytical and bootstrap corrections to score tests in regression models. Journal of Statistical Computation and Simulation, 64, 177 191.

14 / 171 Some references Specific (cont.) 9 Lemonte, A.J.; Simas, A.B.; Cribari-Neto, F. (2008). Bootstrap-based improved estimators for the two-parameter Birnbaum-Saunders distribution. Journal of Statistical Computation and Simulation, 78, 37 49. 10 MacKinnon, J.G. & Smith, Jr., A.A. (1998). Approximate bias correction in econometrics. Journal of Econometrics, 85, 205 230. 11 Wu, C.F.J. (1996). Jackknife, bootstrap and other resampling methods in regression analysis (with discussion). Annals of Statistics, 14, 1261 1295.

15 / 171 A fundamental equation Figure 6 : C.R. Rao C.R. Rao: uncertain knowledge + knowledge about the uncertainty = useful knowledge

15 / 171 A fundamental equation Figure 6 : C.R. Rao C.R. Rao: uncertain knowledge + knowledge about the uncertainty = useful knowledge

15 / 171 A fundamental equation Figure 6 : C.R. Rao C.R. Rao: uncertain knowledge + knowledge about the uncertainty = useful knowledge

15 / 171 A fundamental equation Figure 6 : C.R. Rao C.R. Rao: uncertain knowledge + knowledge about the uncertainty = useful knowledge

16 / 171 As Anthony Davison and David Hinkley remind us... The explicity recognition of uncertainty is central to statistical sciences. Notions such as prior information, probability models, likelihood, standard errors and confidence limits are all intended to formalize uncertainty and thereby make allowance for it. Davison & Hinkley

16 / 171 As Anthony Davison and David Hinkley remind us... The explicity recognition of uncertainty is central to statistical sciences. Notions such as prior information, probability models, likelihood, standard errors and confidence limits are all intended to formalize uncertainty and thereby make allowance for it. Davison & Hinkley

17 / 171 The big picture (the grand scheme of things) Sampling POPULATION DATA model = f(parameters) The great scheme of things.

18 / 171 What is the bootstrap? The bootstrap is a computer-based method for assessing the accuracy of statistical estimates and tests. It was first proposed by Bradley Efron in a 1979 Annals of Statistics paper. Main idea: Treat the data as if they were the (true, unknown) population, and draw samples (with replacement) from the data as if you were sampling from the population. Repeat the procedure a large number of times (say, B), each time computing the quantity of interest. Then, use the B values of the quantity of interest to estimate its unknown distribution.

18 / 171 What is the bootstrap? The bootstrap is a computer-based method for assessing the accuracy of statistical estimates and tests. It was first proposed by Bradley Efron in a 1979 Annals of Statistics paper. Main idea: Treat the data as if they were the (true, unknown) population, and draw samples (with replacement) from the data as if you were sampling from the population. Repeat the procedure a large number of times (say, B), each time computing the quantity of interest. Then, use the B values of the quantity of interest to estimate its unknown distribution.

18 / 171 What is the bootstrap? The bootstrap is a computer-based method for assessing the accuracy of statistical estimates and tests. It was first proposed by Bradley Efron in a 1979 Annals of Statistics paper. Main idea: Treat the data as if they were the (true, unknown) population, and draw samples (with replacement) from the data as if you were sampling from the population. Repeat the procedure a large number of times (say, B), each time computing the quantity of interest. Then, use the B values of the quantity of interest to estimate its unknown distribution.

19 / 171 In a nutshell... population sample }{{} real world bootstrap samples }{{} virtual world

20 / 171 Does it work? Question: Does it work well? Answer: Yes (most of the time). In the simplest nonparametric problems we do literally sample from the data, and a common initial reaction is that this is a fraud. In fact it is not. Davison and Hinkley, 1997

20 / 171 Does it work? Question: Does it work well? Answer: Yes (most of the time). In the simplest nonparametric problems we do literally sample from the data, and a common initial reaction is that this is a fraud. In fact it is not. Davison and Hinkley, 1997

21 / 171 Asymptotic refinement Question: When does the bootstrap provide an asymptotic refinement? The quantity being bootstrapped must be asymptotically pivotal. That is: It must have a limiting distribution free of unknown parameters.

21 / 171 Asymptotic refinement Question: When does the bootstrap provide an asymptotic refinement? The quantity being bootstrapped must be asymptotically pivotal. That is: It must have a limiting distribution free of unknown parameters.

21 / 171 Asymptotic refinement Question: When does the bootstrap provide an asymptotic refinement? The quantity being bootstrapped must be asymptotically pivotal. That is: It must have a limiting distribution free of unknown parameters.

22 / 171 Point estimation in a nutshell Suppose that model that represents the population is indexed by the parameter θ = (θ 1,..., θ p ) Θ, where Θ is the parameter space. Estimator: A statistic used to estimate θ. The estimator, say θ, is typically obtained from the minimization of some undesirable quantity (e.g., sum of squared errors) or from the maximization of some desirable quantity.

22 / 171 Point estimation in a nutshell Suppose that model that represents the population is indexed by the parameter θ = (θ 1,..., θ p ) Θ, where Θ is the parameter space. Estimator: A statistic used to estimate θ. The estimator, say θ, is typically obtained from the minimization of some undesirable quantity (e.g., sum of squared errors) or from the maximization of some desirable quantity.

22 / 171 Point estimation in a nutshell Suppose that model that represents the population is indexed by the parameter θ = (θ 1,..., θ p ) Θ, where Θ is the parameter space. Estimator: A statistic used to estimate θ. The estimator, say θ, is typically obtained from the minimization of some undesirable quantity (e.g., sum of squared errors) or from the maximization of some desirable quantity.

23 / 171 Point estimation in a nutshell (cont.) Some of the most important properties an estimator can enjoy are: Unbiasedness: IE( θ) = θ θ Θ; Consistency ( ): θ p θ; Asymptotic normality: When n is large, θ approx normally distributed; Efficiency: (more generally, optimality in some class; e.g., Gauss-Markov Theorem).

23 / 171 Point estimation in a nutshell (cont.) Some of the most important properties an estimator can enjoy are: Unbiasedness: IE( θ) = θ θ Θ; Consistency ( ): θ p θ; Asymptotic normality: When n is large, θ approx normally distributed; Efficiency: (more generally, optimality in some class; e.g., Gauss-Markov Theorem).

23 / 171 Point estimation in a nutshell (cont.) Some of the most important properties an estimator can enjoy are: Unbiasedness: IE( θ) = θ θ Θ; Consistency ( ): θ p θ; Asymptotic normality: When n is large, θ approx normally distributed; Efficiency: (more generally, optimality in some class; e.g., Gauss-Markov Theorem).

23 / 171 Point estimation in a nutshell (cont.) Some of the most important properties an estimator can enjoy are: Unbiasedness: IE( θ) = θ θ Θ; Consistency ( ): θ p θ; Asymptotic normality: When n is large, θ approx normally distributed; Efficiency: (more generally, optimality in some class; e.g., Gauss-Markov Theorem).

23 / 171 Point estimation in a nutshell (cont.) Some of the most important properties an estimator can enjoy are: Unbiasedness: IE( θ) = θ θ Θ; Consistency ( ): θ p θ; Asymptotic normality: When n is large, θ approx normally distributed; Efficiency: (more generally, optimality in some class; e.g., Gauss-Markov Theorem).

24 / 171 Setup Y 1,..., Y n i.i.d. F 0 (θ), where θ Θ IR p. We can write the unknown parameter θ as a functional of F 0 : θ = θ(f 0 ). We can denote an estimator of θ (say, the MLE) as θ, which can be written as the functional θ = θ( F), where F is the empirical c.d.f. of Y 1,..., Y n.

24 / 171 Setup Y 1,..., Y n i.i.d. F 0 (θ), where θ Θ IR p. We can write the unknown parameter θ as a functional of F 0 : θ = θ(f 0 ). We can denote an estimator of θ (say, the MLE) as θ, which can be written as the functional θ = θ( F), where F is the empirical c.d.f. of Y 1,..., Y n.

24 / 171 Setup Y 1,..., Y n i.i.d. F 0 (θ), where θ Θ IR p. We can write the unknown parameter θ as a functional of F 0 : θ = θ(f 0 ). We can denote an estimator of θ (say, the MLE) as θ, which can be written as the functional θ = θ( F), where F is the empirical c.d.f. of Y 1,..., Y n.

25 / 171 Plug-in principle Y 1,..., Y n iid F0. Plug-in Write the parameter as θ = θ(f 0 ). Estimator: θ = θ( F). Example: mean Parameter: θ(f 0 ) = y df 0 = IE(Y). Estimator: θ = y d F = n 1 n i=1 y i = y.

25 / 171 Plug-in principle Y 1,..., Y n iid F0. Plug-in Write the parameter as θ = θ(f 0 ). Estimator: θ = θ( F). Example: mean Parameter: θ(f 0 ) = y df 0 = IE(Y). Estimator: θ = y d F = n 1 n i=1 y i = y.

25 / 171 Plug-in principle Y 1,..., Y n iid F0. Plug-in Write the parameter as θ = θ(f 0 ). Estimator: θ = θ( F). Example: mean Parameter: θ(f 0 ) = y df 0 = IE(Y). Estimator: θ = y d F = n 1 n i=1 y i = y.

25 / 171 Plug-in principle Y 1,..., Y n iid F0. Plug-in Write the parameter as θ = θ(f 0 ). Estimator: θ = θ( F). Example: mean Parameter: θ(f 0 ) = y df 0 = IE(Y). Estimator: θ = y d F = n 1 n i=1 y i = y.

25 / 171 Plug-in principle Y 1,..., Y n iid F0. Plug-in Write the parameter as θ = θ(f 0 ). Estimator: θ = θ( F). Example: mean Parameter: θ(f 0 ) = y df 0 = IE(Y). Estimator: θ = y d F = n 1 n i=1 y i = y.

Main idea: Plug-in principle. Example: Let Y = n 1 n i=1 Y i. We know that if Y i (µ, σ 2 ), then Y (µ, n 1 σ 2 ), so that n 1 σ 2 gives us an indication of the accuracy of the estimate Y. In particular, the standard error of the estimate can be obtained as s.e.(y) = σ 2 n, σ2 = 1 n 1 n (y i ȳ) 2, ( ) for a given observed sample Y 1 = y 1,..., Y n = y n. Bootstrap approach: Write σ 2 = σ 2 (F 0 ), and replace F 0 by F to obtain: σ 2 b.s.e.(y) = n, σ2 σ 2 ( F) = 1 n (y i ȳ) 2. n This is the bootstrap estimate. i=1 i=1 26 / 171

Main idea: Plug-in principle. Example: Let Y = n 1 n i=1 Y i. We know that if Y i (µ, σ 2 ), then Y (µ, n 1 σ 2 ), so that n 1 σ 2 gives us an indication of the accuracy of the estimate Y. In particular, the standard error of the estimate can be obtained as s.e.(y) = σ 2 n, σ2 = 1 n 1 n (y i ȳ) 2, ( ) for a given observed sample Y 1 = y 1,..., Y n = y n. Bootstrap approach: Write σ 2 = σ 2 (F 0 ), and replace F 0 by F to obtain: σ 2 b.s.e.(y) = n, σ2 σ 2 ( F) = 1 n (y i ȳ) 2. n This is the bootstrap estimate. i=1 i=1 26 / 171

Main idea: Plug-in principle. Example: Let Y = n 1 n i=1 Y i. We know that if Y i (µ, σ 2 ), then Y (µ, n 1 σ 2 ), so that n 1 σ 2 gives us an indication of the accuracy of the estimate Y. In particular, the standard error of the estimate can be obtained as s.e.(y) = σ 2 n, σ2 = 1 n 1 n (y i ȳ) 2, ( ) for a given observed sample Y 1 = y 1,..., Y n = y n. Bootstrap approach: Write σ 2 = σ 2 (F 0 ), and replace F 0 by F to obtain: σ 2 b.s.e.(y) = n, σ2 σ 2 ( F) = 1 n (y i ȳ) 2. n This is the bootstrap estimate. i=1 i=1 26 / 171

27 / 171 Noteworthy Note: (i) the difference between the two estimates is minor and vanishes as n ; (ii) F places probability mass 1/n on y 1,..., y n.

27 / 171 Noteworthy Note: (i) the difference between the two estimates is minor and vanishes as n ; (ii) F places probability mass 1/n on y 1,..., y n.

28 / 171 Problem: We are usually interested in estimates more complicated than the sample mean, and for such statistics we may not have a directly available formula like ( ). [E.g., we may be interested in the correlation coefficient, the median, a given quantile, the coefficients of a quantile regression, etc.] Solution: The bootstrap approach allows us to numerically evaluate σ 2 = σ 2 ( F).

28 / 171 Problem: We are usually interested in estimates more complicated than the sample mean, and for such statistics we may not have a directly available formula like ( ). [E.g., we may be interested in the correlation coefficient, the median, a given quantile, the coefficients of a quantile regression, etc.] Solution: The bootstrap approach allows us to numerically evaluate σ 2 = σ 2 ( F).

29 / 171 Bootstrap standard error Question: How can we use bootstrap resampling to obtain a standard error for a given estimate?

30 / 171 What s in a number?

31 / 171 The basic bootstrap algorithm Suppose we wish to obtain an standard error for θ = θ( F), an estimate of θ = θ(f 0 ), from an i.i.d. sample of size n. Here is how we proceed: 1 Compute θ for our sample. 2 Sample from the data with replacement and construct a new sample of size n, say y = (y 1,..., y n). 3 Compute θ for the bootstrap sample obtained in (ii). 4 Repeat steps (ii) and (iii) B times. 5 Use the B realizations of θ to obtain an estimate for the standard error of θ.

32 / 171 The basic bootstrap algorithm That is, where 1 b.s.e.( θ ) = B B 1 b=1 { θ b θ ( )} 2, θ ( ) = 1 B B θ b. b=1

33 / 171 It is important to notice that... Note that the bootstrap generalizes the jackknife in the sense that resampling is carried out in a random fashion, and not in a deterministic and systematic way ( leave one out ).

34 / 171 Parametric versus nonparametric bootstrap The bootstrap may be performed parametrically or nonparametrically. Nonpametric bootstrap: Resampling from F. That is, sample from the data (with replacement). Parametric bootstrap: Sample from F( θ ). The nonparametric bootstrap is more robust against distributional assumptions whereas the parametric bootstrap is expected to be more efficient when the parametric assumptions are true.

34 / 171 Parametric versus nonparametric bootstrap The bootstrap may be performed parametrically or nonparametrically. Nonpametric bootstrap: Resampling from F. That is, sample from the data (with replacement). Parametric bootstrap: Sample from F( θ ). The nonparametric bootstrap is more robust against distributional assumptions whereas the parametric bootstrap is expected to be more efficient when the parametric assumptions are true.

34 / 171 Parametric versus nonparametric bootstrap The bootstrap may be performed parametrically or nonparametrically. Nonpametric bootstrap: Resampling from F. That is, sample from the data (with replacement). Parametric bootstrap: Sample from F( θ ). The nonparametric bootstrap is more robust against distributional assumptions whereas the parametric bootstrap is expected to be more efficient when the parametric assumptions are true.

34 / 171 Parametric versus nonparametric bootstrap The bootstrap may be performed parametrically or nonparametrically. Nonpametric bootstrap: Resampling from F. That is, sample from the data (with replacement). Parametric bootstrap: Sample from F( θ ). The nonparametric bootstrap is more robust against distributional assumptions whereas the parametric bootstrap is expected to be more efficient when the parametric assumptions are true.

35 / 171 Noparametric bootstrap sampling EMPIRICAL DISTRIBUTION: Puts equal weight probabilities n 1 at each sample value y i. EMPIRICAL DISTRIBUTION FUNCTION (EDF): F(y) = #{y i y}. n Notice that the values of the EDF are fixed: (0, 1/n, 2/n,..., n/n). Hence, the EDF is equivalent to its points of increase: y (1) y (n) (ordered sample values).

35 / 171 Noparametric bootstrap sampling EMPIRICAL DISTRIBUTION: Puts equal weight probabilities n 1 at each sample value y i. EMPIRICAL DISTRIBUTION FUNCTION (EDF): F(y) = #{y i y}. n Notice that the values of the EDF are fixed: (0, 1/n, 2/n,..., n/n). Hence, the EDF is equivalent to its points of increase: y (1) y (n) (ordered sample values).

35 / 171 Noparametric bootstrap sampling EMPIRICAL DISTRIBUTION: Puts equal weight probabilities n 1 at each sample value y i. EMPIRICAL DISTRIBUTION FUNCTION (EDF): F(y) = #{y i y}. n Notice that the values of the EDF are fixed: (0, 1/n, 2/n,..., n/n). Hence, the EDF is equivalent to its points of increase: y (1) y (n) (ordered sample values).

36 / 171 Noparametric bootstrap sampling (cont.) Since the EDF puts equal probabilities on the data values y 1,..., y n, each Y is independently sampled at random from those data values. Hence, the bootstrap sample is a random sample taken with replacement from the original data.

36 / 171 Noparametric bootstrap sampling (cont.) Since the EDF puts equal probabilities on the data values y 1,..., y n, each Y is independently sampled at random from those data values. Hence, the bootstrap sample is a random sample taken with replacement from the original data.

37 / 171 Smoothed bootstrap The empirical distribution function ( F) is discrete. Sampling from it boils down to sampling from data with replacement. An interesting idea: To sample from a smoothed distribution function. We replace F by a smooth distribution based on, e.g., a kernel density estimate of F (the derivative of F with respect to y). An example using the correlation coefficient can be found in Efron s 1982 monograph: Efron, B. (1982). The Jackknife, the Bootstrap, and Other Resampling Plans. Philadelphia: SIAM.

37 / 171 Smoothed bootstrap The empirical distribution function ( F) is discrete. Sampling from it boils down to sampling from data with replacement. An interesting idea: To sample from a smoothed distribution function. We replace F by a smooth distribution based on, e.g., a kernel density estimate of F (the derivative of F with respect to y). An example using the correlation coefficient can be found in Efron s 1982 monograph: Efron, B. (1982). The Jackknife, the Bootstrap, and Other Resampling Plans. Philadelphia: SIAM.

37 / 171 Smoothed bootstrap The empirical distribution function ( F) is discrete. Sampling from it boils down to sampling from data with replacement. An interesting idea: To sample from a smoothed distribution function. We replace F by a smooth distribution based on, e.g., a kernel density estimate of F (the derivative of F with respect to y). An example using the correlation coefficient can be found in Efron s 1982 monograph: Efron, B. (1982). The Jackknife, the Bootstrap, and Other Resampling Plans. Philadelphia: SIAM.

37 / 171 Smoothed bootstrap The empirical distribution function ( F) is discrete. Sampling from it boils down to sampling from data with replacement. An interesting idea: To sample from a smoothed distribution function. We replace F by a smooth distribution based on, e.g., a kernel density estimate of F (the derivative of F with respect to y). An example using the correlation coefficient can be found in Efron s 1982 monograph: Efron, B. (1982). The Jackknife, the Bootstrap, and Other Resampling Plans. Philadelphia: SIAM.

38 / 171 Bayesian bootstrap Suppose y 1,..., y n are i.i.d. realizations of Y which has distribution function F(θ), where θ is scalar. Let θ be an estimator of θ. We know that the bootstrap can be used to construct an estimate of the distribution of such an estimator. Instead of sampling from the data with replacement (i.e., sampling each y i with probability 1/n), the Bayesian bootstrap uses a posterior probability distribution for y i. The posterior probability distribution is centered at 1/n, but varies for each y i. How is that done?

38 / 171 Bayesian bootstrap Suppose y 1,..., y n are i.i.d. realizations of Y which has distribution function F(θ), where θ is scalar. Let θ be an estimator of θ. We know that the bootstrap can be used to construct an estimate of the distribution of such an estimator. Instead of sampling from the data with replacement (i.e., sampling each y i with probability 1/n), the Bayesian bootstrap uses a posterior probability distribution for y i. The posterior probability distribution is centered at 1/n, but varies for each y i. How is that done?

38 / 171 Bayesian bootstrap Suppose y 1,..., y n are i.i.d. realizations of Y which has distribution function F(θ), where θ is scalar. Let θ be an estimator of θ. We know that the bootstrap can be used to construct an estimate of the distribution of such an estimator. Instead of sampling from the data with replacement (i.e., sampling each y i with probability 1/n), the Bayesian bootstrap uses a posterior probability distribution for y i. The posterior probability distribution is centered at 1/n, but varies for each y i. How is that done?

38 / 171 Bayesian bootstrap Suppose y 1,..., y n are i.i.d. realizations of Y which has distribution function F(θ), where θ is scalar. Let θ be an estimator of θ. We know that the bootstrap can be used to construct an estimate of the distribution of such an estimator. Instead of sampling from the data with replacement (i.e., sampling each y i with probability 1/n), the Bayesian bootstrap uses a posterior probability distribution for y i. The posterior probability distribution is centered at 1/n, but varies for each y i. How is that done?

39 / 171 Bayesian bootstrap Draw a random sample of size n 1 from the standard uniform distribution. Order the sampled values: u (1),..., u (n 1). Let u (0) = 0 and u (n) = 1. Compute g i = u (i) u (i 1), i = 1,..., n. The g i s are called the gaps between the uniform order statistics. The vector g = (g 1,..., g n ) is used to assign probabilities to the Bayesian bootstrap. That is, sample y i with probability g i (not 1/n). Note that we obtain a different g in each bootstrap replication.

39 / 171 Bayesian bootstrap Draw a random sample of size n 1 from the standard uniform distribution. Order the sampled values: u (1),..., u (n 1). Let u (0) = 0 and u (n) = 1. Compute g i = u (i) u (i 1), i = 1,..., n. The g i s are called the gaps between the uniform order statistics. The vector g = (g 1,..., g n ) is used to assign probabilities to the Bayesian bootstrap. That is, sample y i with probability g i (not 1/n). Note that we obtain a different g in each bootstrap replication.

39 / 171 Bayesian bootstrap Draw a random sample of size n 1 from the standard uniform distribution. Order the sampled values: u (1),..., u (n 1). Let u (0) = 0 and u (n) = 1. Compute g i = u (i) u (i 1), i = 1,..., n. The g i s are called the gaps between the uniform order statistics. The vector g = (g 1,..., g n ) is used to assign probabilities to the Bayesian bootstrap. That is, sample y i with probability g i (not 1/n). Note that we obtain a different g in each bootstrap replication.

39 / 171 Bayesian bootstrap Draw a random sample of size n 1 from the standard uniform distribution. Order the sampled values: u (1),..., u (n 1). Let u (0) = 0 and u (n) = 1. Compute g i = u (i) u (i 1), i = 1,..., n. The g i s are called the gaps between the uniform order statistics. The vector g = (g 1,..., g n ) is used to assign probabilities to the Bayesian bootstrap. That is, sample y i with probability g i (not 1/n). Note that we obtain a different g in each bootstrap replication.

39 / 171 Bayesian bootstrap Draw a random sample of size n 1 from the standard uniform distribution. Order the sampled values: u (1),..., u (n 1). Let u (0) = 0 and u (n) = 1. Compute g i = u (i) u (i 1), i = 1,..., n. The g i s are called the gaps between the uniform order statistics. The vector g = (g 1,..., g n ) is used to assign probabilities to the Bayesian bootstrap. That is, sample y i with probability g i (not 1/n). Note that we obtain a different g in each bootstrap replication.

39 / 171 Bayesian bootstrap Draw a random sample of size n 1 from the standard uniform distribution. Order the sampled values: u (1),..., u (n 1). Let u (0) = 0 and u (n) = 1. Compute g i = u (i) u (i 1), i = 1,..., n. The g i s are called the gaps between the uniform order statistics. The vector g = (g 1,..., g n ) is used to assign probabilities to the Bayesian bootstrap. That is, sample y i with probability g i (not 1/n). Note that we obtain a different g in each bootstrap replication.

40 / 171 Bayesian bootstrap ADVANTAGE: It can used to make Bayesian inference on θ based on the estimated posterior distribution for θ. The bootstrap distribution of θ and the Bayesian bootstrap posterior distribution for θ will be similar in many applications. Reference: Rubin, D.B. (1981). The Bayesian bootstrap. Annals of Statistics, 9, 130 134.

40 / 171 Bayesian bootstrap ADVANTAGE: It can used to make Bayesian inference on θ based on the estimated posterior distribution for θ. The bootstrap distribution of θ and the Bayesian bootstrap posterior distribution for θ will be similar in many applications. Reference: Rubin, D.B. (1981). The Bayesian bootstrap. Annals of Statistics, 9, 130 134.

40 / 171 Bayesian bootstrap ADVANTAGE: It can used to make Bayesian inference on θ based on the estimated posterior distribution for θ. The bootstrap distribution of θ and the Bayesian bootstrap posterior distribution for θ will be similar in many applications. Reference: Rubin, D.B. (1981). The Bayesian bootstrap. Annals of Statistics, 9, 130 134.

41 / 171 Revisiting the big picture Sampling POPULATION DATA model = f(parameters) The great scheme of things.

42 / 171 Software Programming Programming bootstrap resampling is easy: (i) PARAMETRIC: Sample from F( θ), (ii) NONPARAMETRIC: Sample from F (empirical distribution function). Sampling from F: 1) Obtain a standard uniform draw, i.e., obtain u from U(0, 1). 2) Generate a random integer (sau, i ) from {1,..., n} as u n + 1. The ith observation in the bootstrap sample is i th observation in the original sample.

42 / 171 Software Programming Programming bootstrap resampling is easy: (i) PARAMETRIC: Sample from F( θ), (ii) NONPARAMETRIC: Sample from F (empirical distribution function). Sampling from F: 1) Obtain a standard uniform draw, i.e., obtain u from U(0, 1). 2) Generate a random integer (sau, i ) from {1,..., n} as u n + 1. The ith observation in the bootstrap sample is i th observation in the original sample.

43 / 171 Software R package boot: functions and datasets for bootstrapping from the book Bootstrap Methods and Their Applications by A.C. Davison and D.V. Hinkley (1997, Cambridge University Press).

44 / 171 Bootstrap bias correction Suppose that θ is biased (although consistent) for θ, and that we would like to obtain a new estimate which is bias-corrected up to some order of accuracy. The bias of θ is: IE( θ) θ (systematic error). Ideally, we would like to have: θ bias, but this is not feasible (since the bias depends on θ). Define the bias-corrected estimate as: θ = θ bias. We then take bias to be bias B = θ ( ) θ, which implies that θ = θ { θ ( ) θ} = 2 θ θ ( ).

44 / 171 Bootstrap bias correction Suppose that θ is biased (although consistent) for θ, and that we would like to obtain a new estimate which is bias-corrected up to some order of accuracy. The bias of θ is: IE( θ) θ (systematic error). Ideally, we would like to have: θ bias, but this is not feasible (since the bias depends on θ). Define the bias-corrected estimate as: θ = θ bias. We then take bias to be bias B = θ ( ) θ, which implies that θ = θ { θ ( ) θ} = 2 θ θ ( ).

44 / 171 Bootstrap bias correction Suppose that θ is biased (although consistent) for θ, and that we would like to obtain a new estimate which is bias-corrected up to some order of accuracy. The bias of θ is: IE( θ) θ (systematic error). Ideally, we would like to have: θ bias, but this is not feasible (since the bias depends on θ). Define the bias-corrected estimate as: θ = θ bias. We then take bias to be bias B = θ ( ) θ, which implies that θ = θ { θ ( ) θ} = 2 θ θ ( ).

44 / 171 Bootstrap bias correction Suppose that θ is biased (although consistent) for θ, and that we would like to obtain a new estimate which is bias-corrected up to some order of accuracy. The bias of θ is: IE( θ) θ (systematic error). Ideally, we would like to have: θ bias, but this is not feasible (since the bias depends on θ). Define the bias-corrected estimate as: θ = θ bias. We then take bias to be bias B = θ ( ) θ, which implies that θ = θ { θ ( ) θ} = 2 θ θ ( ).

44 / 171 Bootstrap bias correction Suppose that θ is biased (although consistent) for θ, and that we would like to obtain a new estimate which is bias-corrected up to some order of accuracy. The bias of θ is: IE( θ) θ (systematic error). Ideally, we would like to have: θ bias, but this is not feasible (since the bias depends on θ). Define the bias-corrected estimate as: θ = θ bias. We then take bias to be bias B = θ ( ) θ, which implies that θ = θ { θ ( ) θ} = 2 θ θ ( ).

45 / 171 We shall call the above bias correction BC1. NOTE: θ ( ) is not itself the bootstrap bias-corrected estimate. Let s look into that. (For futher details, see, e.g., MacKinnon & Smith, Journal of Econometrics, 1998). Assuming that IE( θ) exists, write θ = θ 0 + B(θ 0, n) + R(θ 0, n), where B(θ 0, n) = IE( θ) θ 0 (i.e., B(, ) is the bias function) and R(θ 0, n) is defined so that the above equation holds. Assume we know the distribution of Y i up to the unknown parameter θ (so that we can use the parametric bootstrap).

45 / 171 We shall call the above bias correction BC1. NOTE: θ ( ) is not itself the bootstrap bias-corrected estimate. Let s look into that. (For futher details, see, e.g., MacKinnon & Smith, Journal of Econometrics, 1998). Assuming that IE( θ) exists, write θ = θ 0 + B(θ 0, n) + R(θ 0, n), where B(θ 0, n) = IE( θ) θ 0 (i.e., B(, ) is the bias function) and R(θ 0, n) is defined so that the above equation holds. Assume we know the distribution of Y i up to the unknown parameter θ (so that we can use the parametric bootstrap).

45 / 171 We shall call the above bias correction BC1. NOTE: θ ( ) is not itself the bootstrap bias-corrected estimate. Let s look into that. (For futher details, see, e.g., MacKinnon & Smith, Journal of Econometrics, 1998). Assuming that IE( θ) exists, write θ = θ 0 + B(θ 0, n) + R(θ 0, n), where B(θ 0, n) = IE( θ) θ 0 (i.e., B(, ) is the bias function) and R(θ 0, n) is defined so that the above equation holds. Assume we know the distribution of Y i up to the unknown parameter θ (so that we can use the parametric bootstrap).

45 / 171 We shall call the above bias correction BC1. NOTE: θ ( ) is not itself the bootstrap bias-corrected estimate. Let s look into that. (For futher details, see, e.g., MacKinnon & Smith, Journal of Econometrics, 1998). Assuming that IE( θ) exists, write θ = θ 0 + B(θ 0, n) + R(θ 0, n), where B(θ 0, n) = IE( θ) θ 0 (i.e., B(, ) is the bias function) and R(θ 0, n) is defined so that the above equation holds. Assume we know the distribution of Y i up to the unknown parameter θ (so that we can use the parametric bootstrap).

46 / 171 Noteworthy If θ is n consistent and asymptotically normal, the bias will be typically be O(n 1 ). (Otherwise, n( θ θ 0 ) would not have mean zero asymptotically.)

47 / 171 Suppose that B(θ, n) = B(n) for all θ. That is, suppose the bias function if flat. Here, it does not matter at which value of θ we evaluate the bias function since it is flat. An obvious candidate, however, is θ, the MLE. And what we get here is exactly our bias correction BC1: θ = θ B( θ, n) = θ { θ ( ) θ} = 2 θ θ ( ). NOTE: In many (most?) cases, however, the bias function is not flat.

47 / 171 Suppose that B(θ, n) = B(n) for all θ. That is, suppose the bias function if flat. Here, it does not matter at which value of θ we evaluate the bias function since it is flat. An obvious candidate, however, is θ, the MLE. And what we get here is exactly our bias correction BC1: θ = θ B( θ, n) = θ { θ ( ) θ} = 2 θ θ ( ). NOTE: In many (most?) cases, however, the bias function is not flat.

47 / 171 Suppose that B(θ, n) = B(n) for all θ. That is, suppose the bias function if flat. Here, it does not matter at which value of θ we evaluate the bias function since it is flat. An obvious candidate, however, is θ, the MLE. And what we get here is exactly our bias correction BC1: θ = θ B( θ, n) = θ { θ ( ) θ} = 2 θ θ ( ). NOTE: In many (most?) cases, however, the bias function is not flat.

48 / 171 Suppose now that the bias function is linear in θ, that is, B(θ, n) = α 0 + α 1 θ. ( ) The main idea is to evaluate ( ) at two points and then solve for α 0 and α 1. (Note that this will require two sets of simulations!) Obvious choices for the two points at which we evaluate the DGP are θ and θ. The solution is α 0 = B B B θ θ θ, α 1 = B B θ θ. (NOTE: Here we are using a shorthand notation.) The estimated α s will converge to the true ones as B increases.

48 / 171 Suppose now that the bias function is linear in θ, that is, B(θ, n) = α 0 + α 1 θ. ( ) The main idea is to evaluate ( ) at two points and then solve for α 0 and α 1. (Note that this will require two sets of simulations!) Obvious choices for the two points at which we evaluate the DGP are θ and θ. The solution is α 0 = B B B θ θ θ, α 1 = B B θ θ. (NOTE: Here we are using a shorthand notation.) The estimated α s will converge to the true ones as B increases.

48 / 171 Suppose now that the bias function is linear in θ, that is, B(θ, n) = α 0 + α 1 θ. ( ) The main idea is to evaluate ( ) at two points and then solve for α 0 and α 1. (Note that this will require two sets of simulations!) Obvious choices for the two points at which we evaluate the DGP are θ and θ. The solution is α 0 = B B B θ θ θ, α 1 = B B θ θ. (NOTE: Here we are using a shorthand notation.) The estimated α s will converge to the true ones as B increases.

48 / 171 Suppose now that the bias function is linear in θ, that is, B(θ, n) = α 0 + α 1 θ. ( ) The main idea is to evaluate ( ) at two points and then solve for α 0 and α 1. (Note that this will require two sets of simulations!) Obvious choices for the two points at which we evaluate the DGP are θ and θ. The solution is α 0 = B B B θ θ θ, α 1 = B B θ θ. (NOTE: Here we are using a shorthand notation.) The estimated α s will converge to the true ones as B increases.

48 / 171 Suppose now that the bias function is linear in θ, that is, B(θ, n) = α 0 + α 1 θ. ( ) The main idea is to evaluate ( ) at two points and then solve for α 0 and α 1. (Note that this will require two sets of simulations!) Obvious choices for the two points at which we evaluate the DGP are θ and θ. The solution is α 0 = B B B θ θ θ, α 1 = B B θ θ. (NOTE: Here we are using a shorthand notation.) The estimated α s will converge to the true ones as B increases.

48 / 171 Suppose now that the bias function is linear in θ, that is, B(θ, n) = α 0 + α 1 θ. ( ) The main idea is to evaluate ( ) at two points and then solve for α 0 and α 1. (Note that this will require two sets of simulations!) Obvious choices for the two points at which we evaluate the DGP are θ and θ. The solution is α 0 = B B B θ θ θ, α 1 = B B θ θ. (NOTE: Here we are using a shorthand notation.) The estimated α s will converge to the true ones as B increases.

49 / 171 The bias-corrected estimator can then be defined as θ = θ α 0 α 1 θ. (Here we are evaluating the bias function at θ itself.) The solution, therefore, is θ = 1 1 + α 1 ( θ α 0 ). We can call the above bias correction BC2.

49 / 171 The bias-corrected estimator can then be defined as θ = θ α 0 α 1 θ. (Here we are evaluating the bias function at θ itself.) The solution, therefore, is θ = 1 1 + α 1 ( θ α 0 ). We can call the above bias correction BC2.

49 / 171 The bias-corrected estimator can then be defined as θ = θ α 0 α 1 θ. (Here we are evaluating the bias function at θ itself.) The solution, therefore, is θ = 1 1 + α 1 ( θ α 0 ). We can call the above bias correction BC2.

What if the bias function is nonlinear? In that case, we define a bias-corrected estimator as θ = θ B( θ, n). One way of implement this is as follows. Start w/ B obtained as in the BC1, i.e., B = θ ( ) θ. Now, compute (sequentially) θ (j) = (1 λ) θ (j 1) + λ( θ B( θ (j 1), n)), where θ (0) = θ and 0 < λ 1. Stop when θ (j) θ (j 1) < ɛ for a sufficiently small ɛ. Suggestion: Start with λ = 1. If the procedure does not converge, try smaller values of λ. 50 / 171

What if the bias function is nonlinear? In that case, we define a bias-corrected estimator as θ = θ B( θ, n). One way of implement this is as follows. Start w/ B obtained as in the BC1, i.e., B = θ ( ) θ. Now, compute (sequentially) θ (j) = (1 λ) θ (j 1) + λ( θ B( θ (j 1), n)), where θ (0) = θ and 0 < λ 1. Stop when θ (j) θ (j 1) < ɛ for a sufficiently small ɛ. Suggestion: Start with λ = 1. If the procedure does not converge, try smaller values of λ. 50 / 171

What if the bias function is nonlinear? In that case, we define a bias-corrected estimator as θ = θ B( θ, n). One way of implement this is as follows. Start w/ B obtained as in the BC1, i.e., B = θ ( ) θ. Now, compute (sequentially) θ (j) = (1 λ) θ (j 1) + λ( θ B( θ (j 1), n)), where θ (0) = θ and 0 < λ 1. Stop when θ (j) θ (j 1) < ɛ for a sufficiently small ɛ. Suggestion: Start with λ = 1. If the procedure does not converge, try smaller values of λ. 50 / 171

What if the bias function is nonlinear? In that case, we define a bias-corrected estimator as θ = θ B( θ, n). One way of implement this is as follows. Start w/ B obtained as in the BC1, i.e., B = θ ( ) θ. Now, compute (sequentially) θ (j) = (1 λ) θ (j 1) + λ( θ B( θ (j 1), n)), where θ (0) = θ and 0 < λ 1. Stop when θ (j) θ (j 1) < ɛ for a sufficiently small ɛ. Suggestion: Start with λ = 1. If the procedure does not converge, try smaller values of λ. 50 / 171

What if the bias function is nonlinear? In that case, we define a bias-corrected estimator as θ = θ B( θ, n). One way of implement this is as follows. Start w/ B obtained as in the BC1, i.e., B = θ ( ) θ. Now, compute (sequentially) θ (j) = (1 λ) θ (j 1) + λ( θ B( θ (j 1), n)), where θ (0) = θ and 0 < λ 1. Stop when θ (j) θ (j 1) < ɛ for a sufficiently small ɛ. Suggestion: Start with λ = 1. If the procedure does not converge, try smaller values of λ. 50 / 171

What if the bias function is nonlinear? In that case, we define a bias-corrected estimator as θ = θ B( θ, n). One way of implement this is as follows. Start w/ B obtained as in the BC1, i.e., B = θ ( ) θ. Now, compute (sequentially) θ (j) = (1 λ) θ (j 1) + λ( θ B( θ (j 1), n)), where θ (0) = θ and 0 < λ 1. Stop when θ (j) θ (j 1) < ɛ for a sufficiently small ɛ. Suggestion: Start with λ = 1. If the procedure does not converge, try smaller values of λ. 50 / 171

51 / 171 An alternative bootstrap bias estimate was introduced by Efron (1990). It is carried out nonparametrically and uses an auxiliary (n 1) resampling vector, whose elements are the proportions of observations in the original sample y = (y 1,..., y n ) that were included in the bootstrap sample. Let P = (P1, P 2,..., P n) be the resampling vector. Its jth element (j = 1, 2,..., n), Pj, is defined with respect to a given bootstrap sample y = (y 1,..., y n) as P j = n 1( #{y k = y j} ). It is important to note that the vector P 0 = (1/n, 1/n,..., 1/n) corresponds to the original sample.

51 / 171 An alternative bootstrap bias estimate was introduced by Efron (1990). It is carried out nonparametrically and uses an auxiliary (n 1) resampling vector, whose elements are the proportions of observations in the original sample y = (y 1,..., y n ) that were included in the bootstrap sample. Let P = (P1, P 2,..., P n) be the resampling vector. Its jth element (j = 1, 2,..., n), Pj, is defined with respect to a given bootstrap sample y = (y 1,..., y n) as P j = n 1( #{y k = y j} ). It is important to note that the vector P 0 = (1/n, 1/n,..., 1/n) corresponds to the original sample.

51 / 171 An alternative bootstrap bias estimate was introduced by Efron (1990). It is carried out nonparametrically and uses an auxiliary (n 1) resampling vector, whose elements are the proportions of observations in the original sample y = (y 1,..., y n ) that were included in the bootstrap sample. Let P = (P1, P 2,..., P n) be the resampling vector. Its jth element (j = 1, 2,..., n), Pj, is defined with respect to a given bootstrap sample y = (y 1,..., y n) as P j = n 1( #{y k = y j} ). It is important to note that the vector P 0 = (1/n, 1/n,..., 1/n) corresponds to the original sample.

51 / 171 An alternative bootstrap bias estimate was introduced by Efron (1990). It is carried out nonparametrically and uses an auxiliary (n 1) resampling vector, whose elements are the proportions of observations in the original sample y = (y 1,..., y n ) that were included in the bootstrap sample. Let P = (P1, P 2,..., P n) be the resampling vector. Its jth element (j = 1, 2,..., n), Pj, is defined with respect to a given bootstrap sample y = (y 1,..., y n) as P j = n 1( #{y k = y j} ). It is important to note that the vector P 0 = (1/n, 1/n,..., 1/n) corresponds to the original sample.

51 / 171 An alternative bootstrap bias estimate was introduced by Efron (1990). It is carried out nonparametrically and uses an auxiliary (n 1) resampling vector, whose elements are the proportions of observations in the original sample y = (y 1,..., y n ) that were included in the bootstrap sample. Let P = (P1, P 2,..., P n) be the resampling vector. Its jth element (j = 1, 2,..., n), Pj, is defined with respect to a given bootstrap sample y = (y 1,..., y n) as P j = n 1( #{y k = y j} ). It is important to note that the vector P 0 = (1/n, 1/n,..., 1/n) corresponds to the original sample.

52 / 171 Also, any bootstrap replicate θ can be defined as a function of the resampling vector. For example, if θ = s(y) = y = n 1 n i=1 y i, then θ = y 1 + y 2 + + y n n = (np 1 )y 1 + + (np n)y n n = #{y k = y 1}y 1 + + #{y k = y n}y n n = P y.

52 / 171 Also, any bootstrap replicate θ can be defined as a function of the resampling vector. For example, if θ = s(y) = y = n 1 n i=1 y i, then θ = y 1 + y 2 + + y n n = (np 1 )y 1 + + (np n)y n n = #{y k = y 1}y 1 + + #{y k = y n}y n n = P y.

53 / 171 Suppose we can write the estimate of interest, obtained from the original sample y, as G(P 0 ). It is now possible to obtain bootstrap estimates θ b using the resampling vectors P b, b = 1, 2,..., R, as G(P b ). Efron s (1990) bootstrap bias estimate, BˆF(ˆθ, θ), is defined as BˆF(ˆθ, θ) = ˆθ ( ) G(P ( ) ), where P ( ) = 1 R R P b, b=1 which differs from ˆBˆF(ˆθ, θ), since ˆBˆF(ˆθ, θ) = ˆθ ( ) G(P 0 ). Notice that this bias estimate uses an additional information, namely: the proportions of the n observations that were selected each nonparametric resampling.

53 / 171 Suppose we can write the estimate of interest, obtained from the original sample y, as G(P 0 ). It is now possible to obtain bootstrap estimates θ b using the resampling vectors P b, b = 1, 2,..., R, as G(P b ). Efron s (1990) bootstrap bias estimate, BˆF(ˆθ, θ), is defined as BˆF(ˆθ, θ) = ˆθ ( ) G(P ( ) ), where P ( ) = 1 R R P b, b=1 which differs from ˆBˆF(ˆθ, θ), since ˆBˆF(ˆθ, θ) = ˆθ ( ) G(P 0 ). Notice that this bias estimate uses an additional information, namely: the proportions of the n observations that were selected each nonparametric resampling.

53 / 171 Suppose we can write the estimate of interest, obtained from the original sample y, as G(P 0 ). It is now possible to obtain bootstrap estimates θ b using the resampling vectors P b, b = 1, 2,..., R, as G(P b ). Efron s (1990) bootstrap bias estimate, BˆF(ˆθ, θ), is defined as BˆF(ˆθ, θ) = ˆθ ( ) G(P ( ) ), where P ( ) = 1 R R P b, b=1 which differs from ˆBˆF(ˆθ, θ), since ˆBˆF(ˆθ, θ) = ˆθ ( ) G(P 0 ). Notice that this bias estimate uses an additional information, namely: the proportions of the n observations that were selected each nonparametric resampling.

53 / 171 Suppose we can write the estimate of interest, obtained from the original sample y, as G(P 0 ). It is now possible to obtain bootstrap estimates θ b using the resampling vectors P b, b = 1, 2,..., R, as G(P b ). Efron s (1990) bootstrap bias estimate, BˆF(ˆθ, θ), is defined as BˆF(ˆθ, θ) = ˆθ ( ) G(P ( ) ), where P ( ) = 1 R R P b, b=1 which differs from ˆBˆF(ˆθ, θ), since ˆBˆF(ˆθ, θ) = ˆθ ( ) G(P 0 ). Notice that this bias estimate uses an additional information, namely: the proportions of the n observations that were selected each nonparametric resampling.

54 / 171 After obtaining an estimate for the bias, it is easy to obtain a bias-adjusted estimator: θ = s(y) BˆF(ˆθ, θ) = ˆθ ˆθ ( ) + G(P ( ) ).

55 / 171 It is important to note that the bias estimation procedure proposed by Efron (1990) requires the estimator ˆθ to have closed form. However, oftentimes the maximum likelihood estimator of θ, the parameter that indexes the model used to represent the population, does not have a closed form. Rather, it needs to be obtained by numerically maximizing the log likelihood function using a nonlinear optimization algorithm, such as a Newton or quasi-newton algorithm. Cribari-Neto, Frery and Silva (2002) proposed an adaptation of Efron s method that can be used with estimators that cannot be written in closed form.

55 / 171 It is important to note that the bias estimation procedure proposed by Efron (1990) requires the estimator ˆθ to have closed form. However, oftentimes the maximum likelihood estimator of θ, the parameter that indexes the model used to represent the population, does not have a closed form. Rather, it needs to be obtained by numerically maximizing the log likelihood function using a nonlinear optimization algorithm, such as a Newton or quasi-newton algorithm. Cribari-Neto, Frery and Silva (2002) proposed an adaptation of Efron s method that can be used with estimators that cannot be written in closed form.

55 / 171 It is important to note that the bias estimation procedure proposed by Efron (1990) requires the estimator ˆθ to have closed form. However, oftentimes the maximum likelihood estimator of θ, the parameter that indexes the model used to represent the population, does not have a closed form. Rather, it needs to be obtained by numerically maximizing the log likelihood function using a nonlinear optimization algorithm, such as a Newton or quasi-newton algorithm. Cribari-Neto, Frery and Silva (2002) proposed an adaptation of Efron s method that can be used with estimators that cannot be written in closed form.

56 / 171 They use the resampling vector to modify the log likelihood function, and then maximize the modified log likelihood. The main idea is to write the log likelihood function in terms of P 0, replace this vector by P ( ), and then maximize the resulting (modified) log likelihood function. The maximizer of such a function is a bias-corrected maximum likelihood estimator.

56 / 171 They use the resampling vector to modify the log likelihood function, and then maximize the modified log likelihood. The main idea is to write the log likelihood function in terms of P 0, replace this vector by P ( ), and then maximize the resulting (modified) log likelihood function. The maximizer of such a function is a bias-corrected maximum likelihood estimator.

56 / 171 They use the resampling vector to modify the log likelihood function, and then maximize the modified log likelihood. The main idea is to write the log likelihood function in terms of P 0, replace this vector by P ( ), and then maximize the resulting (modified) log likelihood function. The maximizer of such a function is a bias-corrected maximum likelihood estimator.