Bootstrap (Part 3) Christof Seiler. Stanford University, Spring 2016, Stats 205

Similar documents
Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Bootstrap Confidence Intervals

The Nonparametric Bootstrap

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

4 Resampling Methods: The Bootstrap

Finite Population Correction Methods

11. Bootstrap Methods

One-Sample Numerical Data


Better Bootstrap Confidence Intervals

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

ITERATED BOOTSTRAP PREDICTION INTERVALS

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

Stat 710: Mathematical Statistics Lecture 31

Stat 5101 Lecture Notes

Math 494: Mathematical Statistics

MS&E 226: Small Data

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models

STAT 830 Non-parametric Inference Basics

Lecture 28: Asymptotic confidence sets

EXPLICIT NONPARAMETRIC CONFIDENCE INTERVALS FOR THE VARIANCE WITH GUARANTEED COVERAGE

Chapter 1 - Lecture 3 Measures of Location

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.

SOLUTION FOR HOMEWORK 4, STAT 4352

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STAT 461/561- Assignments, Year 2015

inferences on stress-strength reliability from lindley distributions

This produces (edited) ORDINARY NONPARAMETRIC BOOTSTRAP Call: boot(data = aircondit, statistic = function(x, i) { 1/mean(x[i, ]) }, R = B)

Higher-Order Confidence Intervals for Stochastic Programming using Bootstrapping

QED. Queen s Economics Department Working Paper No The Size Distortion of Bootstrap Tests

A Simulation Study on Confidence Interval Procedures of Some Mean Cumulative Function Estimators

Double Bootstrap Confidence Interval Estimates with Censored and Truncated Data

UNIVERSITÄT POTSDAM Institut für Mathematik

Resampling and the Bootstrap

Tail bound inequalities and empirical likelihood for the mean

Using R in Undergraduate and Graduate Probability and Mathematical Statistics Courses*

MS&E 226: Small Data

Economics 583: Econometric Theory I A Primer on Asymptotics

Bootstrapping, Randomization, 2B-PLS

Contents 1. Contents

Chapter 2: Resampling Maarten Jansen

Small area prediction based on unit level models when the covariate mean is measured with error

Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs

Nonparametric confidence intervals. for receiver operating characteristic curves

Bootstrap & Confidence/Prediction intervals

Bootstrap Confidence Intervals for the Correlation Coefficient

SUPPLEMENT TO PARAMETRIC OR NONPARAMETRIC? A PARAMETRICNESS INDEX FOR MODEL SELECTION. University of Minnesota

Hypothesis Testing For Multilayer Network Data

Bayesian Inference and the Parametric Bootstrap. Bradley Efron Stanford University

The automatic construction of bootstrap confidence intervals

9. Robust regression

Nonparametric Methods II

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

Estimation of Stress-Strength Reliability Using Record Ranked Set Sampling Scheme from the Exponential Distribution

Stat 5102 Final Exam May 14, 2015

Saddlepoint-Based Bootstrap Inference in Dependent Data Settings

Bootstrap Methods and their Application

Bootstrapping Spring 2014

A Simulation Comparison Study for Estimating the Process Capability Index C pm with Asymmetric Tolerances

The Bootstrap Suppose we draw aniid sample Y 1 ;:::;Y B from a distribution G. Bythe law of large numbers, Y n = 1 B BX j=1 Y j P! Z ydg(y) =E

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

On bootstrap coverage probability with dependent data

ASSESSING A VECTOR PARAMETER

Subject CS1 Actuarial Statistics 1 Core Principles

Bootstrap Testing in Econometrics

simple if it completely specifies the density of x

Bootstrap Resampling: Technical Notes. R.C.H. Cheng School of Mathematics University of Southampton Southampton SO17 1BJ UK.

Sampling Distributions of Statistics Corresponds to Chapter 5 of Tamhane and Dunlop

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

A Resampling Method on Pivotal Estimating Functions

The Bootstrap in Econometrics

IEOR E4703: Monte-Carlo Simulation

On robust and efficient estimation of the center of. Symmetry.

Chapter 1 Likelihood-Based Inference and Finite-Sample Corrections: A Brief Overview

Resampling and the Bootstrap

Graphical Presentation of a Nonparametric Regression with Bootstrapped Confidence Intervals

Inferences for the Ratio: Fieller s Interval, Log Ratio, and Large Sample Based Confidence Intervals

University of California San Diego and Stanford University and

Lifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with an Arrhenius rate relationship

1 One-way analysis of variance

Robust Performance Hypothesis Testing with the Variance. Institute for Empirical Research in Economics University of Zurich

STAT440/840: Statistical Computing

Testing Statistical Hypotheses

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Constructing Prediction Intervals for Random Forests

Advanced Statistics II: Non Parametric Tests

1 Measures of the Center of a Distribution

Example. Populations and samples

The bootstrap and Markov chain Monte Carlo

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

The Prediction of Monthly Inflation Rate in Romania 1

The exact bootstrap method shown on the example of the mean and variance estimation

STAT 512 sp 2018 Summary Sheet

Outline of GLMs. Definitions

Measuring and Valuing Mobility

Frequentist Accuracy of Bayesian Estimates

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Transcription:

Bootstrap (Part 3) Christof Seiler Stanford University, Spring 2016, Stats 205

Overview So far we used three different bootstraps: Nonparametric bootstrap on the rows (e.g. regression, PCA with random rows and columns) Nonparametric bootstrap on the residuals (e.g. regression) Parametric bootstrap (e.g. PCA with fixed rows and columns) Today, we will look at some tricks to improve the bootstrap for confidence intervals: Studentized bootstrap

Introduction A statistics is (asymptotically) pivotal if its limiting distribution does not depend on unknown quantities For example, with observations X 1,..., X n from a normal distribution with unknown mean and variance, a pivotal quantity is T (X 1,..., X n ) = ( ) θ ˆθ n ˆσ with unbiased estimates for sample mean and variance ˆθ = 1 n n i=1 X i ˆσ 2 = 1 n 1 n (X i ˆθ) 2 i=1 Then T (X 1,..., X n ) is a pivot following the Student s t-distribution with ν = n 1 degrees of freedom Because the distribution of T (X 1,..., X n ) does not depend on µ or σ 2

Introduction The bootstrap is better at estimating the distribution of a pivotal statistics than at a nonpivotal statistics We will see an asymptotic argument using Edgeworth expansions But first, let us look at an example

Motivation Take n = 20 random exponential variables with mean 3 x = rexp(n,rate=1/3) Generate B = 1000 bootstrap samples of x, and calculate the mean for each bootstrap sample s = numeric(b) for (j in 1:B) { boot = sample(n,replace=true) s[j] = mean(x[boot]) } Form confidence interval from bootstrap samples using quantiles (α =.025) simple.ci = quantile(s,c(.025,.975)) Repeat this process 100 times Check how often the intervals actually contains the true mean

Motivation bootstrap conf intervals 0 20 40 60 80 100 0 2 4 6 8

Motivation Another way is to calculate a pivotal quantity as the bootstrapped statistic Calculate the mean and standard deviation x = rexp(n,rate=1/3) mean.x = mean(x) sd.x = sd(x) For each bootstrap sample, calculate z = numeric(b) for (j in 1:B) { boot = sample(n,replace=true) z[j] = (mean.x - mean(x[boot]))/sd(x[boot]) } Form a confidence interval like this pivot.ci = mean.x + sd.x*quantile(z,c(.025,.975))

Motivation bootstrap conf intervals 0 20 40 60 80 100 0 2 4 6 8

Studentized Bootstrap Consider X 1,..., X n from F Let ˆθ be an estimate of some θ Let ˆσ 2 be a standard error for ˆθ estimated using the bootstrap Most of the time as n grows ˆθ θ ˆσ. N(0, 1) Let z (α) be the 100 αth percentile of N(0, 1) Then a standard confidence interval with coverage probability 1 2α is ˆθ ± z (1 α) ˆσ As n, the bootstrap and standard intervals converge

Studentized Bootstrap How can we improve the standard confidence interval? These intervals are valid under assumption that Z = ˆθ θ ˆσ. N(0, 1) But this is only valid as n And are approximate for finite n When ˆθ is the sample mean, a better approximation is Z = ˆθ θ ˆσ. t n 1 and t n 1 is the Student s t distribution with n 1 degrees of freedom

Studentized Bootstrap With this new approximation, we have ˆθ ± t (1 α) n 1 As n grows the t distribution converges to the normal distribution Intuitively, it widens the interval to account for unknown standard error But, for instance, it does not account for skewness in the underlying population This can happen when ˆθ is not the sample mean The Studentized bootstrap can adjust for such errors ˆσ

Studentized Bootstrap We estimate the distribution of Z = ˆθ θ ˆσ.? by generating B bootstrap samples X 1, X 2,..., X B and computing Z b = ˆθ b ˆθ ˆσ b Then the αth percentile of Z b is estimated by the value ˆt (α) such that #{Z b ˆt (α) } = α B Which yields the studentized bootstrap interval (ˆθ ˆt (1 α) ˆσ, ˆθ ˆt (α) ˆσ)

Asymptotic Argument in Favor of Pivoting Consider parameter θ estimated by ˆθ with variance 1 n σ2 Take the pivotal statistics S = n ( ) ˆθ θ with estimate ˆθ and asymptotic variance estimate ˆσ 2 Then, we can use Edgeworth expansions P(S x) = Φ(X) + nq(x)φ(x) + O( n) with Φ standard normal distribution, φ standard normal density, and q even polynomials of degree 2 ˆσ

Asymptotic Argument in Favor of Pivoting Bootstrap estimates are S = n ( ˆθ ) ˆθ ˆσ Then, we can use Edgeworth expansions P(S x X 1,..., X n ) = Φ(X) + nˆq(x)φ(x) + O( n) ˆq is obtain by replacing unknowns in q with bootstrap estimates Asymptotically, we further have ˆq q = O( n)

Asymptotic Argument in Favor of Pivoting Then, the bootstrap approximation to the distribution of S is P(S x) P(S x X 1,..., X n ) = ( Φ(X)+ nq(x)φ(x)+o( ) ( n) Φ(X)+ nˆq(x)φ(x)+o( ) n) ( ) 1 = O n Compared to the normal approximation n Which the same as the error when using standard bootstrap (can be shown with the same argument)

Studentized Bootstrap These pivotal intervals are more accurate in large samples than that of standard intervals and t intervals Accuracy comes at the cost of generality standard normal tables apply to all samples and all samples sizes t tables apply to all samples of fixed n studentized bootstrap tables apply only to given sample The studentized bootstrap can be asymmetric It can be used for simple statistics, like mean, median, trimmed mean, and sample percentile But for more general statistics like the correlation coefficients, there are some problems: Interval can fall outside of allowable range Computational issues if both parameter and standard error have to be bootstrapped

Studentized Bootstrap The Studentized bootstrap works better for variance stabilized parameters Consider a random variable X with mean θ and standard deviation s(θ) that varies as a function of θ Using the delta method and solving an ordinary differential equation, we can show that g(x) = x 1 s(u) du will make the variance of g(x) constant Usually s(u) is unknown So we need to estimate s(u) = se(ˆθ θ = u) using the bootstrap

Studentized Bootstrap 1. First bootstrap ˆθ, second bootstrap ŝe(ˆθ) from ˆθ 2. Fit curve through points (ˆθ 1, ŝe(ˆθ 1 )),..., (ˆθ B, ŝe(ˆθ B )) 3. Variance stabilization g(ˆθ) by numerical integration 4. Studentized bootstrap using g(ˆθ ) g(ˆθ) (no denominator, since variance is now approximately one) 5. Map back through transformation g 1 Source: Efron and Tibshirani (1994)

Studentized Bootstrap in R library(boot) mean.fun = function(d, i) { m = mean(d$hours[i]) n = length(i) v = (n-1)*var(d$hours[i])/n^2 c(m, v) } air.boot <- boot(aircondit, mean.fun, R = 999) results = boot.ci(air.boot, type = c("basic", "stud"))

Studentized Bootstrap in R results ## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS ## Based on 999 bootstrap replicates ## ## CALL : ## boot.ci(boot.out = air.boot, type = c("basic", "stud")) ## ## Intervals : ## Level Basic Studentized ## 95% ( 22.2, 171.2 ) ( 49.0, 303.0 ) ## Calculations and Intervals on Original Scale

References Efron (1987). Better Bootstrap Confidence Intervals Hall (1992). The Bootstrap and Edgeworth Expansion Efron and Tibshirani (1994). An Introduction to the Bootstrap Love (2010). Bootstrap-t Confidence Intervals (Link to blog entry)