Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

Similar documents
large number of i.i.d. observations from P. For concreteness, suppose

Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf. (ˆθn θ (P ) under P. For real ˆθ n, (τ n

University of California San Diego and Stanford University and

On the Uniform Asymptotic Validity of Subsampling and the Bootstrap

Lecture 8 Inequality Testing and Moment Inequality Models

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

1 Glivenko-Cantelli type theorems

ON THE UNIFORM ASYMPTOTIC VALIDITY OF SUBSAMPLING AND THE BOOTSTRAP. Joseph P. Romano Azeem M. Shaikh

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

Lecture 8. October 22, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Statistics Ph.D. Qualifying Exam

Asymptotic Statistics-III. Changliang Zou

Introductory Econometrics

DA Freedman Notes on the MLE Fall 2003

This does not cover everything on the final. Look at the posted practice problems for other topics.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Inference for Identifiable Parameters in Partially Identified Econometric Models

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

Chapter 10. U-statistics Statistical Functionals and V-Statistics

Bootstrap Confidence Intervals

STAT 512 sp 2018 Summary Sheet

Mathematical statistics

7 Influence Functions

Central Limit Theorem ( 5.3)

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Final Examination Statistics 200C. T. Ferguson June 11, 2009

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Stat 710: Mathematical Statistics Lecture 31

STAT 830 Non-parametric Inference Basics

Lecture 32: Asymptotic confidence sets and likelihoods

Interval estimation. October 3, Basic ideas CLT and CI CI for a population mean CI for a population proportion CI for a Normal mean

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Lecture Notes 3 Convergence (Chapter 5)

Tail bound inequalities and empirical likelihood for the mean

MTMS Mathematical Statistics

Math 141. Lecture 16: More than one group. Albyn Jones 1. jones/courses/ Library 304. Albyn Jones Math 141

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.

Chapter 12 - Lecture 2 Inferences about regression coefficient

COMPSCI 240: Reasoning Under Uncertainty

Introduction to Self-normalized Limit Theory

Bootstrap - theoretical problems

i=1 X i/n i=1 (X i X) 2 /(n 1). Find the constant c so that the statistic c(x X n+1 )/S has a t-distribution. If n = 8, determine k such that

IEOR E4703: Monte-Carlo Simulation

Master s Written Examination

Proving the central limit theorem

Comprehensive Examination Quantitative Methods Spring, 2018

Inverse Statistical Learning

STAT 461/561- Assignments, Year 2015

Lecture 28: Asymptotic confidence sets

Mathematical statistics

Math 494: Mathematical Statistics

Lecture 4: September Reminder: convergence of sequences

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Hypothesis testing: theory and methods

SDS : Theoretical Statistics

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Generalization Bounds and Stability

Inference For High Dimensional M-estimates. Fixed Design Results

The regression model with one fixed regressor cont d

Recitation 5. Inference and Power Calculations. Yiqing Xu. March 7, 2014 MIT

simple if it completely specifies the density of x

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Stat 5102 Final Exam May 14, 2015

Statistics 3858 : Maximum Likelihood Estimators

K-sample subsampling in general spaces: the case of independent time series

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

M(t) = 1 t. (1 t), 6 M (0) = 20 P (95. X i 110) i=1

Reliable Inference in Conditions of Extreme Events. Adriana Cornea

Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators

Lecture 7 Introduction to Statistical Decision Theory

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

A Very Brief Summary of Statistical Inference, and Examples

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

STAT Homework X - Solutions

Problem Selected Scores

Economics 583: Econometric Theory I A Primer on Asymptotics

Spring 2012 Math 541B Exam 1

Math 494: Mathematical Statistics

Inference for identifiable parameters in partially identified econometric models

One-Sample Numerical Data

MAS113 Introduction to Probability and Statistics

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

Advanced Statistics II: Non Parametric Tests

Estimation of the quantile function using Bernstein-Durrmeyer polynomials

STAT 200C: High-dimensional Statistics

Summary of Chapters 7-9

Statistical Inference of Covariate-Adjusted Randomized Experiments

Bootstrap & Confidence/Prediction intervals

Statistics. Statistics

Cross-Validation with Confidence

Linear models and their mathematical foundations: Simple linear regression

The Numerical Delta Method and Bootstrap

Large Sample Theory. Consider a sequence of random variables Z 1, Z 2,..., Z n. Convergence in probability: Z n

Preliminary Statistics Lecture 5: Hypothesis Testing (Outline)

HT Introduction. P(X i = x i ) = e λ λ x i

Transcription:

Lecture 13: 2011

Bootstrap ) R n x n, θ P)) = τ n ˆθn θ P) Example: ˆθn = X n, τ n = n, θ = EX = µ P) ˆθ = min X n, τ n = n, θ P) = sup{x : F x) 0} ) Define: J n P), the distribution of τ n ˆθ n θ P) under P. For real ˆθ n, ) ) J n x, P) Prob P τ n ˆθn θ P) x Since P is unknown, θ P) is unknown, and J n x, P) is also unknown.

The bootstrap estimate J n x, P) by J n x, ˆP n ), where ˆP n is a consistent estimate of P in some sense. For example, take ˆP n x) = 1 n n i=1 1 X i x) sup x ˆPn x) P x) 0 a.s. Similarly estimate 1 α)th quantile of J n x, P) by J n x, ˆP n ): i.e. Estimate Jn 1 x, P) by Jn 1 x, ˆP n ). Usually J n x, ˆP n ) can t be explicitly calculated, use MC: J n x, ˆP n ) 1 B for ˆθ n,i = ˆθ X 1,i,..., X n,i). B i=1 1 τ n ˆθn,i ˆθ ) ) n x

When bootstrap works, for each x, J n x, ˆP n ) J n x, P) p 0 = Jn 1 1 α, ˆP n ) Jn 1 1 α, P) p 0 When should Bootstrap work? Need local uniformity in weak convergence: Usually J n x, P) J x, P). Usually ˆP n a.s. P in some sense, say sup x ˆP n x) P x) a.s. 0 Suppose for each sequence P n s.t. P n P, say sup x P n P 0, it is also true that J n x, P n ) J x, P), then it must be true that a.s. J n x, ˆP n ) J x, P) So it ends up having to show for P n P, J n x, P n ) J x, P), use triangular array formulation.

Case When Bootstrap Works Sample mean with finite variance. sup x ˆF n x) F x) a.s. 0. θˆf n ) = 1 n n i=1 X a.s. i θ F ) = EX ). σ 2 ˆF n ) = 1 n n i=1 Xi X ) 2 a.s. n σ 2 F ) = VarX ). Use Linderberg-Feller for the triangular array, applied to the deterministic sequence of P n such that: 1) sup x P n x) P n x) 0; 2) θ P n ) θ P); 3) σ 2 P n ) σ 2 P), it can be shown that n Xn θ P n ) ) d N 0, σ 2 ) under P n. Since ˆP n satisfies 1,2,3 a.s., therefore J n x, ˆP n ) a.s. J x, P) So local uniformity of weak convergence is satisfied here.

Cases When Bootstrap Fails Order Statistics: F U 0, θ), and X 1),..., X n) is the order statistics of the sample, so X n) is the maximum: P n θ X ) n) > x = P X n) < θ θx ) = P X i < θ θx ) n θ n n 1 = θ θx )) n = 1 x ) n n e x θ n n The bootstrap version: P n X n) Xn) ) = 0 = 1 1 1 ) n ) X n) n n 1 e 1) 0.63

Degenerate U-statistics: Take w x, y) = xy, θ F ) = w x, y) df x) df y) = µ F ) 2. If µ F ) 0 it is known that ) 1 ˆθ n = θ ˆFn = X i X j n n 1) i j S x) = xydf y) = xµ F ) n ˆθn θ) d N 0, 4Var S X ))) = N 0, 4 µ 2 EX 2 µ 4)) The bootstrap works.

But if µ F ) = 0 = θ F ) = 0: 1 θˆf n ) = X i X j = X n 2 1 n n 1) n i j ) ) n θ ˆF n θ F ) = n X n 2 Sn 2 1 n 1 i Xi X n ) 2 = X 2 n S 2 n n d N 0, σ 2) σ 2 [ ) )] However the bootstrap version of n θ ˆF n θ ˆF n : [ n X n 2 1 ] [ n S n 2 X n 2 1 ]) n S n 2 2 = n X n Sn 2 n X n 2 + Sn 2 n 2 X n X n 2 ) = [ n X n X n )] 2 + 2 n X n X n ) n Xn d N 0, σ 2) 2 + 2N 0, σ 2 ) n X n

Subsampling iid case: Y i block of size b from X 1,..., X n ), i = 1,..., q, for q = n b). Let ˆθn,b,i = ˆθ Y i ) calculated with ith block of data. Use the empirical distribution of τ b ˆθ n,b,i ˆθ) over the q pseudo-estimates to approximate the distribution of τ n ˆθ θ): ) ) Approximate J n x, P) =P τ n ˆθ n θ x by L n,b x) =q 1 q i=1 1 τ b ˆθn,b,i ˆθ ) ) n x d Claim: If b, b/n 0, τ b /τ n 0, as long as τ n ˆθ θ) something, J n x, P) L n,b x) p 0

Different Motivation for Subsampling vs. Bootstrap Subsampling: Each subset of size b comes from the TRUE model. Since d τ n ˆθ n θ) J x, P), so as long as b : τ b ˆθ b θ) d J x, P) The distributions of τ n ˆθ n θ) and τ b ˆθ b θ) should be close. But τ b ˆθ b θ) = τ b ˆθ b ˆθ n ) + τ b ˆθ n θ). Since ) ) τb τ b ˆθn θ = O p = o p 1) τ n The distributions of τ b ˆθ b θ) and τ b ˆθ b ˆθ n ) should be close. The distribution of τ b ˆθ b ˆθ n ) is estimated by the empirical distribution over q = n b) pseudo-estimates.

Bootstrap: Recalculate the statistics from the ESTIMATED model ˆP n. Given that ˆP n is close to P, hopefully J n x, ˆP n ) is close to J n x, P) Or to J x, P), the limit distribution). But when bootstrap fails ˆP n P J n x, ˆP n ) J x, P)

Formal Proof of Consistency of Subsampling d Assumptions: τ n ˆθ n θ) J x, P), b, b n 0, τ b τ n 0. Need to show: L n,b x) J x, P) p 0. Since τ θ n θ) U n,b x) = q 1 p 0, it is enough to show q i=1 ) ) p 1 τ b ˆθn,b,i θ x J x, P). U n,b x) is a bth order U-statistics with kernel function bounded by 1, 1). U n,b x) J x, P) = U n,b x) EU n,b x) + EU n,b x) J x, P), it is enough to show U n,b x) EU n,b x) p 0 and EU n,b x) J x, P) 0

But EU n,b x) J x, P) = J b x, P) 0 Use Hoeffding exponential-type inequality Serfling1980), Thm A. p201): P U n,b x) J b x, P) ɛ) exp 2 n ) b ɛ2 / [1 1)] = exp n b t2) 0 as n. b So Q.E.D. L n,b x) J x, P) = L n,b x) U n,b x) + U n,b x) J b x, P) + J b x, P) J x, P) p 0.

Time Series Respect the ordering of the data to preserve correlation. ˆθ n,b,t = ˆθ b X t,..., X t+b 1 ), q = T b + 1. L n,b x) = 1 q ) ) 1 τ b ˆθ n,b,t ˆθ n x q i=1 d Assumption: τ n ˆθ n θ) J x, P), b, b n 0, τ b τ n 0, α m) 0. Result: L n,b x) J x, P) p 0. Most difficult part: To show τ n ˆθ n θ) d J x, P).

Can treat iid data as time series, or even using non-overlapping blocks k = [ n b ], but using n b) more efficient. For example, if then Ū n x) = k 1 k 1 τ b [R n,b,j θ P)] x) j=1 U n,b x) = E [ Ū n x) X n ] = E [1 τb [R n,b,j θ P)] x) X n ] for X n = X 1),..., X n) ). U n,b x) is better than Ū n x) since X n is sufficient statistics for iid data.

Hypothesis Testing: T n = τ n t n X 1,..., X n ), G n x, P) = Prob p τ n x) P P0 J x, P) q q Ĝ n,b x) = q 1 1 T n,b,i x) = q 1 1 τ b t n,b,i x) i=1 i=1 As long as b, b n 0, then under P P 0: Ĝ n,b x) G x, P) If under P P 1, T n, then x, Ĝ n,b x) 0. Key difference with confidence interval: don t need τ b τ n 0, because don t need to estimate θ 0 but assumed known under the null hypothesis.

Estimating the unknown rate of convergence Assume that τ n = n β, for some unknown β > 0. Estimate β using different size of subsampling distribution. Key idea: Compare the shape of the empirical distributions of ˆθ b ˆθ n for different values of b to infer the value of β. Let q = n b) for iid data, or q = T b + 1 for time series data: This implies L n,b x τ b ) q 1 L n,b x 1) q 1 q a=1 q a=1 1 τ b ˆθn,b,a ˆθ ) ) n x ) 1 ˆθ n,b,a ˆθ n x L n,b x τ b ) = L n,b τ 1 b x 1 ) t

x = L 1 n,b t τ b) = τ b τ 1 b x) = τ b L 1 n,b t 1) p Since L n,b x τ b ) J x, P), if J x, P) is continuous and increasing, it can be infered that Same as So L 1 n,b t τ b) = J 1 t, P) + o p 1) τ b L 1 n,b t 1) = J 1 t, P) + o p 1) b β L 1 n,b t 1) = J 1 t, P) + o p 1) Assuming J 1 t, P) > 0, or t > J 0, P), take log.

For different b 1 and b 2, then this becomes ) β log b 1 + log L 1 n,b 1 t 1) = log J 1 t, P) + o p 1) ) β log b 2 + log L 1 n,b 2 t 1) = log J 1 t, P) + o p 1) Different out the fixed effect ) β log b 1 log b 2 ) = log L 1 n,b 2 t 1) ) log L 1 n,b 1 t 1) + o p 1) So estimate β by ˆβ ) )) = log b 1 log b 2 ) 1 log L 1 n,b 2 t 1) log L 1 n,b 1 t 1) = β + log b 1 log b 2 ) 1 o p 1) Take b 1 = n γ1, b 2 = n γ2, 1 γ 1 > γ 2 > 0) ˆβ β = γ 1 γ 2 ) log n) 1 o p 1) = o p log n) 1)

How to know t > J 0, P), L n,b 0 τ b ) = L n,b 0 1) = J 0, P) + o p 1) So estimating J 0, P) not a problem. Alternatively, take t 2 0.5, 1), take t 1 0, 0.5) ) b β L 1 n,b t 2 1) L 1 n,b t 1 1) = J 1 t 2 P) J 1 t 1 P) + o p 1) ) β log b + log L 1 n,b t 2 1) L 1 n,b t 1 1) = log J 1 t 2 P) J 1 t 1 P) ) + o p 1) [ ˆβ = log b 1 log b 2 ) 1 ) log L 1 n,b 2 t 2 1) L 1 n,b 2 t 1 1) )] log L 1 n,b 1 t 2 1) L 1 n,b 1 t 1 1) Take b 1 = n γ1, b 2 = n γ2 1 > γ 1 > γ 2 > 0), ˆβ β = o p log n) 1 ).

Two Step Subsampling ˆτ n = n ˆβ L n,b x ˆτ b ) = q 1 q a=1 1 ˆτ b ˆθn,b,a ˆθ ) ) n x Can show that sup x L n,b x ˆτ b ) J x, P) p 0. Problem: imprecise in small samples. In variation estimation, best choice of b gives On 1/3 ) error rate. Parameter estimates, if model is true, gives On 1/2 ) error rate. Bootstrap pivotal statistics, when applicable, gives even better than On 1/2 ) error rate.