Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Similar documents
Lecture 33: Bootstrap

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

32 estimating the cumulative distribution function

Lecture 2: Monte Carlo Simulation

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Topic 9: Sampling Distributions of Estimators

Estimation for Complete Data

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Stat 421-SP2012 Interval Estimation Section

Simulation. Two Rule For Inverting A Distribution Function

Parameter, Statistic and Random Samples

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

Efficient GMM LECTURE 12 GMM II

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Introductory statistics

Confidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation

1 Introduction to reducing variance in Monte Carlo simulations

1.010 Uncertainty in Engineering Fall 2008

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Topic 9: Sampling Distributions of Estimators

Properties and Hypothesis Testing

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Topic 9: Sampling Distributions of Estimators

Last Lecture. Wald Test

Summary. Recap ... Last Lecture. Summary. Theorem

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Output Analysis and Run-Length Control

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

1 Inferential Methods for Correlation and Regression Analysis

Expectation and Variance of a random variable

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Asymptotic Results for the Linear Regression Model

Understanding Samples

6.3 Testing Series With Positive Terms

Lecture 19: Convergence

Module 1 Fundamentals in statistics

A statistical method to determine sample size to estimate characteristic value of soil parameters


ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

5. Likelihood Ratio Tests

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

Random Variables, Sampling and Estimation

4.1 Non-parametric computational estimation

Chapter 6 Sampling Distributions

POWER COMPARISON OF EMPIRICAL LIKELIHOOD RATIO TESTS: SMALL SAMPLE PROPERTIES THROUGH MONTE CARLO STUDIES*

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Parameter, Statistic and Random Samples

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Confidence intervals summary Conservative and approximate confidence intervals for a binomial p Examples. MATH1005 Statistics. Lecture 24. M.

Problem Set 4 Due Oct, 12

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Chi-Squared Tests Math 6070, Spring 2006

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Bayesian Methods: Introduction to Multi-parameter Models

A new distribution-free quantile estimator

MATH/STAT 352: Lecture 15

Lecture 7: Properties of Random Samples

Frequentist Inference

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Statistical inference: example 1. Inferential Statistics

Monte Carlo Integration

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Binomial Distribution

Statistics 511 Additional Materials

MA Advanced Econometrics: Properties of Least Squares Estimators

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

Element sampling: Part 2

Lecture Notes 15 Hypothesis Testing (Chapter 10)

Posted-Price, Sealed-Bid Auctions

TAMS24: Notations and Formulas

Department of Mathematics

A goodness-of-fit test based on the empirical characteristic function and a comparison of tests for normality

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

A NEW METHOD FOR CONSTRUCTING APPROXIMATE CONFIDENCE INTERVALS FOR M-ESTU1ATES. Dennis D. Boos

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

Estimation of a population proportion March 23,

Stochastic Simulation

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Stat 200 -Testing Summary Page 1

7.1 Convergence of sequences of random variables

Stat 319 Theory of Statistics (2) Exercises

STA6938-Logistic Regression Model

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

CSE 527, Additional notes on MLE & EM

STATISTICAL INFERENCE

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

Statistical Inference Based on Extremum Estimators

Questions and Answers on Maximum Likelihood

Transcription:

Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator σ of σ, the stadard error is defied to be se = σ 2 / A cofidece iterval with approximate 95% coverage probability is [ θ ± 196 se Our strategy for estimatig σ 2 was based o the aalogue/plug-i priciple, ie, replace populatio momets/ukow quatities by their sample momets/estimates We eed kowledge of the expressio formula of σ 2 There are two computatio-itesive resamplig approaches that do the estimatio without requirig kowledge of the expressio of σ 2 Suppose we have some testig statistic W ad we eed to kow its distributio uder the ull hypothesis ad calculate its quatile The approach we took was to fid the asymptotic distributio of W, which was always stadard ormal or χ 2 The quatile of the asymptotic distributio ca be foud easily sice it does ot deped o ay ukow quatity/parameter We use it as approximatio to the true quatile of W Later we will see that there is aother approach to approximatig the true distributio of W Resamplig methods are ow core to moder ecoometrics behid the popularity of the resamplig methods There are least three motivatios Stadard errors are hard to get Suppose X 1,, X is a iid radom sample with mea µ ad variace σ 2 The the stadard error of the sample mea µ = 1 i=1 X i is se = σ 2 / where σ 2 = 1 i=1 X i µ 2 Suppose that X i is cotiuous with desity f X Assume for simplicity that its CDF F X is strictly icreasig The populatio media is m = F 1 X 1/2, ie, Pr X i m = 1/2 We order the data: X 1 X 2 X Defie the sample media: X 2 +X 2 +1 2 if is eve m = media {X 1,, X } = X +1 if is odd It is kow that m m d N 0, 4f X m 2 1 Costructig a plug-i estimator of the asymptotic variace 4f X m 2 1 requires kowledge of oparametric ecoometrics sice we eed to estimate the desity fuctio f X at a poit m There is also some subtle techical issue with this approach For this problem, resamplig methods come to rescue Almost othig else we ca do Suppose X 1,, X is a iid radom sample We wat to test H 0 : X is ormally distributed, ie, for some µ ad σ, X i N µ, σ 2 Remember that empirical 2 1

distributio fuctio F x = 1 i=1 1 X i x is cosistet for F X Ideed, we have a much stroger result: sup F x F X x p 0 Gliveko-Catelli theorem Let Φ µ,σ be the CDF x R of N µ, σ 2 The Kolmogorov Smirov test uses the statistic KS = sup F x Φ µ, σ x, x R where µ = 1 i=1 X i ad σ 2 = 1 i=1 X i µ 2 If H 0 is true, both F ad Φ µ, σ are cosistet for F X ad the statistic KS should be small So a large observed KS is regarded as evidece agaist H 0 We reject H 0 if KS > c We kow that KS d B, for some radom variable B with a very complicated distributio that depeds o ukow parameters So it is ot practically possible to choose c such that Pr B c = 1 α Agai for this problem, resamplig methods come to rescue For the traditioal cofidece iterval θ ±196 se, we kow that Pr θ [ θ ± 196 se 95% as Actually i may cases we ca show that Pr θ [ θ ± 196 se = 95% + O 1, ie, the error Pr θ [ θ ± 196 se 95% goes to zero at the rate 1 Some resamplig-based cofidece iterval [ θ + t 25% se, θ + t 975% se with some ew critical values t 25% ad t 975% has the property Pr θ [ θ + t 25% se, θ + t 975% se = 95% + O 3/2 So the error is smaller ad the coverage accuracy of the resamplig-based cofidece iterval is much better Jackkife Probably jackkife is the first-geeratio resamplig method Suppose X 1,, X is a iid radom sample For simplicity, assume X i is scalar A estimator θ ca be writte as θ = ϕ X 1,, X, eg, ϕ z 1,, z = 1 i=1 z i Suppose we kow θ θ d N 0, σ 2 ad we wat to estimate σ 2 Now deote θ j = ϕ 1 X 1,, X j 1, X j+1,, X, ie, θ j is a estimator obtaied by removig the j-th observatio from the etire sample { θ j } The variatio i : j = 1,, should be iformative about the populatio variace of θ Actually it is iformative about the populatio variace of θ 1 Note that θ a 1 N θ, σ 2 / 1 Deote θ = 1 j θ Now it seems reasoable to thik of 1 θ j 2 θ as a estimate of σ 2 / 1 ad 1 1 θ j 2 θ as a estimate of σ 2 Ideed i may cases oe ca show 1 The Jackkife stadard error is 1 se JK = A jackkife 95% cofidece iterval is θ j θ 2 p σ 2 1 θ j 2 θ [ θ ± 196 se JK If 1 is true, we say that jackkife is 2

cosistet Cosider the followig simple example: for iid radom sample X 1,, X, we use the sample average X as a estimator of µ = EX 1 It is kow that X µ d N 0, σ 2, where σ 2 = Var X 1 For this case, 1 θ j = θ j = 1 X X j, 1 1 1 X X j = X, ad We have θ j θ = 1 X X j X = 1 X X j 1 1 1 ˆθ j ˆθ 2 1 = 1 2 Xj X, which is the sample variace that is a cosistet ad ubiased estimator for σ 2 Note that ulike the plug-i approach, the jackkife approach does ot eve require kowledge of the expressio of σ 2 The limitatio of jackkife is that 1 is ot always true For the case of media, 1 fails ad jackkife is icosistet Bootstrap The secod-geeratio resamplig method is the bootstrap First, let us see how bootstrap gets the stadard error for estimatig the populatio media ad costructs the cofidece iterval for iid radom sample X 1,, X, let m = media {X 1,, X } First we idepedetly draw observatios with replacemet from X 1,, X ad get a set of ew observatios X 1 1,, X 1 The computer ca hadle this for us We repeat this resamplig procedure agai ad agai, B times B is a very large iteger Ideally how B is depeds solely o how powerful our computer is What we have is B bootstrap samples X 1 1 X 1 = m 1 { = media X 2 1 X 2 = m 2 = media X B 1 X B = m B = media } X 1 1,, X 1 { } X 2 1,, X 2 { X B 1,, X B ad for each bootstrap sample, we calculate its sample media We use the sample variace of m 1, m 2,, m B as a estimate of the true variace of m : Var BS m = 1 B { B b=1 The the bootstrap stadard error is se BS = m b 1 B B b=1 m b } 2 Var BS m ad a approximate 95% cofidece iterval usig the bootstrap stadard error is [ m ± 196 se BS I fact, there is aother seemigly } 3

simpler way to costruct the cofidece iterval We order the bootstrap sample medias: m 1 m 2 m B Suppose for simplicity B 25% ad B 975% are both itegers A bootstrap [ percetile cofidece iterval is simply m B 25%, m B 975% The bootstrap procedure we just described is called oparametric bootstrap or empirical bootstrap iveted by Professor Bradley Efro i 1979 The oparametric bootstrap takes the sample as the populatio A bootstrap sample is obtaied by idepedetly drawig observatios from the observed sample with replacemet The bootstrap sample has the same umber of observatios as the origial sample, however some observatios appear several times ad others ever Now we summarize the two procedures we itroduced Suppose we have a estimator which is asymptotically ormal: θ θ d N 0, σ 2 Bootstrap stadard errors Step 1: Draw B idepedet bootstrap samples B ca be as large as possible We ca take B = 1000 Step 2: Estimate θ with each of the bootstrap samples, Step 3: Estimate the stadard error by where θ = B 1 B b b=1 θ se BS = 1 B B b=1 θ b θ 2 θ b for b = 1,, B Step 4: The bootstrap stadard errors ca be used to costruct approximate cofidece itervals, eg, if the coverage probability is 95%, a 95% cofidece iterval is [ θ ± 196 se BS Bootstrap percetile Step 1: Draw B idepedet bootstrap samples B ca be as large as possible We ca take B = 1000 Step 2: Estimate θ with each of the bootstrap samples, Step 3: Order the bootstrap replicatios such that θ 1 θ B θ b for b = 1,, B Step 4: The lower ad upper cofidece bouds are B α /2-th ad B 1 α /2-th ordered elemets For B = 1000 ad α = 5%, these are the 25th ad 975th ordered elemets The estimated 1 α cofidece iterval is [ θ B α/2, θ B 1 α/2 What we did ot discuss is whether the bootstrap is correct We eed to show that for bootstrap stadard errors, ad for the bootstrap percetile cofidece iterval, se BS σ 2 / p 1 2 Pr θ [ θ B α/2, θ B 1 α/2 1 α 3 4

as This is a very difficult problem Below we provide some discussio about why bootstrap works Bootstrap percetile cofidece itervals ofte have more accurate coverage probabilities ie closer to the omial coverage probability 1 α tha the usual cofidece itervals based o stadard ormal quatiles ad estimated variace The bootstrap percetile method is simple but it should ot be abused Loosely, it works i the sese that 3 is true, oly if the estimator is asymptotically ormal Suppose we observe a radom sample X 1,, X from a uiform distributio o [0, θ, where θ > 0 is ukow θ = max {X 1,, X } is a cosistet estimator for θ ad θ θ coverges i distributio to the expoetial distributio For this case, 3 fails The bootstrap percetile method fails to give a asymptotically valid cofidece iterval How/Why Bootstrap Works? Suppose we have a iid radom sample X 1,, X with CDF F X Suppose S = ϕ X 1,, X is a statistic Its distributio should deped o F X : F S x = H x F X = Pr ϕ X 1,, X x We kow that the empirical CDF F X is a step fuctio that jumps at each of X 1,, X with size 1/ So F X is the CDF of a discrete radom variable Z with X 1,, X beig its possible realizatios ad 1/ beig the probability of ay of X 1,, X beig selected: Pr Z = X k = 1, for each k = 1, 2,, A radom observatio from X 1,, X is just a radom variable that has the same distributio as Z observatios radomly draw with replacemet from X 1,, X are just a radom sample from the distributio F X So each bootstrap sample is a iid radom sample from F X Note that the distributio here should be iterpreted as the coditioal distributio give X 1,, X Let X 1,, X be a iid radom sample from F X Let S = ϕ X 1,, X The coditioal CDF give X 1,, X of S is H x F X = Pr ϕ X1,, X x X 1,, X It seems reasoable to estimate H x F X by H x F X sice F X is a very good estimate of F X Similar the variace Var S should deped o F X as well ad it ca be estimated by Var S X 1,, X The true coditioal distributio of X 1,, X is kow We ca use computer simulatios kow as Mote Carlo simulatios to compute Var S X 1,, X The computer draws B very large iid radom samples from F X for us: X 1 1 X 1 iid F X X 2 1 X 2 iid F X X B 1 X B iid F X 5

These are just B idepedet bootstrap samples The, Var S X 1,, X 1 B B b=1 ϕ X b 2 1,, X b ϕ, 4 where ϕ = B 1 B b=1 ϕ X b 1,, X b is the bootstrap sample mea Sice B ca be arbitrarily large, by WLLN, the right had side of 4 should be very close to the left had side What we put forward is just the ituitio about how/why bootstrap works The theoretical proof ad also proof of the key results 2 ad 3 are very difficult Here is some further ituitio Let G x = Pr θ θ x be the distributio fuctio of θ θ If we kew G, we could easily costruct a cofidece iterval [ θ t 1 α/2, θ tα /2,where t α is the α-quatile of G: t α = G 1 α I reality, we do ot kow G ad we ca ofte show that G ca be approximated by the distributio fuctio of N 0, σ 2 The ormal approximatio with N 0, σ 2 requires that σ 2 ca be estimated cosistetly What bootstrap does is alterative approximatio It suggests that the coditioal distributio where θ is the bootstrap aalogue of θ Ĝ x = Pr θ θ x X 1,, X, θ is computed usig the bootstrap radom sample X 1,, X but the same formula as θ The bootstrap radom sample X 1,, X are iid with CDF F X We ca use the computer to geerate as may samples as we wat Ĝ is kow to us sice the distributio of the bootstrap sample is kow Ĝ ca be approximated by computer simulatios Ideed i may cases especially whe θ θ is asymptotically ormal, we have sup Ĝ x G x p 0 x R So the estimatio is cosistet But there are exceptios Bootstrap Refiemet If we have a plug-i estimator for σ ad the estimator σ is cosistet, we have T = θ θ σ d N 0, 1 Note that here σ ca be writte as a fuctio of the data ad we kow its fuctio form For each bootstrap sample b = 1,, B, we ca calculate σ usig the bootstrap sample For example, suppose X 1,, X is a iid radom sample with mea µ ad variace σ 2 Let µ = 1 i=1 X i ad σ 2 = 1 i=1 X i µ 2 We kow T = µ µ σ d N 0, 1 We ca compute σ as σ 2 Bootstrap-t = 1 i=1 X i µ 2, with µ = 1 i=1 X i 6

Step 1: Draw B idepedet bootstrap samples B ca be as large as possible We ca take B = 1000 Step 2: Estimate θ ad σ with each of the bootstrap samples, ad the t-value for each bootstrap sample: θ b t b = θ σ b b θ, σ b Step 3: Order the bootstrap replicatios of t such that t 1 t B for b = 1,, B Step 4: The lower critical value t α/2 ad the upper critical value t 1 α/2 are the the B α /2- th ad B 1 α /2-th ordered elemets For B = 1000 ad α = 5%, these are the 25th ad 975th ordered elemets The bootstrap lower ad upper critical values geerally differ i absolute values The bootstrap-t cofidece iterval is [ θ + t 25% σ, θ + t 975% σ A strikig result is [ Pr θ θ + t 25% σ, θ + t 975% σ = 95% + O 3/2 compared with the cofidece iterval usig the stadard ormal critical values [ Pr θ θ 196 σ, θ + 196 σ = 95% + O 1 This is kow as asymptotic refiemet of bootstrap Residual Bootstrap ad Wild Bootstrap Cosider the cotext of liear regressio Our observed data is X 1, Y 1, X 2, Y 2,, X, Y ad we are iterested i the regressio coefficiets: Y i = α + βx i + e i I this case the oparametric/empirical bootstrap we itroduced works well, i the sese that the bootstrap stadard errors are cosistet ad the bootstrap percetile cofidece itervals have asymptotically correct coverage probabilities Empirical bootstrap treats the pair X, Y as oe object ad each bootstrap sample cosists of idepedet observatios draw with replacemet from the observatios X 1, Y 1, X 2, Y 2,, X, Y There are popular alteratives to the empirical bootstrap Bootstrap stadard errors, percetile cofidece itervals ad bootstrap-t are carried out by followig the same steps The oly thig that chages is how we resample to get the bootstrap samples Let ê i = Y i α βx i, where α, β is the LS estimator We draw fitted residuals idepedetly with replacemet from ê 1,, ê I other words, the bootstrap sample is a iid radom sample 7

ê 1,, ê, where for each i = 1,,, Pr ê i = ê k = 1, for each k = 1, 2,, Now for each i = 1, 2,,, let Xi = X i ad Yi = α + βx i + ê i Note that the idepedet variables are the same i all bootstrap samples This is kow as the residual bootstrap For wild bootstrap, let V 1,, V be computer-geerated idepedet radom variables with mea zero that are also idepedet of the data Now for each i = 1, 2,,, let ê i = V i ê i, X i = X i ad = α + βx i + ê i The most popular distributio for V s is the followig two-poit golde rule distributio: 5 1 /2 with probability 5 + 1 / 2 5 V i = 5 + 1 /2 with probability 5 1 / 2 5 Y i Its theoretical motivatio was provided by Professor Eo Mamme i 1993 Bootstrap Hypothesis Test We ow cosider testig H 0 : θ = θ 0 We ca use ay of the bootstrap-based cofidece itervals ad check if θ 0 is i the cofidece iterval We simply reject H 0 if θ 0 fails to be a elemet of the bootstrap percetile cofidece iterval Sice the t-statistic T = θ θ 0 σ d N 0, 1 uder H 0 We use the stadard ormal distributio as approximatio to the true distributio of T ad defie critical values based o stadard ormal quatile Alteratively, we ca do the followig bootstrap-t test Bootstrap-t test Step 1: Draw B idepedet bootstrap samples B ca be as large as possible We ca take B = 1000 Step 2: Estimate θ ad σ with each of the bootstrap samples, ad the t-value for each bootstrap sample: θ b t b = θ σ b b θ, σ b Step 3: Order the bootstrap replicatios of t such that t 1 t B for b = 1,, B Step 4: The lower critical value t α/2 ad the upper critical value t 1 α/2 are the the B α /2- th ad B 1 α /2-th ordered elemets Reject H 0 if T < t α/2 or T > t 1 α/2 Cautio: a commo mistake is that i Step 2, oe mistakely computes θ b θ 0 σ b The test will have o power if we made this mistake The distributio of the t-statistic T = θ θ 0 σ 8

uder H 1 is differet from that uder H 0 Uder H 1, T is ot cetered: T = θ θ 0 σ = θ θ σ + θ θ0 σ A importat guidelie is that we should always approximate the distributio of T uder H 0, ie, the distributio of θ θ σ 9