Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator σ of σ, the stadard error is defied to be se = σ 2 / A cofidece iterval with approximate 95% coverage probability is [ θ ± 196 se Our strategy for estimatig σ 2 was based o the aalogue/plug-i priciple, ie, replace populatio momets/ukow quatities by their sample momets/estimates We eed kowledge of the expressio formula of σ 2 There are two computatio-itesive resamplig approaches that do the estimatio without requirig kowledge of the expressio of σ 2 Suppose we have some testig statistic W ad we eed to kow its distributio uder the ull hypothesis ad calculate its quatile The approach we took was to fid the asymptotic distributio of W, which was always stadard ormal or χ 2 The quatile of the asymptotic distributio ca be foud easily sice it does ot deped o ay ukow quatity/parameter We use it as approximatio to the true quatile of W Later we will see that there is aother approach to approximatig the true distributio of W Resamplig methods are ow core to moder ecoometrics behid the popularity of the resamplig methods There are least three motivatios Stadard errors are hard to get Suppose X 1,, X is a iid radom sample with mea µ ad variace σ 2 The the stadard error of the sample mea µ = 1 i=1 X i is se = σ 2 / where σ 2 = 1 i=1 X i µ 2 Suppose that X i is cotiuous with desity f X Assume for simplicity that its CDF F X is strictly icreasig The populatio media is m = F 1 X 1/2, ie, Pr X i m = 1/2 We order the data: X 1 X 2 X Defie the sample media: X 2 +X 2 +1 2 if is eve m = media {X 1,, X } = X +1 if is odd It is kow that m m d N 0, 4f X m 2 1 Costructig a plug-i estimator of the asymptotic variace 4f X m 2 1 requires kowledge of oparametric ecoometrics sice we eed to estimate the desity fuctio f X at a poit m There is also some subtle techical issue with this approach For this problem, resamplig methods come to rescue Almost othig else we ca do Suppose X 1,, X is a iid radom sample We wat to test H 0 : X is ormally distributed, ie, for some µ ad σ, X i N µ, σ 2 Remember that empirical 2 1
distributio fuctio F x = 1 i=1 1 X i x is cosistet for F X Ideed, we have a much stroger result: sup F x F X x p 0 Gliveko-Catelli theorem Let Φ µ,σ be the CDF x R of N µ, σ 2 The Kolmogorov Smirov test uses the statistic KS = sup F x Φ µ, σ x, x R where µ = 1 i=1 X i ad σ 2 = 1 i=1 X i µ 2 If H 0 is true, both F ad Φ µ, σ are cosistet for F X ad the statistic KS should be small So a large observed KS is regarded as evidece agaist H 0 We reject H 0 if KS > c We kow that KS d B, for some radom variable B with a very complicated distributio that depeds o ukow parameters So it is ot practically possible to choose c such that Pr B c = 1 α Agai for this problem, resamplig methods come to rescue For the traditioal cofidece iterval θ ±196 se, we kow that Pr θ [ θ ± 196 se 95% as Actually i may cases we ca show that Pr θ [ θ ± 196 se = 95% + O 1, ie, the error Pr θ [ θ ± 196 se 95% goes to zero at the rate 1 Some resamplig-based cofidece iterval [ θ + t 25% se, θ + t 975% se with some ew critical values t 25% ad t 975% has the property Pr θ [ θ + t 25% se, θ + t 975% se = 95% + O 3/2 So the error is smaller ad the coverage accuracy of the resamplig-based cofidece iterval is much better Jackkife Probably jackkife is the first-geeratio resamplig method Suppose X 1,, X is a iid radom sample For simplicity, assume X i is scalar A estimator θ ca be writte as θ = ϕ X 1,, X, eg, ϕ z 1,, z = 1 i=1 z i Suppose we kow θ θ d N 0, σ 2 ad we wat to estimate σ 2 Now deote θ j = ϕ 1 X 1,, X j 1, X j+1,, X, ie, θ j is a estimator obtaied by removig the j-th observatio from the etire sample { θ j } The variatio i : j = 1,, should be iformative about the populatio variace of θ Actually it is iformative about the populatio variace of θ 1 Note that θ a 1 N θ, σ 2 / 1 Deote θ = 1 j θ Now it seems reasoable to thik of 1 θ j 2 θ as a estimate of σ 2 / 1 ad 1 1 θ j 2 θ as a estimate of σ 2 Ideed i may cases oe ca show 1 The Jackkife stadard error is 1 se JK = A jackkife 95% cofidece iterval is θ j θ 2 p σ 2 1 θ j 2 θ [ θ ± 196 se JK If 1 is true, we say that jackkife is 2
cosistet Cosider the followig simple example: for iid radom sample X 1,, X, we use the sample average X as a estimator of µ = EX 1 It is kow that X µ d N 0, σ 2, where σ 2 = Var X 1 For this case, 1 θ j = θ j = 1 X X j, 1 1 1 X X j = X, ad We have θ j θ = 1 X X j X = 1 X X j 1 1 1 ˆθ j ˆθ 2 1 = 1 2 Xj X, which is the sample variace that is a cosistet ad ubiased estimator for σ 2 Note that ulike the plug-i approach, the jackkife approach does ot eve require kowledge of the expressio of σ 2 The limitatio of jackkife is that 1 is ot always true For the case of media, 1 fails ad jackkife is icosistet Bootstrap The secod-geeratio resamplig method is the bootstrap First, let us see how bootstrap gets the stadard error for estimatig the populatio media ad costructs the cofidece iterval for iid radom sample X 1,, X, let m = media {X 1,, X } First we idepedetly draw observatios with replacemet from X 1,, X ad get a set of ew observatios X 1 1,, X 1 The computer ca hadle this for us We repeat this resamplig procedure agai ad agai, B times B is a very large iteger Ideally how B is depeds solely o how powerful our computer is What we have is B bootstrap samples X 1 1 X 1 = m 1 { = media X 2 1 X 2 = m 2 = media X B 1 X B = m B = media } X 1 1,, X 1 { } X 2 1,, X 2 { X B 1,, X B ad for each bootstrap sample, we calculate its sample media We use the sample variace of m 1, m 2,, m B as a estimate of the true variace of m : Var BS m = 1 B { B b=1 The the bootstrap stadard error is se BS = m b 1 B B b=1 m b } 2 Var BS m ad a approximate 95% cofidece iterval usig the bootstrap stadard error is [ m ± 196 se BS I fact, there is aother seemigly } 3
simpler way to costruct the cofidece iterval We order the bootstrap sample medias: m 1 m 2 m B Suppose for simplicity B 25% ad B 975% are both itegers A bootstrap [ percetile cofidece iterval is simply m B 25%, m B 975% The bootstrap procedure we just described is called oparametric bootstrap or empirical bootstrap iveted by Professor Bradley Efro i 1979 The oparametric bootstrap takes the sample as the populatio A bootstrap sample is obtaied by idepedetly drawig observatios from the observed sample with replacemet The bootstrap sample has the same umber of observatios as the origial sample, however some observatios appear several times ad others ever Now we summarize the two procedures we itroduced Suppose we have a estimator which is asymptotically ormal: θ θ d N 0, σ 2 Bootstrap stadard errors Step 1: Draw B idepedet bootstrap samples B ca be as large as possible We ca take B = 1000 Step 2: Estimate θ with each of the bootstrap samples, Step 3: Estimate the stadard error by where θ = B 1 B b b=1 θ se BS = 1 B B b=1 θ b θ 2 θ b for b = 1,, B Step 4: The bootstrap stadard errors ca be used to costruct approximate cofidece itervals, eg, if the coverage probability is 95%, a 95% cofidece iterval is [ θ ± 196 se BS Bootstrap percetile Step 1: Draw B idepedet bootstrap samples B ca be as large as possible We ca take B = 1000 Step 2: Estimate θ with each of the bootstrap samples, Step 3: Order the bootstrap replicatios such that θ 1 θ B θ b for b = 1,, B Step 4: The lower ad upper cofidece bouds are B α /2-th ad B 1 α /2-th ordered elemets For B = 1000 ad α = 5%, these are the 25th ad 975th ordered elemets The estimated 1 α cofidece iterval is [ θ B α/2, θ B 1 α/2 What we did ot discuss is whether the bootstrap is correct We eed to show that for bootstrap stadard errors, ad for the bootstrap percetile cofidece iterval, se BS σ 2 / p 1 2 Pr θ [ θ B α/2, θ B 1 α/2 1 α 3 4
as This is a very difficult problem Below we provide some discussio about why bootstrap works Bootstrap percetile cofidece itervals ofte have more accurate coverage probabilities ie closer to the omial coverage probability 1 α tha the usual cofidece itervals based o stadard ormal quatiles ad estimated variace The bootstrap percetile method is simple but it should ot be abused Loosely, it works i the sese that 3 is true, oly if the estimator is asymptotically ormal Suppose we observe a radom sample X 1,, X from a uiform distributio o [0, θ, where θ > 0 is ukow θ = max {X 1,, X } is a cosistet estimator for θ ad θ θ coverges i distributio to the expoetial distributio For this case, 3 fails The bootstrap percetile method fails to give a asymptotically valid cofidece iterval How/Why Bootstrap Works? Suppose we have a iid radom sample X 1,, X with CDF F X Suppose S = ϕ X 1,, X is a statistic Its distributio should deped o F X : F S x = H x F X = Pr ϕ X 1,, X x We kow that the empirical CDF F X is a step fuctio that jumps at each of X 1,, X with size 1/ So F X is the CDF of a discrete radom variable Z with X 1,, X beig its possible realizatios ad 1/ beig the probability of ay of X 1,, X beig selected: Pr Z = X k = 1, for each k = 1, 2,, A radom observatio from X 1,, X is just a radom variable that has the same distributio as Z observatios radomly draw with replacemet from X 1,, X are just a radom sample from the distributio F X So each bootstrap sample is a iid radom sample from F X Note that the distributio here should be iterpreted as the coditioal distributio give X 1,, X Let X 1,, X be a iid radom sample from F X Let S = ϕ X 1,, X The coditioal CDF give X 1,, X of S is H x F X = Pr ϕ X1,, X x X 1,, X It seems reasoable to estimate H x F X by H x F X sice F X is a very good estimate of F X Similar the variace Var S should deped o F X as well ad it ca be estimated by Var S X 1,, X The true coditioal distributio of X 1,, X is kow We ca use computer simulatios kow as Mote Carlo simulatios to compute Var S X 1,, X The computer draws B very large iid radom samples from F X for us: X 1 1 X 1 iid F X X 2 1 X 2 iid F X X B 1 X B iid F X 5
These are just B idepedet bootstrap samples The, Var S X 1,, X 1 B B b=1 ϕ X b 2 1,, X b ϕ, 4 where ϕ = B 1 B b=1 ϕ X b 1,, X b is the bootstrap sample mea Sice B ca be arbitrarily large, by WLLN, the right had side of 4 should be very close to the left had side What we put forward is just the ituitio about how/why bootstrap works The theoretical proof ad also proof of the key results 2 ad 3 are very difficult Here is some further ituitio Let G x = Pr θ θ x be the distributio fuctio of θ θ If we kew G, we could easily costruct a cofidece iterval [ θ t 1 α/2, θ tα /2,where t α is the α-quatile of G: t α = G 1 α I reality, we do ot kow G ad we ca ofte show that G ca be approximated by the distributio fuctio of N 0, σ 2 The ormal approximatio with N 0, σ 2 requires that σ 2 ca be estimated cosistetly What bootstrap does is alterative approximatio It suggests that the coditioal distributio where θ is the bootstrap aalogue of θ Ĝ x = Pr θ θ x X 1,, X, θ is computed usig the bootstrap radom sample X 1,, X but the same formula as θ The bootstrap radom sample X 1,, X are iid with CDF F X We ca use the computer to geerate as may samples as we wat Ĝ is kow to us sice the distributio of the bootstrap sample is kow Ĝ ca be approximated by computer simulatios Ideed i may cases especially whe θ θ is asymptotically ormal, we have sup Ĝ x G x p 0 x R So the estimatio is cosistet But there are exceptios Bootstrap Refiemet If we have a plug-i estimator for σ ad the estimator σ is cosistet, we have T = θ θ σ d N 0, 1 Note that here σ ca be writte as a fuctio of the data ad we kow its fuctio form For each bootstrap sample b = 1,, B, we ca calculate σ usig the bootstrap sample For example, suppose X 1,, X is a iid radom sample with mea µ ad variace σ 2 Let µ = 1 i=1 X i ad σ 2 = 1 i=1 X i µ 2 We kow T = µ µ σ d N 0, 1 We ca compute σ as σ 2 Bootstrap-t = 1 i=1 X i µ 2, with µ = 1 i=1 X i 6
Step 1: Draw B idepedet bootstrap samples B ca be as large as possible We ca take B = 1000 Step 2: Estimate θ ad σ with each of the bootstrap samples, ad the t-value for each bootstrap sample: θ b t b = θ σ b b θ, σ b Step 3: Order the bootstrap replicatios of t such that t 1 t B for b = 1,, B Step 4: The lower critical value t α/2 ad the upper critical value t 1 α/2 are the the B α /2- th ad B 1 α /2-th ordered elemets For B = 1000 ad α = 5%, these are the 25th ad 975th ordered elemets The bootstrap lower ad upper critical values geerally differ i absolute values The bootstrap-t cofidece iterval is [ θ + t 25% σ, θ + t 975% σ A strikig result is [ Pr θ θ + t 25% σ, θ + t 975% σ = 95% + O 3/2 compared with the cofidece iterval usig the stadard ormal critical values [ Pr θ θ 196 σ, θ + 196 σ = 95% + O 1 This is kow as asymptotic refiemet of bootstrap Residual Bootstrap ad Wild Bootstrap Cosider the cotext of liear regressio Our observed data is X 1, Y 1, X 2, Y 2,, X, Y ad we are iterested i the regressio coefficiets: Y i = α + βx i + e i I this case the oparametric/empirical bootstrap we itroduced works well, i the sese that the bootstrap stadard errors are cosistet ad the bootstrap percetile cofidece itervals have asymptotically correct coverage probabilities Empirical bootstrap treats the pair X, Y as oe object ad each bootstrap sample cosists of idepedet observatios draw with replacemet from the observatios X 1, Y 1, X 2, Y 2,, X, Y There are popular alteratives to the empirical bootstrap Bootstrap stadard errors, percetile cofidece itervals ad bootstrap-t are carried out by followig the same steps The oly thig that chages is how we resample to get the bootstrap samples Let ê i = Y i α βx i, where α, β is the LS estimator We draw fitted residuals idepedetly with replacemet from ê 1,, ê I other words, the bootstrap sample is a iid radom sample 7
ê 1,, ê, where for each i = 1,,, Pr ê i = ê k = 1, for each k = 1, 2,, Now for each i = 1, 2,,, let Xi = X i ad Yi = α + βx i + ê i Note that the idepedet variables are the same i all bootstrap samples This is kow as the residual bootstrap For wild bootstrap, let V 1,, V be computer-geerated idepedet radom variables with mea zero that are also idepedet of the data Now for each i = 1, 2,,, let ê i = V i ê i, X i = X i ad = α + βx i + ê i The most popular distributio for V s is the followig two-poit golde rule distributio: 5 1 /2 with probability 5 + 1 / 2 5 V i = 5 + 1 /2 with probability 5 1 / 2 5 Y i Its theoretical motivatio was provided by Professor Eo Mamme i 1993 Bootstrap Hypothesis Test We ow cosider testig H 0 : θ = θ 0 We ca use ay of the bootstrap-based cofidece itervals ad check if θ 0 is i the cofidece iterval We simply reject H 0 if θ 0 fails to be a elemet of the bootstrap percetile cofidece iterval Sice the t-statistic T = θ θ 0 σ d N 0, 1 uder H 0 We use the stadard ormal distributio as approximatio to the true distributio of T ad defie critical values based o stadard ormal quatile Alteratively, we ca do the followig bootstrap-t test Bootstrap-t test Step 1: Draw B idepedet bootstrap samples B ca be as large as possible We ca take B = 1000 Step 2: Estimate θ ad σ with each of the bootstrap samples, ad the t-value for each bootstrap sample: θ b t b = θ σ b b θ, σ b Step 3: Order the bootstrap replicatios of t such that t 1 t B for b = 1,, B Step 4: The lower critical value t α/2 ad the upper critical value t 1 α/2 are the the B α /2- th ad B 1 α /2-th ordered elemets Reject H 0 if T < t α/2 or T > t 1 α/2 Cautio: a commo mistake is that i Step 2, oe mistakely computes θ b θ 0 σ b The test will have o power if we made this mistake The distributio of the t-statistic T = θ θ 0 σ 8
uder H 1 is differet from that uder H 0 Uder H 1, T is ot cetered: T = θ θ 0 σ = θ θ σ + θ θ0 σ A importat guidelie is that we should always approximate the distributio of T uder H 0, ie, the distributio of θ θ σ 9