Lecture 33: Bootstrap

Size: px

Start display at page:

Download "Lecture 33: Bootstrap"

Hubert McCarthy
5 years ago
Views:

1 Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece sets. Let Var( θ) be the variace or asymptotic variace of a estimator θ. Traditioal approach to estimate Var( θ): Derivatio ad substitutio First, we derive a theoretical formula Approximatio (asymptotic theory) is usually eeded The formula may deped o ukow quatities We the substitute ukow quatities by estimators Example: the δ-method Y 1,...,Y are iid (k-dimesioal) θ = g(µ) (e.g., a ratio of two compoets of µ), θ = g(ȳ ) Var( θ) [ g(µ)] T Var(Ȳ ) g(µ) A estimator of Var( θ) is [ g(ȳ )]T (S 2 /) g(ȳ ) Is the derivative g always easy to derive? UW-Madiso (Statistics) Stat 710, Lecture 33 Ja / 13

2 Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece sets. Let Var( θ) be the variace or asymptotic variace of a estimator θ. Traditioal approach to estimate Var( θ): Derivatio ad substitutio First, we derive a theoretical formula Approximatio (asymptotic theory) is usually eeded The formula may deped o ukow quatities We the substitute ukow quatities by estimators Example: the δ-method Y 1,...,Y are iid (k-dimesioal) θ = g(µ) (e.g., a ratio of two compoets of µ), θ = g(ȳ ) Var( θ) [ g(µ)] T Var(Ȳ ) g(µ) A estimator of Var( θ) is [ g(ȳ )]T (S 2 /) g(ȳ ) Is the derivative g always easy to derive? UW-Madiso (Statistics) Stat 710, Lecture 33 Ja / 13

3 A alterative? Suppose we ca idepedetly obtai copies of the data set X Say X 1,...,X The we ca calculate θ b = θ(x b ), b = 1,..., Variace of θ ca be estimated as 1 b=1 ( θ b 1 ) 2 θ l l=1 I fact, the cdf G(t) = P( θ θ t) ca be estimated as 1 I ( θ b θ t b=1 I ( θ b θ ) t = # of b s such that θ b θ t ) = 1 if θ b θ t ad 0 otherwise No derivatio is eeded These estimators are valid for large (, law of large umbers) ut typically, we oly have oe dataset, X UW-Madiso (Statistics) Stat 710, Lecture 33 Ja / 13

4 ootstrap Ca we apply the same idea by creatig pseudo-replicate datasets? This meas X 1,...,X are copies of X, but they are ot idepedet of X (i fact, they are depedet o X) Is 1 ( θ b 1 b=1 still a valid estimator of Var( θ)? The aswer to this questio depeds o how the sample X is take how X 1,...,X are costructed the type of the estimator, θ ) 2 θ l l=1 A heuristic descriptio for the bootstrap P: the populatio producig data X P: a estimated of the populatio based o data X X : the bootstrap data produced by P UW-Madiso (Statistics) Stat 710, Lecture 33 Ja / 13

5 ootstrap Ca we apply the same idea by creatig pseudo-replicate datasets? This meas X 1,...,X are copies of X, but they are ot idepedet of X (i fact, they are depedet o X) Is 1 ( θ b 1 b=1 still a valid estimator of Var( θ)? The aswer to this questio depeds o how the sample X is take how X 1,...,X are costructed the type of the estimator, θ ) 2 θ l l=1 A heuristic descriptio for the bootstrap P: the populatio producig data X P: a estimated of the populatio based o data X X : the bootstrap data produced by P UW-Madiso (Statistics) Stat 710, Lecture 33 Ja / 13

6 A heuristic descriptio for the bootstrap real world P X θ = θ(x) bootstrap P X θ = θ(x ) Var( θ) ca be approximated by Var ( θ ), the variace take uder the bootstrap samplig coditioed o X. If P is close to P, the the samplig properties of θ is close to that of θ, coditioal o X Var ( θ ) is close to Var( θ) Ĝ(t) is close to G(t) Note that Var ( θ ) is a fuctio of X ad is a estimator. If it has a explicit form, the it ca be directly used. If ot, the we use the Mote Carlo approximatio: Var ( θ ) 1 b=1 ( θ b 1 ) 2 θ l l=1 θ b = θ(x b ) ad X 1,...,X are iid bootstrap data sets (copies of X ). How do we geerate X based o X? UW-Madiso (Statistics) Stat 710, Lecture 33 Ja / 13

7 Parametric bootstrap Let X 1,...,X be iid with a cdf F θ where θ is a ukow parameter vector ad F θ is kow whe θ is kow. Let θ be a estimator of θ based o X = (X 1,...,X ). Parametric bootstrap data set X = (X 1,...,X ) is obtaied by geerate iid X 1,...,X from F θ. Example: locatio-scale problems ( Let F θ (x) = F x µ ) 0 σ, where µ = E(X1 ), σ 2 =Var(X 1 ) ad F 0 is a kow cdf. Let X be the sample mea, S 2 be the sample variace, ad ( X µ) T = = X S i µ S The distributio of T does ot deped o ay parameter. It is the t-distributio with degrees of freedom 1 if F 0 is ormal. Otherwise its explicit form is ukow. UW-Madiso (Statistics) Stat 710, Lecture 33 Ja / 13

8 Parametric bootstrap Let X 1,...,X be iid with a cdf F θ where θ is a ukow parameter vector ad F θ is kow whe θ is kow. Let θ be a estimator of θ based o X = (X 1,...,X ). Parametric bootstrap data set X = (X 1,...,X ) is obtaied by geerate iid X 1,...,X from F θ. Example: locatio-scale problems ( Let F θ (x) = F x µ ) 0 σ, where µ = E(X1 ), σ 2 =Var(X 1 ) ad F 0 is a kow cdf. Let X be the sample mea, S 2 be the sample variace, ad ( X µ) T = = X S i µ S The distributio of T does ot deped o ay parameter. It is the t-distributio with degrees of freedom 1 if F 0 is ormal. Otherwise its explicit form is ukow. UW-Madiso (Statistics) Stat 710, Lecture 33 Ja / 13

9 Example (cotiued) Let θ = ( X,S 2 ) Geerate iid X i, i = 1,...,, from F θ. The (X i X)/S 2 F 0 T = Xi b Ȳ S T The parametric bootstrap is prefect: Var (T ) = Var(T ). If we calculate Var (T ) by Mote Carlo approximatio, the the parametric bootstrap is exactly the same as the simulatio approach. I geeral, if there is a fuctio τ such that Var θ ( θ) = τ(θ), X 1,...,X are iid from F θ the Var θ ( θ ) = τ( θ), X1,...,X are iid from F θ Hece, the parametric bootstrap is simply the substitutio approach. If θ is cosistet ad τ is cotiuous, the Var θ ( θ ) is cosistet. If τ does ot have a close form, we apply Mote Carlo approximatio. I the locatio-scale example, τ = a costat ad hece the bootstrap is prefect. UW-Madiso (Statistics) Stat 710, Lecture 33 Ja / 13

10 Example Let X 1,...,X be iid from F θ. Defie µ = µ(θ) = E θ (X 1 ), µ j = µ j (θ) = E θ (X 1 µ) j, j = 2,3,4. Cosider the estimatio of µ 2 by X 2. A direct calculatio shows that Var θ ( X 2 ) = 4[µ(θ)]2 µ 2 (θ) + 4µ(θ)µ 3(θ) 2 + µ 4(θ) 3 ased o the previous discussio, the parametric bootstrap variace estimator is Var θ ( X 2 ) = 4[µ( θ)] 2 µ 2 ( θ) + 4µ( θ)µ 3 ( θ) 2 + µ 4( θ) 3 It is a cosistet estimator if µ, µ j, j = 2,3,4, are cotiuous fuctios. If we apply the asymptotic approach, the we estimate Var θ ( X 2 ) by 4[µ( θ)] 2 µ 2 ( θ) UW-Madiso (Statistics) Stat 710, Lecture 33 Ja / 13

11 Noparametric bootstrap Without ay model, we ca apply the simple oparametric bootstrap. If X = (X 1,...,X ), X 1,...,X are iid, the P is the cdf of X 1 ad P is the empirical cdf based o X 1,...,X. If we geerate iid bootstrap data X 1,...,X from P, the it is the same as takig a simple radom sample with replacemet from X. Property of Var ( θ ) Cosider first θ = X, the sample mea, θ = X, the sample mea of X1,...,X. E ( X ) = 1 E (Xi ) = 1 X = X Var ( X ) = 1 2 Var (Xi ) = 1 2 = 1 2 j=1 (X j X) 2 = 1 1. UW-Madiso (Statistics) Stat 710, Lecture 33 Ja / 13 Whe is small, we may make a adjustmet of 2 1 j=1 S2 S2 (X j X) 2

12 Noparametric bootstrap Without ay model, we ca apply the simple oparametric bootstrap. If X = (X 1,...,X ), X 1,...,X are iid, the P is the cdf of X 1 ad P is the empirical cdf based o X 1,...,X. If we geerate iid bootstrap data X 1,...,X from P, the it is the same as takig a simple radom sample with replacemet from X. Property of Var ( θ ) Cosider first θ = X, the sample mea, θ = X, the sample mea of X1,...,X. E ( X ) = 1 E (Xi ) = 1 X = X Var ( X ) = 1 2 Var (Xi ) = 1 2 = 1 2 j=1 (X j X) 2 = 1 1. UW-Madiso (Statistics) Stat 710, Lecture 33 Ja / 13 Whe is small, we may make a adjustmet of 2 1 j=1 S2 S2 (X j X) 2

13 Property of Var ( θ ) Cosider ext the estimatio of g(µ), where µ = E(X 1 ) ad g is a cotiuously differetiable fuctio. Our estimator is θ = g( X). The bootstrap aalog is θ = g( X ). Whe is large, Hece, g( X ) = g( X) + g ( X)( X X) + g( X) + g ( X)( X X) Var ( θ ) = Var [g( X )] [g ( X)] 2 Var ( X X) = [g ( X)] 2 Var ( X ) 1 [g ( X)] 2 S 2 This result ca be exteded to multivariate X i. UW-Madiso (Statistics) Stat 710, Lecture 33 Ja / 13

14 For θ = g( X), Var ( θ ) 1 [ g( X)] T S 2 g( X)/ the delta-method variace estimator, where S 2 = 1 1 is called the sample covariace matrix. Example (X i X)(X i X) τ Let X 1,...,X be iid from F. Defie µ = E θ (X 1 ), µ j = E θ (X 1 µ) j, j = 2,3,4. Cosider the estimatio of µ 2 by X 2. We still have ad Var( X 2 ) = 4µ2 µ 2 Var ( X 2 ) = 4 X 2 m 2 + 4µµ µ Xm m 4 3 UW-Madiso (Statistics) Stat 710, Lecture 33 Ja / 13

15 For θ = g( X), Var ( θ ) 1 [ g( X)] T S 2 g( X)/ the delta-method variace estimator, where S 2 = 1 1 is called the sample covariace matrix. Example (X i X)(X i X) τ Let X 1,...,X be iid from F. Defie µ = E θ (X 1 ), µ j = E θ (X 1 µ) j, j = 2,3,4. Cosider the estimatio of µ 2 by X 2. We still have ad Var( X 2 ) = 4µ2 µ 2 Var ( X 2 ) = 4 X 2 m 2 + 4µµ µ Xm m 4 3 UW-Madiso (Statistics) Stat 710, Lecture 33 Ja / 13

16 Example (cotiued) where m j = 1 (X i X) j, j = 2,3,4. This is because the mea of the empirical cdf F is X ad the jth cetral momet of F is is m j. I this case, we have a explicit form for the bootstrap variace estimator Var ( X 2 ) so o Mote Carlo is eeded. This bootstrap variace estimator is cosistet, sice sample momets m j s are cosistet for µ j s, by the WLLN. Sice g (x) = 2x whe g(x) = x 2, the use of the approximatio derived eariler shows that Var ( X 2 ) 4 X 2 m 2 which is also cosistet sice the terms igored are of the orders 2 ad 3. I fact, the delta-method produces the variace estimator [g ( X)] 2 S 2 = 4 X 2 S 2 UW-Madiso (Statistics) Stat 710, Lecture 33 Ja / 13

17 The sample media Cosider the sample media Q 1 1/2 = F (1/2), where F is the empirical cdf. For simplicity, assume that = 2m 1 for a iteger m. The, Q 1/2 = X (m). Let X 1,...,X be iid from F. The p k = P {X(m) = X (k) X 1,...,X } = ( j m 1 j=0 ) (k 1) j ( k + 1) j k j ( k) j. This shows that the bootstrap variace estimator for the sample media is ) 2 Var (X(m) ) = p k (X (k) p j X (j). k=1 j=1 UW-Madiso (Statistics) Stat 710, Lecture 33 Ja / 13

18 Discussio I geeral, the expressio Var ( θ ) is usually complicated ad ot explicit. Mote Carlo approximatio is ecessary. I fact, the idea of usig the bootstrap is ot to derive its explicit form (sice it ivolves complex derivatios). The bootstrap is to replace theoretical derivatios by repeated computatios. The user does ot eed to do theoretical derivatios. However, they should be told whe usig the bootstrap produces correct variace estimators ad how to do the bootstrap. The research o the bootstrap methodology still requires theoretical derivatios. UW-Madiso (Statistics) Stat 710, Lecture 33 Ja / 13

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator