Chapter 2: Resampling Maarten Jansen

Size: px

Start display at page:

Download "Chapter 2: Resampling Maarten Jansen"

Bernice Holmes
5 years ago
Views:

1 Chapter 2: Resampling Maarten Jansen Randomization tests Randomized experiment random assignment of sample subjects to groups Example: medical experiment with control group n 1 subjects for true medicine, n 2 subjects for placebo Suppose X 1, X 2 is measure of quality of life (score between. and 1. based on questionnaire among sample subjects X 1, n 1 = 2 X 2, n 2 = Define µ 1 = E(X 1 and µ 2 = E(X 2 Test H : µ 1 = µ 2 H 1 : µ 1 > µ 2 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.1 Randomization tests Test H : δ := µ 1 µ 2 = H 1 : δ > Test statistic Sample means are always unbiased estimators for population means, so we take δ = D = X 1 X 2 Assumptions We know nothing about the distribution of X 1 and X 2, nothing about equality of variances and so on. We know nothing about the applicability of CLT or any other limiting approximation (n 1, n 2 are far from We only assume that µ 1 = E(X 1 and µ 2 = E(X 2 exist Procedure If H is true, then the randomization process (assigning individuals to 2 groups is the only factor. We then consider all possible randomizations. This is an exact test. Procedure There are ( possible randomizations take a subset of (for instance 1 groupings: Let X = (X 1,X 2 For i = 1,...,N (with N for instance 1 Find a random permutation of(1,...,n 1 +n 2 (in Matlab: randperm. Denote as Perm i Consider X i = Perm i (X X i,1 = ( X i;1,..., X i;n1 X i,2 = ( X i;n1 +1,..., X i;n1 +n 2 Compute X i,1 and X i,2 and D i estimatedp value = #{i D i > D} N c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.2 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.3

2 1. Find the p-value for the test above Exercises 2. Construct power curves (operating characteristic curves for the classical t-test comparing two groups with variances respectively σ 2 1 = 2 2 and σ 2 2 = 3 2, for M = 1 values of δ = µ 2 µ 1 between and 1 and for a one-sided test H : δ = against δ > at significance level α = 5%, and with n 1 = 2 and n 2 = 25. Compare with power curves for a randomization test. For both curves, you can simulate with say 1 simulations and count the relative number of false negatives (missed rejections of H Permutation test Observational study (not randomized experiment: assignment of elements to groups is fixed, cannot be chosen in sample (hence, danger of confounding factors, no causality conclusions possible Example Discriminatory pay in a company: are men better paid than women? Monthly salaries of a sample of men and women in that company men, n 1 = 2 women, n 2 = 25 Permutation test The same principle as randomization test, but conclusions are weaker (no causality c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.4 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.5 1. Exact tests Exact tests, classical tests, distr.-free tests No model, no information is added to the observations No information is removed E(estimatedp value = p value, whathever the distribution F X (x Numerical approximation of heavy calculations 2. Classical tests Approximation by a model Model assumptions (normality,...: model contains/represents information that is not present in the observations Results can be interpreted in the framework of the model Model has to be validated 3. Nonparametric/distribution free tests (e.g.: Wilcoxon omit information, limited power Elements from the randomization/permutation tests Estimation of parameter δ was no problem, but the inference on this parameter was a problem, because the model for the observations was unclear X? the sample size was too low for CLT c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.6 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.7

3 THE BOOTSTRAP 1. The bootstrap principle Situation Suppose we want to do inference on parameter θ from sample X. Often, estimation is fairly easy: θ = t(x For inference, we rely on either: 1. A model for X, which allows to compute f θ (t example if we know thatx i exp(1, then we know thatx = µ X Gamma(n,n 2. An asymptotic result to approximate f θ (t example Central Limit Theorem (CLT: µ X = X N(µ,σ 2 /n Requires n, i.e., n is large 3. Bootstrap: an alternative statistical approximation: use F X (x as an approximation of F X (x c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.8 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.9 The basic principle of bootstrapping: EDF CDF The empirical distribution function (EDF is an unbiased and consistent estimator of the (theoretic cumulative distribution function (CDF: F(x = #{i {1,...,n} x i x} n ] ] E[ F(x = P(X x = F X (x var[ F(x = 1 n F X(x (1 F X (x The EDF contains all information from the sample (cfr. permutation/randomization tests Instead of using the exact but unknown CDF, requiring approximative calculations + model assumptions (centr.lim.thm., normality,..., we do exact calculations with the EDF (in principle if calculations are too heavy, then again we approximate. Working with the empirical distribution (EDF Thus, we consider X F Since we know the EDF, we can compute everything about X. Its expected value E(X, which can serve as an approximation of the E(X. This is a bootstrap estimator. The exact distribution of X = 1 n X i allows to construct exact confidence interval (CI fore(x withnfinite, small This CI can be used as an approximation for a CI for E(X No need to rely on CLT (asymptotic result or approximative model for F X (x c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.1 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.11

4 Further details: bootstrap for the distribution of µ = X Suppose X = (12,17,18,19,21 We are interested in F µ (t or f µ (t for µ = 1 n X (not necessarily the sample above X i, with X i a sample of sizenfrom As we don t know F X (x, we cannot know F µ (t, therefore we work with F X (x F X (x is depicted in the figure below 1 Using resamples for numerical calculations We consider X F X (x and compute the density/distribution of µ = 1 n The exact computation is difficult, so we work with resamples from F X (x We resample B = 1 times and compute the sample means µ b = Xb of all resamples The value of B is a parameter of the numerical approximation algorithm. This numerical algorithm uses statistical techniques (samples, sample means, but it is under full control of the user, so it is not a statistical method (See further slide 19 X i 4/5 3/5 2/5 1/ c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.12 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.13 Kernel density estimation All observations µ b allow a density estimation, whose result appears in the following figure: This plot uses a kernel density estimation f µ (t = 1 B ( t µ K b with K(u a kernel function, which is a density, Bh h b=1 i.e.: K(u positive and K(udu = 1 Hence, also 1 ( u u K du = 1 h h In this expression, the bandwidth h is a smoothing parameter. Note that strictly speaking, µ b can only take a finite number of values, so it is a discrete random variable, not a continuous one c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.14 Conclusions about the bootstrap estimation of F X (x 1. In order to draw conclusions about the distribution of a sample mean X, a natural approach would be to consider many samples, each with their sample mean. Instead, here we infer as much as we can from a single sample, without assuming any model behind the sample (normality, central limit theorem,...: the individual observations within a single sample may reveal patterns for further samples. 2. Note that the approximation X X goes from a continuous to a discrete RV, and the kernel density estimation goes from a discrete probability mass function back to a continuous density function 3. All the approximations: (a Statistical approximation: X X, F X (x F X (x (b Numerical approximation: simulation, resampling (parameter B, a numerical method that relies on statistical techniques (c Kernel density estimation: a statistical technique within a numerical method c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.15

5 Parametric bootstrap Two instances of bootstrap We know the family to which the CDF belongs: F X (x;θ, we can find good estimators θ for the parameters, but distribution of estimators is hard to find or hard to analyse Then we can take X F X (x; θ Parametric bootstrap relies on estimators, the total accuracy is limited by the accuracy of those estimators. Non-parametric bootstrap We don t know the family of the CDF OR Parameters cannot be estimated in a reliable way Then we start from the empirical distrubtion X F X (x (See the example slide 12 Bootstrap in practice: resampling Computations are often heavy simulations from sample In case of non-parametric bootstrap, we use EDF, which is a discrete random variable. Hence we resample from the observed data. In case of parametric bootstrap, we resample from a probability density function or probability mass function, with estimated parameters but with possibly new observed values. See further, slide 19 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p Bootstrap estimators Suppose F X (x;θ depends parametrically on θ which we also denote as θ FX. Parameter can be found from the distribution function as ( θ = T... G i (xdf X (x... Then define bootstrap estimator T as T = T (... G i (xd F X (x... A bootstrap estimator is the empirical analogue of the parameter We also denote T as θ. The estimator has two aspects In the original problem, it is an estimator, hence random: T = θ FX In the bootstrap setting, based on F, T = θ = θ F is a parameter, hence fixed; we write θ without a hat. c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.17 See slide 17 Approximation using resampling Let Xb,b = 1,...,B be B observations from X, then ( T T... 1 B G i (X B b... b=1 This is a Monte Carlo method This is not really a statistical estimation, but rather a numerical approximation, since we have full control over B. (It is not inference from a given sample See chapter on Monte Carlo The same idea applies to sample parameters (quantiles in CI, bias of an estimator: θ = T... G i (xdf X (x... T = T... G i (xd F X ( ( (x... T T (... 1 B B G i (Xb... b=1 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.18 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.19

6 1. Mean µ = Example 1: moments (any distribution xdf X (x, i.e., T (y = y and G 1 (x = x, and thus (in case of non-parametric bootstrap µ = X (x = xd F 1 x i with x i the observed values of X. n Note: this summation is NOT an approximation using a resampling, but a straightforward development of the formula, see Slide 22 ( 2 2. Variance σ 2 = x 2 df X (x xdf X (x i.e., T (y 1,y 2 = y 2 y1 2 and G 1(x = x andg 2 (x = x 2, and thus (in case of non-parametric bootstrap ( 2 σ 2 = x 2 d F X (x xd F X (x So σ 2 = 1 n ( xi x 2 is not the unbiased variance estimator. Example 2:Exponential distribution X exp(λ, i.e., F X (x = 1 e λx, f X (x = λe λx. Then λ = 1/E(X = 1/ xdf X (x So T (y = 1/y and G 1 (x = x. We have λ = 1/ xd F X (x In case of non-parametric bootstrap, this estimator is also the method of moment estimator and the maximum likelihood estimator: λ = n/ n X i This estimator is biased. In case of parametric bootstrap, we have X exp( λ, with λ an estimator elsewhere given (for instance, an unbiased estimator. Then λ = 1/ xdf(x; λ = λ c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.2 E(X = µ = µ = = lim ε + The expected value of X EDF xd F X (x = 1 n xd F X (x = lim [ x F X (x ε + ] x(n +ε x (1 x(n +ε x (1 x(n x (1 x i = x xd F X (x F X (xdx n 1 = [x (n 1 ] [x (i+1 x (i ] i n = 1 n Remark: We write x and not X, because a bootstrap analysis proceeds on the empirical values, so after the observations have taken place. The randomness of a sample mean is ignored. It would be strange to write that the expected value of a random variable E(X would be equal to the (random sample mean of another random variable X. The equality can only hold if X is defined after X has been observed. x (i c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p Estimating sample distributions Bootstrap is not really a method for point estimating: classical methods (maximum likelihood, method of moments perform well enough (i.e., close enough to optimally Bootstrap estimators are mainly used in their role as parameter in the bootstrap setting (see slide 18. For this role, one cannot possibly use other estimators. Bootstrap is usefull for inference, based on the distribution of the estimators, both bootstrap estimators as classical estimators. See also slide 15. IfT is an estimator forθ, bootstrap can be used to estimatef T (t and from there, properties of T Example 1: see slide 12 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.22 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.23

7 Example 2 (1: bias of estimator θ for parameter θ Example 2 (2: the bias in the bootstrap world Suppose we have an estimator θ (bootstrap or classical and we wish to know if this estimator is biased β = E( θ θ β can be seen as a parameter, but it also depends on the estimator (it may depend on sample size n, for instance Density of θ is unknown (because it depends on F X (x which is unknown, unless n : central limit theorem (CLT. Often n is too small to apply CLT. In order to get an idea of a possible bias, we set up an experiment: would there be a bias if we replaced F X (x by F X (x, then compute the exact bias of a sample of n observations from F X (x. We write β = E FX ( θ θ FX where θ = t(x E FX ( θ = t(x df X(x F X (x = n F X(x i θ FX = T (... G i(xdf X (x... Then the bootstrap estimator becomes β = E F X ( θ θ F X with θ = t(x, and θ F X = θ = T. This result might suggest to define a new estimator for θ as: θ 2 = θ β c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.24 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.25 Example 2 (3: using the bootstrap bias in the original problem Note that θ F X = θ = T is the bootstrap estimator. θ may or may not be the bootstrap estimator, but in the expression for β, we must use the bootstrap estimator. This illustrates the specific use of bootstrap estimators θ may also be the bootstrap estimator (or it may coincide with it: the bootstrap estimator is then used in a specific and in a general setting. In that case β = E F X (T T and so θ 2 = 2T E F X (T This is not necessarily a better estimator than T : we don t know the properties of β : is it unbiased? has it got a large variance? (See further, slide 36, for answers to these questions Example 3: Estimating the distribution of the mean of X exp e.g. Waiting time X, we assume that X exp(λ with λ unknown. We want to estimate and test µ = 1/λ. µ = X n = 8 observations: X = 27. s = EDF in figure below, together with true (unobserved PDF (probability distribution function; λ = 1/23 = /8 6/8 5/8 4/8 3/8 2/8 1/ We want to estimate the distribution of µ = X c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.26 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.27

8 Example 3: the true distribution of the estimator (sample mean In this case, (as we are simulating we know the true distribution of the estimator µ Gamma(nλ,n, with µ = 1/λ. Indeed, E(X = µ, var(x = µ 2, and n X i Gamma(λ,n. IfS Gamma(λ,k, then S/n Gamma(nλ, k Since we know the distribution of µ and this distribution is not very difficult, we could just estimate the parameter and from there compute confidence intervals without relying on the CLT, resampling or nonparametric bootstrap. We use this case as a benchmark to evaluate several approximations and methods. Example 3.1 Estimation within exact model This is actually parametric bootstrapping (without the resampling We assume that we know that µ Gamma(nλ,n, with λ unknown. We estimate λ = 1/x, this leads to the following plot true distribution estimation within exact model The difference between the true model with exact parameter and the model with estimated parameter quantifies the quality of the estimator. c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.28 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.29 Example 3.2 CLT approximation; estimators in exact model This example represents cases where we know the exact model of the estimator s distribution, but using it would lead to too complicated computations So we approximate by a normal model, but for mean and variance of the normal model, we rely on the expressions within the true model. Here: E( µ = µ, estimated by 1/ λ, and var( µ = σ 2 /n, where σ 2 is estimated by 1/ λ2 Example 3.3 Straightforward (model-free application of CLT µ N(µ,σ 2 /n where we estimate µ = x = 27. and σ 2 = s 2 /n = /8 = true distribution estimation within exact model model free CLT true distribution estimation within exact model CLT approx. of exact model We loose the asymmetry of the true model. This situation is not realistic: in most cases, we do not know the exact model of an estimator µ: model for X maybe unknown, approximative, or too difficult to use for computation of model for µ. c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.3 This procedure is always possible for the sample mean (if sample distribution has a mean; cfr.: Cauchy, but approximative. Approximation does not depend on quality of λ. Quality of approximation unclear for finite n c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.31

9 Example 3.4 Parametric bootstrap with resampling This includes both the statistical approximation as in 3.1 and the numerical approximation with parameter B We generate X b,i exp( λ and compute from there µ b = 1 n X b,i for b = 1,...,B, where B = 5 We generate X b,i F X (x and compute from there µ b = 1 n Example 3.5 Nonparametric bootstrap.6.5 X b,i for b = 1,...,B, where B = 5 true distribution estimation within exact model nonparametric bootstrap true distribution estimation within exact model parametric bootstrap Always possible if we assume a parametric observational model Quality of approximation depends on quality λ: curve of exact model with estimated parameter is best possible c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.32 This procedure is always possible Quality of approximation does not depend on any λ: Curve of exact model with estimated parameter is not best possible But not using parametric form, makes good estimator more challenging c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.33 Example 4: Estimating the ratio of two means (1 Suppose we observe paired values X i and Y i and we are interested in θ = µ Y /µ X. The observational model is thus µ Y = θ µ X Y i = µ Y +η i X i = µ X +ξ i where η i and ξ i are observational errors. This is an errors-in-variables model. 1. A minimum least squares solution We could estimate θ with the procedure of minimum least squares θ = argmin (Y i θx i 2 but since X i is a random θ n and not an exact observation of the covariate µ X, the solution θ = Y ix i n X2 i Example 4: Estimating the ratio of two means (2 2. Mean of ratios The estimator θ = 1 n (e.g. Cauchy-like distribution. n 3. Ratio of means The estimator θ = Y i n X i This is also the bootstrap estimator: ydf Y(y xdf X(x and similarly θ = θ = µ Y µ X = Y i X i is likely to have no expected value is asymptotically unbiased. yd F Y (y xd F X (x = Y X Asymptotically unbiased, but for finite sample sizes: biased is biased, also asymptotically. c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.34 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.35

10 4. Properties of bootstrap estimations 1. Errors When we approximate a sample parameter such as bias β = E FX ( θ θ FX by a bootstrap estimated value β = E F X ( θ θ F X we make two sorts of errors 1. Statistical errors: F X (x F X (x 2. Simulation/computational/numerical errors (controled by the bootstrap resample size B - see slide Improvement over asymptotic/classical estimators Bootstrap estimators of quantiles, confidence bounds converge to asymptotic values. In first order, there is no difference (no improvement, no deteriorisation between bootstrap estimators and asymptotic results. Second order improvement (typically proven by concept of Edgeworth expansions The bootstrap results converge second order faster Note: Edgeworth expansions write a distribution as [ ] F X (x = exp (κ m γ m ( 1m d m m! dx mφ(x, m=1 m=1 where κ m and γ m are the coefficients in the expansions [ ] [ (ix m ] (ix m f X (x = exp κ m and φ(x = exp γ m m! m! The Edgeworth expansion thus approximates a distrubtion F X (x in terms of the normal distribution Φ(x (The formalism also holds if the central distribution Φ(x is another known law m=1 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p The use of bootstrap in stat. inference 1. Bias correction/reduction Suppose θ is an estimator for parameter θ. We want to estimate β = E( θ θ. We have θ = t(x with X the sample and θ F θ (t The distribution of θ, F θ (t is unknown and/or too hard to compute. We therefore define θ = t(x, where X is a re-sample whose components are independent and distributed according to the empirical distribution of X: Xi F X Denote the distribtion of θ, defined above, as We estimate β by: β = E F X ( θ θ F X = E F X ( θ θ, distribution. F θ (t In case that θ = T = θ is the bootstrap estimator β = E(T T and a bias corrected estimator is then θ 2 = θ β = 2T E(T c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.38 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p Confidence intervals Situation θ estimates θ (point estimator, but F θ (t is unknown and/or hard to find General construction of a confidence interval Pivot: find Q = q(x,θ, so that F Q (q does not depend on θ Find quantiles of Q so that P ( ( q 1 α/2 Q q α/2 = 1 α or P q1 α/2 q(x,θ q α/2 = 1 α By inverting the function q(x,θ, one can obtain an interval around θ. Examples of pivots For the mean: Q = X µ We know that, asymptotically, Q N(,σ 2 Or: Q = X µ S/, which is Student-t-distributed for normal observations (otherwise: n asymptotically For the variance Q = (n 1S 2 /σ 2, which is chi-squared distributed (if data are normal c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.39

11 Using the pivot Ideal case: Distribution of Q is known within a model for X and q(x,θ is easy to invert Asymptotic model for Q Bootstrap approximation for Q Bootstrap for pivot distribution Situation: We know that the distribution of Q does not depend on θ, but the distribution may still be unknown (if model for observations is unknown or too difficult to compute Define Q = q(x,θ with θ = θ F X the bootstrap estimator of θ. Bootstrap quantiles Bootstrap allows us to compute (sometimes exactly, mostly by resampling the empirical (upper! quantiles q α defined as P(Q > q α = α. Bootstrap ( approximation the equations on slide 39 is P q1 α/2 q(x,θ q α/2 1 α The bootstrap approximation replaces the fixed values q α by random (sampledependent values q α c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.4 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.41 Construction of bootstrap confidence intervals Two ways to construct confidence intervals 1. Basic bootstrap confidence intervals 2. Percentile bootstrap confidence intervals 1. Basic bootstrap confidence interval Assume that Q = θ θ has a distribution that does not depend on θ. This is a typical example. We have Q = θ θ The quantiles are then qα = t α θ Note that, from the bootstrap perspective, θ is a (known constant We have (original problem P (q 1 α/2 θ θ q α/2 = 1 α ( and (bootstrap version P q1 α/2 θ θ qα/2 = 1 α Using ( the bootstrap quantiles for the original problem yields the approximation: P q1 α/2 θ θ qα/2 1 α which can be reworked as P ( θ q α/2 θ θ q1 α/2 1 α If θ itself is the bootstrap estimator of θ, then θ = θ = T and qα = t α T ( This leads to P 2T t α/2 θ 2T t 1 α/2 1 α Conclusion basic bootstrap CIs include an implicit bias correction c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.42 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.43

12 2. Percentile bootstrap confidence interval: assumptions We assume that θ = T is the bootstrap estimator (assumpt. 1 We start again from Q = T θ and we assume that there exists a monotone transformation T = h( T and θ = h( θ such that T is symmetric around θ. This implies that θ = t 5%, i.e., θ is the median of estimator T (assumpt. 2 Denoting Q = T θ, we can write P ( T qα/2 θ T q 1 α/2 = 1 α This is equivalent to P ( T q α/2 θ T q 1 α/2 = 1 α Because of symmetry, we have q α/2 = q 1 α/2, and so P ( T + q 1 α/2 θ T + q α/2 = 1 α In the transformed domain, where T is symmetric, this is the same CI. It is equiavlent to P ( T +q 1 α/2 θ T +q α/2 = 1 α This CI is different from the classical one (unless T is also symmetric, but it has the same α Percentile bootstrap confidence interval The bootstrap approximation of the alternative CI becomes P ( T + q 1 α/2 θ T + q α/2 1 α We have Q = T θ, and if T is the bootstrap estimator of θ, then θ = T, so we have Q = T T Hence the quantiles are t α = q α+ T Filling in into the bootstrap approximative CI: P ( t 1 α/2 θ t α/2 1 α Apply monotone transformation h ( P t 1 α/2 θ t α/2 1 α Both methods (basic and percentile bootstrap CI use t α. These empirical parameters can sometimes be computed exactly, and are most of the times found by resampling. c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.44 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p Bootstrap-t: problem Starting from basic bootstrap CI: θ = Q+θ or Q = θ θ. ( Empirical quantiles of Q: P q1 α/2 θ θ qα/2 1 α leads to the basic bootstrap CI: [ θ q α/2, θ q1 α/2] Now suppose that we can estimate the variance of θ or Q fairly easily, then we define Q 1 = θ θ σ θ The basic bootstrap resamples θ, which in terms of Q 1 means that we sample from Q 1 = θ θ σ θ We thus implicitly construct a bootstrap estimator for σ θ. Bootstrap-t The implicit bootstrap estimator for σ θ may not be the best possible. Suppose we have another estimator S θ which we think is better: we would like to use it explicitly. The basic confidence interval above can be rewritten as [ θ q 1,α/2 σ θ 1,1 α/2 σ θ], θ q The bootstrap-t alternative is to define U = θ θ, where S θ is the estimation of σ θ. The expression for U resembles the definition of a student-t variable, hence the name bootstrap-t. Now, we can define the empirical equivalent U = θ θ Every resample X yields a value for θ and for S θ. S θ S θ c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.46 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.47

13 Bootstrap-t: further details 3. Hypothesis testing We bootstrap over the estimator for σ θ, rather than over parameter σ θ. Hence, there will be no implicit bootstrap estimation of the parameter. Bootstrapping over the estimator means that for every resample, the estimate is computed with the same expression as it was computed for the original sample. We can then construct a bootstrap conficence interval, using the estimator S θ whose definition does not depend on the empirical distribution. [ θ u 1,α/2 S θ, θ u 1,1 α/ This confidence interval only depends on the resampling through the empirical quantiles of the studentized variable U. c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.48 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p Regression Bootstrap in regression can proceed in two ways Bootstrap of covariate-response pairs (x i,y i Bootstrap of residuals We now develop the procedure when bootstrapping the residuals. The model is Y = X β +ε where the errors have zero mean, but they are not normal (otherwise, the classical results for confidence intervals can be used We further define µ = X β (the error-free response variables for the given covariate values in X and µ = X β where X is a row vector. µ is the error-free response corresponding to (possibly unobserved covariate values. Classical, least squares estimators β = (X T X 1 (X T Y µ = X β = X (X T X 1 (X T Y µ = X (X T X 1 (X T Y Regression for normal errors If ε NID(,σ 2, i.e., ε N n (,σ 2 I n, then the least squares estimator is the maximum likelihood estimator. The normal model allows us to construct confidence regions/intervals for β or for µ. c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.5 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.51

14 Bootstrap for regression For any error distribution with zero mean, β is unbiased, so we can write (a multiple integral expression β = (X T X 1 (X T y f Y (ydy We now enter the bootstrap-analogue. That means that we replace f Y by its empirical counterpart. Then β = (X T X 1 (X T Y = β The least squares estimator is the bootstrap estimator. We take that estimator as the true value during the bootstrap resampling. So, we generate values according to the model Y = X β +ε As we do not resample X (it is possible, but it is not our option, and we have already found the parameter value β, the only remaining term to be filled in, is ε. Resampling the residuals Since ε is not normal, and we normally don t know the model for the error distribution, we decide to adopt non-parametric bootstrapping. The residuals e = Y X β are the empirical equivalent of the errors. We assume that the errors are homoscedastic and have zero mean. We construct permutations of the observed residualse i and use those as bootstrap error terms. This allows to construct as many bootstrap values for β, µ, µ as we like, from which we can construct bootstrap confidence intervals for β, µ and µ. c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.52 c Maarten Jansen STAT-F-48 Comp. Stat. Chap. 2: Resampling p.53

This does not cover everything on the final. Look at the posted practice problems for other topics.

Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry