Lecture 33: Bootstrap

Similar documents
Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Lecture 19: Convergence

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Lecture 20: Multivariate convergence and the Central Limit Theorem

Parameter, Statistic and Random Samples

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

Lecture 18: Sampling distributions

Lecture 8: Convergence of transformations and law of large numbers

Lecture 15: Density estimation

Estimation of the Mean and the ACVF

Asymptotic Results for the Linear Regression Model

4.1 Non-parametric computational estimation

Output Analysis and Run-Length Control

32 estimating the cumulative distribution function

Asymptotics. Hypothesis Testing UMP. Asymptotic Tests and p-values

Random Variables, Sampling and Estimation

STATISTICAL INFERENCE

Large Sample Theory. Convergence. Central Limit Theorems Asymptotic Distribution Delta Method. Convergence in Probability Convergence in Distribution

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Exponential Families and Bayesian Inference

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

SDS 321: Introduction to Probability and Statistics

Lecture 16: UMVUE: conditioning on sufficient and complete statistics

Estimation for Complete Data

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Distribution of Random Samples & Limit theorems

Bayesian Methods: Introduction to Multi-parameter Models

An Introduction to Asymptotic Theory

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Lecture 24: Variable selection in linear models

Efficient GMM LECTURE 12 GMM II

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

1.010 Uncertainty in Engineering Fall 2008

AMS570 Lecture Notes #2

Machine Learning Brett Bernstein

1 Introduction to reducing variance in Monte Carlo simulations

Topic 9: Sampling Distributions of Estimators

Last Lecture. Wald Test

POWER COMPARISON OF EMPIRICAL LIKELIHOOD RATIO TESTS: SMALL SAMPLE PROPERTIES THROUGH MONTE CARLO STUDIES*

Exam II Review. CEE 3710 November 15, /16/2017. EXAM II Friday, November 17, in class. Open book and open notes.

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Statistical inference: example 1. Inferential Statistics

Lecture 7: Properties of Random Samples

Module 1 Fundamentals in statistics

A statistical method to determine sample size to estimate characteristic value of soil parameters

A Question. Output Analysis. Example. What Are We Doing Wrong? Result from throwing a die. Let X be the random variable

Lecture 2: Monte Carlo Simulation

Study the bias (due to the nite dimensional approximation) and variance of the estimators

5. Likelihood Ratio Tests

Economics 326 Methods of Empirical Research in Economics. Lecture 18: The asymptotic variance of OLS and heteroskedasticity

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Questions and Answers on Maximum Likelihood

MA Advanced Econometrics: Properties of Least Squares Estimators

of the matrix is =-85, so it is not positive definite. Thus, the first

Lecture 23: Minimal sufficiency

Kernel density estimator

Lecture 12: September 27

TAMS24: Notations and Formulas

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Probability 2 - Notes 10. Lemma. If X is a random variable and g(x) 0 for all x in the support of f X, then P(g(X) 1) E[g(X)].

Introductory statistics

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Topic 9: Sampling Distributions of Estimators

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

4. Partial Sums and the Central Limit Theorem

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Stochastic Simulation

Solutions: Homework 3

Statistical Theory; Why is the Gaussian Distribution so popular?

STAT Homework 1 - Solutions

Regression with an Evaporating Logarithmic Trend

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

MATH/STAT 352: Lecture 15

Properties and Hypothesis Testing

Lecture Notes 15 Hypothesis Testing (Chapter 10)

Topic 9: Sampling Distributions of Estimators

Simulation. Two Rule For Inverting A Distribution Function

Chapter 6 Principles of Data Reduction

AMS 216 Stochastic Differential Equations Lecture 02 Copyright by Hongyun Wang, UCSC ( ( )) 2 = E X 2 ( ( )) 2

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Confidence Interval for Standard Deviation of Normal Distribution with Known Coefficients of Variation

LECTURE 8: ASYMPTOTICS I

STAT431 Review. X = n. n )

Lecture 9: September 19

Agenda: Recap. Lecture. Chapter 12. Homework. Chapt 12 #1, 2, 3 SAS Problems 3 & 4 by hand. Marquette University MATH 4740/MSCS 5740

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

A NEW METHOD FOR CONSTRUCTING APPROXIMATE CONFIDENCE INTERVALS FOR M-ESTU1ATES. Dennis D. Boos

Lecture 11 and 12: Basic estimation theory

Chapter 6 Sampling Distributions

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

Clases 7-8: Métodos de reducción de varianza en Monte Carlo *

Transcription:

Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece sets. Let Var( θ) be the variace or asymptotic variace of a estimator θ. Traditioal approach to estimate Var( θ): Derivatio ad substitutio First, we derive a theoretical formula Approximatio (asymptotic theory) is usually eeded The formula may deped o ukow quatities We the substitute ukow quatities by estimators Example: the δ-method Y 1,...,Y are iid (k-dimesioal) θ = g(µ) (e.g., a ratio of two compoets of µ), θ = g(ȳ ) Var( θ) [ g(µ)] T Var(Ȳ ) g(µ) A estimator of Var( θ) is [ g(ȳ )]T (S 2 /) g(ȳ ) Is the derivative g always easy to derive? UW-Madiso (Statistics) Stat 710, Lecture 33 Ja 2018 1 / 13

Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece sets. Let Var( θ) be the variace or asymptotic variace of a estimator θ. Traditioal approach to estimate Var( θ): Derivatio ad substitutio First, we derive a theoretical formula Approximatio (asymptotic theory) is usually eeded The formula may deped o ukow quatities We the substitute ukow quatities by estimators Example: the δ-method Y 1,...,Y are iid (k-dimesioal) θ = g(µ) (e.g., a ratio of two compoets of µ), θ = g(ȳ ) Var( θ) [ g(µ)] T Var(Ȳ ) g(µ) A estimator of Var( θ) is [ g(ȳ )]T (S 2 /) g(ȳ ) Is the derivative g always easy to derive? UW-Madiso (Statistics) Stat 710, Lecture 33 Ja 2018 1 / 13

A alterative? Suppose we ca idepedetly obtai copies of the data set X Say X 1,...,X The we ca calculate θ b = θ(x b ), b = 1,..., Variace of θ ca be estimated as 1 b=1 ( θ b 1 ) 2 θ l l=1 I fact, the cdf G(t) = P( θ θ t) ca be estimated as 1 I ( θ b θ t b=1 I ( θ b θ ) t = # of b s such that θ b θ t ) = 1 if θ b θ t ad 0 otherwise No derivatio is eeded These estimators are valid for large (, law of large umbers) ut typically, we oly have oe dataset, X UW-Madiso (Statistics) Stat 710, Lecture 33 Ja 2018 2 / 13

ootstrap Ca we apply the same idea by creatig pseudo-replicate datasets? This meas X 1,...,X are copies of X, but they are ot idepedet of X (i fact, they are depedet o X) Is 1 ( θ b 1 b=1 still a valid estimator of Var( θ)? The aswer to this questio depeds o how the sample X is take how X 1,...,X are costructed the type of the estimator, θ ) 2 θ l l=1 A heuristic descriptio for the bootstrap P: the populatio producig data X P: a estimated of the populatio based o data X X : the bootstrap data produced by P UW-Madiso (Statistics) Stat 710, Lecture 33 Ja 2018 3 / 13

ootstrap Ca we apply the same idea by creatig pseudo-replicate datasets? This meas X 1,...,X are copies of X, but they are ot idepedet of X (i fact, they are depedet o X) Is 1 ( θ b 1 b=1 still a valid estimator of Var( θ)? The aswer to this questio depeds o how the sample X is take how X 1,...,X are costructed the type of the estimator, θ ) 2 θ l l=1 A heuristic descriptio for the bootstrap P: the populatio producig data X P: a estimated of the populatio based o data X X : the bootstrap data produced by P UW-Madiso (Statistics) Stat 710, Lecture 33 Ja 2018 3 / 13

A heuristic descriptio for the bootstrap real world P X θ = θ(x) bootstrap P X θ = θ(x ) Var( θ) ca be approximated by Var ( θ ), the variace take uder the bootstrap samplig coditioed o X. If P is close to P, the the samplig properties of θ is close to that of θ, coditioal o X Var ( θ ) is close to Var( θ) Ĝ(t) is close to G(t) Note that Var ( θ ) is a fuctio of X ad is a estimator. If it has a explicit form, the it ca be directly used. If ot, the we use the Mote Carlo approximatio: Var ( θ ) 1 b=1 ( θ b 1 ) 2 θ l l=1 θ b = θ(x b ) ad X 1,...,X are iid bootstrap data sets (copies of X ). How do we geerate X based o X? UW-Madiso (Statistics) Stat 710, Lecture 33 Ja 2018 4 / 13

Parametric bootstrap Let X 1,...,X be iid with a cdf F θ where θ is a ukow parameter vector ad F θ is kow whe θ is kow. Let θ be a estimator of θ based o X = (X 1,...,X ). Parametric bootstrap data set X = (X 1,...,X ) is obtaied by geerate iid X 1,...,X from F θ. Example: locatio-scale problems ( Let F θ (x) = F x µ ) 0 σ, where µ = E(X1 ), σ 2 =Var(X 1 ) ad F 0 is a kow cdf. Let X be the sample mea, S 2 be the sample variace, ad ( X µ) T = = X S i µ S The distributio of T does ot deped o ay parameter. It is the t-distributio with degrees of freedom 1 if F 0 is ormal. Otherwise its explicit form is ukow. UW-Madiso (Statistics) Stat 710, Lecture 33 Ja 2018 5 / 13

Parametric bootstrap Let X 1,...,X be iid with a cdf F θ where θ is a ukow parameter vector ad F θ is kow whe θ is kow. Let θ be a estimator of θ based o X = (X 1,...,X ). Parametric bootstrap data set X = (X 1,...,X ) is obtaied by geerate iid X 1,...,X from F θ. Example: locatio-scale problems ( Let F θ (x) = F x µ ) 0 σ, where µ = E(X1 ), σ 2 =Var(X 1 ) ad F 0 is a kow cdf. Let X be the sample mea, S 2 be the sample variace, ad ( X µ) T = = X S i µ S The distributio of T does ot deped o ay parameter. It is the t-distributio with degrees of freedom 1 if F 0 is ormal. Otherwise its explicit form is ukow. UW-Madiso (Statistics) Stat 710, Lecture 33 Ja 2018 5 / 13

Example (cotiued) Let θ = ( X,S 2 ) Geerate iid X i, i = 1,...,, from F θ. The (X i X)/S 2 F 0 T = Xi b Ȳ S T The parametric bootstrap is prefect: Var (T ) = Var(T ). If we calculate Var (T ) by Mote Carlo approximatio, the the parametric bootstrap is exactly the same as the simulatio approach. I geeral, if there is a fuctio τ such that Var θ ( θ) = τ(θ), X 1,...,X are iid from F θ the Var θ ( θ ) = τ( θ), X1,...,X are iid from F θ Hece, the parametric bootstrap is simply the substitutio approach. If θ is cosistet ad τ is cotiuous, the Var θ ( θ ) is cosistet. If τ does ot have a close form, we apply Mote Carlo approximatio. I the locatio-scale example, τ = a costat ad hece the bootstrap is prefect. UW-Madiso (Statistics) Stat 710, Lecture 33 Ja 2018 6 / 13

Example Let X 1,...,X be iid from F θ. Defie µ = µ(θ) = E θ (X 1 ), µ j = µ j (θ) = E θ (X 1 µ) j, j = 2,3,4. Cosider the estimatio of µ 2 by X 2. A direct calculatio shows that Var θ ( X 2 ) = 4[µ(θ)]2 µ 2 (θ) + 4µ(θ)µ 3(θ) 2 + µ 4(θ) 3 ased o the previous discussio, the parametric bootstrap variace estimator is Var θ ( X 2 ) = 4[µ( θ)] 2 µ 2 ( θ) + 4µ( θ)µ 3 ( θ) 2 + µ 4( θ) 3 It is a cosistet estimator if µ, µ j, j = 2,3,4, are cotiuous fuctios. If we apply the asymptotic approach, the we estimate Var θ ( X 2 ) by 4[µ( θ)] 2 µ 2 ( θ) UW-Madiso (Statistics) Stat 710, Lecture 33 Ja 2018 7 / 13

Noparametric bootstrap Without ay model, we ca apply the simple oparametric bootstrap. If X = (X 1,...,X ), X 1,...,X are iid, the P is the cdf of X 1 ad P is the empirical cdf based o X 1,...,X. If we geerate iid bootstrap data X 1,...,X from P, the it is the same as takig a simple radom sample with replacemet from X. Property of Var ( θ ) Cosider first θ = X, the sample mea, θ = X, the sample mea of X1,...,X. E ( X ) = 1 E (Xi ) = 1 X = X Var ( X ) = 1 2 Var (Xi ) = 1 2 = 1 2 j=1 (X j X) 2 = 1 1. UW-Madiso (Statistics) Stat 710, Lecture 33 Ja 2018 8 / 13 Whe is small, we may make a adjustmet of 2 1 j=1 S2 S2 (X j X) 2

Noparametric bootstrap Without ay model, we ca apply the simple oparametric bootstrap. If X = (X 1,...,X ), X 1,...,X are iid, the P is the cdf of X 1 ad P is the empirical cdf based o X 1,...,X. If we geerate iid bootstrap data X 1,...,X from P, the it is the same as takig a simple radom sample with replacemet from X. Property of Var ( θ ) Cosider first θ = X, the sample mea, θ = X, the sample mea of X1,...,X. E ( X ) = 1 E (Xi ) = 1 X = X Var ( X ) = 1 2 Var (Xi ) = 1 2 = 1 2 j=1 (X j X) 2 = 1 1. UW-Madiso (Statistics) Stat 710, Lecture 33 Ja 2018 8 / 13 Whe is small, we may make a adjustmet of 2 1 j=1 S2 S2 (X j X) 2

Property of Var ( θ ) Cosider ext the estimatio of g(µ), where µ = E(X 1 ) ad g is a cotiuously differetiable fuctio. Our estimator is θ = g( X). The bootstrap aalog is θ = g( X ). Whe is large, Hece, g( X ) = g( X) + g ( X)( X X) + g( X) + g ( X)( X X) Var ( θ ) = Var [g( X )] [g ( X)] 2 Var ( X X) = [g ( X)] 2 Var ( X ) 1 [g ( X)] 2 S 2 This result ca be exteded to multivariate X i. UW-Madiso (Statistics) Stat 710, Lecture 33 Ja 2018 9 / 13

For θ = g( X), Var ( θ ) 1 [ g( X)] T S 2 g( X)/ the delta-method variace estimator, where S 2 = 1 1 is called the sample covariace matrix. Example (X i X)(X i X) τ Let X 1,...,X be iid from F. Defie µ = E θ (X 1 ), µ j = E θ (X 1 µ) j, j = 2,3,4. Cosider the estimatio of µ 2 by X 2. We still have ad Var( X 2 ) = 4µ2 µ 2 Var ( X 2 ) = 4 X 2 m 2 + 4µµ 3 2 + µ 4 3 + 4 Xm 3 2 + m 4 3 UW-Madiso (Statistics) Stat 710, Lecture 33 Ja 2018 10 / 13

For θ = g( X), Var ( θ ) 1 [ g( X)] T S 2 g( X)/ the delta-method variace estimator, where S 2 = 1 1 is called the sample covariace matrix. Example (X i X)(X i X) τ Let X 1,...,X be iid from F. Defie µ = E θ (X 1 ), µ j = E θ (X 1 µ) j, j = 2,3,4. Cosider the estimatio of µ 2 by X 2. We still have ad Var( X 2 ) = 4µ2 µ 2 Var ( X 2 ) = 4 X 2 m 2 + 4µµ 3 2 + µ 4 3 + 4 Xm 3 2 + m 4 3 UW-Madiso (Statistics) Stat 710, Lecture 33 Ja 2018 10 / 13

Example (cotiued) where m j = 1 (X i X) j, j = 2,3,4. This is because the mea of the empirical cdf F is X ad the jth cetral momet of F is is m j. I this case, we have a explicit form for the bootstrap variace estimator Var ( X 2 ) so o Mote Carlo is eeded. This bootstrap variace estimator is cosistet, sice sample momets m j s are cosistet for µ j s, by the WLLN. Sice g (x) = 2x whe g(x) = x 2, the use of the approximatio derived eariler shows that Var ( X 2 ) 4 X 2 m 2 which is also cosistet sice the terms igored are of the orders 2 ad 3. I fact, the delta-method produces the variace estimator [g ( X)] 2 S 2 = 4 X 2 S 2 UW-Madiso (Statistics) Stat 710, Lecture 33 Ja 2018 11 / 13

The sample media Cosider the sample media Q 1 1/2 = F (1/2), where F is the empirical cdf. For simplicity, assume that = 2m 1 for a iteger m. The, Q 1/2 = X (m). Let X 1,...,X be iid from F. The p k = P {X(m) = X (k) X 1,...,X } = ( j m 1 j=0 ) (k 1) j ( k + 1) j k j ( k) j. This shows that the bootstrap variace estimator for the sample media is ) 2 Var (X(m) ) = p k (X (k) p j X (j). k=1 j=1 UW-Madiso (Statistics) Stat 710, Lecture 33 Ja 2018 12 / 13

Discussio I geeral, the expressio Var ( θ ) is usually complicated ad ot explicit. Mote Carlo approximatio is ecessary. I fact, the idea of usig the bootstrap is ot to derive its explicit form (sice it ivolves complex derivatios). The bootstrap is to replace theoretical derivatios by repeated computatios. The user does ot eed to do theoretical derivatios. However, they should be told whe usig the bootstrap produces correct variace estimators ad how to do the bootstrap. The research o the bootstrap methodology still requires theoretical derivatios. UW-Madiso (Statistics) Stat 710, Lecture 33 Ja 2018 13 / 13