Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Similar documents
, which yields. where z1. and z2

AP Statistics Notes Unit Two: The Normal Distributions

Resampling Methods. Chapter 5. Chapter 5 1 / 52

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

Distributions, spatial statistics and a Bayesian perspective

UNIV1"'RSITY OF NORTH CAROLINA Department of Statistics Chapel Hill, N. C. CUMULATIVE SUM CONTROL CHARTS FOR THE FOLDED NORMAL DISTRIBUTION

B. Definition of an exponential

On Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

A Matrix Representation of Panel Data

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

What is Statistical Learning?

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Lead/Lag Compensator Frequency Domain Properties and Design Methods

ALE 21. Gibbs Free Energy. At what temperature does the spontaneity of a reaction change?

SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A.

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

NUMBERS, MATHEMATICS AND EQUATIONS

Medium Scale Integrated (MSI) devices [Sections 2.9 and 2.10]

Interference is when two (or more) sets of waves meet and combine to produce a new pattern.

Chapter 8: The Binomial and Geometric Distributions

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

EASTERN ARIZONA COLLEGE Introduction to Statistics

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Lyapunov Stability Stability of Equilibrium Points

Computational modeling techniques

Differentiation Applications 1: Related Rates

Section 5.8 Notes Page Exponential Growth and Decay Models; Newton s Law

o o IMPORTANT REMINDERS Reports will be graded largely on their ability to clearly communicate results and important conclusions.

1b) =.215 1c).080/.215 =.372

Maximum A Posteriori (MAP) CS 109 Lecture 22 May 16th, 2016

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

CS 109 Lecture 23 May 18th, 2016

Hypothesis Tests for One Population Mean

AMERICAN PETROLEUM INSTITUTE API RP 581 RISK BASED INSPECTION BASE RESOURCE DOCUMENT BALLOT COVER PAGE

Comparing Several Means: ANOVA. Group Means and Grand Mean

Drought damaged area

Eric Klein and Ning Sa

Sequential Allocation with Minimal Switching

How do scientists measure trees? What is DBH?

WRITING THE REPORT. Organizing the report. Title Page. Table of Contents

Least Squares Optimal Filtering with Multirate Observations

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

Math Foundations 20 Work Plan

ECEN 4872/5827 Lecture Notes

A NOTE ON THE EQUIVAImCE OF SOME TEST CRITERIA. v. P. Bhapkar. University of Horth Carolina. and

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

ON-LINE PROCEDURE FOR TERMINATING AN ACCELERATED DEGRADATION TEST

Module 4: General Formulation of Electric Circuit Theory

Statistical Learning. 2.1 What Is Statistical Learning?

MATCHING TECHNIQUES Technical Track Session VI Céline Ferré The World Bank

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

MODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b

^YawataR&D Laboratory, Nippon Steel Corporation, Tobata, Kitakyushu, Japan

Name: Block: Date: Science 10: The Great Geyser Experiment A controlled experiment

(2) Even if such a value of k was possible, the neutrons multiply

Lecture 13: Markov Chain Monte Carlo. Gibbs sampling

February 28, 2013 COMMENTS ON DIFFUSION, DIFFUSIVITY AND DERIVATION OF HYPERBOLIC EQUATIONS DESCRIBING THE DIFFUSION PHENOMENA

Lecture 17: Free Energy of Multi-phase Solutions at Equilibrium

Determining the Accuracy of Modal Parameter Estimation Methods

Subject description processes

Chemistry 20 Lesson 11 Electronegativity, Polarity and Shapes

On Boussinesq's problem

THERMAL TEST LEVELS & DURATIONS

Revision: August 19, E Main Suite D Pullman, WA (509) Voice and Fax

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

READING STATECHART DIAGRAMS

Kinetic Model Completeness

initially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur

Surface and Contact Stress

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart

Simple Linear Regression (single variable)

BASD HIGH SCHOOL FORMAL LAB REPORT

Measurement of Radial Loss and Lifetime. of Microwave Plasma in the Octupo1e. J. C. Sprott PLP 165. Plasma Studies. University of Wisconsin DEC 1967

Tree Structured Classifier

The blessing of dimensionality for kernel methods

THERMAL-VACUUM VERSUS THERMAL- ATMOSPHERIC TESTS OF ELECTRONIC ASSEMBLIES

THE LIFE OF AN OBJECT IT SYSTEMS

SPH3U1 Lesson 06 Kinematics

You need to be able to define the following terms and answer basic questions about them:

22.54 Neutron Interactions and Applications (Spring 2004) Chapter 11 (3/11/04) Neutron Diffusion

LHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers

A Regression Solution to the Problem of Criterion Score Comparability

Department of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets

SOLUTION OF THREE-CONSTRAINT ENTROPY-BASED VELOCITY DISTRIBUTION

Statistics Statistical method Variables Value Score Type of Research Level of Measurement...

Section 11 Simultaneous Equations

Introduction to Quantitative Genetics II: Resemblance Between Relatives

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Chapters 29 and 35 Thermochemistry and Chemical Thermodynamics

INSTRUMENTAL VARIABLES

Part 3 Introduction to statistical classification techniques

Math 10 - Exam 1 Topics

SticiGui Chapter 4: Measures of Location and Spread Philip Stark (2013)

IN a recent article, Geary [1972] discussed the merit of taking first differences

Transcription:

Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs); lambda [1] 0.04619738 > # The expnential distributin with rate=lambda has density f(x) = lambda e^(- lambda x) > # draw a randm sample f size n frm expnential(lambda) distributin, where lambda is estimated frm data > x=rexp(n, lambda); x [1] 4.451616 12.097513 6.302449 21.942872 37.191007 76.530816 9.458349 11.386464 [1] 22.42014 > 1/mean(x) # Btstrap estimate f lambda [1] 0.04460276 > # d it again > x=rexp(n, lambda); x [1] 9.4759699 2.5089895 0.1630891 0.7994896 51.2508151 10.9096888 9.2945093 2.8122216 [1] 10.90185 > 1/mean(x) # Btstrap estimate f lambda [1] 0.09172758 > # here we d a nnparametrical btstrape by replacing x=rexp(n, lambda) with x=sample(bs, n, replace=t) > x=sample(bs, n, replace=t); x [1] 11.10 22.38 67.40 31.50 5.03 7.73 11.10 22.38 [1] 22.3275 > 1/mean(x) # Btstrap estimate f lambda [1] 0.04478782 > # d it again > x=sample(bs, n, replace=t); x [1] 5.03 11.10 11.96 16.07 11.96 31.50 16.07 11.96 [1] 14.45625 > 1/mean(x) # Btstrap estimate f lambda [1] 0.06917423 > # repeat the prcedure B times, save the B sample means in xbar and sample sd in ss[] > B=200; xbar = rep(0, B); ss = rep(0, B) > fr(i in 1:B) { x=sample(bs, n, replace=t); xbar[i] = mean(x) } > # the B btstrap estimate f lambda are > lambda.bt = 1/xbar > summary(lambda.bt) Min. 1st Qu. Median Mean 3rd Qu. Max. 0.02127 0.04039 0.04856 0.05142 0.05994 0.09881 > # btstrap estimate f standard errr > sd(lambda.bt) [1] 0.01565485

> # distributin f the btstrap estimates > stem(lambda.bt) The decimal pint is 2 digit(s) t the left f the 2 1 2 55789 3 00112222333344444444 3 55556666777888889999 4 00000011111112222222233344444 4 5555666666677778888888889999999 5 00011112222233333333444444444 5 6666667888889 6 0000001222334 6 566677789 7 0000011222334 7 57 8 01124 8 56667 9 034 9 89 > hist(lambda.bt) > lambda.bt.sub = lambda.bt-mean(lambda.bt) > hist(lambda.bt.sub) Histgram f lambda.bt Histgram f lambda.bt.sub 0 10 20 30 40 50 60 0 10 20 30 40 50 0.02 0.04 0.06 0.08 0.10-0.04-0.02 0.00 0.02 0.04 lambda.bt > # 5 and 95 percentiles f lambda.bt-mean(lambda.bt) > quantile(lambda.bt.sub, c(.05,.95) ) 5% 95% -0.01940520 0.03217117 > # 90% btstrap CI > c(lambda-quantile(lambda.bt.sub,.95), lambdaquantile(lambda.bt.sub,.05) ) 95% 5% 0.01402620 0.06560257 lambda.bt.sub

7-2 7-2.5 Btstrap Estimate f the Standard Errr (CD Only) There are situatins in which the standard errr f the pint estimatr is unknwn. Usually, these are cases where the frm f ˆ is cmplicated, and the standard expectatin and variance peratrs are difficult t apply. A cmputer-intensive technique called the btstrap that was develped in recent years can be used fr this prblem. Suppse that we are sampling frm a ppulatin that can be mdeled by the prbability distributin f 1x; 2. The randm sample results in data values x 1, x 2, p, x n and we btain ˆ as the pint estimate f. We wuld nw use a cmputer t btain btstrap samples frm the distributin f 1x; ˆ 2, and fr each f these samples we calculate the btstrap estimate ˆ f. This results in Btstrap Sample Observatins Btstrap Estimate 1 2 B x 1, x 2, p, x n x 1, x 2, p, x n x 1, x 2, p, x n ˆ 1 ˆ 2 ˆ B Usually B 100 r 200 f these btstrap samples are taken. Let 11 B2 g B i 1 ˆ i be the sample mean f the btstrap estimates. The btstrap estimate f the standard errr f ˆ is just the sample standard deviatin f the ˆi, r B a 1 ˆ i 2 2 i 1 s ˆ R B 1 (S7-1) In the btstrap literature, B 1 in Equatin S7-1 is ften replaced by B. Hwever, fr the large values usually emplyed fr B, there is little difference in the estimate prduced fr. s ˆ EXAMPLE S7-1 The time t failure f an electrnic mdule used in an autmbile engine cntrller is tested at an elevated temperature in rder t accelerate the failure mechanism. The time t failure is expnentially distributed with unknwn parameter. Eight units are selected at randm and tested, with the resulting failure times (in hurs): x 1 11.96, x 2 5.03, x 3 67.40, x 4 16.07, x 5 31.50, x 6 7.73, x 7 11.10, and x 8 22.38. Nw the mean f an expnential distributin is 1, s E(X ) 1, and the expected value f the sample average is E1X2 1. Therefre, a reasnable way t estimate is with ˆ 1 X. Fr ur sample, x 21.65, s ur estimate f is ˆ 1 21.65 0.0462. T find the btstrap standard errr we wuld nw btain B 200 (say) samples f n 8 bservatins each frm an expnential distributin with parameter 0.0462. The fllwing table shws sme f these results: Btstrap Sample Observatins Btstrap Estimate 1 8.01, 28.85, 14.14, 59.12, 3.11, 32.19, 5.26, 14.17 2 33.27, 2.10, 40.17, 32.43, 6.94, 30.66, 18.99, 5.61 ˆ 1 0.0485 ˆ 2 0.0470 200 40.26, 39.26, 19.59, 43.53, 9.55, 7.07, 6.03, 8.94 ˆ 200 0.0459

7-3 The sample average f the ˆ i (the btstrap estimates) is 0.0513, and the standard deviatin f these btstrap estimates is 0.020. Therefre, the btstrap standard errr f ˆ is 0.020. In this case, estimating the parameter in an expnential distributin, the variance f the estimatr we used, ˆ, is knwn. When n is large, V 1 ˆ 2 2 n. Therefre the estimated standard errr f ˆ is 2 ˆ 2 n 210.04622 2 8 0.016. Ntice that this result agrees reasnably clsely with the btstrap standard errr. Smetimes we want t use the btstrap in situatins in which the frm f the prbability distributin is unknwn. In these cases, we take the n bservatins in the sample as the ppulatin and select B randm samples each f size n, with replacement, frm this ppulatin. Then Equatin S7-1 can be applied as described abve. The bk by Efrn and Tibshirani (1993) is an excellent intrductin t the btstrap. 7-3.3 Bayesian Estimatin f Parameters (CD Only) This bk uses methds f statistical inference based n the infrmatin in the sample data. In effect, these methds interpret prbabilities as relative frequencies. Smetimes we call prbabilities that are interpreted in this manner bjective prbabilities. There is anther apprach t statistical inference, called the Bayesian apprach, that cmbines sample infrmatin with ther infrmatin that may be available prir t cllecting the sample. In this sectin we briefly illustrate hw this apprach may be used in parameter estimatin. Suppse that the randm variable X has a prbability distributin that is a functin f ne parameter. We will write this prbability distributin as f 1x 0 2. This ntatin implies that the exact frm f the distributin f X is cnditinal n the value assigned t. The classical apprach t estimatin wuld cnsist f taking a randm sample f size n frm this distributin and then substituting the sample values x i int the estimatr fr. This estimatr culd have been develped using the maximum likelihd apprach, fr example. Suppse that we have sme additinal infrmatin abut and that we can summarize that infrmatin in the frm f a prbability distributin fr, say, f( ). This prbability distributin is ften called the prir distributin fr, and suppse that the mean f the prir is 0 and the variance is 2 0. This is a very nvel cncept insfar as the rest f this bk is cncerned because we are nw viewing the parameter as a randm variable. The prbabilities assciated with the prir distributin are ften called subjective prbabilities, in that they usually reflect the analyst s degree f belief regarding the true value f. The Bayesian apprach t estimatin uses the prir distributin fr, f( ), and the jint prbability distributin f the sample, say f 1x 1, x 2, p, x n 0 2, t find a psterir distributin fr, say, f 1 0 x 1, x 2, p, x n 2. This psterir distributin cntains infrmatin bth frm the sample and the prir distributin fr. In a sense, it expresses ur degree f belief regarding the true value f after bserving the sample data. It is easy cnceptually t find the psterir distributin. The jint prbability distributin f the sample X 1, X 2, p, X n and the parameter (remember that is a randm variable) is f 1x 1, x 2, p, x n, 2 f 1x 1, x 2, p, x n 0 2 f 1 2 and the marginal distributin f X 1, X 2, p, X n is a f 1x 1, x 2, p, x n, 2, discrete f 1x 1, x 2, p, x n 2 µ f 1x 1, x 2, p, x n, 2 d, cntinuus

8-2.6 Btstrap Cnfidence Intervals (CD Only) In Sectin 7-2.5 we shwed hw a technique called the btstrap culd be used t estimate the standard errr ˆ, where ˆ is an estimate f a parameter. We can als use the btstrap t find a cnfidence interval n the parameter. T illustrate, cnsider the case where is the mean f a nrmal distributin with knwn. Nw the estimatr f is X. Als ntice that z 2 1n is the 100(1 /2) percentile f the distributin f X, and z 2 1n is the 100( 2) percentile f this distributin. Therefre, we can write the prbability statement assciated with the 100(1 )% cnfidence interval as r P11001 22 percentile X 10011 22 percentile2 1 P1X 10011 22 percentile X 1001 22 percentile2 1 This last prbability statement implies that the lwer and upper 100(1 )% cnfidence limits fr are L X 100 11 22 percentile f X X z 2 1n U X 100 1 22 percentile f X X z 2 1n We may generalize this t an arbitrary parameter. The 100(1 )% cnfidence limits fr are L ˆ 10011 22 percentile f ˆ U ˆ 1001 22 percentile f ˆ Unfrtunately, the percentiles f ˆ may nt be as easy t find as in the case f the nrmal distributin mean. Hwever, they culd be estimated frm btstrap samples. Suppse we find B btstrap samples and calculate ˆ,, p, and and then calculate ˆ 2, p, ˆ B 1 ˆ 2 ˆ B ˆ 1,. The required percentiles can be btained directly frm the differences. Fr example, if B 200 and a 95% cnfidence interval n is desired, the fifth smallest and fifth largest f the differences ˆ i are the estimates f the necessary percentiles. We will illustrate this prcedure using the situatin first described in Example 7-3, invlving the parameter f an expnential distributin. Fllwing that example, a randm sample f n 8 engine cntrller mdules were tested t failure, and the estimate f btained was ˆ 0.0462, where ˆ 1 X is a maximum likelihd estimatr. We used 200 btstrap samples t btain an estimate f the standard errr fr ˆ. Figure S8-1(a) is a histgram f the 200 btstrap estimates ˆ i, i 1, 2, p, 200. Ntice that the histgram is nt symmetrical and is skewed t the right, indicating that the sampling distributin f ˆ als has this same shape. We subtracted the sample average f these btstrap estimates 0.5013 frm each ˆ i. The histgram f the differences ˆ i, i 1, 2, p, 200, is shwn in Figure S8-1(b). Suppse we wish t find a 90% cnfidence interval fr. Nw the fifth percentile f the btstrap samples ˆ i is 0.0228 and the ninetyfifth percentile is 0.03135. Therefre the lwer and upper 90% btstrap cnfidence limits are L ˆ 95 percentile f ˆ i 0.0462 0.03135 0.0149 U ˆ 5 percentile f ˆ i 0.0462 1 0.02282 0.0690 8-1

8-2 80 100 60 80 40 60 40 20 20 0 0 0 0.03 0.06 0.09 0.12 0.15 0.18 _ 0.04 0 0.04 0.08 0.12 λ i ^ λ _ i λ (a) Histgram f the btstrap estimate (b) Histgram f the differences ^ λ _ i λ Figure S8-1 Histgrams f the btstrap estimates f and the differences ˆ i ˆ used in finding the btstrap cnfidence interval. Therefre, ur 90% btstrap cnfidence interval fr is 0.0149 0.0690. There is an exact cnfidence interval fr the parameter in an expnential distributin. Fr the engine cntrller failure data fllwing Example 7-3, the exact 90% cnfidence interval fr is 0.0230 0.0759. Ntice that the tw cnfidence intervals are very similar. The length f the exact cnfidence interval is 0.0759 0.0230 0.0529, while the length f the btstrap cnfidence interval is 0.0690 0.0149 0.0541, which is nly slightly lnger. The percentile methd fr btstrap cnfidence intervals wrks well when the estimatr is unbiased and the standard errr f ˆ is apprximately cnstant (as a functin f ). An imprvement, knwn as the bias-crrected and accelerated methd, adjusts the percentiles in mre general cases. It culd be applied in this example (because ˆ is a biased estimatr), but at the cst f additinal cmplexity. 8-3.2 Develpment f the t-distributin (CD Only) We will give a frmal develpment f the t-distributin using the techniques presented in Sectin 5-8. It will be helpful t review that material befre reading this sectin. First cnsider the randm variable This quantity can be written as T X S 1n T X 1 n 2S 2 2 (S8-1) The cnfidence interval is 2 2,2n 12g x where 2 and 2 i 2 2 1 2,2n 12g x i 2 2,2n 1 2,2n are the lwer and upper 2 percentage pints f the chi-square distributin (which was intrduced briefly in Chapter 4 and discussed further in Sectin 8-4), and the x i are the n sample bservatins.