The exact bootstrap method shown on the example of the mean and variance estimation

Size: px
Start display at page:

Download "The exact bootstrap method shown on the example of the mean and variance estimation"

Transcription

1 Comput Stat (2013) 28: DOI /s ORIGINAL PAPER The exact bootstrap method shown on the example of the mean and variance estimation Joanna Kisielinska Received: 21 May 2011 / Accepted: 26 June 2012 / Published online: 21 July 2012 The Author(s) This article is published with open access at Springerlink.com Abstract The bootstrap method is based on resampling of the original random sample drawn from a population with an unknown distribution. In the article it was shown that because of the progress in computer technology resampling is actually unnecessary if the sample size is not too large. It is possible to automatically generate all possible resamples and calculate all realizations of the required statistic. The obtained distribution can be used in point or interval estimation of population parameters or in testing hypotheses. We should stress that in the exact bootstrap method the entire space of resamples is used and therefore there is no additional bias which results from resampling. The method was used to estimate mean and variance. The comparison of the obtained distributions with the limit distributions confirmed the accuracy of the exact bootstrap method. In order to compare the exact bootstrap method with the basic method (with random sampling) probability that 1,000 resamples would allow for estimating a parameter with a given accuracy was calculated. There is little chance of obtaining the desired accuracy, which is an argument supporting the use of the exact method. Random sampling may be interpreted as discretization of a continuous variable. Keywords Bootstrap Nonparametric estimation Discrete random variables Mean and variance estimation 1 Introduction Consider random variable X with unknown distribution F. We are interested in the distribution parameter denoted by θ. If the parameter can not be constructed directly, itis J. Kisielinska (B) Faculty of Economics, Warsaw University of Life Sciences, St. Nowoursynowska 166, Warsaw, Poland joanna_kisielinska@sggw.pl URL: jkisielinska

2 1062 J. Kisielinska necessary to draw a random sample and select an appropriate estimator of parameter θ. The estimator is a statistic defined on a sample space. The random sample is denoted by X = (X 1, X 2,...,X n ), its realization by x = (x 1, x 2,...,x n ), and the estimator of parameter θ by ˆθ = t(x). Efron (1979) proposed what he called the bootstrap method. It is based on a random selection of resamples (bootstrap samples) of size n from the obtained sample (original sample) x. The random selection is done with replacement and is assumed to have identical probabilities equal to 1/n of randomly selecting each of the values x k,fork = 1,...,n. Thus, distribution ˆF, also known as the bootstrap distribution, is generated. The bootstrap sample is denoted by X = ( X 1, X 2,...,X n), and its arbitrary realization by x = ( x 1, x 2,...,x n ). Estimator ˆθ for the bootstrap sample is denoted by ˆθ = t (X ). Approximation of the distribution of statistic ˆθ by the bootstrap statistic ˆθ is the essence of this method. If Monte Carlo approximation is used to construct distribution ˆθ, it is necessary to determine the number of the randomly selected bootstrap samples B. Using the bootstrap variance, Efron (1987) states that it is sufficient to have a small number of random samplings in order to achieve sufficient accuracy. Booth and Sarkar (1988) disagree with this statement. They used the distribution approximation of relative bootstrap variance. This allowed for the estimation of B for the given error level at the assumed confidence level. It proved that achieving an error lower than 10 % at the 0.95 confidence level requires B to be around 800. Efron and Tibshirani (1993, p. 52) believe that the estimation of standard error rarely requires more than 200 replications (repeated random samplings) while estimating the confidence interval requires 1,000 replications (Efron and Tibshirani 1993, p. 162). Having B bootstrap samples x 1, x 2,...,x B, we may estimate the unknown parameter of population θ. Each of these samples allows for calculating a single realization for statistic ˆθ. For the given original sample the realization of this statistic for every b resample is as follows: ˆθ (b) = t ( x b ). (1) The bootstrap estimation of parameter θ will then be: ˆθ ( ) = 1 B ˆθ (b), (2) B b=1 an estimation of the standard error of estimate will be a standard deviation in the form: s = 1 B B b=1 ( ) 2 ˆθ (b) ˆθ ( ) (3)

3 The exact bootstrap method shown on the example 1063 or: ŝ = 1 B ( ) 2 ˆθ B 1 (b) ˆθ ( ) (4) b=1 depending on whether B includes all possible samples (3) or it is only their subset (4). Considering the bootstrap method one may ask the question whether random sampling of the bootstrap sample X from the original sample X obtained previously is necessary. Random sampling is necessary if examining the entire population data is impossible or too costly. Using a sample instead of the population has its significant implications in the area of mathematical statistics interest. Note that the fundamental sample property is its finite size. The given bootstrap distribution ˆF for this sample is a simple discrete distribution. Distribution of any given statistic determined for n discrete random variables with a finite number of realizations does not have to be estimated as it may simply be calculated. The only question that remains open is how many calculations are required in this approach, which will be discussed below. Consider the case of the two-element sample (x 1, x 2 ). The possible resamples that may be obtained are: (x 1, x 1 ),(x 2, x 2 ),(x 1, x 2 ),(x 2, x 1 ). The probability of randomly selecting each of them is the same and equals 1/4. For the three-element sample (x 1, x 2, x 3 ) there are 3 3 = 27 of resamples: (x 1, x 1, x 1 ),(x 1, x 1, x 2 ),(x 1, x 1, x 3 ),(x 1, x 2, x 1 ),(x 1, x 2, x 2 ),(x 1, x 2, x 3 ),(x 1, x 3, x 1 ),(x 1, x 3, x 2 ),(x 1, x 3, x 3 ),(x 2, x 1, x 1 ),(x 2, x 1, x 2 ),(x 2, x 1, x 3 ),(x 2, x 2, x 1 ),(x 2, x 2, x 2 ),(x 2, x 2, x 3 ),(x 2, x 3, x 1 ),(x 2, x 3, x 2 ),(x 2, x 3, x 3 ),(x 3, x 1, x 1 ),(x 3, x 1, x 2 ),(x 3, x 1, x 3 ),(x 3, x 2, x 1 ),(x 3, x 2, x 2 ),(x 3, x 2, x 3 ),(x 3, x 3, x 1 ),(x 3, x 3, x 2 ) i(x 3, x 3, x 3 ). The probability of selecting each of the aforementioned resamples is also the same and equals 1/27. The fact that the resamples include the same elements which are just permutated has no significance as each of them has a defined (identical) probability. If the original sample is an n element sample, then the number of equally probable resamples equals BE = n n. The probability of randomly selecting each of the resamples is equal to 1/BE. It is necessary to stress that the space of resamples is a finite space measuring BE and such is the size of the exact (ideal) bootstrap sample. If it is not too large, all realizations of the estimator may be calculated. These realizations may be interpreted as realizations of a given discrete random variable. Since the number of the realizations is finite, it is necessary to use descriptive statistics tools for their analysis (estimation error is then calculated using formula (3)). If it is impossible to generate the entire sample space of resamples as n n is too large, random sampling, that is using the classical bootstrap (estimation error is described by formula (4)), is then necessary. It is worth noting that if the estimator is the mean and the sample is large, according to the Central Limit Theorem, there will be normal asymptotic distribution. The bootstrap method which uses the entire space of resamples may be called the exact bootstrap method. The claim that the method is exact only pertains to resampling. With regard to the original sample, its adequacy in relation to the original variable X is based on the Glivenko-Cantelli Theorem.

4 1064 J. Kisielinska 2 The exact bootstrap method Let us assume that from the population described by random variable X with an unknown distribution of probability F, an n element primary sample x = (x 1, x 2,...,x n ) was drawn. Because for some i = j it is possible that x i = x j, we should reduce 1 the size of the random sample to k different values. The probabilities p i of achieving the realization x i,fori = 1, 2,...,k does not have to be identical for all i (as is the case with the classical bootstrap). Let us introduce the concept of a discrete random sample variable, which may be denoted by X D, with probability distribution F D described using values p i such that: p D i ) = P (X D = x i = p i, for i = 1, 2,...,k. (5) The distribution of random variable X D is equivalent to the bootstrap distribution ˆF. The resample is denoted by X D = ( X1 D, X 2 D,...,X n D ). Estimator ˆθ for the resample may be denoted by ˆθ D = t ( X D). The distribution of ˆθ will be approximated by the distribution of statistic ˆθ D. Note that the problems related to the possible bias, consistency and effectiveness of the estimator pertain to estimator ˆθ. In the exact bootstrap method the realizations of estimator ˆθ D are calculated (for the whole population of resamples) rather than estimated based on the sample (drawn from population of resamples). Therefore, the method does not introduce any additional bias. For a single b realization of the resample x D ( b = x1 Db, xd 2 b,...,x D ) n b we should calculate ˆθ D (b) = t ( x Db) = t ( x1 Db, ) xdb 2,...,xDb n as well as the probability of its random selection. The probabilities are no longer identical due to the size reduction of the random sample for k different values. The probability of selecting a b sample equals: ( ) p Db = P ˆθ D = ˆθ D (b) = n i=1 p D b i, (6) where: p D i b = P ( X D = xi Db ). The number of possible realizations of resamples equals BE = k n. The correct algorithm should satisfy the condition: BE b=1 pd b = 1. Formula (6) describes the distribution of estimator ˆθ D, which is used to approximate the distribution of estimator ˆθ. It is a discrete distribution with a finite number of realizations although in most cases the number is very high. This distribution may be used in point or interval assessment of parameter θ or in testing hypotheses. Note that in essence the entire operation is based on the approximation of an unknown continuous distribution of a certain random variable ˆθ using discrete random variable ˆθ D with a distribution which may be generated based on a sample. Through 1 This reduction is not necessary but recommended as it allows for reduction of the problem dimension.

5 The exact bootstrap method shown on the example 1065 random sampling we actually conduct discretization of a certain continuous occurrence. We attempt to approximate the continuous random variable X by a sequence of its realization x = (x 1, x 2,...,x n ). Knowing the distribution of the discrete random variable, we may automatically calculate the distribution of the function of this variable. In the case of continuous random variables there is no automatic method which would allow for calculating the distribution of these functions. When using the bootstrap methods it is worth comparing the value of BE (the number of all resamples) with the prescribed number of resamplings B in the classical bootstrap. For example, for k = 15 and n = 18 we obtain BE = 15 18, which is a very large number, significantly greater than the sample B = 1,000. Nowadays such a great number of repetitions can be generated. The pioneering work of Efron dates back to At that time conducting such a great number of calculations within a reasonable time was impossible. Since it was impossible to examine the entire population represented by the original sample, it was necessary to draw resamples from a sample functioning as a population. In the bootstrap method the sequence of the obtained values of the estimator is sequenced from the lowest to the highest, which allows one to, for example, set the confidence intervals using the percentile method (Efron and Tibshirani 1993). In theory, the exact bootstrap method also permits it. The number of the possible realizations of statistic ˆθ D is very high. However, firstly, a part of the realizations of discrete statistic will certainly be repeated and secondly, it is advisable to group the results in a histogram. Creating a histogram is necessary for large problems, as the number of the possible estimator realizations is very large. However, this may cause a loss of data. We should also stress that in spite of this, a very accurate estimation of the confidence intervals may be achieved through the exact bootstrap method as the widths of the intervals in the histogram do not have to be identical. In ranges that require exact probabilities (or cumulative distribution function), the width of the interval may be very small. Limited accuracy may only result from the density of the individual realizations of the estimator and the probability of their selection. The easiest method of generating all resamples for discrete distribution is the recursive drawing of sequential elements from the original sample of size n (or k if there are repetitions in the original sample). Such an algorithm may be included in the brute force category. Algorithms of this type are considered ineffective. The number of generated realizations of the bootstrap samples may be reduced, as in resampling some values will be repeated random sampling with repetitions. Feller (1950, p. 38) presents a similar problem. Fisher and Hall (1991) presented an algorithm which allows for generating all resamples. Both works pertain to the situation when the probability of drawing every element from a sample is the same. Every n element secondary bootstrap sample, with the assumption of k different values, from which we draw its elements may be written as: x D b = (a 1b x 1, a 2b x 2,...,a kb x k ), (7) where every a jb 0, for j = 1, 2,...,k, is the number of occurrences in sample b of a j element of the sample. The numbers must satisfy the following condition:

6 1066 J. Kisielinska k a jb = n, (8) j=1 whereby some of them may be equal to 0. If for the selected j there is a jb = 0, it is an indication of the fact that x j did not occur in a b resample. The probability of drawing a single sample defined by (7) equals: p Db = k ( ) a jb. p Db j (9) j=1 There are two compiled programs written in C++ posted on the following website: The first one generates the exact bootstrap distribution of the mean estimator: X = 1 n n X i. (10) i=1 The second generates the variance estimator: Ŝ 2 = 1 n 1 n ( Xi X ) 2. (11) i=1 The first program provides all the possible bootstrap realizations of the mean estimator, while the second one generates a histogram due to a very large number of realizations. 3 The limit distribution of the bootstrap sample mean and variance estimator The exact bootstrap method will be used to estimate the mean and variance. The verification of accuracy will be made possible through the limit distributions which may be used when the sample is large (n 30). Consider an n element random sample X = (X 1, X 2,...,X n ). Variables X i,for i=1, 2,, n have the same distributions F, with the expected value μ and standard deviation σ. The limit distribution of the mean estimator X defined by formula (10)is a normal distribution with the following parameters: μ X = μ; σ X = σ n. (12) Before we define the limit distribution of an unbiased variance estimator (11) we will present the limit distribution of a biased estimator: S 2 = 1 n n ( Xi X ) 2. (13) i=1

7 The exact bootstrap method shown on the example 1067 The expected value of this estimator equals: μ S 2 = n 1 σ 2. (14) n If random variable X has distribution N(μ, σ), the distribution of the variance estimator (13) is a normal asymptotic distribution with the mean given (14) and standard deviation: 2 σ Norm S 2 = n σ 4. (15) The variance limit distribution from sample S 2 drawn from a population of any given distribution with parameters μ and σ will also be a normal distribution (variance is the averaged square of deviations from the mean values). The standard deviation of the limit distribution is described by the formula (Smirnow and Dunin-Barkowski 1973, p. 237): σ S 2 = μ 4 σ 4 n 2 (μ 4 2 σ 4) n 2 + μ 4 3 σ 4 n 3, (16) where: μ 4 is the fourth central moment of variable X. To determine the limit distribution of the unbiased estimator of variance Ŝ 2 described by (11), we should correct the parameters of limit distributions, bearing in mind the relationship: Ŝ 2 = n n 1 S2. (17) The expected value and standard deviation of estimator Ŝ 2 is obtained through multiplying the parameters of the distribution estimator S 2 by n 1 n. By using the exact bootstrap method the distribution of random variable X is approximated by the distribution of discrete random variable X D with a realization set (x 1, x 2,...,x k ) and probability distribution denoted by values p i = P(X D = x i ), for i = 1,...,k, whereby k i=1 p i = 1. The expected value μ D, standard deviation σ D and fourth central moment μ D 4 of variable X D are equal, respectively: μ D = k x i p i, σ D = k ( xi μ D) k ( 2 pi, μ D 4 = x i μ D) 4 pi. i=1 i=1 The normal limit distribution of estimator X will be denoted by GA and is as follows: ( GA : N μ D,σ D / ) n. (19) i=1 (18)

8 1068 J. Kisielinska Table 1 Distribution of random variable X D x i p i The limit distribution of estimator Ŝ 2 will be denoted by GV (Smirnow and Dunin- Barkowski 1973, p. 237): GV: N ( σ D) 2, n μ D 4 ( σ D) 4 n 1 n 2 (μ D 4 2 (σ D) ) 4 n 2 + μd 4 3 ( σ D ) 4 n 3. (20) 4 A comparison of the exact and basic bootstrap The basic bootstrap method is based on resampling the original sample, which may be interpreted as random sampling of the B realization of an estimator of any given parameter. Arithmetic mean is calculated (according to formula 2) based on the randomly selected realizations. From all the BE resamples, B samples may be selected in BE B ways. Even if BE is not large, BE B will be a very large number, which makes it impossible to calculate mean distribution. However, because B is large, limit distribution may be used, which is normal distribution: GO:N (μ BE, σ BE ), (21) B where: μ BE and σ BE are mean and standard deviation of the estimator of the parameter calculated using the exact bootstrap. This distribution allows for calculating the probability that estimating the parameter using basic bootstrap is within any given interval (in particular this may be confidence interval). 5 Results 5.1 Example 1 Suppose an n element sample drawn from a unspecified probability distribution F and represented by discrete random variable X D is given. The probability distribution ˆF of X D is presented in Table 1. The expected value and standard deviation of X D equal μ D = 5.174, and σ D = , respectively. Two alternatives of the distributions of mean and variance estimators calculated using the exact bootstrap method and limit distributions are provided. The first one

9 The exact bootstrap method shown on the example 1069 Table 2 Confidence intervals of the mean computed using the exact DBA bootstrap method and the GA limit distribution Sample size n = 20 n = 30 Estimator distribution DBA GA DBA GA Mean of mean estimator Standard deviation of mean estimator Boundaries confidence level 1 α = Width confidence level 1 α = Boundaries confidence level 1 α = Width confidence level 1 α = Source: own calculations assumes that n = 20 and the second one n = 30. The samples were generated according to the algorithm proposed by (Fisher and Hall 1991) Mean estimation for n = 20 and n = 30 In Fig. 1 there is the mean estimator distribution calculated using the exact bootstrap method (DBA) and the limit distribution GA defined by (19) in the interval separated from the mean by 4 standard deviations. The bootstrap distribution for the mean was provided as the probability of using individual values. In the case of the limit distributions, however, it is the probability of assuming the values of the interval (from the center of the interval between the values on the left to the center on the right). Both when n = 30 and n = 20 the diagrams are nearly identical. The probability values overlap with an accuracy to three decimal places. In Table 2 there are parameters (mean and standard deviation) of the distribution estimators of the mean DBA and GA and the confidence intervals calculated using them. With regard to the parameters of the distributions they are nearly identical (with an accuracy to five decimal places for both sizes). In the case of the confidence intervals there are some differences. For n = 20 the interval boundaries of the confidence distributions DVA and GA differ in the second decimal place and for n = 30 in the third decimal place. We should stress that for the set sample size n it is impossible to achieve an arbitrarily high accuracy of estimation as the exact bootstrap method DBA is a discrete distribution. Comparing basic bootstrap with exact bootstrap allows for ( distribution as given by formula (21). In the case of mean the distribution will be N μ D σ, ). D Assuming n B B = 1,000 for n = 20 we obtain N(5.174, ), and for n = 30 the distribution will be N(5.174, ). Knowing the distribution we may calculate probability of the mean being estimated with the desired accuracy based on 1,000 resamples. In Table 3 there are probabilities for accuracies that equal 0.1, 0.01, and The probability of estimation equals 1 only for the 0.1 accuracy and is >0.5 for the 0.01

10 1070 J. Kisielinska Fig. 1 Distributions of the GA and DBA mean estimators Table 3 Probability that the mean calculated based on 1,000 bootstrap samples will be computed with the given accuracy Accuracy Intervals n = 20 n = (mean 0.1; mean + 0.1) (mean 0.01; mean ) (mean 0.001; mean ) (mean ; mean ) Source: own calculations accuracy. We may note that if we increase the requirements concerning the accuracy of estimation, the probability of fulfilling the requirements rapidly decreases, despite the large sample 1,000 elements. In such situations the exact bootstrap should be used, which guarantees no bias at the resampling stage Variance estimation for n = 20 and n = 30 In Table 4 there are distributions of variance estimator determined using the exact bootstrap method (DBV) and limit distribution GV defined by formula (20). The number of different realizations of the variance estimator is significantly higher than

11 The exact bootstrap method shown on the example 1071 that of the mean estimator. Therefore the probabilities for intervals rather than for individual values are shown in the table. If one should use them to calculate the parameters of the distributions in the same way as for grouped data; thus, the results may differ from the exact values. The method of selecting the width of the intervals also requires some comment. For the limit (continuous) distributions and for each arbitrarily small interval the probability that the random variable will assume the values of this interval is >0. For the discrete distribution, and such is the variance estimator distribution calculated using the exact bootstrap method, the case is different. The smaller the interval width, the greater the number of intervals where the probability is equal to 0. Moreover, in Table 4 the expected values of variance estimator distributions and their standard deviations are presented. These are exact values calculated based on all the generated realizations. The expected values of limit distribution GV are equal to sample variance. In the case of the DBV distribution the expected value was also equal to the variance, which attests to the accuracy of the applied algorithm. The exact bootstrap method does not introduce additional estimator bias (contrary to the bootstrap method with random sampling). Also, note that the standard deviation of the DBV distribution is equal to the standard deviation of the GV distribution with an accuracy to five decimal places. In Fig. 2 there is the distribution of the variance estimators for n = 20 and n = 30. They prove that the distribution GV constitute the correct approximation of the distribution DBV for the random variable whose distribution is presented in Table 1. We should also note that the DBV distribution is slightly asymmetric in comparison to limit distributions. The difference is very small and the consistency of the GV and DBV distributions was confirmed by Pearson s goodness of fit test, both for n = 20 and n = 30. In Table 5 there are confidence intervals of the variance when using the exact bootstrap method (the DBV distribution) and the limit distribution GV. Comparing the width of the intervals we can state that the more precise estimation was done using the exact bootstrap method (and it is an exact estimation), then the limit distribution GV (with the exception of n = 20 and 1 α = 0.99, for which the confidence interval of the GV distribution was narrower than DVB). The left shift of the intervals for the GV distribution in relation to DBV results from the asymmetry of the latter. The presented calculations indicate that the GV distribution is a good approximation of the DBV distribution. A comparison of the basic and exact bootstrap methods, as is the case with the mean, would allow for distribution given by formula (21). The distribution of estimator variance obtained by formula (21) will be ( N σ D) 2 1 n μ D, 4 ( σ D) 4 2 (μ D 4 2 (σ D) ) 4 B n 1 n n 2 + μd 4 3 ( σ D ) 4 n 3.

12 1072 J. Kisielinska Fig. 2 Distributions of GV and DBV variance estimator If B = 1,000 for n = 20 we obtain the distribution N(3.9897, ), and for n = 30 the distribution is N(3.9897, ). In Table 6 there are probabilities of the mean being estimated based on 1,000 resamples with accuracy that equals 0.1, 0.01, and The probability of obtaining 0.1 accuracy equals 1 for both sizes of the original sample. However, the 0.01 accuracy may be obtained with probability for n = 20 and for n = 30. For the accuracies and the probabilities are very small. This is an indication of the need to use the exact bootstrap method in the case of estimating variance if the accuracy requirements are higher. 5.2 Example 2 The second simulation experiment for a small sample including the values {1, 2, 3, 4, 5}, assuming the same probabilities of random sampling of each element equal 0,2. This distribution is represented by discrete random variable X D with the expected value and variance equal to μ D = 3, and ( σ D) 2 = 2, respectively. Mean estimation for n =5. The expected value of the mean estimator equals 3 and standard deviation is (according to formula (12)). From 5 elements of the original sample we may draw 5 5 = 3,125 resamples. The probability of drawing each of them is the same in the given conditions and equals 1/3,125. Since the mean estimator for many resamples

13 The exact bootstrap method shown on the example 1073 Table 4 Distributions of variance estimator: obtained using the exact DBV bootstrap method and limit distributions of variance estimator GV Parameters n = 20 n = 30 DBV GV DBV GV Mean of variance estimator Standard deviation of variance estimator Intervals [0.00; 0.25) 2.77E E E E 07 [0.25; 0.50) 9.47E E E E 07 [0.50; 0.75) 5.73E E E E 06 [0.75; 1.00) 2.91E E E 05 [1.00; 1.25) 9.40E E E 05 [1.25; 1.50) E [1.50; 1.75) [1.75; 2.00) [2.00; 2.25) [2.25; 2.50) [2.50; 2.75) [2.75; 3.00) [3.00; 3.25) [3.25; 3.50) [3.50; 3.75) [3.75; 4.00) [4.00; 4.25) [4.25; 4.50) [4.50; 4.75) [4.75; 5.00) [5.00; 5.25) [5.25; 5.50) [5.50; 5.75) [5.75; 6.00) [6.00; 6.25) [6.25; 6.50) [6.50; 6.75) [6.75; 7.00) E 05 [7.00; 7.25) E 05 [7.25; 7.50) E E E 06 [7.50; + ) E E E 07 Source: own calculations has the same values, the set of all it realizations includes only 21 realizations. Figure 3 shows the distribution of estimator probability calculated using the exact bootstrap method (denoted as DVA), and limit distribution GA denoted by formula (19). The

14 1074 J. Kisielinska Table 5 Confidence intervals of the variance constructed using the exact bootstrap method DBV and GV limit distributions Sample size n = 20 n = 30 Estimator distribution DBV GV DBV GV Boundaries confidence level 1 α = Width confidence level 1 α = Boundaries confidence level 1 α = Width confidence level 1 α = Source: own calculations Table 6 Probability that the variance calculated based on 1,000 bootstrap samples will be computed with the given accuracy Accuracy Intervals n = 20 n = (variance 0.1; variance + 0.1) (variance 0.01; variance ) (variance 0.001; variance ) (variance ; variance ) Source: own calculations Fig. 3 Distributions of the GA and DBA mean estimators for a small sample probability of distributions DBA and GA is very large (the probabilities are equal with accuracy to two decimal places) even though the sample is small. The parameters of both distributions are identical, which is proved by the accuracy of the algorithm used. Table 7 presents the probabilities that the mean based on 1,000 bootstrap samples will be calculated with a given accuracy. The probability was calculated based on limit distribution, which for B = 1,000 is N(3, ). Only for the 0.1 accuracy does the probability equal 1. In all other cases the use of the exact bootstrap method is recommended.

15 The exact bootstrap method shown on the example 1075 Table 7 Probability that the mean computed based on 1,000 bootstrap small samples will be calculated with the given accuracy Accuracy Intervals Probability 0.1 (mean 0.1; mean + 0.1) (mean 0.01; mean ) (mean 0.001; mean ) (mean ; mean ) Source: own calculations Fig. 4 Distributions of the GV and DBV mean estimators for a small sample Table 8 Probability that the variance calculated based on 1,000 bootstrap small samples will be computed with the given accuracy Accuracy Intervals Probability 0.1 (variance 0.1; variance + 0.1) (variance 0.01; variance ) (variance 0.001; variance ) (variance ; variance ) Source: own calculations Variance estimation for n = 5 The expected value of the unbiased variance estimator equals 2 and its standard deviation (according to formula (8) after correction of the factor 5/4). The variance estimator for the original sample has only 26 different values. Figure 4 presents the distribution of unbiased variance estimator calculated using the exact bootstrap method DBV. Normal limit distribution DV was included for comparison. The differences between the distributions are notable and we should state that in the case of a variance estimator for a small sample limit distribution should not be used. Table 8 presents the probability that the variance calculated based on 1,000 bootstrap samples will be computed with the given accuracy. The probabilities were calculated

16 1076 J. Kisielinska using limit distribution N(2, ). As was the case with the mean, only for the 0.1 accuracy is the probability close to 1. In all other cases the use of the exact bootstrap method is recommended, since the probability of exact variance estimation using the basic bootstrap method is small. 6 Conclusion In the article the exact bootstrap method was discussed. This method can be used to estimate the parameters of the estimators of random variables with an unknown distribution. The method allows for determining the estimation of an arbitrary parameter, the error of this estimation, the distribution estimator or confidence intervals. Traditionally, this problem is solved using the bootstrap method, which consists on resampling of the original random sample. Random sampling is used in statistics if the entire population can not be examined or the study would be too problematic. First of all, the original sample is finite, and secondly, its distribution is known it is the empirical distribution. Instead of resampling, one can generate an entire space of resamples and determinate all the realizations of the statistic which is the estimator of the unknown parameter. The method was used to estimate mean and variance. It was shown that the expected values of the estimators are equal to the mean and variance of the sample. The method, therefore, does not introduce bias resulting from resampling as it may occur in classical bootstrap. The estimators distributions calculated using the exact bootstrap method was compared with the limit distributions. The similarity between the distributions indicates that there is a possibility of approximation of the exact distribution by the limit distribution if the sample is not too small. In order to assess the effectiveness of the traditional bootstrap method, limit distribution of the mean calculated from B realizations of the estimator of a given parameter (every realization is calculated based on a single bootstrap sample) was used. This distribution allows for calculating the probability of obtaining the assumed accuracy of estimation. Although the number of bootstrap resamples was large (B = 1,000), both the mean and variance probabilities rapidly decreased as required accuracy increased. This proves that it is worth using the exact method, which guarantees that there will not additional bias at the resampling stage. The conducted simulation experiments have revealed that in the case of small samples (n 15 and k = n) the time necessary to generate the entire space of resamples is short (<10 s using an average-quality computer). This means that there is no need for resampling. For larger samples, the time is much longer and requires several hours of computation (for n = 20 and k = n the calculations lasted 5 h and 30 min). The increase in size of the sample causes significant lengthening of the computation time. On the other hand, we should remember that the bootstrap method is used for small samples for larger samples the limit distribution of estimators may be used. However, considering the progress in computer technology, the exact bootstrap method will also be used for lager samples in the future.

17 The exact bootstrap method shown on the example 1077 What are the consequences of the possibility of generating complete information contained in the sample as presented in the article? The fundamental issue is the much greater flexibility in constructing estimators as there is no need to make the assumption of the distribution form so that determining the distribution of the estimator would be possible. One should attempt to make the estimator unbiased, consistent and effective. The exact bootstrap method may also be useful in this matter. Drawing a random sample may be seen as the replacement of a continuous random variable with an unknown distribution by a discrete variable with known distribution the bootstrap distribution. Transformations of discrete variables are easier than transformations of continuous variables since the distribution of discrete variable statistics can be calculated automatically. In reality, due to the finite accuracy of all measurements we can only observe discrete variables. We may suppose that with the increasing power of computers their role in statistics will also increase. Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited. References Booth JG, Sarkar S (1998) Monte Carlo approximation of bootstrap variances. Am Stat 4(52): Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 1(7):1 26 Efron B (1987) Better bootstrap confidence intervals (with discussion). J Am Stat Assoc 397(82): Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman & Hall, London Feller W (1950) An introduction to probability theory and its application. Wiley, New York, London, Sydney Fisher NI, Hall P (1991) Bootstrap algorithms for small samples. J Stat Plan Inference 27: SmirnowNW, Dunin-BarkowskiIW (1973) Kursrachunku prawdopodobieństwa i statystyki matematycznej dla zastosowań technicznych. Państwowe Wydawnictwo Naukowe, Warsaw

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap Patrick Breheny December 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/21 The empirical distribution function Suppose X F, where F (x) = Pr(X x) is a distribution function, and we wish to estimate

More information

The Nonparametric Bootstrap

The Nonparametric Bootstrap The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use

More information

Characterizing Forecast Uncertainty Prediction Intervals. The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ

Characterizing Forecast Uncertainty Prediction Intervals. The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ Characterizing Forecast Uncertainty Prediction Intervals The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ t + s, t. Under our assumptions the point forecasts are asymtotically unbiased

More information

Bootstrap, Jackknife and other resampling methods

Bootstrap, Jackknife and other resampling methods Bootstrap, Jackknife and other resampling methods Part III: Parametric Bootstrap Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland dahyot@mee.tcd.ie 2005 R. Dahyot (TCD)

More information

A Signed-Rank Test Based on the Score Function

A Signed-Rank Test Based on the Score Function Applied Mathematical Sciences, Vol. 10, 2016, no. 51, 2517-2527 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2016.66189 A Signed-Rank Test Based on the Score Function Hyo-Il Park Department

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 1 The General Bootstrap This is a computer-intensive resampling algorithm for estimating the empirical

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Resampling and the Bootstrap

Resampling and the Bootstrap Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing

More information

STAT Section 2.1: Basic Inference. Basic Definitions

STAT Section 2.1: Basic Inference. Basic Definitions STAT 518 --- Section 2.1: Basic Inference Basic Definitions Population: The collection of all the individuals of interest. This collection may be or even. Sample: A collection of elements of the population.

More information

Bootstrapping, Randomization, 2B-PLS

Bootstrapping, Randomization, 2B-PLS Bootstrapping, Randomization, 2B-PLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,

More information

STAT440/840: Statistical Computing

STAT440/840: Statistical Computing First Prev Next Last STAT440/840: Statistical Computing Paul Marriott pmarriott@math.uwaterloo.ca MC 6096 February 2, 2005 Page 1 of 41 First Prev Next Last Page 2 of 41 Chapter 3: Data resampling: the

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Permutation Tests Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods The Two-Sample Problem We observe two independent random samples: F z = z 1, z 2,, z n independently of

More information

Constructing Prediction Intervals for Random Forests

Constructing Prediction Intervals for Random Forests Senior Thesis in Mathematics Constructing Prediction Intervals for Random Forests Author: Benjamin Lu Advisor: Dr. Jo Hardin Submitted to Pomona College in Partial Fulfillment of the Degree of Bachelor

More information

Analytical Bootstrap Methods for Censored Data

Analytical Bootstrap Methods for Censored Data JOURNAL OF APPLIED MATHEMATICS AND DECISION SCIENCES, 6(2, 129 141 Copyright c 2002, Lawrence Erlbaum Associates, Inc. Analytical Bootstrap Methods for Censored Data ALAN D. HUTSON Division of Biostatistics,

More information

4 Resampling Methods: The Bootstrap

4 Resampling Methods: The Bootstrap 4 Resampling Methods: The Bootstrap Situation: Let x 1, x 2,..., x n be a SRS of size n taken from a distribution that is unknown. Let θ be a parameter of interest associated with this distribution and

More information

Bootstrapping Dependent Data in Ecology

Bootstrapping Dependent Data in Ecology Bootstrapping Dependent Data in Ecology Mark L Taper Department of Ecology 310 Lewis Hall Montana State University/ Bozeman Bozeman, MT 59717 < taper@ rapid.msu.montana.edu> Introduction I have two goals

More information

ON USING BOOTSTRAP APPROACH FOR UNCERTAINTY ESTIMATION

ON USING BOOTSTRAP APPROACH FOR UNCERTAINTY ESTIMATION The First International Proficiency Testing Conference Sinaia, România 11 th 13 th October, 2007 ON USING BOOTSTRAP APPROACH FOR UNCERTAINTY ESTIMATION Grigore Albeanu University of Oradea, UNESCO Chair

More information

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University  babu. Bootstrap G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu Director of Center for Astrostatistics http://astrostatistics.psu.edu Outline 1 Motivation 2 Simple statistical problem 3 Resampling

More information

Quantitative Empirical Methods Exam

Quantitative Empirical Methods Exam Quantitative Empirical Methods Exam Yale Department of Political Science, August 2016 You have seven hours to complete the exam. This exam consists of three parts. Back up your assertions with mathematics

More information

Better Bootstrap Confidence Intervals

Better Bootstrap Confidence Intervals by Bradley Efron University of Washington, Department of Statistics April 12, 2012 An example Suppose we wish to make inference on some parameter θ T (F ) (e.g. θ = E F X ), based on data We might suppose

More information

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Econometrics Working Paper EWP0401 ISSN 1485-6441 Department of Economics AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Lauren Bin Dong & David E. A. Giles Department of Economics, University of Victoria

More information

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values

The assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values Statistical Consulting Topics The Bootstrap... The bootstrap is a computer-based method for assigning measures of accuracy to statistical estimates. (Efron and Tibshrani, 1998.) What do we do when our

More information

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career. Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences

More information

Monte Carlo Simulations

Monte Carlo Simulations Monte Carlo Simulations What are Monte Carlo Simulations and why ones them? Pseudo Random Number generators Creating a realization of a general PDF The Bootstrap approach A real life example: LOFAR simulations

More information

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models

Supporting Information for Estimating restricted mean. treatment effects with stacked survival models Supporting Information for Estimating restricted mean treatment effects with stacked survival models Andrew Wey, David Vock, John Connett, and Kyle Rudser Section 1 presents several extensions to the simulation

More information

Kevin Ewans Shell International Exploration and Production

Kevin Ewans Shell International Exploration and Production Uncertainties In Extreme Wave Height Estimates For Hurricane Dominated Regions Philip Jonathan Shell Research Limited Kevin Ewans Shell International Exploration and Production Overview Background Motivating

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

One-Sample Numerical Data

One-Sample Numerical Data One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Bootstrap Confidence Intervals for the Correlation Coefficient

Bootstrap Confidence Intervals for the Correlation Coefficient Bootstrap Confidence Intervals for the Correlation Coefficient Homer S. White 12 November, 1993 1 Introduction We will discuss the general principles of the bootstrap, and describe two applications of

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

A Bias Correction for the Minimum Error Rate in Cross-validation

A Bias Correction for the Minimum Error Rate in Cross-validation A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.

More information

A better way to bootstrap pairs

A better way to bootstrap pairs A better way to bootstrap pairs Emmanuel Flachaire GREQAM - Université de la Méditerranée CORE - Université Catholique de Louvain April 999 Abstract In this paper we are interested in heteroskedastic regression

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Testing Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy

Testing Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy This article was downloaded by: [Ferdowsi University] On: 16 April 212, At: 4:53 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer

More information

Nonparametric Methods II

Nonparametric Methods II Nonparametric Methods II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 PART 3: Statistical Inference by

More information

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference 1 / 171 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2013 2 / 171 Unpaid advertisement

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference 1 / 172 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2014 2 / 172 Unpaid advertisement

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

Bootstrap Method for Dependent Data Structure and Measure of Statistical Precision

Bootstrap Method for Dependent Data Structure and Measure of Statistical Precision Journal of Mathematics and Statistics 6 (): 84-88, 00 ISSN 549-3644 00 Science Publications ootstrap Method for Dependent Data Structure and Measure of Statistical Precision T.O. Olatayo, G.N. Amahia and

More information

NEW APPROXIMATE INFERENTIAL METHODS FOR THE RELIABILITY PARAMETER IN A STRESS-STRENGTH MODEL: THE NORMAL CASE

NEW APPROXIMATE INFERENTIAL METHODS FOR THE RELIABILITY PARAMETER IN A STRESS-STRENGTH MODEL: THE NORMAL CASE Communications in Statistics-Theory and Methods 33 (4) 1715-1731 NEW APPROXIMATE INFERENTIAL METODS FOR TE RELIABILITY PARAMETER IN A STRESS-STRENGT MODEL: TE NORMAL CASE uizhen Guo and K. Krishnamoorthy

More information

Biased Urn Theory. Agner Fog October 4, 2007

Biased Urn Theory. Agner Fog October 4, 2007 Biased Urn Theory Agner Fog October 4, 2007 1 Introduction Two different probability distributions are both known in the literature as the noncentral hypergeometric distribution. These two distributions

More information

Statistics. Lecture 4 August 9, 2000 Frank Porter Caltech. 1. The Fundamentals; Point Estimation. 2. Maximum Likelihood, Least Squares and All That

Statistics. Lecture 4 August 9, 2000 Frank Porter Caltech. 1. The Fundamentals; Point Estimation. 2. Maximum Likelihood, Least Squares and All That Statistics Lecture 4 August 9, 2000 Frank Porter Caltech The plan for these lectures: 1. The Fundamentals; Point Estimation 2. Maximum Likelihood, Least Squares and All That 3. What is a Confidence Interval?

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Bootstrapping Australian inbound tourism

Bootstrapping Australian inbound tourism 19th International Congress on Modelling and Simulation, Perth, Australia, 12 16 December 2011 http://mssanz.org.au/modsim2011 Bootstrapping Australian inbound tourism Y.H. Cheunga and G. Yapa a School

More information

Statistical Applications in Genetics and Molecular Biology

Statistical Applications in Genetics and Molecular Biology Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca

More information

University of California San Diego and Stanford University and

University of California San Diego and Stanford University and First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford

More information

Bootstrapping the Confidence Intervals of R 2 MAD for Samples from Contaminated Standard Logistic Distribution

Bootstrapping the Confidence Intervals of R 2 MAD for Samples from Contaminated Standard Logistic Distribution Pertanika J. Sci. & Technol. 18 (1): 209 221 (2010) ISSN: 0128-7680 Universiti Putra Malaysia Press Bootstrapping the Confidence Intervals of R 2 MAD for Samples from Contaminated Standard Logistic Distribution

More information

Finite Population Correction Methods

Finite Population Correction Methods Finite Population Correction Methods Moses Obiri May 5, 2017 Contents 1 Introduction 1 2 Normal-based Confidence Interval 2 3 Bootstrap Confidence Interval 3 4 Finite Population Bootstrap Sampling 5 4.1

More information

arxiv: v1 [stat.co] 26 May 2009

arxiv: v1 [stat.co] 26 May 2009 MAXIMUM LIKELIHOOD ESTIMATION FOR MARKOV CHAINS arxiv:0905.4131v1 [stat.co] 6 May 009 IULIANA TEODORESCU Abstract. A new approach for optimal estimation of Markov chains with sparse transition matrices

More information

Fin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals

Fin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals Fin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals Overview Hypothesis testing terms Testing a die Testing issues Estimating means Confidence intervals

More information

Chapter 2: Resampling Maarten Jansen

Chapter 2: Resampling Maarten Jansen Chapter 2: Resampling Maarten Jansen Randomization tests Randomized experiment random assignment of sample subjects to groups Example: medical experiment with control group n 1 subjects for true medicine,

More information

interval forecasting

interval forecasting Interval Forecasting Based on Chapter 7 of the Time Series Forecasting by Chatfield Econometric Forecasting, January 2008 Outline 1 2 3 4 5 Terminology Interval Forecasts Density Forecast Fan Chart Most

More information

Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs

Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs Martin J. Wolfsegger Department of Biostatistics, Baxter AG, Vienna, Austria Thomas Jaki Department of Statistics, University of South Carolina,

More information

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department

More information

A Simulation Comparison Study for Estimating the Process Capability Index C pm with Asymmetric Tolerances

A Simulation Comparison Study for Estimating the Process Capability Index C pm with Asymmetric Tolerances Available online at ijims.ms.tku.edu.tw/list.asp International Journal of Information and Management Sciences 20 (2009), 243-253 A Simulation Comparison Study for Estimating the Process Capability Index

More information

Simulating Uniform- and Triangular- Based Double Power Method Distributions

Simulating Uniform- and Triangular- Based Double Power Method Distributions Journal of Statistical and Econometric Methods, vol.6, no.1, 2017, 1-44 ISSN: 1792-6602 (print), 1792-6939 (online) Scienpress Ltd, 2017 Simulating Uniform- and Triangular- Based Double Power Method Distributions

More information

A Classroom Approach to Illustrate Transformation and Bootstrap Confidence Interval Techniques Using the Poisson Distribution

A Classroom Approach to Illustrate Transformation and Bootstrap Confidence Interval Techniques Using the Poisson Distribution International Journal of Statistics and Probability; Vol. 6, No. 2; March 2017 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education A Classroom Approach to Illustrate Transformation

More information

Bootstrap, Jackknife and other resampling methods

Bootstrap, Jackknife and other resampling methods Bootstrap, Jackknife and other resampling methods Part VI: Cross-validation Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland dahyot@mee.tcd.ie 2005 R. Dahyot (TCD) 453 Modern

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for

More information

Advanced Statistics II: Non Parametric Tests

Advanced Statistics II: Non Parametric Tests Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney

More information

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University

Model Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates

More information

Hypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Hypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Hypothesis Testing with the Bootstrap Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Bootstrap Hypothesis Testing A bootstrap hypothesis test starts with a test statistic

More information

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2 STA 248 H1S MIDTERM TEST February 26, 2008 SURNAME: SOLUTIONS GIVEN NAME: STUDENT NUMBER: INSTRUCTIONS: Time: 1 hour and 50 minutes Aids allowed: calculator Tables of the standard normal, t and chi-square

More information

Confidence Estimation Methods for Neural Networks: A Practical Comparison

Confidence Estimation Methods for Neural Networks: A Practical Comparison , 6-8 000, Confidence Estimation Methods for : A Practical Comparison G. Papadopoulos, P.J. Edwards, A.F. Murray Department of Electronics and Electrical Engineering, University of Edinburgh Abstract.

More information

A Monte-Carlo study of asymptotically robust tests for correlation coefficients

A Monte-Carlo study of asymptotically robust tests for correlation coefficients Biometrika (1973), 6, 3, p. 661 551 Printed in Great Britain A Monte-Carlo study of asymptotically robust tests for correlation coefficients BY G. T. DUNCAN AND M. W. J. LAYAKD University of California,

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 12: Frequentist properties of estimators (v4) Ramesh Johari ramesh.johari@stanford.edu 1 / 39 Frequentist inference 2 / 39 Thinking like a frequentist Suppose that for some

More information

1 Overview. Coefficients of. Correlation, Alienation and Determination. Hervé Abdi Lynne J. Williams

1 Overview. Coefficients of. Correlation, Alienation and Determination. Hervé Abdi Lynne J. Williams In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 2010 Coefficients of Correlation, Alienation and Determination Hervé Abdi Lynne J. Williams 1 Overview The coefficient of

More information

Working Paper No Maximum score type estimators

Working Paper No Maximum score type estimators Warsaw School of Economics Institute of Econometrics Department of Applied Econometrics Department of Applied Econometrics Working Papers Warsaw School of Economics Al. iepodleglosci 64 02-554 Warszawa,

More information

Non-Parametric Dependent Data Bootstrap for Conditional Moment Models

Non-Parametric Dependent Data Bootstrap for Conditional Moment Models Non-Parametric Dependent Data Bootstrap for Conditional Moment Models Bruce E. Hansen University of Wisconsin September 1999 Preliminary and Incomplete Abstract A new non-parametric bootstrap is introduced

More information

Introductory Econometrics. Review of statistics (Part II: Inference)

Introductory Econometrics. Review of statistics (Part II: Inference) Introductory Econometrics Review of statistics (Part II: Inference) Jun Ma School of Economics Renmin University of China October 1, 2018 1/16 Null and alternative hypotheses Usually, we have two competing

More information

Bootstrap (Part 3) Christof Seiler. Stanford University, Spring 2016, Stats 205

Bootstrap (Part 3) Christof Seiler. Stanford University, Spring 2016, Stats 205 Bootstrap (Part 3) Christof Seiler Stanford University, Spring 2016, Stats 205 Overview So far we used three different bootstraps: Nonparametric bootstrap on the rows (e.g. regression, PCA with random

More information

Lifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with an Arrhenius rate relationship

Lifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with an Arrhenius rate relationship Scholars' Mine Doctoral Dissertations Student Research & Creative Works Spring 01 Lifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with

More information

Label switching and its solutions for frequentist mixture models

Label switching and its solutions for frequentist mixture models Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Label switching and its solutions for frequentist mixture

More information

ITERATED BOOTSTRAP PREDICTION INTERVALS

ITERATED BOOTSTRAP PREDICTION INTERVALS Statistica Sinica 8(1998), 489-504 ITERATED BOOTSTRAP PREDICTION INTERVALS Majid Mojirsheibani Carleton University Abstract: In this article we study the effects of bootstrap iteration as a method to improve

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Inference via Kernel Smoothing of Bootstrap P Values

Inference via Kernel Smoothing of Bootstrap P Values Queen s Economics Department Working Paper No. 1054 Inference via Kernel Smoothing of Bootstrap P Values Jeff Racine McMaster University James G. MacKinnon Queen s University Department of Economics Queen

More information

Section 8.1: Interval Estimation

Section 8.1: Interval Estimation Section 8.1: Interval Estimation Discrete-Event Simulation: A First Course c 2006 Pearson Ed., Inc. 0-13-142917-5 Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 1/ 35 Section

More information

Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions

Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions JKAU: Sci., Vol. 21 No. 2, pp: 197-212 (2009 A.D. / 1430 A.H.); DOI: 10.4197 / Sci. 21-2.2 Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions Ali Hussein Al-Marshadi

More information

Bootstrap Confidence Intervals

Bootstrap Confidence Intervals Bootstrap Confidence Intervals Patrick Breheny September 18 Patrick Breheny STA 621: Nonparametric Statistics 1/22 Introduction Bootstrap confidence intervals So far, we have discussed the idea behind

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Interaction effects for continuous predictors in regression modeling

Interaction effects for continuous predictors in regression modeling Interaction effects for continuous predictors in regression modeling Testing for interactions The linear regression model is undoubtedly the most commonly-used statistical model, and has the advantage

More information

Analysis of incomplete data in presence of competing risks

Analysis of incomplete data in presence of competing risks Journal of Statistical Planning and Inference 87 (2000) 221 239 www.elsevier.com/locate/jspi Analysis of incomplete data in presence of competing risks Debasis Kundu a;, Sankarshan Basu b a Department

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Confidence intervals for kernel density estimation

Confidence intervals for kernel density estimation Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting

More information

Harvard University. Harvard University Biostatistics Working Paper Series

Harvard University. Harvard University Biostatistics Working Paper Series Harvard University Harvard University Biostatistics Working Paper Series Year 2008 Paper 94 The Highest Confidence Density Region and Its Usage for Inferences about the Survival Function with Censored

More information

Nonparametric Test on Process Capability

Nonparametric Test on Process Capability Nonparametric Test on Process Capability Stefano Bonnini Abstract The study of process capability is very important in designing a new product or service and in the definition of purchase agreements. In

More information

"ZERO-POINT" IN THE EVALUATION OF MARTENS HARDNESS UNCERTAINTY

ZERO-POINT IN THE EVALUATION OF MARTENS HARDNESS UNCERTAINTY "ZERO-POINT" IN THE EVALUATION OF MARTENS HARDNESS UNCERTAINTY Professor Giulio Barbato, PhD Student Gabriele Brondino, Researcher Maurizio Galetto, Professor Grazia Vicario Politecnico di Torino Abstract

More information

The comparative studies on reliability for Rayleigh models

The comparative studies on reliability for Rayleigh models Journal of the Korean Data & Information Science Society 018, 9, 533 545 http://dx.doi.org/10.7465/jkdi.018.9..533 한국데이터정보과학회지 The comparative studies on reliability for Rayleigh models Ji Eun Oh 1 Joong

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008 MIT OpenCourseWare http://ocw.mit.edu 2.830J / 6.780J / ESD.63J Control of Processes (SMA 6303) Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

STAT 830 Non-parametric Inference Basics

STAT 830 Non-parametric Inference Basics STAT 830 Non-parametric Inference Basics Richard Lockhart Simon Fraser University STAT 801=830 Fall 2012 Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference Basics STAT 801=830

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing

More information

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

More information