The exact bootstrap method shown on the example of the mean and variance estimation
|
|
- Helena Cummings
- 5 years ago
- Views:
Transcription
1 Comput Stat (2013) 28: DOI /s ORIGINAL PAPER The exact bootstrap method shown on the example of the mean and variance estimation Joanna Kisielinska Received: 21 May 2011 / Accepted: 26 June 2012 / Published online: 21 July 2012 The Author(s) This article is published with open access at Springerlink.com Abstract The bootstrap method is based on resampling of the original random sample drawn from a population with an unknown distribution. In the article it was shown that because of the progress in computer technology resampling is actually unnecessary if the sample size is not too large. It is possible to automatically generate all possible resamples and calculate all realizations of the required statistic. The obtained distribution can be used in point or interval estimation of population parameters or in testing hypotheses. We should stress that in the exact bootstrap method the entire space of resamples is used and therefore there is no additional bias which results from resampling. The method was used to estimate mean and variance. The comparison of the obtained distributions with the limit distributions confirmed the accuracy of the exact bootstrap method. In order to compare the exact bootstrap method with the basic method (with random sampling) probability that 1,000 resamples would allow for estimating a parameter with a given accuracy was calculated. There is little chance of obtaining the desired accuracy, which is an argument supporting the use of the exact method. Random sampling may be interpreted as discretization of a continuous variable. Keywords Bootstrap Nonparametric estimation Discrete random variables Mean and variance estimation 1 Introduction Consider random variable X with unknown distribution F. We are interested in the distribution parameter denoted by θ. If the parameter can not be constructed directly, itis J. Kisielinska (B) Faculty of Economics, Warsaw University of Life Sciences, St. Nowoursynowska 166, Warsaw, Poland joanna_kisielinska@sggw.pl URL: jkisielinska
2 1062 J. Kisielinska necessary to draw a random sample and select an appropriate estimator of parameter θ. The estimator is a statistic defined on a sample space. The random sample is denoted by X = (X 1, X 2,...,X n ), its realization by x = (x 1, x 2,...,x n ), and the estimator of parameter θ by ˆθ = t(x). Efron (1979) proposed what he called the bootstrap method. It is based on a random selection of resamples (bootstrap samples) of size n from the obtained sample (original sample) x. The random selection is done with replacement and is assumed to have identical probabilities equal to 1/n of randomly selecting each of the values x k,fork = 1,...,n. Thus, distribution ˆF, also known as the bootstrap distribution, is generated. The bootstrap sample is denoted by X = ( X 1, X 2,...,X n), and its arbitrary realization by x = ( x 1, x 2,...,x n ). Estimator ˆθ for the bootstrap sample is denoted by ˆθ = t (X ). Approximation of the distribution of statistic ˆθ by the bootstrap statistic ˆθ is the essence of this method. If Monte Carlo approximation is used to construct distribution ˆθ, it is necessary to determine the number of the randomly selected bootstrap samples B. Using the bootstrap variance, Efron (1987) states that it is sufficient to have a small number of random samplings in order to achieve sufficient accuracy. Booth and Sarkar (1988) disagree with this statement. They used the distribution approximation of relative bootstrap variance. This allowed for the estimation of B for the given error level at the assumed confidence level. It proved that achieving an error lower than 10 % at the 0.95 confidence level requires B to be around 800. Efron and Tibshirani (1993, p. 52) believe that the estimation of standard error rarely requires more than 200 replications (repeated random samplings) while estimating the confidence interval requires 1,000 replications (Efron and Tibshirani 1993, p. 162). Having B bootstrap samples x 1, x 2,...,x B, we may estimate the unknown parameter of population θ. Each of these samples allows for calculating a single realization for statistic ˆθ. For the given original sample the realization of this statistic for every b resample is as follows: ˆθ (b) = t ( x b ). (1) The bootstrap estimation of parameter θ will then be: ˆθ ( ) = 1 B ˆθ (b), (2) B b=1 an estimation of the standard error of estimate will be a standard deviation in the form: s = 1 B B b=1 ( ) 2 ˆθ (b) ˆθ ( ) (3)
3 The exact bootstrap method shown on the example 1063 or: ŝ = 1 B ( ) 2 ˆθ B 1 (b) ˆθ ( ) (4) b=1 depending on whether B includes all possible samples (3) or it is only their subset (4). Considering the bootstrap method one may ask the question whether random sampling of the bootstrap sample X from the original sample X obtained previously is necessary. Random sampling is necessary if examining the entire population data is impossible or too costly. Using a sample instead of the population has its significant implications in the area of mathematical statistics interest. Note that the fundamental sample property is its finite size. The given bootstrap distribution ˆF for this sample is a simple discrete distribution. Distribution of any given statistic determined for n discrete random variables with a finite number of realizations does not have to be estimated as it may simply be calculated. The only question that remains open is how many calculations are required in this approach, which will be discussed below. Consider the case of the two-element sample (x 1, x 2 ). The possible resamples that may be obtained are: (x 1, x 1 ),(x 2, x 2 ),(x 1, x 2 ),(x 2, x 1 ). The probability of randomly selecting each of them is the same and equals 1/4. For the three-element sample (x 1, x 2, x 3 ) there are 3 3 = 27 of resamples: (x 1, x 1, x 1 ),(x 1, x 1, x 2 ),(x 1, x 1, x 3 ),(x 1, x 2, x 1 ),(x 1, x 2, x 2 ),(x 1, x 2, x 3 ),(x 1, x 3, x 1 ),(x 1, x 3, x 2 ),(x 1, x 3, x 3 ),(x 2, x 1, x 1 ),(x 2, x 1, x 2 ),(x 2, x 1, x 3 ),(x 2, x 2, x 1 ),(x 2, x 2, x 2 ),(x 2, x 2, x 3 ),(x 2, x 3, x 1 ),(x 2, x 3, x 2 ),(x 2, x 3, x 3 ),(x 3, x 1, x 1 ),(x 3, x 1, x 2 ),(x 3, x 1, x 3 ),(x 3, x 2, x 1 ),(x 3, x 2, x 2 ),(x 3, x 2, x 3 ),(x 3, x 3, x 1 ),(x 3, x 3, x 2 ) i(x 3, x 3, x 3 ). The probability of selecting each of the aforementioned resamples is also the same and equals 1/27. The fact that the resamples include the same elements which are just permutated has no significance as each of them has a defined (identical) probability. If the original sample is an n element sample, then the number of equally probable resamples equals BE = n n. The probability of randomly selecting each of the resamples is equal to 1/BE. It is necessary to stress that the space of resamples is a finite space measuring BE and such is the size of the exact (ideal) bootstrap sample. If it is not too large, all realizations of the estimator may be calculated. These realizations may be interpreted as realizations of a given discrete random variable. Since the number of the realizations is finite, it is necessary to use descriptive statistics tools for their analysis (estimation error is then calculated using formula (3)). If it is impossible to generate the entire sample space of resamples as n n is too large, random sampling, that is using the classical bootstrap (estimation error is described by formula (4)), is then necessary. It is worth noting that if the estimator is the mean and the sample is large, according to the Central Limit Theorem, there will be normal asymptotic distribution. The bootstrap method which uses the entire space of resamples may be called the exact bootstrap method. The claim that the method is exact only pertains to resampling. With regard to the original sample, its adequacy in relation to the original variable X is based on the Glivenko-Cantelli Theorem.
4 1064 J. Kisielinska 2 The exact bootstrap method Let us assume that from the population described by random variable X with an unknown distribution of probability F, an n element primary sample x = (x 1, x 2,...,x n ) was drawn. Because for some i = j it is possible that x i = x j, we should reduce 1 the size of the random sample to k different values. The probabilities p i of achieving the realization x i,fori = 1, 2,...,k does not have to be identical for all i (as is the case with the classical bootstrap). Let us introduce the concept of a discrete random sample variable, which may be denoted by X D, with probability distribution F D described using values p i such that: p D i ) = P (X D = x i = p i, for i = 1, 2,...,k. (5) The distribution of random variable X D is equivalent to the bootstrap distribution ˆF. The resample is denoted by X D = ( X1 D, X 2 D,...,X n D ). Estimator ˆθ for the resample may be denoted by ˆθ D = t ( X D). The distribution of ˆθ will be approximated by the distribution of statistic ˆθ D. Note that the problems related to the possible bias, consistency and effectiveness of the estimator pertain to estimator ˆθ. In the exact bootstrap method the realizations of estimator ˆθ D are calculated (for the whole population of resamples) rather than estimated based on the sample (drawn from population of resamples). Therefore, the method does not introduce any additional bias. For a single b realization of the resample x D ( b = x1 Db, xd 2 b,...,x D ) n b we should calculate ˆθ D (b) = t ( x Db) = t ( x1 Db, ) xdb 2,...,xDb n as well as the probability of its random selection. The probabilities are no longer identical due to the size reduction of the random sample for k different values. The probability of selecting a b sample equals: ( ) p Db = P ˆθ D = ˆθ D (b) = n i=1 p D b i, (6) where: p D i b = P ( X D = xi Db ). The number of possible realizations of resamples equals BE = k n. The correct algorithm should satisfy the condition: BE b=1 pd b = 1. Formula (6) describes the distribution of estimator ˆθ D, which is used to approximate the distribution of estimator ˆθ. It is a discrete distribution with a finite number of realizations although in most cases the number is very high. This distribution may be used in point or interval assessment of parameter θ or in testing hypotheses. Note that in essence the entire operation is based on the approximation of an unknown continuous distribution of a certain random variable ˆθ using discrete random variable ˆθ D with a distribution which may be generated based on a sample. Through 1 This reduction is not necessary but recommended as it allows for reduction of the problem dimension.
5 The exact bootstrap method shown on the example 1065 random sampling we actually conduct discretization of a certain continuous occurrence. We attempt to approximate the continuous random variable X by a sequence of its realization x = (x 1, x 2,...,x n ). Knowing the distribution of the discrete random variable, we may automatically calculate the distribution of the function of this variable. In the case of continuous random variables there is no automatic method which would allow for calculating the distribution of these functions. When using the bootstrap methods it is worth comparing the value of BE (the number of all resamples) with the prescribed number of resamplings B in the classical bootstrap. For example, for k = 15 and n = 18 we obtain BE = 15 18, which is a very large number, significantly greater than the sample B = 1,000. Nowadays such a great number of repetitions can be generated. The pioneering work of Efron dates back to At that time conducting such a great number of calculations within a reasonable time was impossible. Since it was impossible to examine the entire population represented by the original sample, it was necessary to draw resamples from a sample functioning as a population. In the bootstrap method the sequence of the obtained values of the estimator is sequenced from the lowest to the highest, which allows one to, for example, set the confidence intervals using the percentile method (Efron and Tibshirani 1993). In theory, the exact bootstrap method also permits it. The number of the possible realizations of statistic ˆθ D is very high. However, firstly, a part of the realizations of discrete statistic will certainly be repeated and secondly, it is advisable to group the results in a histogram. Creating a histogram is necessary for large problems, as the number of the possible estimator realizations is very large. However, this may cause a loss of data. We should also stress that in spite of this, a very accurate estimation of the confidence intervals may be achieved through the exact bootstrap method as the widths of the intervals in the histogram do not have to be identical. In ranges that require exact probabilities (or cumulative distribution function), the width of the interval may be very small. Limited accuracy may only result from the density of the individual realizations of the estimator and the probability of their selection. The easiest method of generating all resamples for discrete distribution is the recursive drawing of sequential elements from the original sample of size n (or k if there are repetitions in the original sample). Such an algorithm may be included in the brute force category. Algorithms of this type are considered ineffective. The number of generated realizations of the bootstrap samples may be reduced, as in resampling some values will be repeated random sampling with repetitions. Feller (1950, p. 38) presents a similar problem. Fisher and Hall (1991) presented an algorithm which allows for generating all resamples. Both works pertain to the situation when the probability of drawing every element from a sample is the same. Every n element secondary bootstrap sample, with the assumption of k different values, from which we draw its elements may be written as: x D b = (a 1b x 1, a 2b x 2,...,a kb x k ), (7) where every a jb 0, for j = 1, 2,...,k, is the number of occurrences in sample b of a j element of the sample. The numbers must satisfy the following condition:
6 1066 J. Kisielinska k a jb = n, (8) j=1 whereby some of them may be equal to 0. If for the selected j there is a jb = 0, it is an indication of the fact that x j did not occur in a b resample. The probability of drawing a single sample defined by (7) equals: p Db = k ( ) a jb. p Db j (9) j=1 There are two compiled programs written in C++ posted on the following website: The first one generates the exact bootstrap distribution of the mean estimator: X = 1 n n X i. (10) i=1 The second generates the variance estimator: Ŝ 2 = 1 n 1 n ( Xi X ) 2. (11) i=1 The first program provides all the possible bootstrap realizations of the mean estimator, while the second one generates a histogram due to a very large number of realizations. 3 The limit distribution of the bootstrap sample mean and variance estimator The exact bootstrap method will be used to estimate the mean and variance. The verification of accuracy will be made possible through the limit distributions which may be used when the sample is large (n 30). Consider an n element random sample X = (X 1, X 2,...,X n ). Variables X i,for i=1, 2,, n have the same distributions F, with the expected value μ and standard deviation σ. The limit distribution of the mean estimator X defined by formula (10)is a normal distribution with the following parameters: μ X = μ; σ X = σ n. (12) Before we define the limit distribution of an unbiased variance estimator (11) we will present the limit distribution of a biased estimator: S 2 = 1 n n ( Xi X ) 2. (13) i=1
7 The exact bootstrap method shown on the example 1067 The expected value of this estimator equals: μ S 2 = n 1 σ 2. (14) n If random variable X has distribution N(μ, σ), the distribution of the variance estimator (13) is a normal asymptotic distribution with the mean given (14) and standard deviation: 2 σ Norm S 2 = n σ 4. (15) The variance limit distribution from sample S 2 drawn from a population of any given distribution with parameters μ and σ will also be a normal distribution (variance is the averaged square of deviations from the mean values). The standard deviation of the limit distribution is described by the formula (Smirnow and Dunin-Barkowski 1973, p. 237): σ S 2 = μ 4 σ 4 n 2 (μ 4 2 σ 4) n 2 + μ 4 3 σ 4 n 3, (16) where: μ 4 is the fourth central moment of variable X. To determine the limit distribution of the unbiased estimator of variance Ŝ 2 described by (11), we should correct the parameters of limit distributions, bearing in mind the relationship: Ŝ 2 = n n 1 S2. (17) The expected value and standard deviation of estimator Ŝ 2 is obtained through multiplying the parameters of the distribution estimator S 2 by n 1 n. By using the exact bootstrap method the distribution of random variable X is approximated by the distribution of discrete random variable X D with a realization set (x 1, x 2,...,x k ) and probability distribution denoted by values p i = P(X D = x i ), for i = 1,...,k, whereby k i=1 p i = 1. The expected value μ D, standard deviation σ D and fourth central moment μ D 4 of variable X D are equal, respectively: μ D = k x i p i, σ D = k ( xi μ D) k ( 2 pi, μ D 4 = x i μ D) 4 pi. i=1 i=1 The normal limit distribution of estimator X will be denoted by GA and is as follows: ( GA : N μ D,σ D / ) n. (19) i=1 (18)
8 1068 J. Kisielinska Table 1 Distribution of random variable X D x i p i The limit distribution of estimator Ŝ 2 will be denoted by GV (Smirnow and Dunin- Barkowski 1973, p. 237): GV: N ( σ D) 2, n μ D 4 ( σ D) 4 n 1 n 2 (μ D 4 2 (σ D) ) 4 n 2 + μd 4 3 ( σ D ) 4 n 3. (20) 4 A comparison of the exact and basic bootstrap The basic bootstrap method is based on resampling the original sample, which may be interpreted as random sampling of the B realization of an estimator of any given parameter. Arithmetic mean is calculated (according to formula 2) based on the randomly selected realizations. From all the BE resamples, B samples may be selected in BE B ways. Even if BE is not large, BE B will be a very large number, which makes it impossible to calculate mean distribution. However, because B is large, limit distribution may be used, which is normal distribution: GO:N (μ BE, σ BE ), (21) B where: μ BE and σ BE are mean and standard deviation of the estimator of the parameter calculated using the exact bootstrap. This distribution allows for calculating the probability that estimating the parameter using basic bootstrap is within any given interval (in particular this may be confidence interval). 5 Results 5.1 Example 1 Suppose an n element sample drawn from a unspecified probability distribution F and represented by discrete random variable X D is given. The probability distribution ˆF of X D is presented in Table 1. The expected value and standard deviation of X D equal μ D = 5.174, and σ D = , respectively. Two alternatives of the distributions of mean and variance estimators calculated using the exact bootstrap method and limit distributions are provided. The first one
9 The exact bootstrap method shown on the example 1069 Table 2 Confidence intervals of the mean computed using the exact DBA bootstrap method and the GA limit distribution Sample size n = 20 n = 30 Estimator distribution DBA GA DBA GA Mean of mean estimator Standard deviation of mean estimator Boundaries confidence level 1 α = Width confidence level 1 α = Boundaries confidence level 1 α = Width confidence level 1 α = Source: own calculations assumes that n = 20 and the second one n = 30. The samples were generated according to the algorithm proposed by (Fisher and Hall 1991) Mean estimation for n = 20 and n = 30 In Fig. 1 there is the mean estimator distribution calculated using the exact bootstrap method (DBA) and the limit distribution GA defined by (19) in the interval separated from the mean by 4 standard deviations. The bootstrap distribution for the mean was provided as the probability of using individual values. In the case of the limit distributions, however, it is the probability of assuming the values of the interval (from the center of the interval between the values on the left to the center on the right). Both when n = 30 and n = 20 the diagrams are nearly identical. The probability values overlap with an accuracy to three decimal places. In Table 2 there are parameters (mean and standard deviation) of the distribution estimators of the mean DBA and GA and the confidence intervals calculated using them. With regard to the parameters of the distributions they are nearly identical (with an accuracy to five decimal places for both sizes). In the case of the confidence intervals there are some differences. For n = 20 the interval boundaries of the confidence distributions DVA and GA differ in the second decimal place and for n = 30 in the third decimal place. We should stress that for the set sample size n it is impossible to achieve an arbitrarily high accuracy of estimation as the exact bootstrap method DBA is a discrete distribution. Comparing basic bootstrap with exact bootstrap allows for ( distribution as given by formula (21). In the case of mean the distribution will be N μ D σ, ). D Assuming n B B = 1,000 for n = 20 we obtain N(5.174, ), and for n = 30 the distribution will be N(5.174, ). Knowing the distribution we may calculate probability of the mean being estimated with the desired accuracy based on 1,000 resamples. In Table 3 there are probabilities for accuracies that equal 0.1, 0.01, and The probability of estimation equals 1 only for the 0.1 accuracy and is >0.5 for the 0.01
10 1070 J. Kisielinska Fig. 1 Distributions of the GA and DBA mean estimators Table 3 Probability that the mean calculated based on 1,000 bootstrap samples will be computed with the given accuracy Accuracy Intervals n = 20 n = (mean 0.1; mean + 0.1) (mean 0.01; mean ) (mean 0.001; mean ) (mean ; mean ) Source: own calculations accuracy. We may note that if we increase the requirements concerning the accuracy of estimation, the probability of fulfilling the requirements rapidly decreases, despite the large sample 1,000 elements. In such situations the exact bootstrap should be used, which guarantees no bias at the resampling stage Variance estimation for n = 20 and n = 30 In Table 4 there are distributions of variance estimator determined using the exact bootstrap method (DBV) and limit distribution GV defined by formula (20). The number of different realizations of the variance estimator is significantly higher than
11 The exact bootstrap method shown on the example 1071 that of the mean estimator. Therefore the probabilities for intervals rather than for individual values are shown in the table. If one should use them to calculate the parameters of the distributions in the same way as for grouped data; thus, the results may differ from the exact values. The method of selecting the width of the intervals also requires some comment. For the limit (continuous) distributions and for each arbitrarily small interval the probability that the random variable will assume the values of this interval is >0. For the discrete distribution, and such is the variance estimator distribution calculated using the exact bootstrap method, the case is different. The smaller the interval width, the greater the number of intervals where the probability is equal to 0. Moreover, in Table 4 the expected values of variance estimator distributions and their standard deviations are presented. These are exact values calculated based on all the generated realizations. The expected values of limit distribution GV are equal to sample variance. In the case of the DBV distribution the expected value was also equal to the variance, which attests to the accuracy of the applied algorithm. The exact bootstrap method does not introduce additional estimator bias (contrary to the bootstrap method with random sampling). Also, note that the standard deviation of the DBV distribution is equal to the standard deviation of the GV distribution with an accuracy to five decimal places. In Fig. 2 there is the distribution of the variance estimators for n = 20 and n = 30. They prove that the distribution GV constitute the correct approximation of the distribution DBV for the random variable whose distribution is presented in Table 1. We should also note that the DBV distribution is slightly asymmetric in comparison to limit distributions. The difference is very small and the consistency of the GV and DBV distributions was confirmed by Pearson s goodness of fit test, both for n = 20 and n = 30. In Table 5 there are confidence intervals of the variance when using the exact bootstrap method (the DBV distribution) and the limit distribution GV. Comparing the width of the intervals we can state that the more precise estimation was done using the exact bootstrap method (and it is an exact estimation), then the limit distribution GV (with the exception of n = 20 and 1 α = 0.99, for which the confidence interval of the GV distribution was narrower than DVB). The left shift of the intervals for the GV distribution in relation to DBV results from the asymmetry of the latter. The presented calculations indicate that the GV distribution is a good approximation of the DBV distribution. A comparison of the basic and exact bootstrap methods, as is the case with the mean, would allow for distribution given by formula (21). The distribution of estimator variance obtained by formula (21) will be ( N σ D) 2 1 n μ D, 4 ( σ D) 4 2 (μ D 4 2 (σ D) ) 4 B n 1 n n 2 + μd 4 3 ( σ D ) 4 n 3.
12 1072 J. Kisielinska Fig. 2 Distributions of GV and DBV variance estimator If B = 1,000 for n = 20 we obtain the distribution N(3.9897, ), and for n = 30 the distribution is N(3.9897, ). In Table 6 there are probabilities of the mean being estimated based on 1,000 resamples with accuracy that equals 0.1, 0.01, and The probability of obtaining 0.1 accuracy equals 1 for both sizes of the original sample. However, the 0.01 accuracy may be obtained with probability for n = 20 and for n = 30. For the accuracies and the probabilities are very small. This is an indication of the need to use the exact bootstrap method in the case of estimating variance if the accuracy requirements are higher. 5.2 Example 2 The second simulation experiment for a small sample including the values {1, 2, 3, 4, 5}, assuming the same probabilities of random sampling of each element equal 0,2. This distribution is represented by discrete random variable X D with the expected value and variance equal to μ D = 3, and ( σ D) 2 = 2, respectively. Mean estimation for n =5. The expected value of the mean estimator equals 3 and standard deviation is (according to formula (12)). From 5 elements of the original sample we may draw 5 5 = 3,125 resamples. The probability of drawing each of them is the same in the given conditions and equals 1/3,125. Since the mean estimator for many resamples
13 The exact bootstrap method shown on the example 1073 Table 4 Distributions of variance estimator: obtained using the exact DBV bootstrap method and limit distributions of variance estimator GV Parameters n = 20 n = 30 DBV GV DBV GV Mean of variance estimator Standard deviation of variance estimator Intervals [0.00; 0.25) 2.77E E E E 07 [0.25; 0.50) 9.47E E E E 07 [0.50; 0.75) 5.73E E E E 06 [0.75; 1.00) 2.91E E E 05 [1.00; 1.25) 9.40E E E 05 [1.25; 1.50) E [1.50; 1.75) [1.75; 2.00) [2.00; 2.25) [2.25; 2.50) [2.50; 2.75) [2.75; 3.00) [3.00; 3.25) [3.25; 3.50) [3.50; 3.75) [3.75; 4.00) [4.00; 4.25) [4.25; 4.50) [4.50; 4.75) [4.75; 5.00) [5.00; 5.25) [5.25; 5.50) [5.50; 5.75) [5.75; 6.00) [6.00; 6.25) [6.25; 6.50) [6.50; 6.75) [6.75; 7.00) E 05 [7.00; 7.25) E 05 [7.25; 7.50) E E E 06 [7.50; + ) E E E 07 Source: own calculations has the same values, the set of all it realizations includes only 21 realizations. Figure 3 shows the distribution of estimator probability calculated using the exact bootstrap method (denoted as DVA), and limit distribution GA denoted by formula (19). The
14 1074 J. Kisielinska Table 5 Confidence intervals of the variance constructed using the exact bootstrap method DBV and GV limit distributions Sample size n = 20 n = 30 Estimator distribution DBV GV DBV GV Boundaries confidence level 1 α = Width confidence level 1 α = Boundaries confidence level 1 α = Width confidence level 1 α = Source: own calculations Table 6 Probability that the variance calculated based on 1,000 bootstrap samples will be computed with the given accuracy Accuracy Intervals n = 20 n = (variance 0.1; variance + 0.1) (variance 0.01; variance ) (variance 0.001; variance ) (variance ; variance ) Source: own calculations Fig. 3 Distributions of the GA and DBA mean estimators for a small sample probability of distributions DBA and GA is very large (the probabilities are equal with accuracy to two decimal places) even though the sample is small. The parameters of both distributions are identical, which is proved by the accuracy of the algorithm used. Table 7 presents the probabilities that the mean based on 1,000 bootstrap samples will be calculated with a given accuracy. The probability was calculated based on limit distribution, which for B = 1,000 is N(3, ). Only for the 0.1 accuracy does the probability equal 1. In all other cases the use of the exact bootstrap method is recommended.
15 The exact bootstrap method shown on the example 1075 Table 7 Probability that the mean computed based on 1,000 bootstrap small samples will be calculated with the given accuracy Accuracy Intervals Probability 0.1 (mean 0.1; mean + 0.1) (mean 0.01; mean ) (mean 0.001; mean ) (mean ; mean ) Source: own calculations Fig. 4 Distributions of the GV and DBV mean estimators for a small sample Table 8 Probability that the variance calculated based on 1,000 bootstrap small samples will be computed with the given accuracy Accuracy Intervals Probability 0.1 (variance 0.1; variance + 0.1) (variance 0.01; variance ) (variance 0.001; variance ) (variance ; variance ) Source: own calculations Variance estimation for n = 5 The expected value of the unbiased variance estimator equals 2 and its standard deviation (according to formula (8) after correction of the factor 5/4). The variance estimator for the original sample has only 26 different values. Figure 4 presents the distribution of unbiased variance estimator calculated using the exact bootstrap method DBV. Normal limit distribution DV was included for comparison. The differences between the distributions are notable and we should state that in the case of a variance estimator for a small sample limit distribution should not be used. Table 8 presents the probability that the variance calculated based on 1,000 bootstrap samples will be computed with the given accuracy. The probabilities were calculated
16 1076 J. Kisielinska using limit distribution N(2, ). As was the case with the mean, only for the 0.1 accuracy is the probability close to 1. In all other cases the use of the exact bootstrap method is recommended, since the probability of exact variance estimation using the basic bootstrap method is small. 6 Conclusion In the article the exact bootstrap method was discussed. This method can be used to estimate the parameters of the estimators of random variables with an unknown distribution. The method allows for determining the estimation of an arbitrary parameter, the error of this estimation, the distribution estimator or confidence intervals. Traditionally, this problem is solved using the bootstrap method, which consists on resampling of the original random sample. Random sampling is used in statistics if the entire population can not be examined or the study would be too problematic. First of all, the original sample is finite, and secondly, its distribution is known it is the empirical distribution. Instead of resampling, one can generate an entire space of resamples and determinate all the realizations of the statistic which is the estimator of the unknown parameter. The method was used to estimate mean and variance. It was shown that the expected values of the estimators are equal to the mean and variance of the sample. The method, therefore, does not introduce bias resulting from resampling as it may occur in classical bootstrap. The estimators distributions calculated using the exact bootstrap method was compared with the limit distributions. The similarity between the distributions indicates that there is a possibility of approximation of the exact distribution by the limit distribution if the sample is not too small. In order to assess the effectiveness of the traditional bootstrap method, limit distribution of the mean calculated from B realizations of the estimator of a given parameter (every realization is calculated based on a single bootstrap sample) was used. This distribution allows for calculating the probability of obtaining the assumed accuracy of estimation. Although the number of bootstrap resamples was large (B = 1,000), both the mean and variance probabilities rapidly decreased as required accuracy increased. This proves that it is worth using the exact method, which guarantees that there will not additional bias at the resampling stage. The conducted simulation experiments have revealed that in the case of small samples (n 15 and k = n) the time necessary to generate the entire space of resamples is short (<10 s using an average-quality computer). This means that there is no need for resampling. For larger samples, the time is much longer and requires several hours of computation (for n = 20 and k = n the calculations lasted 5 h and 30 min). The increase in size of the sample causes significant lengthening of the computation time. On the other hand, we should remember that the bootstrap method is used for small samples for larger samples the limit distribution of estimators may be used. However, considering the progress in computer technology, the exact bootstrap method will also be used for lager samples in the future.
17 The exact bootstrap method shown on the example 1077 What are the consequences of the possibility of generating complete information contained in the sample as presented in the article? The fundamental issue is the much greater flexibility in constructing estimators as there is no need to make the assumption of the distribution form so that determining the distribution of the estimator would be possible. One should attempt to make the estimator unbiased, consistent and effective. The exact bootstrap method may also be useful in this matter. Drawing a random sample may be seen as the replacement of a continuous random variable with an unknown distribution by a discrete variable with known distribution the bootstrap distribution. Transformations of discrete variables are easier than transformations of continuous variables since the distribution of discrete variable statistics can be calculated automatically. In reality, due to the finite accuracy of all measurements we can only observe discrete variables. We may suppose that with the increasing power of computers their role in statistics will also increase. Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited. References Booth JG, Sarkar S (1998) Monte Carlo approximation of bootstrap variances. Am Stat 4(52): Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 1(7):1 26 Efron B (1987) Better bootstrap confidence intervals (with discussion). J Am Stat Assoc 397(82): Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman & Hall, London Feller W (1950) An introduction to probability theory and its application. Wiley, New York, London, Sydney Fisher NI, Hall P (1991) Bootstrap algorithms for small samples. J Stat Plan Inference 27: SmirnowNW, Dunin-BarkowskiIW (1973) Kursrachunku prawdopodobieństwa i statystyki matematycznej dla zastosowań technicznych. Państwowe Wydawnictwo Naukowe, Warsaw
The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap
Patrick Breheny December 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/21 The empirical distribution function Suppose X F, where F (x) = Pr(X x) is a distribution function, and we wish to estimate
More informationThe Nonparametric Bootstrap
The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use
More informationCharacterizing Forecast Uncertainty Prediction Intervals. The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ
Characterizing Forecast Uncertainty Prediction Intervals The estimated AR (and VAR) models generate point forecasts of y t+s, y ˆ t + s, t. Under our assumptions the point forecasts are asymtotically unbiased
More informationBootstrap, Jackknife and other resampling methods
Bootstrap, Jackknife and other resampling methods Part III: Parametric Bootstrap Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland dahyot@mee.tcd.ie 2005 R. Dahyot (TCD)
More informationA Signed-Rank Test Based on the Score Function
Applied Mathematical Sciences, Vol. 10, 2016, no. 51, 2517-2527 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2016.66189 A Signed-Rank Test Based on the Score Function Hyo-Il Park Department
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1
MA 575 Linear Models: Cedric E. Ginestet, Boston University Bootstrap for Regression Week 9, Lecture 1 1 The General Bootstrap This is a computer-intensive resampling algorithm for estimating the empirical
More information11. Bootstrap Methods
11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods
More informationConfidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods
Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationResampling and the Bootstrap
Resampling and the Bootstrap Axel Benner Biostatistics, German Cancer Research Center INF 280, D-69120 Heidelberg benner@dkfz.de Resampling and the Bootstrap 2 Topics Estimation and Statistical Testing
More informationSTAT Section 2.1: Basic Inference. Basic Definitions
STAT 518 --- Section 2.1: Basic Inference Basic Definitions Population: The collection of all the individuals of interest. This collection may be or even. Sample: A collection of elements of the population.
More informationBootstrapping, Randomization, 2B-PLS
Bootstrapping, Randomization, 2B-PLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,
More informationSTAT440/840: Statistical Computing
First Prev Next Last STAT440/840: Statistical Computing Paul Marriott pmarriott@math.uwaterloo.ca MC 6096 February 2, 2005 Page 1 of 41 First Prev Next Last Page 2 of 41 Chapter 3: Data resampling: the
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationPermutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods
Permutation Tests Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods The Two-Sample Problem We observe two independent random samples: F z = z 1, z 2,, z n independently of
More informationConstructing Prediction Intervals for Random Forests
Senior Thesis in Mathematics Constructing Prediction Intervals for Random Forests Author: Benjamin Lu Advisor: Dr. Jo Hardin Submitted to Pomona College in Partial Fulfillment of the Degree of Bachelor
More informationAnalytical Bootstrap Methods for Censored Data
JOURNAL OF APPLIED MATHEMATICS AND DECISION SCIENCES, 6(2, 129 141 Copyright c 2002, Lawrence Erlbaum Associates, Inc. Analytical Bootstrap Methods for Censored Data ALAN D. HUTSON Division of Biostatistics,
More information4 Resampling Methods: The Bootstrap
4 Resampling Methods: The Bootstrap Situation: Let x 1, x 2,..., x n be a SRS of size n taken from a distribution that is unknown. Let θ be a parameter of interest associated with this distribution and
More informationBootstrapping Dependent Data in Ecology
Bootstrapping Dependent Data in Ecology Mark L Taper Department of Ecology 310 Lewis Hall Montana State University/ Bozeman Bozeman, MT 59717 < taper@ rapid.msu.montana.edu> Introduction I have two goals
More informationON USING BOOTSTRAP APPROACH FOR UNCERTAINTY ESTIMATION
The First International Proficiency Testing Conference Sinaia, România 11 th 13 th October, 2007 ON USING BOOTSTRAP APPROACH FOR UNCERTAINTY ESTIMATION Grigore Albeanu University of Oradea, UNESCO Chair
More informationBootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.
Bootstrap G. Jogesh Babu Penn State University http://www.stat.psu.edu/ babu Director of Center for Astrostatistics http://astrostatistics.psu.edu Outline 1 Motivation 2 Simple statistical problem 3 Resampling
More informationQuantitative Empirical Methods Exam
Quantitative Empirical Methods Exam Yale Department of Political Science, August 2016 You have seven hours to complete the exam. This exam consists of three parts. Back up your assertions with mathematics
More informationBetter Bootstrap Confidence Intervals
by Bradley Efron University of Washington, Department of Statistics April 12, 2012 An example Suppose we wish to make inference on some parameter θ T (F ) (e.g. θ = E F X ), based on data We might suppose
More informationAN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY
Econometrics Working Paper EWP0401 ISSN 1485-6441 Department of Economics AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Lauren Bin Dong & David E. A. Giles Department of Economics, University of Victoria
More informationThe assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values
Statistical Consulting Topics The Bootstrap... The bootstrap is a computer-based method for assigning measures of accuracy to statistical estimates. (Efron and Tibshrani, 1998.) What do we do when our
More information9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.
Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences
More informationMonte Carlo Simulations
Monte Carlo Simulations What are Monte Carlo Simulations and why ones them? Pseudo Random Number generators Creating a realization of a general PDF The Bootstrap approach A real life example: LOFAR simulations
More informationSupporting Information for Estimating restricted mean. treatment effects with stacked survival models
Supporting Information for Estimating restricted mean treatment effects with stacked survival models Andrew Wey, David Vock, John Connett, and Kyle Rudser Section 1 presents several extensions to the simulation
More informationKevin Ewans Shell International Exploration and Production
Uncertainties In Extreme Wave Height Estimates For Hurricane Dominated Regions Philip Jonathan Shell Research Limited Kevin Ewans Shell International Exploration and Production Overview Background Motivating
More informationBootstrap inference for the finite population total under complex sampling designs
Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.
More informationOne-Sample Numerical Data
One-Sample Numerical Data quantiles, boxplot, histogram, bootstrap confidence intervals, goodness-of-fit tests University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html
More informationBootstrap Confidence Intervals for the Correlation Coefficient
Bootstrap Confidence Intervals for the Correlation Coefficient Homer S. White 12 November, 1993 1 Introduction We will discuss the general principles of the bootstrap, and describe two applications of
More informationA nonparametric two-sample wald test of equality of variances
University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David
More informationA Bias Correction for the Minimum Error Rate in Cross-validation
A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.
More informationA better way to bootstrap pairs
A better way to bootstrap pairs Emmanuel Flachaire GREQAM - Université de la Méditerranée CORE - Université Catholique de Louvain April 999 Abstract In this paper we are interested in heteroskedastic regression
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationTesting Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy
This article was downloaded by: [Ferdowsi University] On: 16 April 212, At: 4:53 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer
More informationNonparametric Methods II
Nonparametric Methods II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 PART 3: Statistical Inference by
More informationPreliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference
1 / 171 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2013 2 / 171 Unpaid advertisement
More informationProbability and Statistics
Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT
More informationPreliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference
1 / 172 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2014 2 / 172 Unpaid advertisement
More informationResearch Article A Nonparametric Two-Sample Wald Test of Equality of Variances
Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner
More informationBootstrap Method for Dependent Data Structure and Measure of Statistical Precision
Journal of Mathematics and Statistics 6 (): 84-88, 00 ISSN 549-3644 00 Science Publications ootstrap Method for Dependent Data Structure and Measure of Statistical Precision T.O. Olatayo, G.N. Amahia and
More informationNEW APPROXIMATE INFERENTIAL METHODS FOR THE RELIABILITY PARAMETER IN A STRESS-STRENGTH MODEL: THE NORMAL CASE
Communications in Statistics-Theory and Methods 33 (4) 1715-1731 NEW APPROXIMATE INFERENTIAL METODS FOR TE RELIABILITY PARAMETER IN A STRESS-STRENGT MODEL: TE NORMAL CASE uizhen Guo and K. Krishnamoorthy
More informationBiased Urn Theory. Agner Fog October 4, 2007
Biased Urn Theory Agner Fog October 4, 2007 1 Introduction Two different probability distributions are both known in the literature as the noncentral hypergeometric distribution. These two distributions
More informationStatistics. Lecture 4 August 9, 2000 Frank Porter Caltech. 1. The Fundamentals; Point Estimation. 2. Maximum Likelihood, Least Squares and All That
Statistics Lecture 4 August 9, 2000 Frank Porter Caltech The plan for these lectures: 1. The Fundamentals; Point Estimation 2. Maximum Likelihood, Least Squares and All That 3. What is a Confidence Interval?
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation
More informationBootstrapping Australian inbound tourism
19th International Congress on Modelling and Simulation, Perth, Australia, 12 16 December 2011 http://mssanz.org.au/modsim2011 Bootstrapping Australian inbound tourism Y.H. Cheunga and G. Yapa a School
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 28 A Two-Step Multiple Comparison Procedure for a Large Number of Tests and Multiple Treatments Hongmei Jiang Rebecca
More informationUniversity of California San Diego and Stanford University and
First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford
More informationBootstrapping the Confidence Intervals of R 2 MAD for Samples from Contaminated Standard Logistic Distribution
Pertanika J. Sci. & Technol. 18 (1): 209 221 (2010) ISSN: 0128-7680 Universiti Putra Malaysia Press Bootstrapping the Confidence Intervals of R 2 MAD for Samples from Contaminated Standard Logistic Distribution
More informationFinite Population Correction Methods
Finite Population Correction Methods Moses Obiri May 5, 2017 Contents 1 Introduction 1 2 Normal-based Confidence Interval 2 3 Bootstrap Confidence Interval 3 4 Finite Population Bootstrap Sampling 5 4.1
More informationarxiv: v1 [stat.co] 26 May 2009
MAXIMUM LIKELIHOOD ESTIMATION FOR MARKOV CHAINS arxiv:0905.4131v1 [stat.co] 6 May 009 IULIANA TEODORESCU Abstract. A new approach for optimal estimation of Markov chains with sparse transition matrices
More informationFin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals
Fin285a:Computer Simulations and Risk Assessment Section 2.3.2:Hypothesis testing, and Confidence Intervals Overview Hypothesis testing terms Testing a die Testing issues Estimating means Confidence intervals
More informationChapter 2: Resampling Maarten Jansen
Chapter 2: Resampling Maarten Jansen Randomization tests Randomized experiment random assignment of sample subjects to groups Example: medical experiment with control group n 1 subjects for true medicine,
More informationinterval forecasting
Interval Forecasting Based on Chapter 7 of the Time Series Forecasting by Chatfield Econometric Forecasting, January 2008 Outline 1 2 3 4 5 Terminology Interval Forecasts Density Forecast Fan Chart Most
More informationEstimation of AUC from 0 to Infinity in Serial Sacrifice Designs
Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs Martin J. Wolfsegger Department of Biostatistics, Baxter AG, Vienna, Austria Thomas Jaki Department of Statistics, University of South Carolina,
More informationTESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST
Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department
More informationA Simulation Comparison Study for Estimating the Process Capability Index C pm with Asymmetric Tolerances
Available online at ijims.ms.tku.edu.tw/list.asp International Journal of Information and Management Sciences 20 (2009), 243-253 A Simulation Comparison Study for Estimating the Process Capability Index
More informationSimulating Uniform- and Triangular- Based Double Power Method Distributions
Journal of Statistical and Econometric Methods, vol.6, no.1, 2017, 1-44 ISSN: 1792-6602 (print), 1792-6939 (online) Scienpress Ltd, 2017 Simulating Uniform- and Triangular- Based Double Power Method Distributions
More informationA Classroom Approach to Illustrate Transformation and Bootstrap Confidence Interval Techniques Using the Poisson Distribution
International Journal of Statistics and Probability; Vol. 6, No. 2; March 2017 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education A Classroom Approach to Illustrate Transformation
More informationBootstrap, Jackknife and other resampling methods
Bootstrap, Jackknife and other resampling methods Part VI: Cross-validation Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland dahyot@mee.tcd.ie 2005 R. Dahyot (TCD) 453 Modern
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationThe Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model
Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for
More informationAdvanced Statistics II: Non Parametric Tests
Advanced Statistics II: Non Parametric Tests Aurélien Garivier ParisTech February 27, 2011 Outline Fitting a distribution Rank Tests for the comparison of two samples Two unrelated samples: Mann-Whitney
More informationModel Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University
Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates
More informationHypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods
Hypothesis Testing with the Bootstrap Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Bootstrap Hypothesis Testing A bootstrap hypothesis test starts with a test statistic
More informationSome Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2
STA 248 H1S MIDTERM TEST February 26, 2008 SURNAME: SOLUTIONS GIVEN NAME: STUDENT NUMBER: INSTRUCTIONS: Time: 1 hour and 50 minutes Aids allowed: calculator Tables of the standard normal, t and chi-square
More informationConfidence Estimation Methods for Neural Networks: A Practical Comparison
, 6-8 000, Confidence Estimation Methods for : A Practical Comparison G. Papadopoulos, P.J. Edwards, A.F. Murray Department of Electronics and Electrical Engineering, University of Edinburgh Abstract.
More informationA Monte-Carlo study of asymptotically robust tests for correlation coefficients
Biometrika (1973), 6, 3, p. 661 551 Printed in Great Britain A Monte-Carlo study of asymptotically robust tests for correlation coefficients BY G. T. DUNCAN AND M. W. J. LAYAKD University of California,
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 12: Frequentist properties of estimators (v4) Ramesh Johari ramesh.johari@stanford.edu 1 / 39 Frequentist inference 2 / 39 Thinking like a frequentist Suppose that for some
More information1 Overview. Coefficients of. Correlation, Alienation and Determination. Hervé Abdi Lynne J. Williams
In Neil Salkind (Ed.), Encyclopedia of Research Design. Thousand Oaks, CA: Sage. 2010 Coefficients of Correlation, Alienation and Determination Hervé Abdi Lynne J. Williams 1 Overview The coefficient of
More informationWorking Paper No Maximum score type estimators
Warsaw School of Economics Institute of Econometrics Department of Applied Econometrics Department of Applied Econometrics Working Papers Warsaw School of Economics Al. iepodleglosci 64 02-554 Warszawa,
More informationNon-Parametric Dependent Data Bootstrap for Conditional Moment Models
Non-Parametric Dependent Data Bootstrap for Conditional Moment Models Bruce E. Hansen University of Wisconsin September 1999 Preliminary and Incomplete Abstract A new non-parametric bootstrap is introduced
More informationIntroductory Econometrics. Review of statistics (Part II: Inference)
Introductory Econometrics Review of statistics (Part II: Inference) Jun Ma School of Economics Renmin University of China October 1, 2018 1/16 Null and alternative hypotheses Usually, we have two competing
More informationBootstrap (Part 3) Christof Seiler. Stanford University, Spring 2016, Stats 205
Bootstrap (Part 3) Christof Seiler Stanford University, Spring 2016, Stats 205 Overview So far we used three different bootstraps: Nonparametric bootstrap on the rows (e.g. regression, PCA with random
More informationLifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with an Arrhenius rate relationship
Scholars' Mine Doctoral Dissertations Student Research & Creative Works Spring 01 Lifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with
More informationLabel switching and its solutions for frequentist mixture models
Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Label switching and its solutions for frequentist mixture
More informationITERATED BOOTSTRAP PREDICTION INTERVALS
Statistica Sinica 8(1998), 489-504 ITERATED BOOTSTRAP PREDICTION INTERVALS Majid Mojirsheibani Carleton University Abstract: In this article we study the effects of bootstrap iteration as a method to improve
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and
More informationInference via Kernel Smoothing of Bootstrap P Values
Queen s Economics Department Working Paper No. 1054 Inference via Kernel Smoothing of Bootstrap P Values Jeff Racine McMaster University James G. MacKinnon Queen s University Department of Economics Queen
More informationSection 8.1: Interval Estimation
Section 8.1: Interval Estimation Discrete-Event Simulation: A First Course c 2006 Pearson Ed., Inc. 0-13-142917-5 Discrete-Event Simulation: A First Course Section 8.1: Interval Estimation 1/ 35 Section
More informationBootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions
JKAU: Sci., Vol. 21 No. 2, pp: 197-212 (2009 A.D. / 1430 A.H.); DOI: 10.4197 / Sci. 21-2.2 Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions Ali Hussein Al-Marshadi
More informationBootstrap Confidence Intervals
Bootstrap Confidence Intervals Patrick Breheny September 18 Patrick Breheny STA 621: Nonparametric Statistics 1/22 Introduction Bootstrap confidence intervals So far, we have discussed the idea behind
More informationContents. Acknowledgments. xix
Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables
More informationInteraction effects for continuous predictors in regression modeling
Interaction effects for continuous predictors in regression modeling Testing for interactions The linear regression model is undoubtedly the most commonly-used statistical model, and has the advantage
More informationAnalysis of incomplete data in presence of competing risks
Journal of Statistical Planning and Inference 87 (2000) 221 239 www.elsevier.com/locate/jspi Analysis of incomplete data in presence of competing risks Debasis Kundu a;, Sankarshan Basu b a Department
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationConfidence intervals for kernel density estimation
Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting
More informationHarvard University. Harvard University Biostatistics Working Paper Series
Harvard University Harvard University Biostatistics Working Paper Series Year 2008 Paper 94 The Highest Confidence Density Region and Its Usage for Inferences about the Survival Function with Censored
More informationNonparametric Test on Process Capability
Nonparametric Test on Process Capability Stefano Bonnini Abstract The study of process capability is very important in designing a new product or service and in the definition of purchase agreements. In
More information"ZERO-POINT" IN THE EVALUATION OF MARTENS HARDNESS UNCERTAINTY
"ZERO-POINT" IN THE EVALUATION OF MARTENS HARDNESS UNCERTAINTY Professor Giulio Barbato, PhD Student Gabriele Brondino, Researcher Maurizio Galetto, Professor Grazia Vicario Politecnico di Torino Abstract
More informationThe comparative studies on reliability for Rayleigh models
Journal of the Korean Data & Information Science Society 018, 9, 533 545 http://dx.doi.org/10.7465/jkdi.018.9..533 한국데이터정보과학회지 The comparative studies on reliability for Rayleigh models Ji Eun Oh 1 Joong
More informationThis does not cover everything on the final. Look at the posted practice problems for other topics.
Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry
More information2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008
MIT OpenCourseWare http://ocw.mit.edu 2.830J / 6.780J / ESD.63J Control of Processes (SMA 6303) Spring 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationSTAT 830 Non-parametric Inference Basics
STAT 830 Non-parametric Inference Basics Richard Lockhart Simon Fraser University STAT 801=830 Fall 2012 Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference Basics STAT 801=830
More informationNon-parametric Inference and Resampling
Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing
More informationA COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky
A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),
More information