Bootstrapping, Permutations, and Monte Carlo Testing

Size: px

Start display at page:

Download "Bootstrapping, Permutations, and Monte Carlo Testing"

Reynard Berry
6 years ago
Views:

The sampling design used is adaptive cluster sampling which provides an estimator ( τˆ hh ) of the total abundance and its

1 Bootstrapping, Permutations, and Monte Carlo Testing Problem: Population of interest is extremely rare spatially and you are interested in using a 95% CI to estimate total abundance. The sampling design used is adaptive cluster sampling which provides an estimator ( τˆ hh ) of the total abundance and its standard error. The problem is that we can t use the usual 95%CI, ˆ τ ± t( n 1, α / 2) SE( ˆ) τ, because the estimator does not have a t- distribution: ALS5932/FOR6934 Fall Mary C. Christman

2 Problem: You are interested in testing hypotheses about treatment effects (2 factors) in an experiment but your data are non-standard. level.activity=1, level.alteration=1 level.activity=2, level.alteration= level.activity=1, level.alteration=2 level.activity=2, level.alteration= The assumption of normality will clearly fail here and even an appeal to the central limit theorem with respect to the means may be problematic. Plus, one treatment has only one value for the response variable. How does one test hypotheses about the main effects and interaction? ALS5932/FOR6934 Fall Mary C. Christman

3 Problem: An entomologist is interested in testing a hypothesis about the spatial distribution of a flock of butterflies within a confined space. Specifically he hypothesizes that this new species will cluster into groups at night. Several nights are randomly selected for observation; on those nights the location of each butterfly on the walls, floors, and ceiling of the space is recorded. H 0 : the animals distribute themselves at random over the walls and ceiling. H A : the animals cluster on one or more walls or ceiling. Dataset: Entries are the number of butterflies observed on that wall that day. WALL DAY B F L R RF Do not want to reduce the data to simple presence/absence information since you would lose too much about the clustering behavior. Also, a χ 2 test would not be appropriate here since it isn t testing clustering but instead if day and wall are independent (couldn t do one anyway!). ALS5932/FOR6934 Fall Mary C. Christman

4 Each of the above examples requires a method other than the traditional tests that have been used classically. We ll consider the simplest versions of three methods. BOOTSTRAPPING Basic Idea: You have a dataset that is a sample collected from an unknown probability distribution (population) according to some probabilistic sampling design. This sample, if collected correctly, should have the properties of the distribution from which it was sampled: a histogram of the data should mimic the shape of the distribution from which it was taken, the moments (mean, variance, skew, kurtosis, median, etc) of the sample should be close to the true values of the population moments, etc. If all of these are reasonable assumptions and the sample size is sufficiently large, then the sample is a mini-universe for your true population. Hence, you could repeatedly sample from the original dataset in such a way as to mimic the original experiment and as a result obtain information you cannot obtain by the more traditional methods. The information of interest usually is the distribution of the sample statistic θˆ that you are using to estimate some population quantity θ = f x, x,..., x ). ( 1 2 n Method Assuming Original Sample Was Obtained By Probabilistic Sampling With Replacement: 1) Take a bootstrap sample from the original dataset using the same sampling technique as was done to get the original sample. The bootstrap sample is the same size as the original dataset. e.g. assume simple random sampling with replacement original dataset = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12} bootstrap sample ={12, 2, 3, 5, 1, 2, 9, 10, 8, 2, 3, 4} ALS5932/FOR6934 Fall Mary C. Christman

5 2) Calculate the sample quantity of interest (any function of the data such as mean, sd, median, Prob(X<4), t-statistic, etc.). Call the bootstrap sample quantity θˆb where b identifies that it is from the bootstrap sample. 3) Store the bootstrap estimate θˆb. 4) Repeat steps 2-3 many times. Call the total number of times you repeat the bootstrap B. That is b = 1, 2,, B. Should now have B estimates of θ plus the value calculated from the original sample, θˆ. These B estimates should have the same distributional properties that the estimator θˆ has, i.e. the shape of the frequency distribution, the mean, the variance, etc. should mimic the true unknown distribution of the estimator θˆ. We can use this information to do many things including: a) checking for the size or direction of bias in an estimator, b) estimating the standard deviation of an estimator, c) calculating confidence intervals directly from the probability distribution (and thus avoiding assumptions about shape), etc. Example: Same simple example as before. Suppose it is of interest to estimate the standard deviation of the estimator of Pr(X<4) where our estimator will be the observed proportion of the data less than 4. Now, from the original sample, ˆ θ = 3/12 = We will do simple non-parametric bootstrapping of random samples of size n=12 with replacement. For each bootstrap sample, we ll calculate the observed proportion in the bootstrap sample and store it as and we ll repeat this B = 1000 times. θˆb R code: data1 <- c(1:12) B < theta <- matrix(0, nrow=b, ncol=1) for (sim in 1:B) theta[sim,1] <- sum(sample(data1, 12, replace=t)<4)/12 hist(theta) quantile(theta, c(0.025, 0.975)) mean(theta) var(theta) ALS5932/FOR6934 Fall Mary C. Christman

6 Results: > quantile(theta, c(0.025, 0.975)) 2.5% 97.5% > mean(theta) [1] > var(theta) [,1] [1,] We actually didn t need the bootstrap here to get estimates of the variance or a confidence interval. Reason is that ˆ # meeting the condition θ = where the # sampled numerator is from the binomial distribution! So, how well did we do at ALS5932/FOR6934 Fall Mary C. Christman

7 getting the correct estimates? The mean of θˆ is π the true proportion and the π ( 1 π ) variance is. n The bootstrap estimator of π is the mean of the bootstrap estimates B ˆ 1 ˆ θ b = θb = B b= 1 ˆ ˆ 2 ( θ b θ b ) The bootstrap estimate of variance of θˆ is ˆ b= 1 V ( ˆ) θ = = n 1 If we instead used the parametric approach for the binomial distribution, the estimate of π is θˆ = 0.25 and the estimate of the variance of θˆ is ˆ(1 θ ˆ) θ = = n 12 So, bootstrapping is reasonable in that it behaves as expected when the distribution is unknown but does reproduce the correct behavior when the distribution is known (and bootstrapping isn t needed). Example: the adaptive cluster sampling of the rare population. Here, the population is finite it is composed of the counts in the 400 squares. And sampling of clusters was done without replacement. So, how do we bootstrap here? Say a sample of n = 20 was taken using adaptive cluster sampling (ACS); ACS will result in an average sample size bigger than 20 units. Can t sample from this using the usual bootstrapping methods because of the random sample size and the without replacement sampling. Instead we created a transformed sample which is the sum of the counts within each sampled network. Now, we have n = 20 sampled networks but still have the problem that we can t sample with replacement from these since we didn t sample without replacement from the original population. So, one method is to instead create an artificial population by making copies of ALS5932/FOR6934 Fall Mary C. Christman B

8 the sampled units (in this case, the networks) enough times to fill the grid. Keep in mind that the number of networks is not the number of original grid squares sampled. So filling the grid requires knowing how many original grid cells were sampled. We then bootstrap sample from this pseudopopulation B times. Of interest in the case of the rare population was to estimate a 90% confidence interval for the true abundance. The simplest approach is the percentile bootstrap: the lower bound of a 100(1-α)% CI is the (α/2) th percentile in the bootstrap set and the upper bound is the (1 α /2) th percentile. ALS5932/FOR6934 Fall Mary C. Christman

9 PERMUTATION TESTS Basic Idea: In some experiments a test of treatment effects may be of interest where the null hypothesis is that the different populations are actually from the same population. Example: ANOVA where H 0 is that the treatment means are all equal. The assumptions that must be true are that each treatment must have the same variance and the same shape. If in fact, the null hypothesis is true, then the observations are not distinguishable by treatment but are instead from the same distribution (one shape, mean and variance) and just happen to be randomly associated with a treatment. Original dataset collected Sample ID Pop 1 Pop Mean Permuted Data Sample ID Pop 1 Pop Mean Permutation tests are based on this idea. If H 0 is true then any set of values are just random assignments among treatments. ALS5932/FOR6934 Fall Mary C. Christman

10 Method Under The Assumptions That The Distributions Are Identical Under H 0 And Sampling Is Random And With Replacement And Treatment Assignment Is Random: 1) Calculate the test statistic for the hypotheses for the original observed arrangement of data. This could be an F-stat or MS or some other statistic. Call it κ 0. 2) Now, randomly rearrange the data among the treatments (shuffle or permute the data) and calculate the test statistic for the new arrangement. Call it. κ p 3) Store the permutation estimate κ p. 4) Repeat steps 2-3 many times. Call the total number of times you repeat the permutations P. That is p = 1, 2,, P. 5) Compare κ 0 to the distribution of the permutation estimates κ p. The p- value for the test is #( κ p > κ p ) p value =. P Example: testing the effects of activity and alteration on counts. Assuming normality and constant variance: Analysis of Variance Source DF Sum of Squares Mean Square F Ratio Model Error Prob > F C. Total Effect Tests Source Nparm DF Sum of Squares F Ratio Prob > F level.activity level.alteration level.activitylevel.alteration ALS5932/FOR6934 Fall Mary C. Christman

11 Least Squares Means Tables level.activity Level Least Sq Mean Std Error Mean level.alteration Level Least Sq Mean Std Error Mean level.activitylevel.alteration Level Least Sq Mean Std Error 1, , , , ) Test interaction To test for interaction between the two factors we permute all observations over all possible arrangements of the values of the two factors and then do the calculations. R code: # Testing Interaction numpermutes < Fstat.interaction <- matrix(0, nrow=numpermutes, ncol = 1) #original data results # bird.data[,1] is the response variable # bird.data[,2] is the activity level # bird.data[,3] is the alteration level temp <- lm(bird.data[,1]~bird.data[,3]+ bird.data[,2]+ bird.data[,3]bird.data[,2]) Fstat.interaction <- anova(temp)$f[[3]] ALS5932/FOR6934 Fall Mary C. Christman

12 #permutation results for (i in 2:numpermutes) { permute.birds <- sample(bird.data[,1], 152, replace=f) permuted.activity <- cbind(permute.birds, bird.data[,2], bird.data[,3]) temp <- lm(permuted.activity[,1]~permuted.activity[,2]+ permuted.activity[,3]+ permuted.activity[,2]permuted.activity[,3]) Fstat.interaction[i] <- anova(temp)$f[[3]] } pvalue.interaction <- sum(fstat.interaction[2:numpermutes] > Fstat.interaction[1])/numpermutes pvalue.interaction Fstat.interaction[1] Results: pvalue.interaction : Fstat.interaction : ) Test activity To test for the effect of activity we permute all observations for activity within each level of alteration and then do the calculations. R code: # Testing Activity numpermutes < Fstat.activity <- matrix(0, nrow=numpermutes, ncol = 1) #original data results temp <- lm(bird.data[,1]~bird.data[,3]+bird.data[,2]) Fstat.activity <- anova(temp)$f[[2]] #permutation results for (i in 2:numpermutes) { permuted.activity <- cbind(bird.data[,1], bird.data[,3], sample(bird.data[,2], 152, replace=f)) ALS5932/FOR6934 Fall Mary C. Christman

13 temp <- lm(permuted.activity[,1]~permuted.activity[,2]+ permuted.activity[,3]) Fstat.activity[i] <- anova(temp)$f[[2]] } pvalue.activity <- sum(fstat.activity[2:numpermutes] > Fstat.activity[1])/numpermutes pvalue.activity Fstat.activity[1] Results: pvalue.activity : Fstat.activity : ) Test alterations To test for the effect of alteration we permute all observations for alteration within each level of activity and then do the calculations. R code: very similar except that we permute column 3 instead of column 2 Results: pvalue.alteration : Fstat.alteration : How does one test for pairwise differences among means? Not so obvious. If the samples within the main effects are sufficiently large, might bootstrap to get the SEs of the means. Could also, do the permutation test of main effects for each pair of means by removing the treatment levels not being compared. In both cases, be sure to adjust for the multiple testing. ALS5932/FOR6934 Fall Mary C. Christman

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Permutation Tests Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods The Two-Sample Problem We observe two independent random samples: F z = z 1, z 2,, z n independently of