BOOTSTRAPPING WITH MODELS FOR COUNT DATA

Journal of Biopharmaceutical Statistics, 21: 1164 1176, 2011 Copyright Taylor & Francis Group, LLC ISSN: 1054-3406 print/1520-5711 online DOI: 10.1080/10543406.2011.607748 BOOTSTRAPPING WITH MODELS FOR COUNT DATA Bryan F. J. Manly Western EcoSystems Technology, Inc., Laramie, Wyoming, USA Two methods of bootstrap resampling are discussed with log-linear models for count data. The first involves the resampling of observations and the second involves the resampling of Pearson residuals taking into account changes in the distribution of residuals associated with the expected values of counts. The use of both methods is illustrated on two data sets; one data set concerns the number of ear infections of swimmers related to whether they are frequent swimmers or not and three other variables, and the other data set concerns the number of visits to a doctor made in the last 2 weeks related to the age of subjects and 10 other variables. A third data set on the number of marine mammal interactions in different years and fishing areas is also used as an example. In this case only the second bootstrap method can be used because the nature of the data allows the bootstrap resampling of observations to produce sets of data that could not have occurred in practice. Simulation results indicate that the bootstrap results are slightly better than the results from a conventional analysis for the first data set, and much better than the results from a conventional analysis for the second data set, but a conventional analysis works well for the third data set while there are problems with bootstrap analyses. Key Words: Bootstrap resampling; Computer-intensive methods; Generalized linear models; Log-linear models. 1. INTRODUCTION Count data often occur in practice and the need to model the data through a generalized linear model is common. For example, suppose that Y i, the annual number of deaths from a disease in part of a country, is recorded for a number of years, together with the estimated population size in each year and the values of certain variables X 1 X 2 X p that are thought to possibly be related to the incidence of the disease. Then there might be interest in fitting a model of the form E Y i = N i exp 0 + 1 X i1 + + p X ip (1) to the data, where E Y i is the expected number of deaths in year i, N i is the population size in year i, and X ij is the value for the jth variable in year i. This is an example of a generalized linear model (McCullagh and Nelder, 1989) where the expected count for a year is proportional to the population size multiplied by Received March 30, 2011; Accepted June 21, 2011 Address correspondence to Bryan F. J. Manly, Western EcoSystems Technology, Inc., 200b South 2nd Street, Laramie, WY 82070, USA; E-mail: bmanly@west-inc.com 1164

BOOTSTRAPPING WITH MODELS FOR COUNT DATA 1165 an exponential function of the predictor variables. Models of this type are also called log-linear models, although for some sets of data there is no equivalent to the population size so that the expected value of the count is just an exponential function of the predictor variables. A common way to fit a model of the form of Eq. (1) involves assuming that the dependent variable has a Poisson distribution, in which case the model can be fitted by maximum likelihood using many statistical packages. Alternatively, as often happens, there may be more or less variation in the Y variable than expected from the Poisson distribution, for which the variance of the observation Y i is equal to its expected value from Eq. (1). In that case a model allowing for this underdispersion or overdispersion can be fitted by assuming that the variance of Y i is the expected value multiplied by a constant using quasi-maximum likelihood (McCullagh and Nelder, 1989). Other possibilities involve assuming a particular distribution for the errors in the model, such as a negative binomial distribution, a zero-inflated Poisson distribution, or a zero-inflated negative binomial distribution as was done by Horton et al. (2007) when analyzing the data on alcohol consumption from a randomized clinical trial, and fitting the model using maximum likelihood. In this article it is suggested that rather than search for an appropriate model for the error distribution in a log-linear model it is simpler to fit the Poisson model with an allowance for underdispersion or overdispersion using quasimaximum likelihood and use bootstrap resampling to allow for a possibly non- Poisson error distribution. One bootstrap method that can then be used involves the resampling with replacement of the individual observation in the original data to get a bootstrap set of data that represents an alternative set of data that might have occurred instead of the observed data. This is appropriate when the observations are effectively in a random order with no fixed structure in the X variables. The second bootstrap method considered here is more complicated. It involves determining bootstrap sets of data by fixing the X variables at their observed values and determining new Y values by stratified resampling of the residuals from the original fitted model that takes into account how the distribution of the residuals may depend on the expected values of counts. With both methods of resampling many bootstrap sets of data are generated and analyzed just like the original data in order to determine standard errors and significance levels for the estimates from the real data, and confidence limits for true parameter values. For the remainder of this article the estimation of regression parameters using quasi-maximum likelihood with an allowance for more or less variation in the counts than expected from the Poisson distribution is referred to as the conventional analysis. As well as the estimates of parameters, it provides standard errors for those estimates, which can be used with t-tests to determine whether a regression estimate is significantly different from zero or some other hypothetical value. The first bootstrap method, which will be called resampling of cases, then provides alternative estimates of the regression standard errors and the generated bootstrap distribution of t-statistics can be used to assess whether an observed t-statistic based on a bootstrap standard error is significantly different from zero. The second bootstrap method, which is called resampling of residuals, also provides alternative estimates of the regression standard errors and again the generated bootstrap distribution of t-statistics can be used to assess whether an observed t-statistic based on a bootstrap standard error is significantly different from zero.

1166 MANLY The next section of this article describes three example sets of data. The two bootstrap methods are then explained and illustrated on these data sets, and finally some general conclusions are presented. Model selection questions are not addressed with these examples. Rather, the emphasis is on the properties of estimates of the parameters for an assumed model. 2. EXAMPLE SETS OF DATA The first example set of data concerns ear infections in swimmers and comes from the 1990 Pilot Surf/Health Study of the New South Wales Water Board in Australia. The results of a survey of 287 young swimmers are available (StatSci.Org., 2011). Each of the swimmers was asked how many ear infections they had experienced, which is the dependent count variable. The potential predictor variables recorded at the same time are: Swimmer, whether they were a frequent swimmer (1) or not (0); Beach, whether they usually swam at the beach (1) or not (0); Age, whether their age range was 15 to 19 (1), 20 to 24 (2), or 25 to 29 years (3); and Sex, whether they were female (0) or male (1). The question of interest was whether the frequency of ear infections is related to any of the other recorded variables. A summarized analysis provided with the data at the StatSci.org website suggests that the only important variables are Swimmers and Beach, with individuals tending to report fewer ear infections if they are frequent swimmers that usually swim at the beach. The second example set of data is much larger, with 5190 observations and 12 predictor variables. These data were used as an example in chapter 3 of the book by Cameron and Trivedi (1998) and concern the number of visits to a medical doctor in the 2 weeks before individuals were interviewed for a 1977 1978 Australian Health Survey. The dependent variable is the number of doctor visits and the potential predictor variables are: Sex, 1 for female and 0 for male; Age, in years divided by 100; Age2, Age squared; Income, in Australian dollars divided by 1000; LevyPlus, 1 if the person was covered by private health insurance or otherwise 0; FreePoor, 1 if covered by the government because of low income or otherwise 0; FreeRepa, 1 if covered by the government because of old age or a disability otherwise 0; Illness, the number of illnesses in the previous 2 weeks with a maximum of 5; ActDays, number of days of reduced activity in the previous 2 weeks because of illness or injury; HScore, a general Goldberg health questionnaire score with a high score for poor health; ChCond1, 1 with a chronic condition not limiting activity or otherwise 0; and ChCond2, 1 with a chronic condition limiting activity or otherwise 0. The website www.econ.ucdavis.edu/faculty/cameron/racd/racddata.html provides the data. The interest in this case is in how the number of doctor visits is related to the 12 predictor variables, with an initial analysis indicating a significant relationship for six of these variables. The third set of data comes from a nonmedical area, and was chosen as an example of a set of data with a high proportion of zero counts. It concerns fisheries interactions with marine mammals for a long-line fishery in New Zealand. The dependent variable is the number of marine mammals observed to be caught in fishing nets by government observers (the fisheries bycatch) on fishing trips in different areas and different years. There are 34 observed counts, with the number of fishing days in an area and year varying from 1 to 192. The potential predictor

BOOTSTRAPPING WITH MODELS FOR COUNT DATA 1167 variables are: FMA, the Fisheries Management Area, of which there were five; and Year, the fishing year from 1997 to 2003. The data are available in Table 3.16 of Manly (2009) but for this example the results for different target species of fish have been combined. The interest with this example is whether the marine mammal bycatch rate per fishing day was particularly high in one or more of the fisheries management areas, or in one or more of the years. If so, the reasons for the high bycatch rates become of interest. 3. BOOTSTRAP RESAMPLING OF CASES Bootstrap resampling of cases involves producing a bootstrap set of data by resampling the observations in the real data, with replacement, to get new sets of data that represent alternative sets of data that might have occurred instead of the data actually observed. For example, with the first set of data described earlier there are results for 287 young swimmers on the number of ear infections that they reported and four variables that the ear infections might be related to. Producing a bootstrap sample involves randomly selecting the results for one of the 287 swimmers to provide the first observation, randomly selecting one of the 287 swimmers to provide the second observation, and so on until 287 selections have been made. The bootstrap sample is then expected to contain some of the swimmers from the original sample no times, some one time, some two times, and so on. Repeating the bootstrap sampling process many times yields many bootstrap samples that are assumed to represent alternative sample that might have occurred instead of the observed sample, and these bootstrap samples can be used to estimate the properties of the estimation process, such as the standard errors of the estimated regression parameters. 4. BOOTSTRAP RESAMPLING OF RESIDUALS An alternative to resampling the observations involves resampling the residuals from the fitted model for count data (Moulton and Zeger, 1991), with stratified sampling being used to allow for possible differences in the distribution of residuals with different expected counts (Davison and Hinkley, 1972), section 7.2). As described by Manly and Chotkowski (2006), this involves fitting the loglinear model of interest to the available data using the standard quasi-maximum likelihood approach with an allowance for overdispersion. The Pearson residuals are then calculated, with the residual for the ith observed count Y i being R i = Y i E Y i / E Y i (2) where E Y i is the expected value of Y i from the fitted model. The n residuals are then put in order based on the values for E Y i and divided into m groups with approximately n/m residuals in each group, with the first group containing residuals with the smallest values for E Y i and the last group containing the residuals with the largest values for E Y i. To generate a bootstrap value for the ith observation a residual is randomly selected from those in the group that includes the value of E Y i. Assume that this

1168 MANLY is R i. This is then set equal to the residual for this observation in the bootstrap set of data so that R i = Y i E Y i / E Y i where Y i is the bootstrap value for the count. Rearranging this equation then gives the bootstrap count to be Y i = E Y i + R i E Yi (3) To make this a count it is replaced by the maximum of zero and the integer part of Y i +0 5. Generating a count for all of the observations in the original data in this way results in a bootstrap set of data set of data with the values of the predictor variables exactly the same as for the original data and only the count values changed. A modification to this procedure is made for very low expected counts of 0.01 or less. For these a random number between zero and one is generated and the observed count is set at one if the random number is less than the expected count. This then gives the correct expected count. 5. RESULTS 5.1. The Ear Infection Data The conventional analysis of the ear infection data results in the estimated equation E Infections = exp 1 023 0 612 Swimmer 0 535 Beach 0 374 Age2 0 190 Age3 0 090 Sex (4) where E(Infections) is the expected number of infections and the other variables are as described earlier. This equation then makes the standard observation with E Infections = exp 1 023 = 2 78 being for a casual swimmer, usually not swimming at the beach, with an age in the range 15 to 19 years, and female. Equation (4) includes all of the variables in the available data set and here only the estimation of this full equation is considered, without the removal of nonsignificant variables. This is because the interest with this set of data is in the properties of estimates with and without the use of bootstrapping, rather than in variable selection. Equation (4) was obtained using the GenStat statistical package (VSN International, 2010) using quasi-maximum likelihood estimation for the standard log-linear model regression procedure for count data assuming a Poisson error model with an estimated overdispersion parameter (the residual deviance divided by the residual degrees of freedom) of 2.75 to allow for the variance of counts being larger than expected from Poisson distributions. With the stratified bootstrap sampling of residuals it is necessary to decide how many strata are needed to account for any changes in the distribution of residuals related to the expected values of the counts. Here the principle used is that

BOOTSTRAPPING WITH MODELS FOR COUNT DATA 1169 Figure 1 Standardized residuals plotted against the expected number of ear infections. (Color figure available online.) the number should be as small as possible while still taking into account how the residual distributions change with the expected values. A reasonable number in this respect can be determined from a plot of the observed residuals given by Eq. (4) against the values for the expected counts, as shown in Fig. 1. A logarithmic scale is used for the expected count in the figure because this often shows the changes in the distribution of residuals more clearly than the use of a linear scale. In the present example the residual distribution seems fairly constant except that the standardized residuals can be slightly more negative for expected counts above 1 than they can be for expected counts below 1. As the residual distribution is quite constant and there are 287 observations it was decided for the resampling of residuals to stratify the residuals into five strata based on their expected count, with about 57 residuals in each of the strata. Table 1 shows the estimated coefficients with their estimated standard errors, t-values (estimates divided by the standard errors) for testing whether the coefficients are significantly different from zero, and the significance of the t- values based on the t-distribution with 281 df from the conventional analysis. The table also shows the results from bootstrap resampling of the cases and bootstrap resampling of the residuals, with 5000 bootstrap samples used for both resampling methods. For the bootstrap analyses the table shows the bootstrap estimates of the standard errors of the coefficients, which are just the bootstrap standard deviations, and the significance of the t-values estimated as the proportion of bootstrap sets of data with absolute t-values as large as or larger than the observed absolute t-values. The bootstrap t-distribution that was used to assess the significance of the t-values was the values of (Bootstrap estimates Bootstrap mean)/(bootstrap estimated standard error) to allow for the possibility of the bootstrap means of estimates differing from the estimates from the original data to some extent. Comparing the bootstrap results with those from the conventional analysis it is seen that the bootstrap standard errors for the estimated coefficients are all larger than the standard errors from the conventional analysis. As a result of the larger standard deviations from bootstrapping, the t-values from the original data are also found to be less significant from the bootstrap analyses than from the conventional analysis, with the two bootstrap analyses giving fairly similar results in this respect.

1170 MANLY Table 1 Results from fitting Eq. (4) using the conventional log-linear method with an allowance for overdispersion and the results obtained from 5000 bootstrap resamples of cases (Bootstrap 1) and 5000 resamples of residuals (Bootstrap 2) Conventional analysis Bootstrap 1 Bootstrap 2 Est Std Err t-value t-dist Signif Std Err Signif Std Err Signif Constant 1 023 0.207 0.254 0.248 Swimmer 0 612 0.172 3 56 0.000 0.185 0.001 0.190 0.001 Beach 0 535 0.174 3 07 0.002 0.190 0.006 0.197 0.005 Age2 0 374 0.210 1 78 0.075 0.271 0.162 0.250 0.124 Age3 0 190 0.213 0 89 0.373 0.246 0.437 0.234 0.429 Sex 0 090 0.184 0 49 0.625 0.223 0.679 0.216 0.674 Note. Values shown are the regression coefficient estimates from the original data (Est), the estimated standard errors (Std Err) of the estimates from the conventional analysis and the estimated values from the two bootstrap resampling methods, and the significance (Signif) of the t-values based on the t-tables and the two bootstrap resampling methods. A simulation study was conducted to check these differences in the results from the different analyses. In total, 1000 sets of data similar to the observed data were simulated with the expected values of counts given by Eq. (4) and with the Poisson error inflated by the factor 2.75 using the zero inflated count model where an observed count is either the value zero with probability p or a random value from a Poisson distribution with probability 1 p (Cameron and Trivedi, 1998, section 4.7.2). It was found that the standard errors from the conventional analysis were on average about 5% too low, the standard errors from bootstrap resampling of cases were on average about 3% too high, and the standard errors from bootstrap resampling of residuals were on average about equal to the standard deviations of the 1000 simulated regression estimates. It seems, therefore, that for the ear infection data the conventional analysis may tend to slightly underestimate the standard errors of regression estimates, and bootstrap resampling of cases may tend to slightly overestimate the standard errors, but bootstrap resampling of residuals shows little bias. The results shown in Table 1 are therefore what is expected from the simulation. The simulated sets of data were also used to estimate the percentage of times that estimated regression coefficients would be within 95% confidence intervals. All the analyses performed well in that respect, with the observed coverage being 94.2% for the conventional analysis, 95.8% for bootstrap resampling of cases, and 94.2% for bootstrap resampling of residuals. 5.2. The Doctor Visits Data The conventional analysis of the doctor visits data results in the estimated equation E Visits = exp 2 224 + 0 157 Sex + 1 056 Age 0 849 Age2 0 205 Income + 0 123 LevyPlus 0 440 FreePoor

BOOTSTRAPPING WITH MODELS FOR COUNT DATA 1171 + 0 080 FreeRepa + 0 187 Illness + 0 127 ActDays + 0 030 HScore + 0 114 ChCond1 + 0 141 ChCond2 (5) where E(Visits) is the expected number of doctor visits and the other variables are as described earlier. Cameron and Trivedi (1998) discussed the estimation of an equation relating the number of doctor visits to all of the variables shown in Eq. (5) even though some of them are not significantly different from zero at the 5% level, and here all of the variables are considered because the properties of estimation methods are of interest rather than the selection of variables. The conventional analysis estimates that the variance of counts is what is expected from the Poisson distribution multiplied by 0.73. If anything there is therefore less variation than expected from the Poisson distribution. This is not a problem with the quasi-maximum likelihood method but does mean that models like the negative binomial that allow for more variation than the Poisson distribution are not appropriate for these data. For stratified resampling of standardized residuals it is necessary to decide on the number of strata to use for the 5190 residuals. Figure 2 shows that for these data there is considerable variation in the distribution of the residuals as the expected number of visits changes from about 0.07 to about 4.0. Given the relatively large number of observations and the large amount of variation in the residual distribution it was decided to use 20 strata, with about 260 residuals in each of these. Table 2 has the same format as Table 1 and shows the estimated coefficients with their estimated standard errors, t-values for testing whether the coefficients are significantly different from zero, the significance of the t-values based on the t- distribution, the means and standard deviations of 5000 bootstrap estimates of the coefficients, and the significance of the t-values estimated from the bootstrap data. Comparing the bootstrap results with those from the conventional analysis, it is seen that the bootstrap standard errors for the estimated coefficients are all larger than the standard errors from the conventional analysis, and the t-values are all less significant from the bootstrap analyses than from the conventional analysis, which is similar to the results that were obtained for the ear infection data. Figure 2 Standardized residuals plotted against the expected number of doctor visits. (Color figure available online.)

1172 MANLY Table 2 Results from fitting Eq. (5) using the conventional log-linear method with an allowance for underdispersion and the results obtained from 5000 bootstrap resamples of cases (Bootstrap 1) and 5000 resamples of residuals (Bootstrap 2) Conventional analysis Bootstrap 1 Bootstrap 2 Est Std Err t-value t-dist Signif Std Err Signif Std Err Signif Constant 2 224 0.175 0.257 0.244 Sex 0 157 0.052 3 04 0.002 0.079 0.051 0.074 0.034 Age 1 056 0.920 1 15 0.251 1.386 0.445 1.288 0.423 AgeSq 0 849 0.991 0 86 0.392 1.486 0.570 1.394 0.555 Income 0 205 0.081 2 53 0.012 0.129 0.116 0.116 0.079 evyplus 0 123 0.066 1 87 0.062 0.096 0.196 0.094 0.182 FreePoor 0 440 0.165 2 66 0.008 0.291 0.105 0.265 0.062 FreeRepa 0 080 0.085 0 94 0.346 0.127 0.522 0.121 0.517 Illness 0 187 0.017 11 12 0.000 0.023 0.000 0.024 0.000 ActDays 0 127 0.005 27 40 0.000 0.008 0.000 0.008 0.000 HScore 0 030 0.009 3 24 0.001 0.014 0.032 0.014 0.043 ChCond1 0 114 0.061 1 86 0.063 0.089 0.198 0.086 0.180 ChCond2 0 141 0.076 1 85 0.065 0.121 0.241 0.112 0.207 Note. Values shown are the regression estimates (Est), the estimated standard errors (Std Err) of the estimates from the conventional analysis and the estimated standard errors from the two bootstrap resampling methods, and the significance (Signif) of the t-values based on the t-tables and the two bootstrap resampling methods. To examine whether these results occur with simulated data, 250 sets of data similar to the observed data were generated and analyzed in the same way as the observed data. Only 250 sets were generated, because the size of the original data set made the simulation of a data set and the conventional and bootstrap analyses a relatively slow process. The counts for the simulated data sets had expected values given by Eq. (5) with Poisson distributions because, if anything, the variation in the observed counts shows less variation than is expected from this distribution. It was found that on average the regression standard errors from the conventional analysis were about 12% lower than the standard deviations of the simulated estimates while both bootstrap methods gave similar average estimates with little apparent bias. The simulation therefore indicates that the bootstrap analyses are more reliable than the conventional analysis in terms of the estimation of standard errors and the determination of the significance of regression coefficients. These results are also reflected in the observed coverage of 95% confidence intervals for the simulated data, which was only 88% for the conventional analysis, 97% for bootstrap resampling of cases, and 94% for bootstrap resampling of residuals. 5.3. The Marine Mammal Interaction Data The conventional analysis of the marine mammal interaction data results in the estimated equation E Interactions = Days exp 1 665 3 065 FMA1 1 573 FMA3 0 269 FMA5 0 225 FMA6 + 0 257 Y1997

BOOTSTRAPPING WITH MODELS FOR COUNT DATA 1173 Figure 3 Standardized residuals plotted against the number of marine mammal interactions. (Color figure available online.) + 0 182 Y1998 + 0 694 Y1999 + 0 275 Y2000 + 0 277 Y2001 + 0 101 Y2002 (6) where E(Interactions) is the expected number of marine mammal interactions, Days is the number of fishing days, FMA1 is 1 for fishing in fisheries management area 1 or otherwise 0 (with similar definitions for FMA3, FMA5 and FMA6), and Y1997 is 1 for fishing in year 1998 (with similar definitions for Y1998 to Y2002). This then makes the standard observation with E Interactions = Days exp 1 665 = 0 189 Days apply for fishing in fisheries management area 7 in 2003, with the effects for the other fishing areas and years being estimated relative to the last fishing area and the last fishing year. The effect for fishing days was allowed for in the standard analysis by including the natural logarithm of Days as an offset in the argument for the exponential function. The conventional analysis estimates that the variances of the counts of the number of interactions are what is expected from the Poisson distribution multiplied by 3.00, so that there is considerable overdispersion in the data. A problem arose when bootstrap resampling of the observations was attempted with this set of data because there are only 34 of these observations, with these being in five fisheries management areas and in seven years. Because of this the probability of a bootstrap set of data having no observations in one of the fisheries management areas or in one of the years is quite high. The estimation process will then fail because one of the predictor variables has no data. This happened immediately when bootstrap resampling of observations was attempted and hence no results with the resampling of observations are available. This problem did not occur with the first two sets of example data because of the much larger numbers of observations. For stratified resampling of standardized residuals it is necessary to decide on the number of strata to use for the 41 residuals. Figure 3 shows that for these data there is some variation in the distribution of the residuals as the expected count changes from about 0.1 to about 60, but given the relatively small number of observations it was decided to use just two strata, with 17 residuals in each of these.

1174 MANLY Table 3 Results from fitting Eq. (6) using the conventional log-linear method with an allowance for overdispersion and the results obtained from 5000 resamples of residuals (Bootstrap 2) Conventional analysis Bootstrap 2 Est Std Err t-value t-dist Signif Std Err Signif Constant 1 665 0.242 0.237 FMA1 3 064 1.008 3 04 0.005 2.180 0.045 FMA3 1 573 0.538 2 92 0.007 0.607 0.030 FMA5 0 269 0.187 1 44 0.160 0.183 0.285 FMA6 0 224 0.873 0 26 0.799 1.505 0.823 Y1997 0 257 0.336 0 76 0.451 0.332 0.570 Y1998 0 181 0.340 0 53 0.598 0.332 0.700 Y1999 0 694 0.286 2 42 0.022 0.284 0.080 Y2000 0 274 0.335 0 82 0.419 0.327 0.545 Y2001 0 276 0.350 0 79 0.436 0.346 0.556 Y2002 0 100 0.344 0 29 0.772 0.336 0.823 Note. Values shown are the regression estimates (Est), the estimated standard errors (Std Err) of the estimates from the conventional analysis and the second bootstrap resampling method, and the significance (Signif) of the t-values based on the t-tables and the bootstrap resampling. Table 3 shows the results obtained from the conventional analysis of the data and the analysis using stratified bootstrap resampling. The bootstrap standard errors are similar to those from the conventional analysis except for the estimates of the coefficients of FMA1 and FMA6. For these two fisheries management areas the bootstrap standard errors are higher than the estimates from the conventional analysis. The reason for this is that because of the small size of the data set and the relatively low expected values for these two fisheries management areas there were some bootstrap sets of data with zero counts for all of the observations in one of these areas. An equation was still estimated in these cases but the coefficient for the fisheries management area with zero counts was a large negative number, so that estimated expected frequencies for the bootstrap data were very close to zero for all the observations in this area. This is not unreasonable because the observed data suggest that zero counts for all observations in fisheries management area 1 or 6 could easily have occurred. From the conventional analysis there are three coefficients in Eq. (6) that are significantly different from zero at the 5% level, for the variables FMA1, FMA3, and Y1999. The bootstrap analysis also gives significance at this level for FMA1 and FMA3, but with considerably less significance, while Y1999 is no longer significant. In total, 1000 sets of data similar to the observed data were simulated with the expected values of counts given by Eq. (6) and with the Poisson error inflated by the factor 3.00 using the same zero inflated count model that was used for the simulation of the ear infection data. This showed that the conventional analysis works well with data like the mammal data. If anything it tends to be a slightly conservative, with nominal 95% confidence limits giving 96.7% cover for the simulated data. The bootstrap method also gave reasonable results for the estimation of the regression coefficients other than those for FMA1, FMA3 and FMA6, with nominal 95% confidence intervals giving 95.8% cover for the simulated data. However, the bootstrap resampling did not give good results for the

BOOTSTRAPPING WITH MODELS FOR COUNT DATA 1175 coefficients of FMA1, FMA3 and FMA6 because of the large number of simulated sets of data and the large number of bootstrap sets of data generated for the simulated data sets where the estimated probability of an interaction in one or more of these areas was zero, with a resulting large negative coefficient for the variable. For these three variables the nominal 95% bootstrap confidence intervals gave 84% cover for FMA1, 89% cover for FMA3, and 85% cover for FMA6. 6. DISCUSSION Three sets of count data have been analysed using the conventional method for fitting a log-linear model with variance inflation and using bootstrap analyses. With two of the data sets the results from bootstrap resampling of observations and stratified bootstrap resampling of Pearson residuals are available, while for the third set of data only stratified resampling of residuals could be used because of the small sample size. For the first data set, on ear infections, the two bootstrap methods gave similar results in terms of the estimated standard deviation of the estimated regression parameters, and the significance of t-values for testing whether the true regression parameters are zero. However, the bootstrap standard errors are larger than those from the conventional analysis, and the t-values are all less significant for the bootstrap analyses than they are for the conventional analysis. A simulation study indicates that these results are what is expected with these data but in terms of confidence limits for true parameter values the three methods of analysis all have about the same performance overall. For the second set of data, on the number of doctor visits, both bootstrap methods give similar results in terms of the estimated standard errors of estimated regression coefficients and the significance of the estimated coefficients from t-tests. For both bootstrap methods the estimated standard errors of estimated regression coefficients are higher and t-values are less significant than the conventional analysis suggests. A simulation study in this case indicates that the results from the conventional analysis have some problems with data like this and that either of the bootstrap methods gives a more reliable analysis for the data. For the third set of data, on the number of fisheries interactions with marine mammals in different fisheries management areas in different years, bootstrap resampling of individual observations was not possible because the probability of getting no data for a fisheries management area or a year was quite high for a bootstrap set of data. Even if this was not the case it can be argued that for the observed data there was at most one observation in a fisheries management area in a year so that bootstrap sets of data with more than one observation in a fisheries management area in a year could not have occurred. In other words, bootstrap resampling of observations produces impossible data, and is therefore not a reasonable thing to do. In fact, any resampling of the data should maintain the structure of the observed data in terms of the sampling in fisheries management areas and years, and also keep the number of fishing days constant for an observation. This argument does not seem as strong for the example on the number of ear infections and the number of doctor visits because for those data there was presumably no structure on the predictor variables because of the way that the data were collected.

1176 MANLY Bootstrap resampling of residuals keeps the structure of the observations constant in terms of the fisheries management areas, years sampled, and the fishing effort in days, and sets of data with no observations for years or fisheries management areas cannot occur. This bootstrap method could therefore be applied with the data on the counts of the number of marine interactions with fishing. It was found that the results obtain by bootstrapping were very similar to those from the conventional analysis except in terms of the coefficients for the variables FMA1 and FMA6 that indicate that fishing was in the fisheries management areas 1 and 6. The bootstrap standard errors for these variables are much larger than the estimates from the conventional analysis because some bootstrap sets of data had zero counts in one or more fisheries management areas 1 and 6, leading to large negative estimated coefficients of one or both of the coefficients of FMA1 and FMA6. The simulation study for these data involved generating 1000 sets of data using Eq. (6) and estimating each set using the conventional analysis and the bootstrap resampling of residuals. The results were similar for seven of the variables in the equation, but the bootstrap results were poor for the three variables FMA1, FMA3, and FMA6 because so many of the bootstrap sets of data had all zero counts in these fishing areas. Therefore for this set of data there are problems with bootstrap analyses but the conventional analysis seems to have good properties. Three sets of data were considered. For the first set the two bootstrap analyses appear to give slightly better results than a conventional analysis. For the second set of data the conventional analysis does not appear to give satisfactory results but the bootstrap methods seem to work well. For the third set of data there are problems with the bootstrap analyses but the conventional analysis appears to give reliable results. The final conclusion suggested by these examples is therefore that a bootstrap analysis is more reliable than the conventional analysis for some sets of count data but it can also be expected that in some cases, particularly with small data sets, a bootstrap analysis may give worse results than a conventional analysis. REFERENCES Cameron, A. C., Trivedi, P. K. (1998). Regression Analysis of Count Data. Cambridge: Cambridge University Press. Davison, A. C., Hinkley, D. V. (1972). Bootstrap Methods and Their Applications. Cambridge: Cambridge University Press. Horton, N. J., Kim, E., Saitz, R. (2007). A cautionary note regarding count models of alcohol consumption in randomized clinical trials. BMC Medical Research Methodology 7:9. Manly, B. F. J. (2009). Statistics of Environmental Science and Management. 2nd ed. Boca Raton, FL: Chapman and Hall/CRC. Manly, B. F. J., Chotkowski, M. (2006). Two new methods for regime change analyses. Archives in Hydrobiology 167:593 607. McCullagh, P., Nelder, J. A. (1989). Generalized Linear Models. 2nd ed. Chapman and Hall, London. Moulton, L. H., Zeger, S. L. (1991). Bootstrapping generalized linear models. Computational Statistics and Data Analysis 11:53 63. StatSci.Org. (2011). OzDASL: Ear infections in swimmers. Data available at www.statsci. org/data/oz/earinf.html VSN International. (2010). GenStat 13th edition. Available at www.vsni.co.uk