A NOTE ON TESTING THE SHOULDER CONDITION IN LINE TRANSECT SAMPLING

Size: px

Start display at page:

Download "A NOTE ON TESTING THE SHOULDER CONDITION IN LINE TRANSECT SAMPLING"

Amie Beasley
5 years ago
Views:

1 A NOTE ON TESTING THE SHOULDER CONDITION IN LINE TRANSECT SAMPLING Shunpu Zhang Department of Mathematical Sciences University of Alaska, Fairbanks, Alaska, U.S.A ABSTRACT We propose a new method to examine the shoulder condition of the detection function in line transect sampling. This is an improvement on the method suggested in Mack (1998). We show that our method has a lower rejection rate (Type I error) when the data are in fact from a population satisfying the shoulder condition. It is also shown that our method is much more sensitive than Mack s method to the departures from the shoulder condition, therefore has a higher power. KEY WORDS: Line transect sampling, kernel density estimation; end point kernel, optimal bandwidth. 1. INTRODUCTION Line transect sampling is an important means to estimate population density or size in population biology. In line transect sampling, nonoverlapping strips, each of width 2w and length L, are placed at random in the region. An observer traverses the middle path of the strip and records the perpendicular distance (X i ) of each object sighted within the strip. Assume that a total of n objects has been sighted from all the strips and assume that objects on the transect line are seen with probability 1, then it turns out (Seber (1982), Burnham and Anderson (1976)) that the population density D satisfies the following relationship D = E(n)f(0) 2tL, (1.1) 1

2 where f(x), 0 x w, is the conditional density of the line transect distances, given the object is observed and t is the number of strips. In order to estimate D, one neeeds to estimate f(0). With an appropriate estimate of f(0),denoted by f(0), Dcan then be estimated by D = n ˆf(0) 2tL. (1.2) Various methods have been proposed to estimate f(0) in literature. Generally speaking, they can be divided into two categories: the parametric method and the nonparametric method. The parametric method specifies a parametric model for the detection function and uses the maximum likelihood technique to estimate the parameters in the model, more details about the parametric method can be found in Burnham and Anderson (1976), Pollock (1978), Quinn and Gallucci (1980), Burnham et al. (1980) and Buckland (1985). A popular nonparametric method is to approximate f(x) using a Fourier series(crain et al. (1979)). It is not necessary to specify a model when using the Fourier series method. The coefficients of the series are estimated from the data. The drawback of the Fourier series method is that one has to decide where the series needs to be truncated. In addition to this, the Fourier series method is famous for its inability to adapt to local variations in the true curve. Note that we only need to estimate f(0), it is obvious that the Fourier series method is not a good choice to estimate f(0). In recent years, researchers have turned their attention to look for nonparametric methods with better local performances. A popular method is the kernel method. Increasing applications of the kernel method are seen in various areas such as astronomy, ecology, econometric, water science, etc. Some initial efforts in applying the kernel method to wildlife sampling have been made by Buckland (1992), Quang (1993) and Chen (1994). Mack and Quang (1998) was the first to present a 2

3 rigorous unified study for the kernel estimation of wildlife abundance for both line and point transect sampling. However, the methods discussed above are all derived under the shoulder condition. Recent studies have shown that the shoulder condition may not hold for many wildlife line transect data such as whales, jack rabbits, capercaillies, cotton tails and impalas (Buckland (1985)), etc. Under these situations, the use of the above methods is not satisfactory due to the so-called boundary effects of the kernel method. Mack et al. (1999) noticed this problem and proposed to use the end point kernel method (Zhang and Karunamuni (1998)) to correct the boundary effect. It is demonstrated in Mack et al. (1999) that the end point kernel method outperforms the conventional kernel method developed under the shoulder condition greatly. Nevertheless, Zhang and Karunamuni (1998) pointed out that the end point kernel method has two intrinsic problems: 1) the use of the end point kernel always brings much larger variance than that from the conventional kernel method; 2) the end point kernel method may give negative estimates of the animal density, especially when the sample size is small. Zhang (2000) discussed how to correct these two problems in line transect sampling when using the kernel method. The above discussion shows that it is necessary to know whether or not the shoulder condition holds before analyzing the data. Mack (1998) suggested to test the shoulder condition using the kernel method. It is the first time in literature which formally discussed the problem of testing the shoulder condition using the nonparametric method. In this paper, we discuss the possibility of improving Mack (1998) s results. We will clarify some confusions in Mack s work. 3

4 2. TESTING THE SHOULDER CONDITION 2.1. What to test? Mack (1998) suggested to test the hypothesis H 0 : f (0 + )=0vsH a : f (0 + ) 0. Since we assume that the detectability decreases sharply as objects move away from the observer. It is enough to consider the one sided alternative H a : f (0 + ) < 0. Therefore, in this paper, we will only discuss how to test the following hypothesis: H 0 : f (0 + )=0 vsh a : f (0 + ) < 0. (2.1) The advantage of restricting our attention to the one sided alternative is that the power of the test will be higher than that from the two sided alternative, as we shall see in Section 4. To test the hypothesis (2.1), it is imperative to discuss the estimation of f (0 + ) How to estimate f (0 + )? In order to test f (0 + ) = 0, we need to find a good estimator for f (0 + ). The natural approach to estimate f (0 + ) is to use the kernel method using the order (1, 2) end point kernel K 1,2 : ˆf (0 + )= 1 ( ) 0 xi K1,2, (2.2) nh h where h is the bandwidth (h 0asn ). An order (v, k) endpointkernelis defined as follows, more details on the end point kernel can be found in Zhang and Karunamuni (1998, 1999). DEFINITION 2.1. An end point kernel K v,k is said to be of order (v, k) if 0, j =0,..., v 1,v+1,..., k 1 K v,k (t)t j dt = 1, j = v B v,k, j = k, (2.3) 4

5 where B v,k <. Mack (1998) suggested the following definition for the kernel K used in estimating f (0 + ). DEFINITION 2.2. The derivative kernel K should satisfy the following condition (let K be the antiderivative of K ): β α K is supported on <α<β 0 (2.4) K(α) = K(β) = 0 (2.5) v k K(v)dv =1, for k =0, for k =1. (2.6) β α (K (v)) 2 dv <. (2.7) The equation (2.6) above basically says that K(v) isanorder(0, 1) endpointkernel. Since for an order (0, 1) end point kernel K, differentiating K does not necessarily result in an order (1, 2) end point kernel for estimating f (0 + ). The condition (2.5) was cleverly added to guarantee K would be an order (1, 2) end point kernel. Compare this with Definition 2.1, it is obvious that Definition 2.1 is more appropriate in the sense that it is simpler and more formal. In fact, (2.5) in Definition 2.2 can be ralaxed to K(α) = K(β). An example of K satisfying Definition 2.2 with relaxed (2.5) is K(t) =a 6t 6t 2, for any real a 0. and For the estimator (2.2), standard arguments lead to E ˆf (0 + )=f (0 + )+ h 2 f (0 + ) K 1,2 (t)t 2 dt + o(h) (2.8) Var ˆf (0 + )= f(0+ ) nh 3 [K 1,2 (t)] 2 dt + o(1/nh 3 ). (2.9) 5

6 2.3. How to test the shoulder condition? The equation (2.8) implies that ˆf (0 + ) is an asymptotically unbiased estimator of f (0 + ). Therefore, under the null hypothesis H 0, it is natural to define the following T = ˆf (0 + ) Var ˆf (0 + ) (2.10) to test the null hypothesis H 0. Plug (2.9) in (2.10), we have T = = f(0 + ) ˆf (0 + ) nh 3 [K 1,2(t)] 2 nh 3 ˆf (0 + ) f(0 + ) [K. (2.11) 1,2(t)] 2 Under H 0 and the assumption that nh 5 0, it can be easily proved that T N(0, 1), asymptotically. (2.12) Note that f(0 + ) in (2.11) is unknown. In order to use T as a test statistic, it is necessary to estimate f(0 + ). With Definition 2.1 of the end point kernel, we estimate f(0 + )by ˆf(0 + )= 1 nh 0 K0,1 ( 0 xi h 0 ), (2.13) where K 0,1 is an order (0, 1) end point kernel. Two examples of order (0, 1) end point kernels are the half normal kernel and K 0,1 (t) = 2/πe t2 /2,t (, 0] K 0,1 (t) =1.5(1 t 2 ),t [ 1, 0]. 6

7 With f(0 + ) in (2.11) substituted by (2.13), the test statistic becomes T = nh 3 ˆf (0 + ) ˆf(0+ ) K2 1,2(t). (2.14) Similar to (2.12), under H 0 and the assumption that nh 5 0, we can prove that T N(0, 1), asymptotically. The test statistic (2.14) is the same as the equation (14) in Mack (1998) when K 1,2 is replaced by 1, 1 v 2 K (t) = 1, 2 v 1 0, otherwise. (2.15) It is easy to prove that K (t) is an order (1, 2) kernel. Mack (1998) reported simulated results for the test statistic T with the kernel K (t). The results showed that T worked satisfactorily for the random data simulated from a half normal population. At the significance level γ = 0.05, the sample rejection rates are for n = 60. Mack (1998) also applied the method to some real data such as the state data and the Hemingway s data and reported that the shoulder condition could not be rejected for the stake data, while it was rejected for the Hemingway s data. Since it is generally believed that the Hemingway s data satisfies the shoulder condition, the rejection of the shoulder condition from Mack s method came as a surprise. We re-did the calculation under the same setup as Mack s. We found two mistakes in Mack s calculation. One is the bandwidth used to estimate ˆf (0 + ). The bandwidth Mack (1998) used was , which was wrong. According to the formula b =ˆσn 1/4 which Mack (1998) used to calculate the bandwidth, the correct bandwidth should be The second mistake was the value of T. The value of T is with the correct bandwidth Even with the incorrect bandwidth , the value of 7

8 T is instead of as reported in Mack (1998). Apparently, the decimal point was misplaced. With the correct value of T, it is obvious that the shoulder condition is not rejected. 3. IMPROVEMENT ON MACK S METHOD Although Mack s method seems working fine from the discussion in Section 2.3, careful examination shows that the test has very low power. We carried out a simulation in which the random data were simulated from an exponential distribution. The rejection rate of H 0 is only 14%. This means that, with 86% of chance, Mack s method mistakenly classify the data from an exponential distribution as the data from a distribution which satisfies the shoulder condition. In this section, we will focus on how to improve the performance of Mack s method. The first step in improving Mack s method is to narrow the scope of the alternatives from H a : f (0 + ) 0to H a : f (0 + ) < 0, as we mentioned in Section 1. Further, we suggest to improve the performance of the test by using more appropriate kernels. Zhang (1999) and Zhang and Karunamuni (1999) showed that the best kernels for estimating f(0 + )andf (0 + ) in the sense of minimizing the mean squared error (MSE) for a certain class of kernels are K(t) = 3 2 (1 t2 ), for t [ 1, 0] (3.1) and K 1 (t) = 12(1 + 4t +3t 2 ), for t [ 1, 0], (3.2) respectively. 8

9 It is well-known that the optimal bandwidths for estimating f(0 + )andf (0 + ), under the assumption that f (0 + ) exists, are both in the order of O(n 1/5 ). However, the test statistic T defined by (2.14) does not converge to N(0, 1) with the optimal bandwidths. This is because the expected value of the numerator of T is E( nh 3 ˆf (0 + )) = O( nh 5 )=O(1) for h = O(n 1/5 ). Therefore we we have to require nh 5 0 in (2.14) to guarantee that T N(0, 1). Mack (1998) used the bandwidth in the order of O(n.25 ). This bandwidth converges to zero faster than the optimal bandwidth. Therefore, it allows the test statistic T to have more variability. As a result, it makes the method more reluctant to reject the null hypothesis ( i.e. the shoulder condition) and causes the low power of Mack s test. Note that (2.8) can be written as E{ ˆf (0 + ) h 2 f (0 + ) K 1,2 (t)t 2 dt} = f (0 + )+o(h). (3.3) If we can find a consistent estimator ˆf (0 + )off (0 + ), then it can be proved that E{ ˆf (0 + ) h 2 ˆf (0 + ) K 1,2 (t)t 2 dt} = f (0 + )+o(h). (3.4) Define ˆf new(0 + )= ˆf (0 + ) h 2 ˆf (0 + ) K 1,2 (t)t 2 dt. (3.5) The equations (2.8) and (3.4) show that ˆf new converges to f (0 + ) faster than ˆf (0 + ). Therefore, we can define the new test statistic as T new = Similar to (2.12), we can prove nh 3 ˆf new (0 + ) ˆf(0+ ). (3.6) 0 [K 1,2(t)dt] 2 T new N(0, 1), asymptotically, (3.7) 9

10 as h = O(n 1/5 ). Unfortunately, a consistent estimator of f (0 + ) with good performance is extremely difficult to find. We tried to use the boundary kernel method to estimate f (0 + ). However, the resulted estimator is too volatile to be used in practice. A practical way to estimate f (0 + ) is to use the reference density method (as often used in estimating bandwidth), more details about this method can be founnd in Silverman (1986). Since the null distribution is derived under H 0 (the shoulder condition), we suggest to use the half normal density g(t) = 2 πσ e t2 2σ 2 (3.8) as the reference density. Thus f (0 + ) will be estimated by 2/πˆσ 3 due to the fact that g (0 + )= 2/πσ 3,whereˆσ is the maximum likelihood estimator of σ with ˆσ = T/n, (3.9) where T = n i=1 x 2 i, see Buckland et al. (1994) for derivation of ˆσ. Substitute π/2ˆσ 3 for ˆf (0 + )in ˆf new(0 + ) defined by (3.5) and plug ˆf new(0 + ) into (3.6), we obtain the following T = nh3 [ ˆf (0 + )+ h 2/πˆσ 3 2 K 1,2(t)t 2 dt] ˆf(0+ ). (3.10) 0 [K 1,2] 2 (t) The estimator π/2ˆσ 3 from the reference density method is not a consistent estimator of f (0 + ) if the true density is different from the half normal density. Therefore, T does not converge to N(0, 1) for h = O(n 1/5 ). However, the problem canbefixedbyrequiring nh 5 0. But at this time, we only require a much milder correction, say h = O(n 0.21 ), compared to h = O(n 0.25 ) in Mack (1998). 10

11 With the above discussion, it is easy to see that under the null hypothesis H 0 as h satisfies nh 5 0. Hence the rejection rule for H 0 is: T N(0, 1), asymptotically, (3.11) Reject H 0 if T< z γ at the significance level γ, wherez γ is the z critical value. The last step in improving Mack s method is to use the optimal bandwidths when estimating f(0 + )andf (0 + ). A common measure of assessing the performance of an estimator ˆf(x) atthepointx for f(x) is the Mean Squared Error (MSE), defined by Siverman (1986) showed that MSE( ˆf(x)) = E( ˆf(x) f(x)) 2. (3.12) MSE( ˆf(x)) = (E ˆf(x) f(x)) 2 +Var(ˆf(x)). (3.13) The first term (without the square) E ˆf(x) f(x) in (3.13) is called the bias of the estimator, and the second term Var( ˆf(x)) is called the variance of the estimator. Applying standard arguments, similar to that in the derivation of (2.8), (2.9), to ˆf(0 + ), we obtain E ˆf(x) f(x) = h2 2 f (0 + ) K 0,1 (t)t 2 dt + o(h 2 ) and Var ˆf(0 + )= f(0+ ) nh [K 0,1 (t)] 2 dt + o(1/nh). Plug the above two equations into (3.13), we have MSE( ˆf(0 + )) = h4 4 [f (0 + )] 2 [ K 0,1 (t)t 2 dt] 2 + f(0+ ) 0 [K 0,1 (t)] 2 dt. (3.14) nh 11

12 Minimizing (3.14) w.r.t. h, the optimal bandwidth for estimating f(0 + )is ( h 0 = [K 0,1(t)] 2 dtf(0 + ) 1/5 ) [ K n 1/5. (3.15) 0,1(t)t 2 dt] 2 [f (0 + )] 2 Similarly, we can obtain MSE( ˆf (0 + )) = h2 4 [f (0 + )] 2 [ K 1,2 (t)t 2 dt] 2 + f(0+ ) 0 [K nh 3 1,2 (t)] 2 dt (3.16) and the optimal bandwidth for estimating f (0 + )is ( 6 0 h = [K 1,2(t)] 2 dtf(0 + ) 1/5 ) [ K n 1/5. (3.17) 1,2(t)t 2 dt] 2 [f (0 + )] 2 The bandwidths h 0 and h are not available since f(0 + )andf (0 + ) in (3.15) and (3.17) are unknown. A well-known method of estimating the bandwidth in density estimation is the least square cross-validation method. However, it is a global estimate, using the same bandwidth at each point of estimation. Since it is our intention to estimate f(0 + )andf (0 + ) only, a local bandwidth selector would be more appropriate. Recently, Gerard and Schucany (1999) discussed the estimation problem of the local bandwidth for estimating f(0 + ) under the shoulder condition. Surprisingly, the reference density bandwidth selector using the half normal density as the reference density uniformly performs better than their proposed estimator. Therefore, we decide to use the reference density method to estimate as we used to estimate f (0 + ) in (3.10). f(0 + ) [f (0 + )] 2 in (3.15) and (3.17) With the half normal density as the reference density, it is seen that f(0 + ) [f (0 + )] = π/2σ 5. (3.18) 2 12

13 Plug (3.17) into (3.15) and (3.17) with σ replaced by its maximum likelihood estimator ˆσ defined by (3.9), we obtain the estimators of h and h 0 as follows ĥ = ( 6 π [K 0,1(t)] 2 dt 2[ K 1,2(t)t 2 dt] 2 ) 1/5 ˆσn 1/5 (3.19) and ĥ 0 = ( π [K 0,1(t)] 2 dt 2[ K 0,1(t)t 2 dt] 2 ) 1/5 ˆσn 1/5. (3.20) The bandwidth ĥ0 defined above will be used in Section 4 as the bandwidth to estimate f(0 + ) in the test statistic T defined by (3.10). Since the bandwidth used to estimate f (0 + ) should be in the order of o(n 1/5 ) (see discussion circa (3.10)), we use the following modified bandwidth ĥ1 to estimate f (0 + ) in the test statistic T : where ĥ is the bandwidth defined by (3.19). 4. SIMULATIONS ĥ 1 = ĥn 0.01, (3.21) In this section, we investigate the performance of the test statistic T by simulations. The optimal end point kernels (3.1) and (3.2) are used to estimate f(0 + )and f (0 + ), respectively. With these two kernels, the test statistic T becomes T = nĥ3 1[ ˆf (0 + )+ 2πĥ1 ˆσ 3 ] 5 96f(0 ˆ, (4.1) + )/5 where ˆf (0 + )and ˆf(0 + ) are defined by (2.2) and (2.13), and h 0 and h 1 are estimated by (3.20 ) and (3.21), respectively. For the kernels defined by (3.1) and (3.2), simple algebra shows that ĥ 0 =(15 2π) 1/5ˆσn 1/5 (4.2) 13

14 and where ˆσ is defined by (3.9). ĥ 1 =(90 2π) 1/5ˆσn 1.05/5, (4.3) Example 1. (Simulated data) samples of size 40 and 80 were simulated from a half normal density 2 f(x) = e x 2 πσ 2 2σ 2,x>0 0, otherwise. (4.4) The value of σ used in the simulation was 3. At the significance level α =0.05, we compared the performance of T defined by (4.1) with that of Mack s test statistic T (see (2.14) and (2.15) of this paper, or (14) of Mack (1998) s). The bandwidths used in our method were determined by (4.2) and (4.3). The bandwidth used in Mack s method was h =ˆσn 1/5, as suggested in Mack (1998), where ˆσ was defined to be the sample standard deviation. Note that only one bandwidth is required in Mack s method. Table 1 below shows that the sample rejection rates of T are lower than T s, and the sample rejection rates of T s are closer to α. This shows that our method is more conservative than Mack s method when the true data are from a half normal distribution. This is an advantage over Mack s method because it is our purpose to make fewer mistakes when we know that the null hypothesis H 0 is true. The only concern remains now is whether or not our method is also more conservative than Mack s method when the null hypothesis H 0 is false. To clarify this issue, we generated 1000 sample of size 40 and 80 from the exponential density f(x) = { λe λx,x>0 0, otherwise (4.5) with λ = 1/

15 Using the same method of choosing bandwidths as above, it was shown that the power (rejection rate) of our suggested method was more than twice higher than Mack s method for the sample size =40, and more than three times higher than Mack s method for the sample size =80. This means that our method is much more sensitive to the departure of the distribution from the half normal distribution than Mack s method. The results are also reported in Table 1 below. Table 1 - A Comparison between Our Method and Mack s Method Sample Method Rejection Rate at α = Power of the size 0.05 Test 40 Our method Mack s method Our method Mack s method Table 1 shows that our method is superior to Mack s method from both the rejection rate when H 0 is in fact true (Type I error) and the power when H 0 is false. It can also be seen that both methods improve when the sample size increases, and it seems that our method improves faster than Mack s method. To see how our method works in practice, we also applied our method to two well-known data sets, the Stakes data and the Hemingway s data, detailed description of the two data sets can be found in Buckland et al (1988). For both data, our method show the existence of a shoulder, which is consistent with other researchers findings. REFERENCES Buckland, S. T. (1985). Perpendicular distance models for line transect sampling. Biometrics, 41, Buckland, S. T. (1992). Fitting density functions using polynomials. Applied Statistics, 41,

16 Buckland, S. T., Anderson D. R., Burnham, K. P. and Laake, J. L. (1993). Distance Sampling, London: Chapman and Hall. Burnham, K. P. and Anderson, D. R. (1976) Mathematical models for nonparametric influences from line transect data. Biometrics, 32, Burnham, K. P., Anderson, D. R. and Laake, J. L. (1980). Estimation of density from line transect sampling of biological populations. Wildlife Monographs, No. 72. Chen, S.X. (1996). Studying school size effects in line transect sampling using the kernel method. Biometrics, 52, Crain, B. R., Burnham, K. P, Anderson, D. R. and Laake, J. L. (1979). Nonparametric estimation of population density for line transect sampling using Fourier series. Biom.J., 21, Gerard, P. D. and Schucany, W. R. (1999). Local bandwidth selection for kernel estimation of population densities with line transect sampling. Technical report. Mack, Y. P. (1998). Testing for the shoulder condition in line transect sampling. Comm. Statist. A, 27, Mack, Y. P. and Quang, P. X. (1998). Kernel methods in line and point transect samplings. Biometrics, 50, Mack, Y.P., Quang, P. X. and Zhang S. (1999). Kernel estimation in transect sampling without the shoulder condition. Comm. Statist., A, 28, Pollock, K. H. (1978). A family of density estimators for line-transect sampling. Biometrics, 34, Quinn, T. J. and Vincent, F. G. (1980). Parametric models for line-transect estimators of abundance. Ecology, 61, Quang, P. X. (1993). Nonparametric estimators for variable circular plot surveys. Biometrics, 49, Seber, G. A. F. (1982). The estimation of Animal Abundance, (2nd ed.) London: Griffin. Silverman B. W. (1986). Density Estimation for Statistics and Data Analysis, London: Chapman and Hall. Zhang, S. (1999). Some thoughts on estimating the density function under the shoulder condition. Technical reports, Department of Mathematical Sciences, University of Alaska Fairbanks. Zhang, S. (2000). Improvements on the kernel estimation in line transect sampling without the shoulder condition. Statistics and Probability Letters, 53, Zhang, S. and Karunamuni, R. J. (1998), On Kernel Density Estimation Near Endpoints. Journal of Statistical Planning and inference, 70,

17 Zhang, S. and Karunamuni, R. J. (1999), On nonparametric density estimation at the boundary. Nonparametric Statistics, 12, Zhang, S., Karunamuni, R.J. and Jones, M. C. (1999). An improved estimator of the density function at the Boundary. J. Am. Statist. Ass., 94,

Akaike Information Criterion to Select the Parametric Detection Function for Kernel Estimator Using Line Transect Data

Journal of Modern Applied Statistical Methods Volume 12 Issue 2 Article 21 11-1-2013 Akaike Information Criterion to Select the Parametric Detection Function for Kernel Estimator Using Line Transect Data