Envelope tests for spatial point patterns with and without simulation

Size: px

Start display at page:

Download "Envelope tests for spatial point patterns with and without simulation"

Eileen Davidson
5 years ago
Views:

1 Envelope tests for spatial point patterns with and without simulation Thorsten Wiegand, 1,2, Pavel Grabarnik, 3 and Dietrich Stoyan 4 1 Department of Ecological Modelling, Helmholtz Centre for Environmental Research-UFZ, Permoserstr. 15, Leipzig Germany 2 German Centre for Integrative Biodiversity Research (idiv) Halle-Jena-Leipzig, Deutscher Platz 5e, Leipzig Germany 3 Institute of Physico-Chemical and Biological Problems in Soil Science, Laboratory of Ecosystem Modelling, The Russian Academy of Sciences, Pushchino Russia 4 Institut für Stochastik, TU Bergakademie Freiberg, D Freiberg Germany Citation: Wiegand, T., P. Grabarnik, and D. Stoyan Envelope tests for spatial point patterns with and without simulation. Ecosphere 7(6):e /ecs Abstract. Model testing is a central step of spatial point pattern analysis, which allows ecologists to judge if their data agree with ecological hypotheses. We present a simple and elegant solution of a challenging problem: the construction of a goodness- of- fit envelope test with prescribed significance level α. Our new Analytical Global Envelope (AGE) test is not restricted to the narrow frame of complete spatial randomness testing and its envelopes can be determined by mathematical calculations. This allows us to investigate the influence of key settings of the AGE test on the width of the envelope strip. To circumvent some assumptions of the simulation- free AGE test we present a corresponding Simulation- Based Global Envelope (SBGE) test. The envelope strip of the AGE and the SBGE test encircles the range of a summary function such as the pair correlation function under the null model, and it has the desired property that the null hypothesis can be rejected with significance level α if the empirical summary function wanders outside the envelopes. The AGE test can be applied under the mild conditions that the values of the summary functions under the null model are (approximately) normally distributed and are (approximately) independent for different distance bins r j. The SBGE test requires only the independence assumption. The width of the strip of the AGE envelopes scales for a broad range of point processes with 1/n, where n is the number of points. This casts doubt about attempts of goodness- of- fit testing with low n (say <100). The AGE and SBGE test operate with wider envelope strips than the classical pointwise test. Therefore, the pointwise test has to be considered as too liberal. Furthermore, we show that the width of the AGE/SBGE strip increases approximately with ln(b), where b is the number of distance bins. For example, the AGE/SBGE envelopes are for b = 20 more than 50% wider than the corresponding pointwise envelopes. Our study opens up new avenues to the test problem in point pattern statistics and the new AGE and SBGE tests can be widely applied in ecology to improve the practice in null model testing. Key words: deviation test; global envelopes; goodness-of-fit; Monte Carlo test; null model; pair correlation function; simulation envelope; spatial point pattern; type I error. Received 2 September 2015; revised 27 January 2016; accepted 10 February Corresponding Editor: D. P. C. Peters. Copyright: 2016 Wiegand et al. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. thorsten.wiegand@ufz.de 1

2 Introduction Spatial point patterns are often studied in science and play an important role in modern ecological research (Illian et al. 2008, Wiegand and Moloney 2014). They comprise the locations of ecological objects (e.g., trees) within a given observational window (Fig. 1a) and may include additional information, called marks, which characterize the ecological objects (e.g., the size of the trees, or surviving vs. dead). Spatial point pattern analysis (Ripley 1981, Stoyan and Stoyan 1994, Diggle 2003, Illian et al. 2008) provides powerful techniques to analyze such data sets. During the last two decades ecologists have increasingly adopted them to derive hypotheses on the underlying processes or to test ecological theory (Stoyan and Penttinen 2000, Wiegand and Moloney 2004, Perry et al. 2006, Law et al. 2009, Szmyt 2014, Wiegand and Moloney 2014, Velázquez et al., in press). A typical point pattern analysis in ecology consists of three major technical steps (Wiegand and Moloney 2014). (1) First, the researcher estimates summary functions S(r) (such as the pair correlation function; for explanations of terms and concepts see Table 1) from the data to summarize key statistical properties of the observed pattern, usually as a function of interpoint distance r (which case is always assumed in the present paper). The basic interest here is to find out if there is positive or negative spatial correlation between the ecological objects, and at which spatial scales. This information can allow for inference on underlying processes such as competition or dispersal limitation (Law et al. 2009, Wiegand et al. 2009). For example, values of the pair correlation function larger than one indicate in the absence of environmental heterogeneity aggregation (Fig. 1b) whereas values smaller than one indicate hyperdispersion (or regularity). However, because the underlying (a) (b) (c) Fig. 1. (a) The spatial pattern of 625 juvenile individuals of the species Stylogyne turbacensis at the 50 ha BCI tropical forest plot. (b) Empirical pair correlation function ĝ(r) (closed circles), the theoretical g(r) for the complete spatial randomness (CSR) null model (bold gray line), and the pointwise envelopes being the 5th lowest and highest values of the pair correlation function estimated from 199 simulations of the CSR null model (black lines) and approximation based on Eqs. 4 and 5 (red lines). (c) Same as (b), but for the L- function. We used for estimation of the pair correlation function the bandwidth h = 2.5 m. 2

3 Table 1. Terms and concepts of envelope tests. Term Explanation References Summary function S(r) Empirical summary function Distance bin r j Pointwise envelope test Global envelope test Significance level β Significance level α Asymptotic normality assumption Independence assumption AGE test Characteristic that quantifies statistical properties of spatial point patterns, usually as function of distance r. Popular examples are Ripley s K- function K(r) and the pair correlation function g(r). The corresponding estimator is indicated by Ŝ(r) and the theoretical function by S(r). Summary function S(r) estimated from the data. Particular distance where S(r) is estimated. Usually, the r j are equally spaced over the distance interval B of interest. The number b of distance bins is an important setting of envelope tests. The traditional envelope test used in ecological applications. The pointwise envelopes are given by the kth lowest and highest values of Ŝ(r) taken from s simulations of the null model (Eq. 1). However, the pointwise envelopes applied for a distance interval B (i.e., b > 1) do not allow rejecting the null model with the aimed significance level given in Eq. 1 if the empirical summary function wanders outside the envelope strip. Allows, in contrast with pointwise envelope tests, to reject the null model with prescribed significance level α. Local significance level of the pointwise test required to yield the prescribed significance level α of the global envelope test (Eq. 8). Prescribed significance level of the global envelope test. Assumes that the values of an estimator of a summary function Ŝ(r) are approximately normally distributed for fixed r. This mild assumption is required for the AGE test but not for the SGBE test. Assumes that the values of the estimators of the summary function Ŝ(r) evaluated at different distance bins r j are (approximately) independent. This assumption excludes the use of cumulative summary functions but allows for analytical determination of the critical value z β of Eq. 8. Analytical Global Envelope test, allows for analytical determination of global envelopes under three assumptions: (1) asymptotic normality, (2) unbiasedness and knowledge on the variance, and (3) independence. DCLF test Diggle Cressie Loosmore Ford test, translates the summary function S(r j ) into the single test statistic given in Eq. 3. This test guarantees the prescribed significance level α. MAD test Maximum Absolute Deviation test, translates the multiple tests at different distance bins r j into a single test statistic being the maximum absolute value of S(r j ) Ŝ i (r j ) taken over all distance bins r j (Eq. 2). This test guarantees the prescribed significance level α. SBGE test Simulation Global Envelope test, works exactly in the same way as the pointwise envelope test but determines the value of k or the number of simulations with Eq. 10 in a way that the prescribed significance level α is guaranteed. Requires a higher number of simulations of the null model than the pointwise envelope test. Note: References are: 1, Ilian et al. (2008); 2, Ripley (1977); 3, Loosmore and Ford (2006); 4, Baddeley et al. (2014); 5, Myllymaki et al. (in press); 6, this study; 7, Myllymaki et al. (2015). 1 2, 3, 4 4, 5 6 6, , 4 2, 4 6 point processes are stochastic a value of the empirical pair correlation function below or above one cannot be used to infer safely aggregation or hyperdispersion respectively. (2) The second step of point pattern analysis is therefore implementation of an ecological (null) hypothesis as stochastic null model. (3) The third step, model testing, allows then to find out if the empirical summary function is compatible with the null model or at which spatial scales significant departures occur. While the technical tools for the first two technical steps are unproblematic, the test problem is more complicated and until the recent past there existed even confusion about the use of test methods and their interpretation (Loosmore and Ford 2006). Ecologists have traditionally used the simulation envelope approach for statistical inference, which goes back to Ripley (1977) and yields valuable information on the scales of deviation. In such tests the empirical summary function is plotted as a function of distance r together with pointwise simulation envelopes, which are typically the 2.5th and 97.5th percentiles of the summary functions generated by Monte Carlo simulations of the null model (Fig. 1b, c). If the empirical summary function 3

4 wanders for some distance outside the pointwise simulation envelopes (e.g., for distances < 8 m in Fig. 1b) this is taken as evidence for a departure from the null hypothesis. This graphical representation is especially attractive for ecological applications because it encircles the fluctuations of the summary function under the null model and points to distances where departures may occur. However, it is not straight- forward to determine for pointwise simulation envelopes with a given number of simulations the significance level α for rejection of the null hypothesis (Loosmore and Ford 2006). The reason for this is that the pointwise envelopes yield only for a single distance value r (when determined prior to conduction of the test) a prescribed significance level α. Otherwise, when considering the envelopes over a distance interval, essentially multiple tests are conducted with the well- known problems of simultaneous inference (e.g., Ripley 1977, Diggle 2003, Loosmore and Ford 2006). This fact has caused confusion and some discussion on the validity of the tests (e.g., Loosmore and Ford 2006, Grabarnik et al. 2011, Baddeley et al. 2014, Myllymäki et al. 2015). Also further fundamental issues of practical importance related with envelope tests require clarification and careful consideration to avoid inappropriate use of these tools. For example, how does the number n of points of the pattern influence the ability of the test to detect departures from the null model? In point pattern studies in ecology one can often find analyses of patterns with very few points (say < 50) (see Velázquez et al., in press:fig. A1), but can one really distinguish patterns with such a low number of points from a pattern belonging to some null model? Recent research has addressed the problem of constructing envelope tests with prescribed significance level α (Grabarnik et al. 2011), and based on the idea of studentization of summary functions (see Myllymäki et al. 2015) the paper Myllymäki et al. (in press) presents solutions that lead to satisfactory Monte Carlo tests. Here, we go one step further by breaking with the long tradition of simulation- based goodness- of- fit testing in point processes statistics. We note that already Ripley (1988:46) constructed tests of the CSR (complete spatial randomness) hypothesis which do not use simulations during the model testing phase but only some analysis of the empirical L- function (see also Illian et al. 2008:95 96). This, however, leads to envelopes of constant width. The present paper introduces a goodness- of- fit envelope test with a prescribed global significance level α that does not use simulations and is not restricted, similarly to simulation- based tests, to the narrow frame of CSR testing. The new test is called Analytical Global Envelope (AGE) test. The term global indicates that the significance level α of the test is valid for a whole given distance interval and not only for one distance r as for the pointwise test. We also present a corresponding Simulation- Based Global Envelope (SBGE) test that relaxes the normality assumptions of the AGE test and runs in the same way as the traditional pointwise simulation envelopes. The simulation- free approach has two main advantages for practical applications. First, for large samples time- consuming simulations can now be avoided. Second, it allows to derive mathematical formulas for the width of the envelope strip that give ecologists precise information on the role of test characteristics such as point number n, size and shape of the window W, and other settings of the estimators of the summary functions. The price to be paid for application of the AGE or the SBGE test is the change from the popular K- or L-function, which is still mostly used for model testing in ecology (e.g., Velázquez et al., in press:figs. 5d and A2d), to the pair correlation function g(r). However, g(r) is increasingly used in ecology because it presents the second- ordervariability information in an easier and more intuitive way than the conventional cumulative functions K and L (e.g., Wiegand and Moloney 2004, Perry et al. 2006, Law et al. 2009); and there is now standard software for the estimation of g(r). The article is organized as follows: We remind the reader first to the technicalities of the conventional pointwise simulation envelopes. Second, we discuss the fundamentals of the construction of the Analytical Global Envelope (AGE) test: analytical estimation of the variance of the estimator of the pair correlation function g(r), its asymptotical normality for fixed r, and the independence of estimators of g(r) for different r in suitable spacing. Third, we present the AGE test and the formulas that link the width of the envelope strip with the settings of the test. Forth, we also present a Simulation- Based Global 4

5 Envelope (SBGE) test that makes fewer assumptions than the AGE test and can be applied with standard software. Finally, to be on the safe side, we also check the quality of the AGE test by means of simulations. While the present paper only uses the pair correlation function for testing univariate patterns, its approach may be extended to bivariate and marked patterns and other summary functions of the nature of densities. Materials and Methods Example data As example pattern we use the spatial pattern of 627 juvenile individuals (diameter at breast height between 1 and 3 cm) of the species Stylogyne turbacensis, a small understorey tree (Fig. 1a). The data are taken from the first (1982) census of the fully mapped m plot of tropical forest at Barro Colorado Island, Panamá (Hubbell et al. 2005). Summary functions We apply here two commonly used secondorder summary functions for our analyses, the L- function L(r)= K(r) π r, a transformation of the K- function, which is traditionally applied in ecology, and the pair correlation function g(r), see Illian et al. (2008) and Wiegand and Moloney (2014) for details on the interpretation and estimation of L(r) and g(r). Although the idea leading to the AGE or SBGE test can be applied in principle also to other summary functions such as nearest neighbor summary functions, we focus here on second- order summary functions. Nevertheless, we use at some points a general language to show that the main methods are more general than only for pair correlation and L- functions. All simulation analyses presented here were conducted with the software Programita (Wiegand and Moloney 2014), which can be accessed at and a R script with example data is provided in the Supplement. On the conventional pointwise simulation envelopes The method of the conventional pointwise simulation envelope was introduced by Ripley (1977) and first applied by Sterner et al. (1986), Getis and Franklin (1987), and Kenkel (1988) to ecological questions. It quickly became the standard method of statistical testing for point pattern analysis in ecology. For example, Velázquez et al. (in press) found in their review that an overwhelming majority of 93% of the point pattern studies reviewed used Monte Carlo simulations and pointwise envelopes. The basic aim of pointwise simulation envelopes is to estimate intervals that encircle the typical (e.g., 95%) range of values of the summary function Ŝ(r) under the null model. [The hat symbol denotes the estimator of the summary function, as opposed to its theoretical counterpart S(r).] For example, for 999 simulated values of Ŝ(r), the 2.5 and 95.5 percentile of the underlying distribution are approximated by the 25th lowest and highest values of Ŝ(r). More generally, simulation envelopes can be constructed by the kth highest and lowest values of the summary function Ŝ(r) taken from s simulations of the null model, which leads to a significance level of β=2k (s+1) β=k (s+1) if the test is two-sided if the test is one-sided, (1) given that the test is conducted for only one distance r. We use here the symbol β instead of the perhaps expected α to avoid confusion. The β is the local type I error of the pointwise simulation envelope, which is only valid for a single distance r [and could be more precisely called β(r)], whereas α is the prescribed significance level of the AGE test, which is conducted over a distance interval B. We therefore call α global significance level to distinguish it from the local β. To get an idea of the variability of Ŝ(r) for the whole range of distances r of interest, the simulation envelopes are then usually plotted together with the empirical summary function over distance r (e.g., Fig. 1b, c) for all r in an interval B of length b. (In the following we assume that b is an integer and consider distances r j for j = 1, 2,, b.) This provides a simple graphical assessment of distance range and strength of potential departures from the null model. For example, the pair correlation function of the juvenile S. turbacensis trees is outside, but not too far outside, the simulation envelopes for all distances below 8.5 m and inside for all larger distances (Fig. 1b). A first diagnosis arising from this observation is that the juvenile S. turbacensis trees may show weak aggregation up to distances of 8.5 m. 5

6 However, if we use these pointwise simulation envelopes in goodness- of- fit testing we come into trouble: the rate of rejection of this test is in general larger than the β given in Eq. 1 except if the envelope test is conducted for only one distance r (Loosmore and Ford 2006). The reason is that the pointwise envelope conducts many tests simultaneously for the various distances r j (j = 1,, b) where S(r) is evaluated. Therefore, we run the risk of type I error inflation due to the phenomenon known as simultaneous inference. This is well- known in the statistical literature (e.g., Conover 1999, Diggle 2003, Illian et al. 2008) and is for point process statistics in detail discussed in Loosmore and Ford (2006), Grabarnik et al. (2011), Baddeley et al. (2014) and Loop and McClure (2015). An important class of goodness- of- fit tests with a prescribed significance level α are the socalled deviation tests, which convert the multiple tests (i.e., the information on various distances r j ) into a single test, as introduced by Diggle (1979) and Ripley (1979). The test statistic is then a single number, typically the maximum T i = max S(r j ) Ŝ i (r j ) of the deviations between the expected summary function S(r j ) and the summary functions Ŝ i (r j ) estimated from the observed data (i = 0) or the ith simulation of the null model, taken over all distances r j of interest, or the corresponding sum of the squared differences u i = j=1,,b b j=1 ) 2. ( S(r j ) Ŝ i (r j ) (2) (3) The hypothesis is rejected if the empirical test statistic T 0 (or u 0 ) has an extreme position within the ordered series of all T i s (or u i s). Following Eq. 1 for the one- sided test, this is the case if T 0 (or u 0 ) is larger than the corresponding kth largest simulated value. Inverting Eq. 1 shows that the value of k is given by k = α (1 + s), but the global significance level α and the number s of simulations must be selected in a way that k approximates an integer value. For example, for a value of α = 0.05 and s = 199 simulations of the null model we have k = 10. Tests of this type are described in detail in Baddeley et al. (2014) and in textbooks such as Diggle (2003) and Illian et al. (2008). The test using Eq. 2 is called MAD (Maximum Absolute Deviation) test and it corresponds to global envelopes of constant width (e.g., the kth largest value of the T i ; i = 1,..., s), centered on the expected summary function S(r). However, these envelopes are not sufficiently flexible to represent the behavior of Ŝ i (r j ) for different distances r if the distribution of Ŝ i (r j ) is not the same for all r. The test using Eq. 3 is called integral deviation test or Diggle Cressie Loosmore Ford (DCLF) test (Baddeley et al. 2014) and does not lead to envelopes. Thus, by construction these tests do not offer envelopes that could help the ecologist to find out which scales r are relevant. Outline of the AGE test The AGE test is based on three main ingredients. The first ingredient is the observation that estimators Ŝ(r) of summary functions such as the pair correlation function follow for fixed distances r in approximation a normal distribution with standard deviation σ S (r) and mean S(r). If Ŝ(r) is normally distri buted we can estimate the pointwise lower and upper envelopes S p (r) and S+ p (r), respectively, as S (r)= S(r) z β σ S (r) S + (r)= S(r)+z β σ S (r), (4) where z β is the critical value for the (local) pointwise significance level β (e.g., z β = 1.96 for β = 0.05). The width of the pointwise envelope strip is therefore entirely determined by σ S (r) and z β. The second ingredient is a formula that approximates the variance of Ŝ(r) in dependence on the settings of the AGE test (e.g., area A and perimeter length U of the observation window, and number n of points; Eq. 5) and requires the assumption that Ŝ(r) is unbiased. The third ingredient is independence of the values of Ŝ(r) for the different distance bins r = r j (j = 1, 2,, b) that cover the distance interval B of interest. The independence property allows us to establish an analytical relationship between 6

7 the significance level β of the pointwise test required for a given value of b to obtain the desired significance level α of the AGE test. Thus, for a given value of b, we can determine the value of β (and therefore the z β of Eq. 4) that corresponds to the prescribed global significance level α (Eq. 8). In summary, Eqs. 4, 5, and 8 allow us to estimate the width of the envelope strip directly (i.e., without any simulations) in dependence on the settings of the AGE test such as α, r, n, b, A, U. Facts on the variability of estimators of summary functions Asymptotic normality of the summary functions. The AGE test is based on studies of the variability of estimators Ŝ(r) of the summary functions used in goodness- of- fit tests. A first result of such studies is that many such estimators follow an approximate normal distribution for fixed r. For example, Fig. 2 shows that the distributions of ĝ(r) and L(r) of a Thomas cluster process (Appendix S1) can be well- approximated by (a) (b) (c) (d) (e) (f) (g) (h) (i) Fig. 2. Variability of summary functions under a Thomas cluster process, estimated from 1000 simulations of the process. The patterns were generated in a m observation window with parameters λ = 0.025, ρ = , and σ = 6 m (for explanations see Appendix S1). (a) Distribution of the values of g(4.5). The red bars indicate the 2.5 and 97.5 percentiles of the distribution [the 25th lowest and highest values of g(4.5)] and the blue bar the mean value. The red line is a fit by a normal distribution. (b) Same as (a), but for distance r = 10.5 m. (c) Comparison of the pointwise simulation envelopes based on simulations (using in Eq. 1 k = 25 and β = 0.05) (black lines) with that predicted by Eq. 4 (red lines). We used a bandwidth h = 1.5 m. (d) Same as (a), but for the L-function at r = 5 m. (e) Same as (a), but for the L-function at r = 11 m. (f) Same as (c) but for the L- function. (g) Same as (a), but for the distribution function of the distances to the nearest neighbor D(r) at r = 5 m. (h) Same as (a), but for D(r) at r = 11 m. (i) Same as (c) but for D(r). 7

8 normal distributions. Furthermore, Figs. S1 S3 in Appendix S1 show several examples for typical point processes relevant for ecological questions. The authors often observed such a behavior in their statistical work and believe that it is a somewhat intuitive knowledge in the statistical community. Fortunately, there is theoretical work on central limit theorems which confirms these empirical findings and gives them a theoretical explanation for spatially homogeneous (and isotropic) point processes (Heinrich and Klein 2014, Heinrich 2015). Analytical formula for the variance of the estimator of the summary functions. A second result is (approximate) knowledge on the variance of Ŝ(r). Simple formulas for the standard deviations of ĝ(r) and L(r) exist for many null models, which will be given and discussed below. Note that if an estimator Ŝ(r) of the summary function S(r) is normally distributed with standard deviation σ S (r) and mean S(r), we can construct (without simulation!) the pointwise lower and upper envelopes S p (r) and S+ p (r), respectively, using Eq. 4. The corresponding pointwise simulation envelopes will be close to (4) if β is chosen following (1), see Figs. 1b, c, and 2c, f, i. We now consider the standard deviation σ g (r) and σ L (r) of ĝ(r) and L(r) respectively. Stoyan et al. (1993) and Illian et al. (2008:234 and Eq ) present a formula that approximates the standard deviation σ g (r) of the estimator of the pair correlation function g(r) for (not too strongly clustered) point processes in dependence on the number n of points of the pattern, distance r, the bandwidth h of the box kernel used for estimation of g(r), and the geometry of the observation window. For the standard deviation needed in Eq. 4 this formula yields: σ g (r)= g(r) A2 1 n 2 (2πrh) γ W (r) = g(r) A h n r γ W (r) A 2π, (5) where A is the area of the observation window W and γ W (r) the so- called isotropized set covariance (Illian et al. 2008:485), where the term A γ W (r) is the edge- correction weight of the Ohser estimator of g(r) (Illian et al. 2008:230), which considers the influence of the shape of W. For a rectangular W with perimeter length U and area A it is γ W (r) A r U + r2 for r not too π π large (Illian et al. 2008:486). Figs. 1b, 3a, c and Fig. S2 in Appendix S2 show the high quality of the approximation by Eq. 5. Eq. 5 in combination with Eq. 4 tells us directly (i.e., without any simulations) that the width of the pointwise envelope strip for the pair correlation function is (1) proportional to 1/n (Fig. 3c), (2) decreases with the square root of the bandwidth h, (3) decreases with the square root of distance r, and (4) is proportional to the square root of the area A of the observation window W. The influence of the geometry of W is captured by A γw (r). Especially, the width of the envelope strip increases for long and narrow observation windows, where U becomes large. For example, the term A γ W (r) yields for a m observation window and distance r = 50 m a value of 1.069, for a narrow transect of 20 12,500 m it takes a 20% larger value. For distance r = 250 m and W = m it yields 1.5 and for r = 359 m it increases to 2.0. This result confirms the common rule of thumb that it is not recommendable to explore g(r) for distances larger than half of the smallest side of W (e.g., Haase 1995). Appendix S2 presents a derivation of Eq. 5 following Stoyan et al. (1993) that is based on the heuristic assumption that the number N r of pairs of points in the window which are distance r ± h apart follows a distribution with the property that mean = variance. (This is true for the Poisson distribution, but this special distribution is not necessary for our calculations.) Simulations show that this is a realistic assumption for many homogeneous point process models ranging from strongly hyperdispersed to moderately clustered patterns and can be applied in many practical situations. However, Eq. 5 becomes inaccurate for larger distances r and/or a large number n of points as well as for small r and/or small n where the property mean = variance does not hold anymore or the distribution becomes skewed (see Appendix S2). A similar approximation can be used in the case of the L- function, as we show in Appendix S2 (see also Ward and Ferrandino 1999): 8

9 σ L (r)= 0.5 π A n Aπr 2 r 0 2πt γ W (t)dt (6) Eq. 6 yields good estimates for σ L (r) under CSR for the range of distances of practical interest (Fig. 3b). However, already small departures from CSR lead to strong departures from Eq. 6 (see Fig. S4g, o in Appendix S1) whereas the corresponding Eq. 5 for the pair correlation function holds also for larger departures from CSR (see Fig. S4b, f, j, n, r in Appendix S1). This is one of our reasons to recommend use of the pair correlation functions in the AGE test instead of the L- or K- function. Ripley (1988) also provided an approximation of the variance of K(r) under CSR. Independence property of the summary function. The third important distributional property is the independence property. For many estimators of non- cumulative summary functions (as the box kernel estimators of the pair correlation function used here and other kernel estimators) we observe that their values at not too small spatial lags Δr [i.e., ĝ(r j ) and ĝ(r j +Δr)] are in good approximation uncorrelated if the lag Δr is larger than two times the bandwidth h of the estimator (Fig. 4a). This is plausible because in the case of the box kernel different point pairs are used to estimate ĝ(r j ) and ĝ(r j +Δr). Fig. 4a shows the correlation coefficients for the pair correlation function for different values of r j and Δr, taken from 1000 simulations of the CSR null model. In all cases the correlation coefficients are small, and for Δr 2h = 3 m (Fig. 4d) they range in most cases between 0.09 and 0.19 (Fig. 4a). The same analysis of the cumulative L- function reveals as expected strong correlations (Loop and McClure 2015), especially for small lag distances Δr and large distances r (Fig. 4b, e). To show that the independence property also holds for a wider range of null models other than CSR we analyzed the correlations of ĝ(r j ) and ĝ(r j +Δr) for the Thomas process shown in Fig. S1u in Appendix S1. The spatial clustering introduces some correlation for distance lags Δr within the range of clustering (here < 20 m) (Fig. 4f), but for larger values of Δr the correlation coefficients are below 0.2 (Fig. 4c, f). This means that one has to work with a larger spacing of distance bins if one wants to ensure approximate independence, which is needed for the AGE test. Similar arguments apply for regular point processes. In the following we always assume a suitable spacing. As we will see below, simulation experiments show that weak correlations between values of the pair correlation functions are uncritical for the construction of the AGE test. Note that this does not apply for the cumulative L- function (cf. Fig. 4b, e)! Construction of the analytical global envelope (AGE) test We now construct global envelopes S + g (r) and S g (r) that correspond to a prescribed global significance level α for a given distance interval B represented by b distances r j. To this end we reinterpret Eq. 4 in a way that the β in Eqs. 1 and 4 is the probability β which ensures that the type I error probability of the AGE test is α, and z β is the corresponding critical value. To determine β we take advantage of the (approximate) independence of the values of Ŝ(r j ) for the selected values of r j. Therefore we basically test a hypothesis which is composed of b independent sub- hypotheses (one for each distance bin r j ). We reject the hypothesis if at least one sub- hypothesis is rejected. If the type I error probability β for each sub- hypothesis is the same for all distances r j then the overall type I error probability α for the hypothesis over the entire distance interval B is simply, α=1 (1 β) b. Therefore, the value of β that yields the prescribed significance level α of the AGE test is β=1 (1 α) 1 b. (7) (8) Thus, the global envelopes S + g (r) and S g (r) constructed by Eq. 4 with the β in Eq. 8 are the envelopes we want. They indicate departure from the null model with prescribed significance level α if the empirical summary function wanders at least at one distance r j outside the envelope strip and so provide the desired intuitive assessment of scale effects. Our approach allows additionally to the construction of simulation- free envelopes also analytical calculation of the p- value. For its 9

10 (a) (b) (c) (d) Fig. 3. Standard deviation of ĝ(r) and L(r) under complete spatial randomness (CSR). (a) Analytical approximation of the standard deviation σ g (r) of ĝ(r) (Eq. 5; bold gray line) and σ g (r) estimated from 1000 simulations of the CSR null model (circles). We used h = 2.5 m, A = m, and n = 626. (b) Same as (a) but for the L- function. (c) Standard deviation of ĝ(r) taken over the 1000 simulations of CSR in dependence on n. (d) Same as (c), but for the L- function. determination we first estimate the maximum value z of all z j = Ŝ(r j ) S(r j ) σ S (r j ) taken over the b distances r j used in the AGE test. Based on z we then compute the p- value of the pointwise envelope test, i.e., a local p- value p loc, by means of p loc = 2(1 Φ(z)), where Φ(z) is the cumulative distribution function of the standard normal distribution. Finally, the p- value of the AGE test is calculated analogously to Eq. 7 as p global = 1 (1 p loc ) b. (9) The SBGE test In some circumstances not all assumptions of our AGE test may hold and simulations can help. For example, if the moments S(r) and σ S (r) cannot be determined analytically as assumed until now they can be determined by simulation of the underlying point process model. This is a simple variant of the AGE test that expands the range of point process models that can be handled. Note that this is not application of Monte Carlo testing! More importantly, we present a SBGE test that can be conducted in exactly the same way as the classical pointwise approach. It only requires the independence assumption. The SBGE test uses, as the AGE test, Eq. 8 to estimate the value of β required for the prescribed significance level α, but then inverts Eq. 1 to estimate a suitable value of k that corresponds to this β, i.e., k = β(1+s) 2 k = β(1+s) if the test is two-sided if the test is one-sided, (10) However, s and β must be selected in a way that k is close to an integer value. Because this simulation- based SBGE test does not use Eq. 4 it does not require the normality assumption. Table 2 shows appropriate values of k for different numbers s of simulations and for different numbers b of distance bins. It also shows that the SBGE test needs in general a large number of simulations to obtain at least a value of k = 1. For example, a SBGE test with b = 25 distance bins and α = 0.05 requires at least 999 simulations of the null model (Table 2). The SBGE test offers the user flexibility in the selection of the point process models that can be handled, but it does not lose the advantage of the 10

11 (a) (b) (c) (d) (e) (f) Fig. 4. (a) Correlation coefficients between ĝ(r j ) and ĝ(r j +Δr) for different distances r j and different spatial lags Δr, taken from 1000 pair correlation functions estimated for the CSR null model for n = 626 points within a m observation window. The pair correlation functions were estimated at distances r j = 1.5, 2.5,, 50.5 m with a bandwidth h = 1.5 m. (b) Same as (a), but for the L- function. The L- function was estimated at distances r j = 5, 10, 15,, 50 m. (c) Same as (a), but for the Thomas process shown in Fig. S1u of Appendix S1. (d) Average correlation coefficient for CSR at lag Δr for over all distances r j = 1.5, 2.5,, 50.5 m. Note that the correlations are large if the lag Δr is smaller than the doubled bandwidth h. (e) Same as (d), but for the L- function. (f) Same as (d), but for the Thomas process. analytical estimation of β of Eq. 8. Of course, S(r) has to be a non- cumulative summary function. Evaluation of the AGE test We used simulations to check the quality of the AGE test and the predictions of Eqs. 4 and 8. In a first simulation experiment we tested if the empirical type I error of the AGE test is really close to the nominal level α. To do this we generated 10,000 point patterns using CSR (627 points within a m observation window) and applied the AGE test based on Eqs. 4, 5, and 8 for the pair correlation function (with bandwidth h = 2 m). The supplement provides the R script that was used for the AGE test. In a second simulation experiment we applied the Monte Carlo test proposed by Myllymäki et al. (in press) to check the robustness of the analytical estimate of the value of β in Eq. 8 (and that of the resulting value of z β used in Eq. 4) with respect to the independence assumption and the influence of the number b of distances r j. We used 1000 simulations of the CSR null model and conducted the test using (1) the pair correlation function, different values of b, and two different values of the spacing of the distance bins (Δr = 1 m and 5 m), and (2) using the L- function and different values of b. Results Fig. 5 shows for the forest data the AGE envelopes S + g (r) and S g (r) (red circles; α = 0.05) together with the analytical pointwise envelopes (blue circles) for the pair correlation function. As it has to be, the AGE envelope strip is clearly wider than the pointwise envelope strip (cf. red and blue circles), since the local significance level β required in Eq. 8 to obtain the prescribed global α for b = 50 is smaller than the β assumed for the pointwise envelope test. Eq. 8 predicts for b = 50 and α = 0.05 a value β = , 11

12 Table 2. Values of k required to obtain a prescribed significance level of α = 0.05 of the two- sided SBGE test (Eqs. 8 and 10) in dependence on the number b of distance bins over which the test is conducted and the number s of simulations of the null model. b β k required to obtain the prescribed α 0.05 s = 39 s = 199 s = 999 s = 1999 s = Note: Suitable values of k do not exist for all values of b and s because k has to be an integer. The values of k presented in the table yield values of α between and Value of b assumed in the pointwise test. which corresponds to a critical value of z α = compared to the pointwise z β = However, we also find that the AGE envelopes are in excellent agreement with the corresponding simulation envelopes (gray lines in Fig. 5). Additionally, because Eq. 4 contains the variance σ g (r) of the summary function under the null model, we can also assess the influence of n, h, A, U, and r on the width of the envelope strip (see discussion after Eq. 5). Finally, we note that for g(r) the envelopes that belong to the null model can be determined for many point processes without simulation (Fig. S4 in Appendix S1) if the theoretical g(r) is known. Otherwise the SBGE test or the variant of the AGE test that uses simulations to determine the mean S(r) and the standard deviation σ S (r) of the summary function S(r) is recommended. The influence of b on the width of the envelope strip Eq. 8 predicts the values of β that are required in the AGE test for a given number b of distance bins r j to obtain the prescribed significance level α (gray line of Fig. 6a). These values of β apply for all summary functions and null models as long as the Ŝ i (r) are independent for the b different distance bins r j used to cover the distance interval B. Eq. 8 allows us also to explore the dependence of the critical value z β of the AGE test (Eq. 4) on b. This is of interest because the width of the envelope strip is proportional to z β (see Eq. 4). For each value of β resulting from Eq. 8 we determined the corresponding critical value z β of the standard normal distribution. The gray line in Fig. 6c shows how z β depends on the number b of distance bins r j. We find that z β increases for α = 0.05 in good approximation as ln(b) with b (Fig. 6c), thus showing a fast increase for smaller values of b and a slower increase for larger values of b. These values of z β apply if the summary functions Ŝ i (r) are independent for different r and normally distributed for fixed r. 12

13 test, but weak departures where the empirical summary function wanders just slightly outside the pointwise envelopes will not be confirmed. Fig. 5. Different simulation envelopes for S. turbacensis. Empirical pair correlation function ĝ(r) (black circles), pointwise analytical envelopes S + p (r) and S p (r) for α = 0.05 (blue circles), expectation under complete spatial randomness (CSR) (horizontal black line), and analytical global envelopes S + g (r) and S g (r) (red circles) taken over the m distance interval with a distance bin of 1 m (i.e., b = 50). The corresponding simulation envelopes based on 999 simulations of the CSR null model are shown as gray lines behind the corresponding analytical envelopes. The bandwidth was h = 2.5 m. The analytical relationship between z β and b is the heart of the AGE and the SBGE test because it defines the envelopes in Eq. 4 and has practical consequences. First, because the abscissa in Fig. 6c has a logarithmic scale, we find the rule of thumb that z β (and thereby the width of the envelope strip) increases in good approximation with the logarithm of b if we increase the number b of distance bins r j. Therefore, a priori information on the scales where departures from the null hypothesis are expected are useful because they can reduce b and therefore increase the power of the AGE and the SBGE test. Second, because the critical value z β of the AGE and the SBGE test must coincide for b = 1 with the z β of the pointwise envelope test, we can assess the error made by the pointwise test. For example, when b = 20 distance bins are used the envelope strip increases 50% in width from the pointwise envelopes to the global envelopes (because z β increases from 1.96 to 3.02 for α = 0.05). Strong departures of the empirical summary function from the pointwise envelopes will therefore generally be confirmed by the AGE Evaluation of the AGE test The global significance level α=0.052 estimated from 10,000 replicates of the AGE test under CSR is in very good agreement with the theoretical value α = In Fig. 6a we used a simulation procedure for different numbers b of distance bins r j to test the predictions for β made with Eq. 8. Because we used in the simulations a bandwidth h = 2.5 m we first evaluated the pair correlation function at b distance bins r j = 2.5 m, 7.5 m, (i.e., r j r j 1 = 2 h) to ensure independence between g(r j ) and all g(r j + Δr). Simulations show that the resulting β and z β values for the pair correlation function (red circles) agree indeed well with the predictions (gray line) (Fig. 6a, c). Interestingly, the predictions are still good even for r j - r j-1 = 1 m (black circles). Thus, the AGE test is robust against some correlations at short lag distances (Fig. 6a, c). In contrast, the cumulative nature of the L- function leads to strong correlations over many distances (Fig. 4b, e), which leads to substantially higher values of β (Fig. 6b) and substantially lower values of z β (Fig. 6d; see also Loop and McClure 2015). For this reason we recommend to use the noncumulative pair correlation function (instead of the cumulative L- function) as summary function because then it is possible to determine β and z β analytically and to construct a test with theoretically well- understood properties. Discussion This article presents a simple and elegant solution of a long- standing problem in point process statistics: the construction of envelopes with prescribed significance level α for goodness- of- fit tests. These envelopes have the desired property that the null model is rejected with the prescribed global significance level α if the empirical summary function wanders at some distance r outside the envelopes. Additionally, we show that these envelopes can be determined without simulation. We obtained this result not based on simulations (as Ripley s CSR test, Ripley 1988), but by means of mathematical reasoning, combining central limit theorems, variance approximations and 13

14 (a) (b) (c) (d) Fig. 6. Factors influencing the local significance level β of the pointwise test (Eq. 8) and the corresponding critical value z β required to obtain the prescribed significance level α = 0.05 of the Analytical Global Envelope test. (a) Analytical estimates of β (Eq. 8) in dependence on the number b of distance bins r j used for the tested interval (gray lines) and simulation results for g(r) based on 1000 simulations of complete spatial randomness (black and red circles). We evaluated the pair correlation function at 1 m steps r j = 1.5, 2.5, (black circles) or at 5 m steps r j = 2.5, 7.5, (red circles). The bandwidth was h = 2.5 m. The horizontal line gives the significance level β = 0.05 of the pointwise envelope test. (b) Same as (a), but for the L- function. (c) Same as (a) but for the critical value z β. The black line shows the fit z β (b) = ln(b) and the horizontal line gives the critical value z β = 1.96 of the pointwise envelope test. (d) Same as (c), but for L- function. empirical knowledge on independence of pair correlation function estimators for different distances r j. Simulations showed a posteriori that our calculations are correct. However, we also present a SBGE test that is only based on the independence assumption and can be applied in exactly the same way as the traditional pointwise simulation envelopes, only s and k have to be chosen in a new way. The SBGE test requires a smaller ratio k/s, which can be obtained with integer k only with numbers s of simulations of the null model larger than for the pointwise test (Table 2, Eichhorn 2010). The significance- level problem arises due to multiple testing. Fig. 5 is a good demonstration of its effect. The width of the envelope strip of the AGE test with the correct significance level α is clearly larger than that of the pointwise test (by factor ln(b), where b is the number of distance bins over which the test is conducted). Thus, marginal departures of the empirical summary function from the pointwise envelopes will lead in most cases to spurious rejection of the null hypothesis. This result is especially relevant in the light of a recent review by Velázquez et al. (in press) that showed that only 12% of the ecological studies reviewed used some correction for type I error. For example, it can be expected that the CSR hypothesis has been rejected too often. The analytical approach allowed us to derive mathematical formulas for the width of the envelope strip that provide precise information on the role of test characteristics such as the number n of points, the size A and the perimeter length U of the observation window, the number b of distance bins used in the AGE test, as well as other settings of the estimator of the summary function. As we will see below, some of our results call in question practices that are typically encountered in ecological applications. Eqs. 4, 5, 6, 8 and 10 give valuable information on envelope tests to be considered in practical applications (note that points 1 3 apply also for the pointwise envelopes): 14

15 1. The standard deviation of the pair correlation function g(r) (and the L-function) and thereby the width of envelope strips is for a wide range of point processes most strongly determined by the number n of points of the pattern and scales approximately with 1/n (Eq. 5). This casts substantial doubt on the practice in ecological statistics to work with small samples of, say, < 100 points (Velázquez et al., in press). 2. Because the standard deviation of g(r) scales with g(r) n (Eq. 5), we can work for regular patterns with a lower number of points n than for aggregated patterns. If the point number n is low, the variance of g(r) will become large. This effect can be counterbalanced to a certain extend by using a larger bandwidth h. 3. The geometry of the observation window influences the variance of the pair correlation function and therefore the width of the envelope strip. In particular, long and narrow observation windows produce wide envelope strips. The isotropized set covariance γ W (r) that represents the geometry of the observation window (Eq. 5) allows us to directly assess the strength of this effect. 4. The width of the global envelope strips increases in good approximation logarithmically with the number b of distance bins r j that define the distance interval B over which the AGE or SBGE test is conduced (Fig. 6c), and therefore the probability to detect significant departures from the null model for a specific distance interval declines with increasing b. This means that short intervals B are useful, which can be chosen if a priori information is available on the scales where departures from the null hypothesis are expected. Thus, the distances between bins should be adjusted to obtain approximate independence between the corresponding values of the pair correlation function. We notice that Eq. 5 predicts the variance of g(r) very well for highly regular to moderately aggregated point processes (Fig. S4 in Appendix S1). One can therefore apply the AGE test for a wide range of point processes relevant in ecology. We also notice that the problem of choosing a suitable number s of simulations of the null model vanishes since our AGE test is simulation- free. This problem has been discussed for example in Loosmore and Ford (2006) and Grabarnik et al. (2011), and a recent literature review by Velázquez et al. (in press: Fig. 5f) revealed that the values of s ranged in methodological studies from as low as 19 or 39 (e.g., Baddeley et al. 2014) to as large as 10,000 (Goreaud and Pélissier 2003). Assumptions and caveats of our approach The AGE test can be applied in all cases where the distribution of the unbiased estimator of the summary function S(r) satisfies the three fundamental properties: asymptotic normality, approximate independence for r j with sufficient spacing and possibility to approximate the estimation variance. We believe that this may be shown also for cases where S(r) is a probability density function of, say, the nearest neighbor distance distribution function D(r) [or G(r)] and spherical contact distribution function H s (r) [or F(r)]. Otherwise, the simulation- based SBGE test can be applied. While the asymptotic normality of the summary function under the null model will hold in a wide range of cases, there are nevertheless situations where the distribution of the summary function under the null model is not symmetric. This happens for example for the pair correlation function g(r) if the number n of points is low and the distance r is small (see Appendix S2). Other cases where the distribution of Ŝ(r) departs from normality may occur for the nearest neighbor summary functions D(r) and H s (r). In such situations the simulation- based SBGE test can help because it does not make assumptions on the underlying distribution of Ŝ(r). The AGE test and its associated p- values are somewhat sensitive to user- defined settings such as the number b of distance bins. This may create a temptation to use researcher degrees of freedom, i.e., post hoc tinkering with test parameters to obtain a more pleasing estimate of the p- value. While this is true and may bear the danger of misuse, the p- value, if appropriately used, provides nevertheless additional important information for the evaluation of the test. The graphical display of the AGE test (as Fig. 5) shows clearly if the empirical summary function wanders only marginally outside the simulation envelope, a case where the ecological significance 15

Testing of mark independence for marked point patterns

9th SSIAB Workshop, Avignon - May 9-11, 2012 Testing of mark independence for marked point patterns Mari Myllymäki Department of Biomedical Engineering and Computational Science Aalto University mari.myllymaki@aalto.fi