Christophe Ange Napoléon Biscio and Rasmus Waagepetersen

Size: px

Start display at page:

Download "Christophe Ange Napoléon Biscio and Rasmus Waagepetersen"

Eric Hensley
5 years ago
Views:

CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING 2018 www.csgb.

1 CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING RESEARCH REPORT Christophe Ange Napoléon Biscio and Rasmus Waagepetersen A general central limit theorem and subsampling variance estimator for α-mixing point processes No. 05, April 2018

2 A general central limit theorem and subsampling variance estimator for α-mixing point processes Christophe Ange Napoléon Biscio and Rasmus Waagepetersen Department of Mathematical Sciences, Aalborg University, Denmark Abstract We establish a central limit theorem for multivariate summary statistics of non-stationary α-mixing spatial point processes and a subsampling estimator of the covariance matrix of such statistics. The central limit theorem is crucial for establishing asymptotic properties of estimators in statistics for spatial point processes. The covariance matrix subsampling estimator is flexible and model free. It is needed e.g. to construct confidence intervals and ellipsoids based on asymptotic normality of estimators. We also provide a simulation study investigating an application of our results to estimating functions. Keywords: α-mixing, central limit theorem, estimating function, random field, spatial point process, subsampling, summary statistics. 1 Introduction Let X denote a spatial point process on R d observed on some bounded window W R d. In statistics for spatial point processes, much interest is focused on possibly multivariate summary statistics or estimating functions T W X) of the form T W X) = u 1,...,u p X W hu 1,..., u p ) 1.1) where h : R p R q, p, q 1, and the signifies that summation is over pairwise distinct points. Central limit theorems for such statistics have usually been developed using either of the two following approaches, both based on assumptions of α-mixing. One approach uses Bernstein s blocking technique and a telescoping argument that goes back to Ibragimov and Linnik 1971, Chapter 18, Section 4). This approach has been used in a number of papers like Guan and Sherman 2007), Guan and Loh 2007), Proke sová and Jensen 2013), Guan et al. 2015), and Xu et al. 2018). The other approach is due to Bolthausen 1982) who considered stationary random fields and whose proof was later generalised to non-stationary random fields by Guyon 1995) and Karácsony 2006). This approach is e.g. used in Waagepetersen and Guan 2009), Coeurjolly and Møller 2014), Biscio and Coeurjolly 2016), Coeurjolly 2017), 1

3 and Poinas et al. 2017). Regarding the point process references mentioned above, it is characteristic that essentially the same central limit theorems are re-)invented again and again for each specific setting and statistic considered. We therefore find it useful to provide a unified framework to state, once and for all, a central limit theorem under general non-stationary settings for multivariate point process statistics T W X) admitting certain additive decompositions. We believe this can save a lot of work and tedious repetitions in future applications of α-mixing point processes. The framework of α-mixing is general and easily applicable to e.g. Cox and cluster point processes and a wide class of determinantal point processes DPPs) Poinas et al., 2017). For certain model classes other approaches may be more relevant. For Gibbs processes it is often convenient to apply central limits for conditionally centered random fields Jensen and Künsch, 1994; Coeurjolly and Lavancier, 2017) while Heinrich 1992) developed a central limit theorem specifically for the case of Poisson cluster point processes using their strong independence properties. Consider for example 1.1) and assume that {Cl)} l L forms a disjoint partitioning of R d. Then we can decompose T W X) as T W X) = l L f l,w X) 1.2) with f l,w X) = u 1 X Cl) W u 2,...,u p X W )\{u 1 } hu 1,..., u p ). Thus T W X) can be viewed as a sum of the variables in a discrete index set random field {f l,w X)} l L. This is covered by our set-up provided h satisfies certain finite range conditions, see the following sections for details. In connection to the Bolthausen approach, we remark that Guyon 1995) does not cover the case where the function f in 1.2) depends on the observation window. This kind of generalisation is e.g. needed in Jalilian et al. 2017). By considering triangular arrays, Karácsony 2006) is more general than Guyon 1995), but Karácsony 2006) on the other hand considers a combination of increasing domain and infill asymptotics that is not so natural in a spatial point process framework. Moreover, the results in Guyon 1995) and Karácsony 2006) are not applicable to non-parametric kernel estimators depending on a band width converging to zero. Using Bolthausen s approach, we establish a central limit theorem that does not have these limitations. For completeness we also provide in the supplementary material a central limit theorem based on Bernstein s blocking technique and we discuss why its conditions may be more restrictive than those for our central limit theorem. A common problem regarding application of central limit theorems is that the variance of the asymptotic distribution is intractable or difficult to compute. However, knowledge of the variance is needed for instance to assess the efficiency of an estimator or to construct confidence intervals and ellipsoids. Bootstrap and subsampling methods for estimation of the variance of statistics of random fields have been studied in e.g. Politis and Romano 1994) and Lahiri 2003). For statistics of points processes, these methods have been considered in e.g. Guan and Sherman 2007), Guan and Loh 2007), Loh 2010) and Mattfeldt et al. 2013) but they have been limited to stationary or second-order intensity reweighted stationary point processes in R 2 and 2

4 only for estimators of the intensity and Ripley s K-function. For general statistics of the form 1.2), we adapt results from Sherman 1996) and Ekström 2008) to propose a subsampling estimator of the variance. We establish its asymptotic properties in the framework of a possibly non-stationary α-mixing point process and discuss its application to estimate the variance of point process estimating functions. The good performance of our subsampling estimator is illustrated in a simulation study considering coverage of approximate confidence intervals when estimates of intensity function parameters are obtained by composite likelihood. In Section 2 we define notation and the different α-mixing conditions used in our paper. Section 3 states the central limit theorem based on Bolthausen s technique and the subsampling estimator is described in Section 4. Application of our subsampling estimator to estimating functions is discussed in Section 5 and is illustrated in a simulation study in Section 6. Finally, our subsampling estimator is discussed in relation to other approaches in Section 7. The proofs of our results are presented in the Appendix. A discussion on Bernstein s blocking technique approach, technical lemmas, and some extensive technical derivations are provided in the supplementary material. 2 Mixing spatial point processes and random fields For d N = {1, 2,...}, we define a random point process X on R d as a random locally finite subset of R d and refer to Daley and Vere-Jones 2003) and Daley and Vere-Jones 2008) for measure theoretical details. We define a lattice L as a countable subset of Z d where Z = N {0, 1, 2,...}. When considering vertices of a lattice, we use bold letter, for instance i Z d. We define Reusing notation we also define dx, y) = max{ x i y i : 1 i d}, x, y R d. da, B) = inf{dx, y) : x A, y B}, A, B R d. For a subset A R d we denote by A the cardinality or Lebesgue measure of A. The meaning of and d, ) will be clear from the context. Moreover, for R 0, we define A R = {x R d : inf y A dx, y) R}. The α-mixing coefficient of two random variables X and Y is αx, Y ) = ασx), σy )) = sup{ P A B) P A)P B) : A σx), B σy )}, where σx) and σy ) are the σ-algebras generated by X and Y, respectively. This definition extends to random fields on a lattice and point processes as follows. The α-mixing coefficient of a random field {Zl)} l L on a lattice L and a point process X are given for m, c 1, c 2 0 by α Z c 1,c 2 m) = sup{ασzl) : l I 1 )), σzk) : k I 2 ))) : I 1 L, I 2 L, I 1 c 1, I 2 c 2, di 1, I 2 ) m} 3

5 and α X c 1,c 2 m) = sup{ασx E 1 ), σx E 2 )) : E 1 R d, E 2 R d, E 1 c 1, E 2 c 2, de 1, E 2 ) m}. Note that the definition of αc X 1,c 2 differs from the usual definition in spatial statistics, see e.g. Waagepetersen and Guan 2009), by the use of d, ) in place of the Euclidean norm. This choice has been made to ease the proofs and makes no substantial difference since all the norms in R d are equivalent. For a matrix M we use the Frobenius norm M = i,j Mi,j) 2 1/2. 3 Central limit theorem based on Bolthausen s approach We consider a sequence of statistics T Wn X) where {W n } n N is a sequence of increasing compact observation windows that verify H1) W 1 W 2... and l=1 W l =. Note that we do not assume that each W i is convex and that i=1w i = R d as it is usually the case in spatial statistics, see e.g. Waagepetersen and Guan 2009) or Biscio and Lavancier 2017). We assume that T Wn X) can be additively decomposed as T Wn X) = l D nw n) f n,l,wn X) 3.1) where for n, q N, D n is a finite index set defined below, and f n,l,wn is a function on the sample space of X to R q. We assume that f n,l,wn X) depends on X only through X W n Cn R l) for some R 0, where C n l) is a hyper cube of side length s n > 0, d C n l) = l j s n /2, l j + s n /2], l s n Z d, 3.2) j=1 and Cn R l) = C n l) R. Thus the C n l), l s n Z d, form a disjoint partition of R d. We denote by v n = Cn R l) the common volume of the Cn R l) and D n A) is defined for any A R d by D n A) = {l s n Z d : C n l) A }. 3.3) For brevity, we write D n in place of D n W n ). Then W n is the disjoint union of C n l) W n, l D n. For n N and l Z d let for ease of notation Z n l) = f n,l,wn X) and consider the following assumptions. H2) There exists 0 η < 1 such that s n = W n η/d, and if η > 0, D n = O W n /s d n). Further, there exists ɛ > 0 such that sup n N α X 2v n, s) = O1/s d+ɛ ). H3) There exists τ > 2d/ɛ such that sup sup E Z n l) EZ n l) 2+τ <. n N l D n 4

6 ) H4) We have 0 < lim inf λ Σn min D n, where Σn = Var T Wn X) and λ min M) denotes the smallest eigen value of a symmetrix matrix M. We then obtain the following theorem. Theorem 3.1. Let {T Wn X)} n N be a sequence of q-dimensional statistics of the form 3.1). If H1) H4) hold, then we have the convergence Σ 1 2 n TWn X) ET Wn X) ) distr. N 0, I q) where Σ n = Var T Wn X), and I q is the identity matrix. Remark 3.2. The existence of Σ 1 2 n for n large enough is ensured by H4). Remark 3.3. In many applications we can simply take η = 0 so that s n = 1. In that case, we do not require further assumptions on D n. However, in applications dealing with kernel estimators depending on a bandwidth h n tending towards 0, we may have Var T Wn X) of the order W n h d n e.g. Heinrich and Klein, 2014). Then, H4) can be fulfilled if s n = 1/h n and η > 0 so that by H2), D n is also of the order W n /s d n = W n h d n. Remark 3.4. For a point process, moments are calculated using so-called joint intensity functions. To verify H3) it often suffices to assume boundedness of the joint intensities up to order 22 + τ ). Remark 3.5. As presented in Section S1, the convergence in Theorem 3.1 can be proved under different assumptions using Bernstein s blocking technique. However, as explained in Section S2, assumptions on the observation windows and on the asymptotic variance of T Wn X) are more restrictive when working with Bernstein s blocking technique than with Bolthausen s approach. 4 Subsampling variance estimator By Theorem 3.1, for α 0, 1), we may establish an asymptotic 1 α confidence ellipsoid for ET Wn X) using the 1 α quantile q 1 α of the χ 2 q) distribution, i.e. where EX) = D n 1 T Wn X) ET Wn X) ) T P EX) q 1 α ) 1 α 4.1) ) 1 Σn TWnX) ET WnX) ). D n The matrix Σ n is usually not known in practice. Thus we suggest to replace Σ n / D n by a subsampling estimate, adapting results from Sherman 1996) and Ekström 2008) to establish the consistency of the subsampling estimator. The setting and notation are as in Section 3 except that we only consider rectangular windows W n so that H1) is replaced with the following assumption. 5

7 S0) We let {m n } n N be a sequence in N d such that the rectangles defined by W n = d j=1 m n,j /2, m n,j /2) verifies W 1 W 2 and n=1 W n =. Let {k n } n N be a sequence in N d, consider for t Z d the overlapping) subrectangles d B kn,t = t j k n,j /2, t j + k n,j /2), 4.2) and define T kn,n = {t Z d : B kn,t W n }. We want to estimate j=1 ς n = VarT W n X)) D n = Σ n D n where T Wn X) is as in 3.1). We suggest the subsampling estimator ˆς n = 1 T B kn,t X) Dn B 1 kn,t) s T kn,n T X) Bkn,s Dn B kn,s) To establish consistency of ˆς n we consider the following assumptions ) S1) For j = 1,..., d, k n,j < m n,j. There is at least one j such that m n,j goes to infinity. If m n,j as n, so does k n,j and k n,j /m n,j 0 as n. If m n,j converges to a constant, then k n,j converges to a constant less than or equal to the previous constant. Moreover, max i k d n,i)/ d i=1 m n,i k n,i ) converges towards 0 as n tends to infinity. S2) For some ɛ > 0, sup n N sup l Tkn,n E Z n l) EZ n l)) 4+ɛ ) < +. S3) We have D n 1 Σ n 1 D n B kn,t) 1 VarT Bkn,t X)) 0 as n and lim sup λ max Σ n ) < where λ max M) denotes the maximal eigen value of a symmetric matrix M. S4) 1 {ET Bkn,t X)) E 1 s T kn,n T Bkn,s X))} 2 n. 0 as S5) There exists c > 0 and δ > 0, such that sup p N α X p,pm)/p c/m d+δ and, for v n as below 3.2), v n dj=1 2k n,j + 1)/max i k n,i s n ) d+δ converges towards 0 as n tends to infinity. S6) There exists c, δ > 0 and ɛ > ɛ > 0 such that α5v X 6+ɛ 5d n,5v n r) c r ɛ δ. Theorem 4.1. Let {T Wn X)} n N be a sequence of q-dimensional statistics of the form 3.1). Let further ˆς n be defined as in 4.3) and assume that S0) S6) hold. Then we have the convergence E ˆς n Σ 2 n D n 0. For practical application, it is enough to state Theorem 4.1 with convergence in probability but the proof is easier when considering mean square convergence. Assumption S1) ensures that the sub-rectangles are large enough to mimic the 6

8 behaviour of the point process on W n while at the same time their number grows to infinity. Assumption S2) looks stronger than H3) for Theorem 3.1. However, in H3) note that τ depends on the mixing properties of the process controlled by H2). Thus, depending on the mixing properties, S2) is not much stronger than H3). Assumption S3) should hold for any process that is not too exotic and ensures that the variance of the studied statistic on each sub-rectangle is not too different from Σ n. In particular, it holds naturally if there exists a matrix Σ such that lim D n A) 1 VarT A X)) = Σ, A = W n or A = B kn,t, with t T kn,n. The condition S4) is needed to control that the expectations over sub-rectangles B kn,t do not vary too much. For instance, this assumption is automatically verified if the point process X is stationary or if the statistics 1.2) are centred so that ET Wn X) = ET Bkn,t X) = 0, for t T kn,n, see Section 5. Moreover, depending on the statistic 1.2), this assumption may also be verified if X is second-order intensity reweighted stationary as assumed for the bootstrap method developed by Loh 2010). Note that S5) includes a condition on the size of the B kn,t that holds trivially if s n = 1, which is usually the case if we do not consider non-parametric kernel estimators. We use two different α-mixing conditions S5) S6) to apply Theorem 4.1. Moreover, the decreasing rate in S6) is restrictive due to the constant 5d. Hence, mixing conditions are stronger than for Theorem 3.1. However, in the proof of Theorem 4.1, assumption S6) is used only to verify C.4) which ensures the validity of the assumption i) of Theorem C.1. Depending on the problem considered, C.4) may be verified without additional constraints on the α-mixing coefficient. For example, if we are in the setting of Biscio and Lavancier 2016, Section 4.1) where in particular X is a stationary determinantal point process, then C.4) is an immediate consequence of Biscio and Lavancier 2016, Proposition 4.2). Remark 4.2. By Theorems 3.1 and 4.1, we may replace the confidence ellipsoid in 4.1) by a subsampling confidence ellipsoid Ên, i.e. where P Ê n X) q 1 α ) 1 α 4.4) Ê n X) = D n 1 T Wn X) ET Wn X) ) T ˆς 1 n TWn X) ET Wn X) ). Remark 4.3. Although the size of the B kn,t is controlled by assumptions S1) and S5), we have not addressed the issue of finding their optimal size, i.e. the one ensuring the fastest convergence rate in Theorem 4.1. Concerning that problem, there are several recommendations in the literature, see for instance Lahiri 2003). Remark 4.4. Note that the centers of the B kn,t are chosen to be a subset of Z d but any other fixed lattice could be used as well. Further, similarly to Loh 2010) and Ekström 2008), it is possible to extend Theorem 4.1 by relaxing the assumption in S0) that the windows are rectangular. 5 Variance estimation for estimating functions Consider a parametric family of point processes {X θ : θ Θ} for a non-empty subset Θ R q, q N. We further assume that we observe a realisation of X θ0 for a θ 0 Θ. 7

9 To estimate θ it is common to use estimating functions of the form e n θ) = h θ u 1,..., u p ) u 1,...,u p X θ0 W n W p n h θ u 1,..., u p )ρ p) θ u 1,..., u p )du 1... du p, 5.1) where h θ is a function from R dp into R q, and ρ p) θ denotes the p-th order joint intensities of X θ. Then an estimate θ n of θ 0 is obtained by solving e n θ) = 0. The case p = 1 is relevant if interest is focused on estimation of the intensity function λ θ u) = ρ 1) θ. Several papers have discussed choices of h and studied asymptotic properties of θ n for the case p = 1, see for instance Waagepetersen 2007), Guan and Shen 2010), and Guan et al. 2015). A popular and simple choice is h θ u) = θ λ θ u)/λ θ u) where θ denotes the gradient with respect to θ. In this case e n can be viewed as the score of a composite likelihood. In the references aforementioned, the asymptotic results are of the form W n 1/2 Sn 1 θ 0 ) Σ n,θ 0 W n S 1 n θ 0 ) ) 1/2 θ n θ 0 ) distr. N 0, I q), 5.2) where S n θ 0 ) = W n 1 E de n θ 0 )/dθ T ) and Σ n,θ0 = Var e n θ 0 ). The matrix Σ n,θ0 is crucial but usually unknown. To estimate Σ n,θ0, a bootstrap method was proposed in Guan and Loh 2007) under several mild mixing and moment conditions. However, their method has been established only for second-order intensity reweighted stationary point processes on R 2 when p = 1 and for a specific function h. Using the theory established in Section 4, we propose a subsampling estimator of Σ n,θ0 that may be used in a more general setting but under slightly stronger mixing conditions. Following the notation in Sections 3 and 4, for θ Θ, we let T Wn,θX θ0 ) = e n θ) and Z n,θ l) = u 1 X θ0 Cl) W n u 2,...,u p X θ0 W n)\{u 1 } Cl) W n W p 1 n h θ u 1,..., u p ) h θ u 1,..., u p )ρ p) θ u 1,..., u p )du 1... du p. Further ˆς n θ) is defined as in 4.3) but now stressing the dependence on θ. In practice, if W n / D n 1, we estimate Σ n,θ0 / W n by ˆς n θ n ). The validity of this relies in a standard way on a Taylor expansion ˆς n θ n ) = ˆς n θ 0 ) + d dθ ˆς nθ ) θ n θ 0 ) where θ θ 0 θ n θ 0 and one needs to check that dˆς n θ )/dθ is bounded in probability. We illustrate with our simulation study in the next section, the applicability of ˆς n θ n ) to estimate Σ n,θ0 / W n. 6 Simulation study To assess the performance of our subsampling estimator, we estimate by simulation the coverage achieved by asymptotic 95% confidence intervals when considering 8

10 intensity estimation by composite likelihood as discussed in the previous section. The confidence intervals are obtained in the standard way using the asymptotic normality 5.2) and replacing Σ n,θ0 / W n by our subsampling estimator. When computing ˆς n, the user must specify the shape, the size, and the possible overlapping of the sub-rectangles blocks) used for the subsampling estimator. For simplicity we assume that W n = [0, n] 2 R 2 and use square blocks. We denote by b l the sidelength of the blocks and by κ the maximal proportion of overlap possible between two blocks. The block centres are located on a grid W n h n,κ Z 2 ) + h n,κ 1/2, 1/2) where h n,κ is chosen such that κ is the ratio between the area of the overlap of two contiguous blocks located at h n,κ 1/2, 1/2) and h n,κ 1/2, 1/2 + 1), and the area of one block. For instance, for W 1 = [0, 1] 2, b l = 0.5 and κ = 0.5, the centers of the blocks completely included in W 1 are 0.25, 0.25); 0.5, 0.25); 0.75, 0.25); 0.25, 0.5),... and so on until 0.75, 0.75). The simulations have been done for every possible combination between n = 1, 2, 3, b l = 0.2, 0.5, κ = 0, 0.5, 0.75, 0.875, and the four following point process models: a non-stationary Poisson point process, two different non-stationary log-gaussian Cox processes LGCPs), and a non-stationary determinantal point process DPP). For a presentation of these models, we refer to Baddeley et al. 2015). For each point process simulation, the intensity is driven by a realisation Z 1 of a zero mean Gaussian random field with exponential covariance function scale parameter 0.5 and variance 0.1). As specified below, we have for each realisation of Z 1 chosen the parameters so that the average number of points on W n is 100 W n 100, 400 or 900). For the non-stationary Poisson point processes we use the intensity function λ n x) = exp θ 0,n + Z 1 x) ) 6.1) where θ 0,n = log100 W n ) log W n expz 1 x))dx. For the two LGCPs, the random intensity functions are of the form Λ n x) = exp θ 0,n + Z 1 x) + Z 2 x) ) 6.2) where θ 0,n = log100 W n ) VarZ 2 0))/2 log W n expz 1 x))dx, and where Z 2 is a zero mean Gaussian random field independent of Z 1 and also with exponential covariance function scale parameter 0.05, and variance 0.25 for one LGCP and 1 for the other). The non-stationary DPP has been simulated by: First simulating a stationary DPP using the Gaussian kernel ) x y 2 C n x, y) = λ n,dom exp, β where β 0.04 and λ n,dom = 100 W n / W n expz 1 x) max x Wn Z 1 x))dx; Second, applying an independent thinning with probability λ nx) = exp Z 1 x) max x Wn Z 1 x) ), x W n, of retaining af point. Specifically, β actually equals 1/ πλ n,dom and corresponds to the most repulsive Gaussian DPP according to Lavancier et al. 2014). Following Appendix A in Lavancier et al. 2014), the result is then a realisation of a non-stationary DPP with kernel ) C nx, x y) = λ nx)λ y 2 ny)λ n,dom exp. 6.3) β 9

11 The intensity of the DPP is given by λ n x) = C nx, x) = λ n,dom λ nx). Realisations of the DPP and each LGCP are plotted in Figure 1 along with the corresponding pair correlation functions defined by gr) = λ 2) u, v)/λu)λv) where r = u v, and λ, λ 2) denote the intensity and second order product density, respectively. Note that in the DPP case, the pair correlation function depends on the realisation of Z 1 via β. In Figure 1 the DPP pair correlation function is plotted with β = For each of the models, the intensity function is of log-linear form λ θ x) = expθ 0 +θ 1 Z 1 x)), x R 2 where Z 1 is considered as a known covariate. The parameter θ = θ 0, θ 1 ) is estimated using composite likelihood which is implemented in the R-package spatstat Baddeley et al., 2015) procedure ppm. The true value of θ 1 is one while the true value of θ 0 depends on the window W n and the realization of Z 1. For each combination of n, b l, κ and point process type we apply the estimation procedure to 5000 simulations and compute the estimated coverage of the confidence interval for the parameter θ 1. The results are plotted in Figure 2. The Monte Carlo standard error for the estimated coverages is approximately For each combination of b l and n, the corresponding plot shows the estimated coverage for combinations of κ = 0, 0.5, 0.75, and the four point process models. To aid the visual interpretation, points are connected by line segments. gr) r DPP LGCP var=0.25 LGCP var=1 Figure 1: From left to right, the first three panels show a realisation on [0, 2] 2 of a LGCP with variance parameter 0.25, a LGCP with variance parameter 1, and a DPP. Last panel: a plot of the corresponding theoretical pair correlation functions. Except for the lower left plot, the results seem rather insensitive to the choice of κ in the lower left plot, b l = 0.5 seems to be too large relative to the window W 1 ). From a computational point of view κ = 0 is advantageous and is never outperformed in terms of coverage by other choices of κ. The results are more sensitive to the choice of b l. For the LGCPs we see the anticipated convergence of the estimated coverages to 95% when b l = 0.5 and n is increased but not when b l = 0.2. This suggests that b l = 0.2 is too small for the statistics on blocks to represent the statistic on the windows W 1 W 3 in case of the LGCPs. Among the LGCPs, the coverages are closer to 95% for the LGCP with the lowest variance. For the Poisson process and the DPP, the estimated coverages are very close to 95% both for b l = 0.2 and b l = 0.5 except for the small window W 1. The general impression from the simulation study is that the subsampling method works well when the point patterns are of reasonable size hundreds of points), and the blocks are of appropriate size relative to the observation window. 10

12 Empirical coverage Poisson DPP LGCP var=0.25 LGCP var=1 W n = [0,1] 2 blocks length = 0.2 Empirical coverage Poisson DPP LGCP var=0.25 LGCP var=1 W n = [0,2] 2 blocks length = 0.2 Empirical coverage Poisson DPP LGCP var=0.25 LGCP var=1 W n = [0,3] 2 blocks length = κ κ κ Empirical coverage Poisson DPP LGCP var=0.25 LGCP var=1 W n = [0,1] 2 blocks length = 0.5 Empirical coverage Poisson DPP LGCP var=0.25 LGCP var=1 W n = [0,2] 2 blocks length = 0.5 Empirical coverage Poisson DPP LGCP var=0.25 LGCP var=1 W n = [0,3] 2 blocks length = κ κ κ Figure 2: Estimated coverages of the confidence intervals for θ 1 when using the subsampling estimator 4.3). Upper row to lower row: b l = 0.2, 0.5. Left column to right column: n = 1, 2, 3. In each plot the estimated coverage is computed for four point process models: non-stationary Poisson point process, DPP, and two LGCPs; and κ = 0, 0.5, 0.75, The lines joining the points just serve to aid visual interpretation. The straight horizontal red line indicates the value Discussion Our simulations have shown that the subsampling estimator may be used to obtain confidence intervals in the framework of intensity estimation by composite likelihood. The results obtained were satisfying with estimated coverages close to the nominal level 95% except for small point patterns and provided a suitable block size was used. These results may be compared with the estimated coverage obtained when using the variance estimate provided by the function vcov.kppm of the R-package spatstat. This function computes an estimate of the asymptotic variance of the composite likelihood estimators by plugging in a parametric estimate of the pair correlation function into the theoretical expression for the covariance matrix following Waagepetersen 2007). Using vcov.kppm for the simulated realisations of LGCPs from Section 6, the estimated coverages of the resulting approximate confidence intervals for the parameter θ 1 ranges from 93% to 96% including results for n = 1 and cases with a misspecified parametric model for the pair correlation function). Thus, the results are closer to the nominal level of the confidence interval than for the subsampling estimator. On the other hand, the subsampling estimator is much more flexible as it is model free and may be applied to any statistic of the form 3.1), in any dimension. We have also compared our subsampling estimator with the thinned block bootstrap estimator proposed in Guan and Loh 2007) and got very similar results within the simulation study settings of that paper. This is to be expected given the similari- 11

13 ties of the methods. However, the method in Guan and Loh 2007) requires that it is possible to thin the point process into a second-order stationary point process. Acknowledgements Christophe A.N. Biscio and Rasmus Waagepetersen are supported by The Danish Council for Independent Research Natural Sciences, grant DFF Statistics for point processes in space and beyond, and by the Centre for Stochastic Geometry and Advanced Bioimaging, funded by grant 8721 from the Villum Foundation. References Adrian Baddeley, Ege Rubak, and Rolf Turner. Spatial Point Patterns: Methodology and Applications with R. Chapman and Hall/CRC Press, URL crcpress.com/spatial-point-patterns-methodology-and-applications-with- R/Baddeley-Rubak-Turner/p/book/ /. Patrick Billingsley. Probability and measure. John Wiley & Sons, New York-Chichester- Brisbane, Wiley Series in Probability and Mathematical Statistics. Christophe Ange Napoléon Biscio and Jean-François Coeurjolly. Standard and robust intensity parameter estimation for stationary determinantal point processes. Spatial Statistics, 18:24 39, Christophe Ange Napoléon Biscio and Frédéric Lavancier. Brillinger mixing of determinantal point processes and statistical applications. Electronic Journal of Statistics, pages , Christophe Ange Napoléon Biscio and Frédéric Lavancier. Contrast estimation for parametric stationary determinantal point processes. Scandinavian Journal of Statistics, 44: , /sjos Christophe Ange Napoléon Biscio, Arnaud Poinas, and Rasmus Waagepetersen. A note on gaps in proofs of central limit theorems. Statistics and Probability Letters, Erwin Bolthausen. On the central limit theorem for stationary mixing random fields. The Annals of Probability, 104): , Kai Lai Chung. A Course in Probability Theory. Academic press, Jean-François Coeurjolly. Median-based estimation of the intensity of a spatial point process. Annals of the Institute of Statistical Mathematics, 69: , Jean-François Coeurjolly and Frédéric Lavancier. Parametric estimation of pairwise Gibbs point processes with infinite range interaction. Bernoulli, 23: , Jean-François Coeurjolly and Jesper Møller. Variational approach for spatial point process intensity estimation. Bernoulli, 203): , doi: /13-BEJ516. URL 12

14 Daryl J. Daley and David Vere-Jones. An Introduction to the Theory of Point Processes. Volume I, Elementary Theory and Methods. Springer-Verlag, New York, second edition, Daryl J. Daley and David Vere-Jones. An Introduction to the Theory of Point Processes, Vol. II, General Theory and Structure. Springer-Verlag, New York, second edition, Paul Doukhan. Mixing, Properties and Examples, volume 85 of Lecture Notes in Statistics. Springer-Verlag, New York, Magnus Ekström. Subsampling variance estimation for non-stationary spatial lattice data. Scandinavian Journal of Statistics, 35:38 63, ISSN doi: /j x. URL István Fazekas, Alexander Kukush, and Tibor Tómács. On the Rosenthal inequality for mixing fields. Ukrainian Mathematical Journal, 522): , Yongtao Guan and Ji Meng Loh. A thinned block bootstrap variance estimation procedure for inhomogeneous spatial point patterns. Journal of the American Statistical Association, ): , doi: / URL Yongtao Guan and Ye Shen. A weighted estimating equation approach for inhomogeneous spatial point processes. Biometrika, 97: , doi: /biomet/asq043. Yongtao Guan and Michael Sherman. On least squares fitting for stationary spatial point processes. Journal of the Royal Statistical Society: Series B Statistical Methodology), 691):31 49, ISSN doi: /j x. URL Yongtao Guan, Abdollah Jalilian, and Rasmus Waagepetersen. Quasi-likelihood for spatial point processes. Journal of the Royal Statistical Society, Series B, 77: , Xavier Guyon. Random fields on a network. Probability and its Applications. Springer, New York, Lothar Heinrich. Minimum contrast estimates for parameters of spatial ergodic point processes. In Transactions of the 11th Prague Conference on Random Processes, Information Theory and Statistical Decision Functions, volume 479, page 492, Lothar Heinrich and Stella Klein. Central limit theorems for empirical product densities of stationary point processes. Statistical Inference for Stochastic Processes, 172): , Jul Ildar Abdulovich Ibragimov and Yuri Vladimirovich Linnik. Independent and stationary sequences of random variables. Wolters-Noordhoff, Abdollah Jalilian, Yongtao Guan, and Rasmus Waagepetersen. Orthogonal series estimation of the pair correlation function for spatial point processes. Statistica Sinica, accepted. 13

15 Jens Ledet Jensen and Hans R. Künsch. On asymptotic normality of pseudo likelihood estimates for pairwise interaction processes. Annals of the Institute of Statistical Mathematics, 46: , ISSN doi: /BF URL Zsolt Karácsony. A central limit theorem for mixing random fields. Miskolc Mathematical Notes. A Publication of the University of Miskolc, 72): , Soumendra Nath Lahiri. Resampling Methods for Dependent Data. Springer-Verlag, New York, Frédéric Lavancier, Jesper Møller, and Ege Rubak. Determinantal point process models and statistical inference extended version). Technical report, Available on arxiv: , Ji Meng Loh. Bootstrapping an inhomogeneous point process. Journal of Statistical Planning and Inference, 140: , ISSN doi: /j.jspi Eugene Lukacs. Characteristic Functions. Griffin, Torsten Mattfeldt, Henrike Häbel, and Frank Fleischer. Block bootstrap methods for the estimation of the intensity of a spatial point process with confidence bounds. Journal of Microscopy, 251:84 98, ISSN doi: /jmi URL Arnaud Poinas, Bernard Delyon, and Frédéric Lavancier. Mixing properties and central limit theorem for associated point processes. arxiv: , Dimitris N. Politis and Joseph P. Romano. Large sample confidence regions based on subsamples under minimal assumptions. The Annals of Statistics, 22: , doi: /aos/ Michaela Proke sová and Eva Bjørn Vedel Jensen. Asymptotic Palm likelihood theory for stationary point processes. Annals of the Institute of Statistical Mathematics, 65: , Michael Sherman. Variance estimation for statistics computed from spatial lattice data. Journal of the Royal Statistical Society. Series B Methodological), 58: , ISSN URL Rasmus Waagepetersen and Yongtao Guan. Two-step estimation for inhomogeneous spatial point processes. Journal of the Royal Statistical Society. Series B. Statistical Methodology, 713): , Rasmus Plenge Waagepetersen. An estimating function approach to inference for inhomogeneous Neyman Scott processes. Biometrics, 63: , doi: /j x. Ganggang Xu, Rasmus Waagepetersen, and Yongtao Guan. Stochastic quasi-likelihood for case-control point pattern data. Journal of the American Statistical Association, 0 ja):0 0, doi: /

16 The following appendix contains the proofs of Theorems 3.1 and 4.1. The last section of the appendix contains a number of technical lemmas used in the proofs of the main results. Proofs of technical lemmas and some lengthy technical derivations are available in the supplementary material. A Proof of Theorem 3.1 Suppose first that we have verified Theorem 3.1 in the univariate case q = 1. Then, by H4) and Lemma F.3, we may use the extension of the Cramér-Wold device in Lemma F.6 to verify Theorem 3.1 also for q > 1. We thus focus on the case q = 1. The proof of Theorem 3.1 for q = 1 follows quite closely Karácsony 2006) and is based on the following theorem which is proved in Section B. Theorem A.1. Let the situation be as in Theorem 3.1 with q = 1, and assume in addition H b ) Z n l) is uniformly bounded with respect to n N and l D n. Then 1 σ n l D n where σ 2 n = Var l D n Z n l). Zn l) EZ n l) ) distr. N 0, 1) Proof of Theorem 3.1. Define for L > 0 and n N: for l Z d, Z n L) l) = Z n l) 1 Z n l) [ L, L] ), Z L) n for l Z d, l) = Z n l) Z n L) l), X n = 1 σ n l D n Zn l) EZ n l) ), X L) n = 1 σ n l D n Z L) n l) EZ n L) l) ), L) X n = 1 σ n l D ZL) n n l) E By Lemma F.2, we have for s 0, Z L) n l) ). α Z L n 1,1 s n r) α X v n,v n s n r s n 2R). Further, by H1) H2), s n is not decreasing with respect to n so we may find r 0 1 such that for all r r 0 and n N, s n 1 1/r) 2R/r > 0. Combining this with H2), there exist constants c 0, c 1 > 0 so that sup n N r d 1 L) Z n α1,1 rs n ) τ r=1 c 0 + c 1 sup n N c 0 + c 1 sup n N c 0 + c 1 sup n N 2+τ r d 1 rs n s n 2R) d+ɛ)τ 2+τ r=r 0 r=r 0 r d 1 d+ɛ)τ 2+τ s n 1 1 r 0 ) 2R r 0 15 ) ) s n 1 1 2R d+ɛ)τ 2+τ r r ) d+ɛ)τ 2+τ r=r 0 r d 1 d+ɛ)τ 2+τ.

17 By H3), the last expression in the inequality is bounded. We may then adapt Theorem 1 in Fazekas et al. 2000) to the lattice s n Z d and so there exist a constant c 2 > 0 such that E By H3), X L) n ) 2 1 = E σ n l D n ZL) L) n l) E Z n l) ) d 2r + 1) d 1 L) Z n α σn 2 1,1 rs n ) τ c 2 D n σ 2 n sup n N sup n N r=1 sup E ZL) n l) ) 2+τ 2 l D n 2+τ. 2+τ sup E ZL) n l) ) 2+τ 2 2+τ l D n ) l D n L 0. Hence, it follows from the two last equations and H4) that sup E n N E ZL) n l) ) 2+τ 2 2+τ L) X n ) 2 0. A.1) L We denote by σnl) 2 the variance of σ n X n L). Noticing that EXn 2 = 1, we have σ 2 nl) σ 2 n 1 = EX n L) ) 2 EXn 2 L) = EX n X n ) 2 EXn 2 L) = E X n ) 2 2EX XL) n n ). Then, by the Cauchy-Schwarz inequality and A.1), σ 2 sup nl) 1 n N σn 2 L 0. A.2) For n N, we have Ee itxn e t2 2 = E [ L) it X n e 1 ) e itxl) n + e itxl) n e t2 ] 2 E L) e it X n 1 + Ee itxl) n Since for all x R, e ix 1 x, Writing δ L = sup n N σ2 n L) σ 2 n EeitXL) n = L) it X n E e 1 E t e σ2 n L) σn 2 t 2 2 σnl) Eeit σn e σ2 n L) σn 2 t 2 2 X L) n 1 and U n = Un e σ 2 n L) σn 2 t t sup n N σn σ XL) nl) n e σ 2 n L) σn 2 t 2 2 e t2 2. A.3) L) E X n ) 2. A.4), we have sup e tv) 2 EeitvUn 2 v [1 δ L,1+δ L ] 16

18 so by Theorem A.1 and Corollary 1 to Theorem in Lukacs 1970), for L 0, EeitXL) n e σ2 n L) σn 2 t A.5) Moreover, by a first order Taylor expansion with remainder, sup σ 2 n L) e σn 2 n N t 2 2 e t2 2 = e t2 2 sup σ 2 n L) e σn 2 n N 1) t2 2 1 t2 2 δ L + t4 8 expδ t 2 L 2 )δ2 L. A.6) Therefore, by A.3), A.4), A.5), and A.6), lim sup EeitXn e t 2 2 L) t sup E X n ) 2 + t2 n N 2 δ L, which by A.1) and A.2) tends to 0 as L tends to infinity. B Proof of Theorem A.1 For ease of presentation, we assume that the bound in H b ) is 1. Define Y n l) = Z n l) EZ n l), S n = l D n Y n l), a n = i,j D n, di,j) m n E[Y n i)y n j)], S n = 1 an l D n Y n l), S n i) = 1 an j D n, di,j) m n Y n j), where if η = 0, m n = D n 1/2d+ɛ/2) and if η > 0, m n = W n ξ/d with ξ verifying max{η, d1 η)/2d + ɛ))} < ξ < 1 + η)/2. Note that such ξ always exists. Then, by H1) H2), and m n, m n /s n, B.1) D n m d ɛ n 0, B.2) D n m n /s n ) d. B.3) By Lemma F.5, sup n N E S 2 n <. Thus by Lemma 2 in Bolthausen 1982) see also the discussion in Biscio et al., 2017), Theorem A.1 is proved if E[it S n )e it S n ] 0. B.4) Notice that it S n )e it S n = A 1 A 2 A 3, 17

19 where A 1 = ite it S n 1 ) 1 Y n i)y n j), B.5) a n i,j D n di,j) m n A 2 = eit S n an A 3 = 1 an Y n i)1 it S n i) e it Sni) ), i D n Y n i)e it Sn Sni)). i D n B.6) B.7) Hence, B.4) follows from the convergences to zero of A 1, A 2, and A 3 as established in Section S4 in the supplementary material. C Proof of Theorem 4.1 The proof is based on the following result for a random field on a lattice that is proved in Section D. Theorem C.1. For n N, let R n be a random field on Z d, {W n } n N be a sequence of compact sets verifying S0) and, for n N, let {B kn,t : k n N d, t T kn,n} be sub-rectangles defined as in 4.2) and such that S1) holds. For q, n N and t Z d, let further Ψ be a function defined on subsets of the sample space of R n and taking values in R q and let Ψ A = ΨR n l) : l Z d A)), for A = B kn,t or A = W n. We assume that the following assumptions hold: i) { Ψ Bkn,t EΨ Bkn,t 4 : t T kn,n, n N} is uniformly integrable, ii) α Rn b n,b n max i k n,i ) 0 as n, where b n = d j=1 2k n,j + 1), iii) VarΨ Wn ) 1 VarΨ Bkn,t ) 0 as n, iv) 1 T kn,n Let further T kn,n EΨBkn,t ) E s T kn,n Ψ Bkn,s T kn,n )) 2 0 as n. ˆς Rn n = 1 Then, we have the convergence, where K n = VarΨ Wn ). ΨBkn,t) 1 lim E ˆςRn n K n 2 ) = 0 ΨB kn,t) )2. Below, we check the assumptions i) iii) in Theorem C.1 with R n l) = Z n l) and Ψ A = T A X)/ D n A), for A R d. Then, Theorem 4.1 is proved directly by Theorem C.1. 18

20 Assumption i). For l Z d and n N, let Y n l) = Z n l) EZ n l)) such that T Bkn,t X) ET Bkn,t X)) = Y n l). l D nb kn,t ) C.1) Let ɛ be as in S2), then by Lemma F.2 and S6), we have for ɛ < ɛ, α Z n 5,5r) ) ɛ 6+ɛ r 5d 1 α X 5vn,5vn r 1 2R) ) ɛ 6+ɛ r 5d 1 <. C.2) r=1 r=1 Then, by S2) and C.2) we may apply Theorem 1 in Fazekas et al. 2000) which states the existence of c 1 > 0 such that where E l D nb kn,t ) Y n l) Dn B kn,t) 4+ɛ /2 c 1 max{u, V } Y U = E n l) 4+ɛ ) 4+ɛ /2 4+ɛ, l D nb kn,t ) Dn B kn,t) { Y V = E n l) 2+ɛ /2 ) 2 l D nb kn,t ) Dn B kn,t) Further, by S2), there exists a constant c 2 such that and 2+ɛ /2 D n B kn,t) U c 2 D n B = c 2 kn,t) 2+ɛ /4 D n B kn,t) 1+ɛ /4 V c 2 D n B kn,t) D n B kn,t) ) 2+ɛ /4 = c 2+ɛ /4 2. } 2+ɛ /4. C.3) Both of the last upper bounds on U and V are bounded. Therefore, by C.1) and C.3), sup n N sup E T Bkn,t X) E T Bkn,t X) ) Dn B kn,t) which implies i) by 25.13) in Billingsley 1995). Assumption ii). For c, δ as in S5) and v n as below 3.2), we have by Lemma F.2, 4+ɛ /2 < C.4) α Zn b n,b n max i k n,i ) α X b nv n,b nv n max i which by S5) converges towards 0 as n tends to infinity. b n v n k n,i s n 2R) c max i k n,i s n ) d+δ 19

21 Assumptions iii) iv). Assumptions iii) iv) are the same as S3) S4). D Proof of Theorem C.1 The proof of Theorem C.1 is based on several applications of the following Theorem D.1 which states an intermediate result and is proved in Section E. Consequently, these theorems may looks similar at first sight. Theorem D.1. For n N, let R n be a random field on Z d, {W n } n N be a sequence of compact sets verifying S0) and, for n N, let {B kn,t : k n N d, t T kn,n} be sub-rectangles defined as in 4.2) and verifying S1). For q, n N and t Z d, let further h be a function defined on subsets of the sample space of R n, taking values into R q and let h A = hr n l) : l Z d A)) for A = B kn,t or A = W n. We assume that the following assumptions hold: i ) {h 2 B kn,t : t T kn,n, n N} is uniformly integrable, ii ) α Rn b n,b n max i k n,i ) 0 as n, where b n = d j=1 2k n,j + 1), iii ) E h Bkn,t ) T kn,n EhWn ) 0 as n. Then, we have the convergence 1 hbkn,t Eh W n ) ) L 2 0. We now give the proof of Theorem C.1 and to shorten, we define Ψ Bkn = Ψ Bkn,t /. For x = x 1,..., x d ) T R d and M a square matrix in R d R d di=1, we further denote by x = x 2 i and M = i,j Mij 2 the Euclidean norms of x and M, respectively, and by x 2 the matrix xx T. From the statement of Theorem C.1, we have Hence, ˆς Rn n = 1 ΨBkn,t EΨ B kn,t ) + EΨ B kn,t ) EΨ Bkn ) + EΨ Bkn ) Ψ Bkn ) 2. ˆς Rn n = C 1 + C 2 + C 3 + C 4 + C 5 + C 6, D.1) where the terms C 1 C 6 are all q q matrices given below: C 1 = 1 C 2 = 1 C 3 = EΨ Bkn ) Ψ Bkn ) 2, ΨBkn,t EΨ B kn,t )) 2, EΨBkn,t ) EΨ B kn )) 2, 20

22 C 4 = C 5 = C 6 = ΨBkn,t EΨ B kn,t )) EΨ Bkn,t ) EΨ B kn )) T EΨBkn,t ) EΨ B kn )) Ψ Bkn,t EΨ B kn,t )) T, ΨBkn,t EΨ B kn,t )) EΨ Bkn ) Ψ B kn ) T EΨBkn ) Ψ B kn ) ΨBkn,t EΨ B kn,t )) T, EΨBkn,t ) EΨ B kn )) EΨ Bkn ) Ψ B kn ) T EΨBkn ) Ψ B kn The assumption iv), implies directly that C 2 ) EΨBkn,t ) EΨ B kn )) T. 0. D.2) By applying the Cauchy-Schwarz inequality for each sum in C 4, we have E C 4 2 ) 4E C 1 ) C 2. Further, by i) and 25.11) in Billingsley 1995), E C 1 ) is uniformly bounded with respect to n N and t Z d. Thus, by D.2), it follows that We have so C 5 + C 6 = EΨ B kn ) Ψ Bkn + 1 E C 4 2 ) = 2EΨ Bkn ) Ψ Bkn ) 2 0. D.3) ΨBkn,t EΨ B kn )) T ΨBkn,t EΨ B kn )) EΨ Bkn ) Ψ B kn ) T C 3 + C 5 + C 6 = EΨ Bkn ) Ψ Bkn ) 2. D.4) Let Y n = 1 T kn,n Ψ Bkn,t EΨ Bkn,t )) and notice that C 3 + C 5 + C 6 = Yn 2. Using i) ii), we may apply Theorem D.1 with h = Ψ EΨ ) so that E Y n 2 ) 0. D.5) Using i) ii) and iii), we may apply Theorem D.1 with h = Ψ EΨ )) 2. Hence, 1 which may be written as the convergence ΨBkn,t EΨ B )) 2 ΨWn kn,t E EΨ Wn ) ) 2 ) L 2 0 L C 1 K 2 n 0. D.6) 21

23 By Theorem in Chung 2001), D.6) implies that C 1 K n 2 is uniformly integrable with respect to n. Moreover, E C 1 2 ) = E C 1 K n +K n 2 ) E C 1 K n 2 )+ K n 2 and it follows from S3) that K n is uniformly bounded. Thus, C 1 2 is uniformly integrable so that by Lemma F.7, Yn 4 is also uniformly integrable. Further, D.5) implies that Yn 2 P 0 so by Theorem in Chung 2001), we have Y L 2 n 0. By D.4), the last implies that L C 3 + C 5 + C D.7) Finally, Theorem C.1 is proved by combining D.1), D.2), D.3), D.6), and D.7). E Proof of Theorem D.1 E h Bkn,t T Eh 2 W n ) kn,n h = E Eh ) Bkn,t 2 Bkn,t Eh ) E.1) + Bkn,t 2 Eh Wn ) Hence, if in E.1) the first expectation on the right-hand side converges to 0 as n tends to infinity, Theorem D.1 is proved by iii ) and E.1). We have h E Eh ) Bkn,t 2 Bkn,t 1 Cov h 2 Bkn,t1, h Bkn,t2 ) t 1,t 2 T kn,n = M 1 + M 2, where M 1 = M 2 = 1 2 Cov h, h ), Bkn,t1 Bkn,t2 t 1,t 2 T kn,n, dz d B kn,t1,z d B kn,t2 ) max k n,i 1 2 Cov h, h ). Bkn,t1 Bkn,t2 t 1,t 2 T kn,n, dz d B kn,t1,z d B kn,t2 )>max k n,i Regarding M 1, for a given t 1 T kn,n, there is at most 2 max i k n,i + 1) d choices for t 2. Thus, M 1 2 max i k n,i + 1) d 2 max i k n,i + 1) d sup t 1,t 2 T kn,n, dz d B kn,t1,z d B kn,t2 ) max k n,i Cov h Bkn,t1, h Bkn,t2 ) sup t 1,t 2 T kn,n, dz d B kn,t1,z d B kn,t2 ) max k n,i By i ), there exists a constant c 1 > 0 such that M 1 2 max i k n,i + 1) d c 1 = 22 E hbkn,t1 2 )E h Bkn,t2 2 ). 2 max i k n,i + 1) d di=1 m n,i k n,i + 1) c 1.

24 which by S1) implies that M 1 tends to 0 as n tends to infinity. We have M 2 sup t 1,t 2 T kn,n, dz d B kn,t1,z d B kn,t2 )>max k n,i Cov h Bkn,t1, h Bkn,t2 ). Further, by 4.2) for all t Z d, Z d B kn,t b n, where b n = d j=1 2k n,j + 1). Then, by Lemma 1 in Sherman 1996), for any η > 0, we have M 2 4η 2 α Zn b n,b n max k n,i ) + 3 c 2 EX η) 1 ) c 2 EX η) 2 ) 2 E.2) where for i = 1, 2, X i = h Bkn,ti, X η i = X i 1X i η), and c 2 = sup t Tkn,n E h Bkn,t 2 ) which by i ) is finite. Hence, by first taking lim sup as n tends to infinity and second as η tends to infinity, it follows by E.2), i ), and ii ), that M 2 tends to 0 as n tends to infinity. Therefore, E h Bkn,t Eh Bkn,t ))/ 2 converges to 0 as n tends to infinity. F Lemmas This section contains a number of technical lemmas used in the proofs of the main results. Proofs of the lemmas are given in the supplementary material. Lemma F.1. For all l, k s n Z d, we have dl, k) s n 2R d Cn R l), Cn R k) ) dl, k) + s n + 2R. Lemma F.2. For c 1, c 2, r 0, we have Lemma F.3. We have α Z c 1,c 2 r) α X c 1 v n,c 2 v n r s n 2R). ) Σn lim sup λ max <. D n where λ max M) denotes the maximal eigen value of a symmetric matrix M. Lemma F.4. For k N and i s n Z d, {j s n Z d : di, j) = s n k} 3 d k d 1. Lemma F.5. Under the assumptions H b ), H2), and H4), we have the convergence 1 a n 0. σ 2 n Lemma F.6 Biscio et al. 2017)). Let {X n } n N be a sequence of random variables in R p, for p N, such that 0 < lim inf λ min VarXn ) ) < lim sup λ max VarXn ) ) <, where for a symmetric matrix M, λ min M) and λ max M) denote the minimal and maximal eigen values of M. Then, VarX n ) 1/2 distr. X n N 0, I p ) if for all a R p, a T VarX n )a ) 1 2 a T distr. X n N 0, 1). 23

25 Lemma F.7. Let the situation be as in Appendix D. We have Y n 4 q C 1 2. Proof of Lemma F.7. For any vector x, let [x] i denotes its i-th coordinate. By the Cauchy-Schwarz inequality, Y n 2 1 q i=1 [Ψ Bkn,t EΨ Bkn,t )] 2 i. so that by applying the Cauchy-Schwarz inequality on the first sum, On the other hand, C 1 2 = Y n q 2 q i,j=1 which implies that Y n 4 q C 1 2. q i=1 [Ψ Bkn,t EΨ Bkn,t )] 2 i ) 2. [Ψ Bkn,t EΨ Bkn,t )] i [Ψ Bkn,t EΨ Bkn,t )] j ) 2 24

26 Supplemental material S1 Central limit theorem by blocking technique For 0 < β < γ < 1 and l n = l 1,n,..., l d,n ) n γ Z, define blocks B n l n ) = d lj,n n γ n β )/2, l j,n + n γ n β )/2 ]. S1.1) j=1 For n N, we denote by w n = n γ n β ) d the volume of each B n l n ). Note that contrary to 3.2) the union of blocks defined in S1.1) over n γ Z do not cover R d. To apply the blocking technique central limit theorem we assume that T Wn X) can be approximated see C5) below) by a sum where k n E n f n,bnk n)x) E n = {l n γ Z, B n l) W n } S1.2) and for each n and k n n γ Z, f n,bnk n)x) is a q 1 dimensional statistic depending on X only through X B n k n ) R). We consider the following conditions. C1) The cardinality E n of E n verifies E n = On d1 γ) ). C2) There exists ɛ > 0 such that sup m 0 α X m,ms)/m = O 1 s d+ɛ ) and 2d/2d + ɛ) < β < γ < 1. C3) There exists a τ > 0 so that sup n N Ew n E n ) 1 2 [fn,bnk n)x) Ef 2+τ n,bnkn)x))] <. k n E n C4) There exists a positive definite matrix Σ, such that for all k n n γ Z, lim w 1 n Var f n,bnk n)x) = Σ. A preliminary central limit theorem is then given below. Theorem S1.1. If C1) C4) holds, w n E n ) 1 2 k E n C5) lim Var [ W n 1 2 T Wn X) w n E n ) 1 2 fn,bnk)x) E[f n,bnk)x)] ) distr. N0, Σ). k E n f n,bnk)x) ] = 0. A prerequisite for verifying C5) will typically be lim W n /w n E n ) = 1, see for instance Proke sová and Jensen 2013). In other words, W n can be approximated well by the sum of blocks B n k n ) contained in W n. This implies that W n expands in all directions and, in light of C1), that W n is proportional to n d. 25

RESEARCH REPORT. A note on gaps in proofs of central limit theorems. Christophe A.N. Biscio, Arnaud Poinas and Rasmus Waagepetersen

RESEARCH REPORT. A note on gaps in proofs of central limit theorems. Christophe A.N. Biscio, Arnaud Poinas and Rasmus Waagepetersen CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING 2017 www.csgb.dk RESEARCH REPORT Christophe A.N. Biscio, Arnaud Poinas and Rasmus Waagepetersen A note on gaps in proofs of central limit theorems