On Model Fitting Procedures for Inhomogeneous Neyman-Scott Processes

Size: px

Start display at page:

Download "On Model Fitting Procedures for Inhomogeneous Neyman-Scott Processes"

Shannon Merilyn Weaver
5 years ago
Views:

1 On Model Fitting Procedures for Inhomogeneous Neyman-Scott Processes Yongtao Guan July 31, 2006 ABSTRACT In this paper we study computationally efficient procedures to estimate the second-order parameters for a class of inhomogeneous Neyman-Scott processes proposed by Waagepetersen (2006). Specifically, we consider three different estimation procedures: a minimum contrast estimation procedure (MCEP) using the K-function, an MCEP using the pair correlation function, and a procedure based on composite likelihood. We give recommendations on how to select the tuning parameters involved in each of these three estimation procedures. We also discuss about which procedure to use based on some preexaminations of the data at hands and the goal of the study. KEY WORDS: Composite Likelihood, Inhomogeneous Neyman-Scott Process, Minimum Contrast Estimation, Pair Correlation Function. Yongtao Guan is Assistant Professor, Division of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT , yongtao.guan@yale.edu. This research was supported by National Science Foundation grant DMS

2 1. INTRODUCTION Many of the emerging spatial point pattern data are inhomogeneous in nature (e.g., Diggle 2003). Conventional point process models often have difficulty with modeling such data since they were developed under stationarity (i.e., homogeneity). To solve the problem, a large number of new models have been recently proposed, where the majority belong to the general class of Gibbs point process models (e.g., Stoyan and Stoyan 1998, Nielsen and Jensen 2004). Although they are flexible for modeling repulsive spatial interactions, Gibbs point process models often are not appropriate for attractive point patterns, as pointed out by Møller and Waagepetersen (2006). Waagepetersen (2006) proposed a new class of inhomogeneous Neyman-Scott process (INSP) models that allow attractive interactions between events. This class of models is analytically simple yet practically sensible for modeling attractive point patterns especially those arising from ecological studies. In addition, statistical properties for estimators of the first-order structure (i.e., the inhomogeneous parameters) have been developed, which makes inference on these parameters possible. In this paper, we discuss procedures that can be used to estimate the second-order parameters (SOPs) of an INSP, a subject that has not been well studied in literature. The estimation of the SOPs is important for two reasons: 1) they can provide insights on how events interact with each other such as the interaction range and strength, which are often of great interest by themselves, and 2) they are critical for a correct inference on the firstorder parameters (FOPs) since variances of these estimates depend on the SOPs. We will consider three procedures to estimate the SOPs. The first is a minimum contrast estimation procedure (MCEP) based on the K-function as given in Waagepetersen (2006); the second is an MCEP based on the pair correlation function (PCF); and the last is an estimation equation approach based on composite likelihood (CL). It will be seen from Section 3 that each of these three procedures involves at least one unknown tuning parameter. The 1

3 choice of these tuning parameters is critical for the accuracy of the resulting estimates. The main objective of this paper is to give data-driven procedures (whenever available) and/or empirical evidences that are obtained from simulations on how to select these parameters. The remainder of this article is organized as follows. In Section 2 we give some background on INSP. We discuss the aforementioned three estimation procedures in Section 3 and assess the performance of the procedures through simulations in Section 4. The conclusion of the paper is given in Section 5. Some technical details are given in the Appendix. 2. INHOMOGENEOUS NEYMAN-SCOTT PROCESSES Consider a two-dimensional homogeneous spatial Poisson process whose first-order intensity is equal to ρ. Each event of the process is considered as a parent which will in turn generate a Poisson number of offspring with an expected value equal to µ. Conditional on the location of a parent, the offspring are dispersed independently following some common probability density function (pdf). For any location s, let X(s) be a p 1 vector of covariates that are recorded at this location. An offspring at s is retained with a probability exp[x(s) T β]/m, where M = max { exp[x(s) T β] } and β is a p 1 vector of unknown parameters. The resulting offspring process then forms an INSP which will be subsequently denoted by N here and henceforth. From the definition of an INSP, we see that the covariates, which often reflect conditions of the local environment, control the survival rate of an offspring at a given location. Another perspective to understand this formulation is that the initial offspring process is thinned by some thinning probabilities which depend on the covariates process X (Waagepetersen 2006). A nice property about INSPs is that their summary functions are often available in closed forms. For example, the first- and second-order intensity functions of the process are given as follows λ(s) = ρµ exp[x(s) T β]/m, (1) 2

4 λ(s 1, s 2 ) = λ(s 1 )λ(s 2 )g(s 1, s 2 ; θ), (2) respectively, where g(s 1, s 2 ; θ) is the PCF (e.g., Stoyan and Stoyan 1994) for the unthinned offspring process depending some unknown parameter vector θ. For example, if the pdf for an offspring location relative to its parent is a bivariate radially symmetric normal distribution (e.g., Diggle 2003), then g(s 1, s 2 ; θ) = 1 + exp[ s 1 s 2 2 /(4σ 2 )]/(4πρσ 2 ), where σ 2 is the variance parameter associated with the pdf and is the Euclidean norm. We call θ as the SOPs since it affects only the second-order structure of the process. Note that for the above example, θ = (ρ, σ 2 ). Throughout the remainder of this article, we will assume that g(s 1, s 2 ; θ) = g( s 1 s 2, θ), i.e., the PCF is isotropic. The FOPs β are often estimated by an estimation equation approach. Let β 0 = ρµ/m and let D be the domain of interest where a realization of N is observed. Waagpetersen (2006) proposed to estimate β by solving u(β) = s N D [ log(β0 ) + X(s) T β ] D β 0 exp[x(s) T β]ds = 0. (3) An alternative approach to estimate β is to solve the following estimation equation u(β) = s N D [ { }] X(s) T β log exp[x(s) T β]ds = 0. (4) D Our simulation experience suggests that (3) and (4) often give nearly identical estimates, although it appears that (4) may be preferable since it eliminates the need to estimate β 0. The statistical properties of the resulting estimator from (3) are available in Waagpetersen (2006). It should be emphasized here that although neither (3) or (4) requires the estimation of the SOPs, the inference on the FOPs does since their distributions depend on the true SOPs. 3

5 3. PROCEDURES TO ESTIMATE THE SOPS 3.1 MCEP based on the K-function Throughout the remainder of the paper, let ˆβ 0 and ˆβ denote an estimate of β 0 and β by solving either (3) or (4). Following Waagepetersen (2006), we define the empirical K- function for an INSP as follows ˆK(t) = s 1,s 2 N D I(0 < s 1 s 2 t) e(s 1, s 2 ) ( ˆβ 0 ) 2 exp{[x(s 1 ) T + X(s 2 ) T ] ˆβ}, where I( ) is an indicator function and e(s 1, s 2 ) is an edge correction term. For the various forms of e(s 1, s 2 ), see Diggle (2003) and Stoyan and Stoyan (1994). To estimate θ, the empirical K-function is often compared with the theoretical K- function K(t; θ) = t ug(u; θ)du through the following discrepancy measure 0 U K (θ) = r 0 {[ ˆK(t)] c [K(t; θ)] c } 2 dt. (5) The minimizer of U K over θ gives the estimated model parameters, ˆθ. The two additional parameters c and r in (5) are called tuning parameters and need to be preselected. Diggle (2003) suggested using c =.25 or smaller for attractive point patterns in the homogeneous case. Guan and Sherman (2006) found that this recommendation was questionable since using c = 1 with an appropriate choice of r often outperformed using c =.25. In the INSP case, Waagepetersen (2006) used c =.25. To the author s best knowledge, no specific recommendation has been given on the choice of r. Although the choice of c and r may be less of an issue if the sample size is very large, for data with a moderate sample size (say 200 points), it is often very important. We intend to give some recommendations on the choice of these parameters based on the results from our simulation study in Section 4. 4

6 3.2 MCEP based on the PCF The MCEP discussed in the previous section can be extended to the case of using the PCF. Specifically, we may define the following discrepancy measure that is analogous to (5) U g (θ) = r 0 {[ĝ(t; h)] c [g(t; θ)] c } 2 dt, (6) where ĝ(t; h) is an estimate of g(t; θ) by nonparametric smoothing and h is the bandwidth being used to obtain ĝ(t; h). Let k( ) denote a kernel function. We define ĝ(t; h) as ĝ(t; h) = 1 2π s 1,s 2 N D I(s 1 s 2 )k[(t s 1 s 2 )/h] e(s 1, s 2 ) ( ˆβ 0 ) 2 exp{[x(s 1 ) T + X(s 2 ) T ] ˆβ} s 1 s 2 h. (7) Note that there is one more parameter h that has been introduced in (6) as compared to (5). We will discuss how to select h at a later time. One difficulty with using (6) is that unlike the K-function case, there is no general recommendation available on how to select c. It certainly would be desirable if we can eliminate the need to select c. For this purpose, we will extend the method of Guan (2006a) in the homogeneous case to the current setting. Specifically, we derive the asymptotic variance of ĝ(t; h) and obtain an estimate of the variance based on its asymptotic expression. From the results that are detailed in the Appendix, we see that by ignoring some multiplicative constant, V ar[ĝ(t; h)] g(t; θ) t D 2π 0 1 ( ˆβ dψds, (8) 0 ) 2 exp[{x(s) T + X[s + u(t, ψ)] T } ˆβ] where u(t, ψ) = [t cos(ψ), t sin(ψ)]. Assume that r is relatively small compared to the size of D and that X( ) is smooth enough such that X[s + u(t 1, ψ)] X[s + u(t 2, ψ)] for all t 1, t 2 r. Then (8) may be simplified as V ar[ĝ(t; h)] g(t; θ)/t. In light of the above result, we assign w(t) = t/ĝ(t; h) and modify (6) as follows U g (θ) = r 0 w(t) [ĝ(t; h) g(t; θ)] 2 dt, (9) 5

7 Note that the use of the power transformation c in both (5) and (6) is intended to control the sampling fluctuation associated with the estimate of the respective summary function. In (9), this is achieved through the use of the weight function w(t). An advantage of using (9) over (6) is there is one fewer tuning parameter to be selected. To use (9), it is also important to select the bandwidth h, preferably by some data-driven methods. In this section, we discuss one method based on least-squares cross validation. In what follows, let r h be a preselected constant and ĝ (s 1,s 2 ) (t; h) denote the estimate of g(t; θ) by deleting s 1 and s 2 from the observed events. Define D D u = {s : s D, s + u D}. Guan (2006b) proposed to select h by minimizing the following criterion M(h) = [ĝ( u ; h)] 2 du u r h 2 I(0 < s 1 s 2 r h )ĝ (s 1,s 2 ) ( s 1 s 2 ; h) ( ˆβ. (10) 0 ) 2 s 1,s 2 D N D D s 1 + s 2 exp{[x(s 1 ) T + X(s 2 ) T ] ˆβ} In the asymptotic sense, Guan (2006b) showed that minimizing M(h) was equivalent to minimizing R(h) = u r h [ĝ( u ; h) g( u ; θ)] 2 du under some mild regulatory conditions. Note that to minimize (10), we need the value of r h. Our numerical experience is that setting r h to be slightly larger than the range of the process (i.e., the distance beyond which the dependence between events dies out) often works fine. Only a rough estimate of the range is necessary. This can be obtained by examining the empirical PCF plot using a pilot bandwidth or from a fitted model using a different procedure, say an MCEP using the K-function. 3.3 Estimation procedures based on CL The estimation procedures to be discussed in this section are motivated by the recent work of Guan (2006a) and Guan (2006c). Let W (s 1, s 2 ) denote an appropriately selected weight function. Guan (2006c) proposed to estimate θ for a homogeneous spatial point process by 6

8 maximizing the following CL criterion s 1,s 2 D N { [ W (s 1, s 2 ) log[λ 2 (s 1, s 2 ; θ)] log W (u, v)λ 2 (u, v; θ)dudv] }. (11) Here the integrations in (11) are both over D. For the weight function, Guan (2006c) recommended using W (s 1, s 2 ) = I(0 < s 1 s 2 r c )g( s 1 s 2 ; ˆθ) for attractive point patterns, where r c is a preselected constant and ˆθ is an estimate of θ. We note that (11) is a CL criterion because each term in the summations is a legitimate (log) likelihood due to the fact that f(s 1, s 2 ; θ) = λ 2 (s 1, s 2 ; θ) λ2 (u, v; θ)dudv (12) is the pdf for two arbitrary events in D N to be at s 1 and s 2. In another word, the criterion given by (11) simply sums up all the pairwise (log) likelihoods. For INSP processes, (11) can still be used in principle since (12) does not depend on homogeneity and thus is still a valid pdf. However, the calculation of the integrals in (11) may be computationally cumbersome due to the introduction of inhomogeneity. To solve this problem, we assign W (s 1, s 2 ) = g ( s 1 s 2 ; ˆθ) ( ˆβ 0 ) 2 exp{[x(s 1 ) T + X(s 2 ) T ] ˆβ} for some isotropic function g ( ), e.g., g (u) = I(0 < u r c )g(u; ˆθ). Then λ 2 (u, v; θ)dudv g ( u v ; ˆθ)g( u v ; θ)dudv, assuming that ˆβ 0 and ˆβ are close to β 0 and β, respectively. The fact that both g and g are isotropic will greatly simply the computational effort. In particular, the computationally efficient algorithms in Guan (2006c) can be applied without difficulty. Another use of criterion (10) is to select the bandwidth h in (7). To see this, first define W (s 1, s 2 ) = g ( s 1 s 2 ; ˆθ) ( ˆβ 0 ) 2 exp{[x(s 1 ) T + X(s 2 ) T ] ˆβ} 1 D D s 1 + s 2, 7

9 where the term 1 D D s 1 +s 2 is an edge correction term which simplifies the computation. By ignoring some terms that do not depend on h, we propose the following composite likelihood cross-validation criterion C(h) = s 1,s 2 D N { [ W (s 1, s 2 ) log[ĝ( s 1 s 2 ; h)] log ug (u)ĝ(u; h)du] }. (13) R The bandwidth h can then be selected by maximizing C(h). For an m-dependent (i.e., events separated by a distance larger than m are independent) and homogeneous spatial point process, Guan (2006a) argued that maximizing (13) was asymptotically equivalent to minimizing a modified Kullback-Leibler information distance measure (e.g., Silverman 1998) between the pdf f defined by (12) and its nonparametric estimates that is obtained by replacing λ 2 and θ in (12) with ˆλ 2 and h, respectively. 4. A SIMULATION STUDY To compare the performance of the proposed methods, we applied them to realizations from INSPs on a unit square, where the offspring dispersion of the processes was a bivariate radially symmetric normal distribution. We added inhomogeneity to the model by assigning X(s) = s x where s = (s x, s y ) and β = 1. Thus the likelihood for an offspring to survive increases as s x increases. The expected numbers of events per simulation (denoted by λ) was 100 and 200, whereas the expected number of parents ρ = 12.5, 25 for λ = 100 and ρ = 25, 50 for λ = 200. The cluster spread parameter σ was.01,.02,.04. To assess the effects of the tuning parameters, we set r = 3σ, 4σ, 5σ, 6σ and c =.125,.25,.5, 1 for MCEP using the K-function, r h = 5σ and r = 3σ, 4σ, 5σ, 6σ for MCEP using the PCF, and r c =.2,.3,.4 for the CL estimation procedure. Note that 4σ is often regarded as the dependence range for this type of process (e.g., Diggle 2003). Thus we simply intended to link the tuning parameter r to the range of the process. A recommendation on the choice of r in terms of the range of a process is practically appealing since a rough estimate of 8

10 the range can often be obtained with relative ease. For the convenience of presentation, we will use MCEPK, MCEPP and CLEP to denote the MCEP based on the K-function, the PCF and the CL estimation procedure, respectively. Table 1 lists the mean squared errors (MSE) for MCEPK and MCEPP. For MCEPK, we omit the results for c = 1 since they are similar to (and in fact often inferior to) those for c =.5. Similarly, we present only the results for MCEPP when (13) was used to select the bandwidth. Due to space constraint, we also do not include the results for ρ = 50. These omissions do not affect our main conclusions. From Table 1 we see that the choice of tuning parameters plays an important role in the performance of MCEPK. Note that the combination of r = 3σ and c =.125 often led to unstable estimations. In contrast, MCEPP is fairly insensitive to the choice of the tuning parameter (i.e., r). Nevertheless it appears that r = 3σ, 4σ yielded overall the best results. From a practical point of view, this means that we should choose r to be around or slightly smaller than the dependence range of the process. When compared to the best MCEPK, we see that MCEPP gave competitive results for ˆρ and generally comparative results for ˆσ across all levels of r. It s worth noting that for MCEPK, a small MSE for ˆρ is often accompanied by a large MSE for ˆσ and vice verse. This is especially true when σ =.04. We thus recommend MCEPP over MCEPK in general. We also comment on how to select r and c for MCEPK since this is a very popular approach for many practitioners. As being noted in the foregoing paragraph, a combination of a small r value relative to the range of the process and a small c value should be avoided. For a process with a short dependence range (i.e., tight clusters as is the case for σ =.01 in the simulation), c =.5 consistently yielded the best results across all r values. Note that this contradicts to the general perception that c =.25 or less gave better results for attractive point patterns. To select r in this case, we recommend using r = 4σ, 5σ, i.e., around or slightly larger than the dependence range. For a process with a medium 9

11 dependence range that is comparable to the case of σ =.02, c =.25 or.5 worked well. Although c =.125 sometime gave better results when estimating ρ, this was achieved often at a price of a highly variable estimate for σ. To select r in this case, we recommend using r = 4σ, i.e., around the range of the process. For a process with a long dependence range (e.g., σ =.04 in our simulation), c =.125 often performed better than c =.25 and.5 except when r = 3σ. However, one should be aware of that it may also result in an unstable estimate for σ, especially when µ (i.e., cluster size) is large. We thus do not recommend using c =.125 in this case. To select r, we recommend using r = 4σ with c =.25 or r = 3σ for c =.5. Table 2 lists the MSE for CLEP. Overall r c =.3 or.4 worked well. This agrees with the findings of Guan (2006b) in the homogeneous case. When compared to MCEPK and MCEPP, we see that CLEP often yielded better estimates for σ but slightly worse estimates for ρ. Thus only if the main interest of a study is to assess the interaction range between events, CLEP should be preferred. The performances of all three procedures were affected by the model parameters ρ, µ, σ and λ. For the same λ, σ can be estimated more accurately when ρ is larger. This is due to the fact that σ is a cluster level parameter. A larger number of clusters (i.e., ρ) means a bigger sample size for the estimation of σ, which in turn will lead to a better estimate. For the estimation of ρ, more accurate estimates can be achieved when the cluster is tighter (i.e., smaller σ) and the cluster size (i.e., µ) is large. This is because tighter and larger clusters can be more easily distinguished from each other. When the sample size increases, both ρ and σ can be estimated more accurately when µ and σ are fixed. 5. CONCLUSION We have discussed three estimation procedures (i.e., MCEPK, MCEPP and CLEP) that can be used to estimate the SOPs of an INSP. The estimation of the SOPs is important 10

12 since they play a critical role for the inference on the FOPs. We have given specific recommendations on how to select the tuning parameters involved in each procedure. These recommendations are based either on theoretical justifications or on simulation evidences as being presented in Section 3 and Section 4, respectively. The MCEPP is easier to use than the MCEPK due to the facts that only one tuning parameter is involved (as compared to two for MCEPK) and that the performance of it is fairly insensitive to the choice of the tuning parameter. We thus recommend the use of MCEPP over MCEPK in general, especially when the dependence range of the process is relatively large. If the main goal of the study is to obtain the interaction range among events, we recommend the use of the CLEP. APPENDIX: DERIVATIONS OF (8) Assume that λ(s) is bounded below from zero. Consider a sequence of regions D n and bandwidths h n. Let D n denote the boundary of D n and D n denote the length of D n. We assume the following condition on D n and h n : D n = O(n 2 ), D n = O(n), and h n = O(n β ) for some β (0, 2). (14) For the ease of mathematical representations, let e(s 1, s 2 ) = 1/ D n, that is, we do not consider any edge correction term. Also assume that ˆβ 0 and ˆβ converge to the target parameters faster than ĝ n (t; h) converges to g(t; θ). We thus substitute them with the true values without altering the asymptotic results. Also define λ(s) = β 0 exp[x(s)β]. Let λ k (s 1,, s k ) denote the kth-order intensity function of the process. Note that λ k (s 1,, s k ) has the general form of λ(s 1 ) λ(s k )g k (s 2 s 1,, s k s 1 ) for k 2. Then 4π 2 ( D n ) 2 V ar[ĝ n (t; h)] {k[(t s1 s 2 )/h n ]} 2 g( s 1 s 2 ) = 2 ds λ(s 1 )λ(s 2 ) s 1 s 2 2 (h n ) 2 1 ds 2 k[(t s1 s 2 )/h n ]k[(t s 1 s 3 )/h n ]g 3 (s 2 s 1, s 3 s 1 ) + 4 ds λ(s 1 ) s 1 s 2 s 1 s 3 (h n ) 2 1 ds 2 ds 3 11

13 + k[(t s1 s 2 )/h n ]k[(t s 3 s 4 )/h n ] s 1 s 2 s 3 s 4 (h n ) 2 [g 4 (s 2 s 1, s 3 s 1, s 4 s 1 ) g( s 1 s 2 )g( s 3 s 4 )]ds 1 ds 2 ds 3 ds 4, where the integrations are all over D. Lengthy yet straightforward algebra not included in the paper yields that the first term on the right side of the equality dominates over the other two. This is due to condition (14) and the fact that any INSP is Brillinger mixing (Heinrich 1988). Furthermore, the first term is approximately g(t; θ) h n t 2π 0 1 λ(s)λ[s + u(t, ψ)] dψds [k(u)] 2 du. Thus we have proved (8). REFERENCES Diggle, P. J. (2003), Statistical Analysis of Spatial Point Patterns, New York: Oxford University Press Inc. Guan, Y. (2006a), A Composite Likelihood Cross-Validation Approach in Selecting Bandwidth for the Estimation of the Pair Correlation Function, Scandinavian Journal of Statistics, to appear. Guan, Y. (2006b), A Least-Squares Cross-Validation Bandwidth Selection Approach in Pair Correlation Estimations, Statistics & Probability Letters, submitted. Guan, Y. (2006c), A Composite Likelihood Approach in Fitting Spatial Point Process Models, Journal of the American Statistical Association, to appear. Guan, Y. and Sherman, M. (2006), On Least Squares Fitting for Spatial Point Processes, Journal of the Royal Statistical Society, Ser. B, submitted revision. 12

14 Heinrich, L. (1988), Asymptotic Gaussianity of Some Estimators for Reduced Factorial Moment Measures and Product Densities of Stationary Poisson Cluster Processes, Statistics, 19, Møller, J. and Waagepetersen, R. P. (2004), Statistical Inference and Simulation for Spatial Point Processes, New York: Chapman & Hall. Møller, J. and Waagepetersen, R. P. (2006), Modern Statistics for Spatial Point Processes, Scandinavian Journal of Statistics, submitted. Nielsen, L. S. and Jensen, E. B..V. (2004), Statistical Inference for Transformation Inhomogeneous Point Processes, Scandinavian Journal of Statistics, 28, Silverman, B. W. (1998), Density Estimation for Statistics and Data Analysis, New York: Chapman & Hall. Stoyan, D. and Stoyan, H. (1994), Fractals, Random Shapes and Point Fields, New York: Wiley. Stoyan, D. and Stoyan, H. (1998), Non-Homogeneous Gibbs Process Models for Forestry - A Case Study, Biometrical Journal, 40, Waagepetersen, R. P. (2006), An Estimating Function Approach to Inference for Inhomogeneous Neyman-Scott Processes, Biometrics, to appear. 13

15 Table 1. MSEs of ˆρ and ˆσ for MCEPs using the K-function and the PCF. Each MSE is divided by the squared target parameter. ρ = 12.5, λ = 100 ρ = 25, λ = 100 ρ = 25, λ = 200 K PCF K PCF K PCF σ r ˆρ.01 3σ σ σ σ σ σ σ σ σ σ σ σ ˆσ.01 3σ σ σ σ σ σ σ σ σ σ σ σ

16 Table 2. MSEs of ˆρ and ˆσ for CLEP. Each MSE is divided by the squared target parameter. ρ = 12.5, λ = 100 ρ = 25, λ = 100 ρ = 25, λ = ˆρ ˆσ

Variance Estimation for Statistics Computed from. Inhomogeneous Spatial Point Processes

Variance Estimation for Statistics Computed from Inhomogeneous Spatial Point Processes Yongtao Guan April 14, 2007 Abstract This paper introduces a new approach to estimate the variance of statistics that