On Model Fitting Procedures for Inhomogeneous Neyman-Scott Processes

Similar documents
Variance Estimation for Statistics Computed from. Inhomogeneous Spatial Point Processes

A Thinned Block Bootstrap Variance Estimation. Procedure for Inhomogeneous Spatial Point Patterns

Second-Order Analysis of Spatial Point Processes

Point Processes. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part II)

ESTIMATING FUNCTIONS FOR INHOMOGENEOUS COX PROCESSES

Spatial analysis of tropical rain forest plot data

Statistics: Learning models from data

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Estimating functions for inhomogeneous spatial point processes with incomplete covariate data

On prediction and density estimation Peter McCullagh University of Chicago December 2004

Introduction to Spatial Data and Models

Introduction to Spatial Data and Models

Estimation of a Neyman-Scott process from line transect data by a one-dimensional K-function

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Spatial point process. Odd Kolbjørnsen 13.March 2017

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

Chapter 2. Poisson point processes

Log-Density Estimation with Application to Approximate Likelihood Inference

Nonparametric Density Estimation

Two step estimation for Neyman-Scott point process with inhomogeneous cluster centers. May 2012

Regularization in Cox Frailty Models

Akaike Information Criterion to Select the Parametric Detection Function for Kernel Estimator Using Line Transect Data

Ripley s K function. Philip M. Dixon Iowa State University,

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

A Bivariate Point Process Model with Application to Social Media User Content Generation

Lecture 10 Spatio-Temporal Point Processes

ECE 275A Homework 7 Solutions

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

RESEARCH REPORT. A note on gaps in proofs of central limit theorems. Christophe A.N. Biscio, Arnaud Poinas and Rasmus Waagepetersen

DIAGNOSTICS FOR STRATIFIED CLINICAL TRIALS IN PROPORTIONAL ODDS MODELS

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Chapter 9. Non-Parametric Density Function Estimation

Lecture 2: Poisson point processes: properties and statistical inference

Decomposition of variance for spatial Cox processes

Rejoinder. 1 Phase I and Phase II Profile Monitoring. Peihua Qiu 1, Changliang Zou 2 and Zhaojun Wang 2

Likelihood-Based Methods

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Point process models for earthquakes with applications to Groningen and Kashmir data

An introduction to spatial point processes

Quasi-likelihood Scan Statistics for Detection of

Quantile Regression for Residual Life and Empirical Likelihood

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th

Overall Objective Priors

Chapter 9. Non-Parametric Density Function Estimation

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

Information geometry for bivariate distribution control

On Backtesting Risk Measurement Models

A Novel Nonparametric Density Estimator

On robust and efficient estimation of the center of. Symmetry.

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

More on Estimation. Maximum Likelihood Estimation.

Decomposition of variance for spatial Cox processes

POLI 8501 Introduction to Maximum Likelihood Estimation

A Bayesian perspective on GMM and IV

Generalized Linear Models

Model comparison: Deviance-based approaches

Modelling Non-linear and Non-stationary Time Series

Properties of spatial Cox process models

Introduction to Geostatistics

Regression I: Mean Squared Error and Measuring Quality of Fit

Tests for spatial randomness based on spacings

Lecture 9 Point Processes

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Notes on the Multivariate Normal and Related Topics

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

A nonparametric two-sample wald test of equality of variances

Points. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Decomposition of variance for spatial Cox processes Jalilian, Abdollah; Guan, Yongtao; Waagepetersen, Rasmus Plenge

Bias and variance reduction techniques for bootstrap information criteria

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Multiplicative component models for replicated point processes

Bias-corrected AIC for selecting variables in Poisson regression models

University of California, Berkeley

Statistics 222, Spatial Statistics. Outline for the day: 1. Problems and code from last lecture. 2. Likelihood. 3. MLE. 4. Simulation.

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Semi-parametric estimation of non-stationary Pickands functions

Hypothesis Testing Based on the Maximum of Two Statistics from Weighted and Unweighted Estimating Equations

Test of Complete Spatial Randomness on Networks

Model Selection and Geometry

Using all observations when forecasting under structural breaks

Statistical Applications in Genetics and Molecular Biology

CONCEPT OF DENSITY FOR FUNCTIONAL DATA

Unsupervised Learning with Permuted Data

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

ELEMENTS OF PROBABILITY THEORY

The Bayesian Approach to Multi-equation Econometric Model Estimation

A Shape Constrained Estimator of Bidding Function of First-Price Sealed-Bid Auctions

ECE534, Spring 2018: Solutions for Problem Set #3

RESEARCH REPORT. Inhomogeneous spatial point processes with hidden second-order stationarity. Ute Hahn and Eva B.

Nonparametric Estimation of Luminosity Functions

Point Process Control

Econ 582 Nonparametric Regression

A NOTE ON TESTING THE SHOULDER CONDITION IN LINE TRANSECT SAMPLING

Forecasting Levels of log Variables in Vector Autoregressions

Fisher information for generalised linear mixed models

Transcription:

On Model Fitting Procedures for Inhomogeneous Neyman-Scott Processes Yongtao Guan July 31, 2006 ABSTRACT In this paper we study computationally efficient procedures to estimate the second-order parameters for a class of inhomogeneous Neyman-Scott processes proposed by Waagepetersen (2006). Specifically, we consider three different estimation procedures: a minimum contrast estimation procedure (MCEP) using the K-function, an MCEP using the pair correlation function, and a procedure based on composite likelihood. We give recommendations on how to select the tuning parameters involved in each of these three estimation procedures. We also discuss about which procedure to use based on some preexaminations of the data at hands and the goal of the study. KEY WORDS: Composite Likelihood, Inhomogeneous Neyman-Scott Process, Minimum Contrast Estimation, Pair Correlation Function. Yongtao Guan is Assistant Professor, Division of Biostatistics, Yale School of Public Health, Yale University, New Haven, CT 06520-8034, e-mail yongtao.guan@yale.edu. This research was supported by National Science Foundation grant DMS-0603673.

1. INTRODUCTION Many of the emerging spatial point pattern data are inhomogeneous in nature (e.g., Diggle 2003). Conventional point process models often have difficulty with modeling such data since they were developed under stationarity (i.e., homogeneity). To solve the problem, a large number of new models have been recently proposed, where the majority belong to the general class of Gibbs point process models (e.g., Stoyan and Stoyan 1998, Nielsen and Jensen 2004). Although they are flexible for modeling repulsive spatial interactions, Gibbs point process models often are not appropriate for attractive point patterns, as pointed out by Møller and Waagepetersen (2006). Waagepetersen (2006) proposed a new class of inhomogeneous Neyman-Scott process (INSP) models that allow attractive interactions between events. This class of models is analytically simple yet practically sensible for modeling attractive point patterns especially those arising from ecological studies. In addition, statistical properties for estimators of the first-order structure (i.e., the inhomogeneous parameters) have been developed, which makes inference on these parameters possible. In this paper, we discuss procedures that can be used to estimate the second-order parameters (SOPs) of an INSP, a subject that has not been well studied in literature. The estimation of the SOPs is important for two reasons: 1) they can provide insights on how events interact with each other such as the interaction range and strength, which are often of great interest by themselves, and 2) they are critical for a correct inference on the firstorder parameters (FOPs) since variances of these estimates depend on the SOPs. We will consider three procedures to estimate the SOPs. The first is a minimum contrast estimation procedure (MCEP) based on the K-function as given in Waagepetersen (2006); the second is an MCEP based on the pair correlation function (PCF); and the last is an estimation equation approach based on composite likelihood (CL). It will be seen from Section 3 that each of these three procedures involves at least one unknown tuning parameter. The 1

choice of these tuning parameters is critical for the accuracy of the resulting estimates. The main objective of this paper is to give data-driven procedures (whenever available) and/or empirical evidences that are obtained from simulations on how to select these parameters. The remainder of this article is organized as follows. In Section 2 we give some background on INSP. We discuss the aforementioned three estimation procedures in Section 3 and assess the performance of the procedures through simulations in Section 4. The conclusion of the paper is given in Section 5. Some technical details are given in the Appendix. 2. INHOMOGENEOUS NEYMAN-SCOTT PROCESSES Consider a two-dimensional homogeneous spatial Poisson process whose first-order intensity is equal to ρ. Each event of the process is considered as a parent which will in turn generate a Poisson number of offspring with an expected value equal to µ. Conditional on the location of a parent, the offspring are dispersed independently following some common probability density function (pdf). For any location s, let X(s) be a p 1 vector of covariates that are recorded at this location. An offspring at s is retained with a probability exp[x(s) T β]/m, where M = max { exp[x(s) T β] } and β is a p 1 vector of unknown parameters. The resulting offspring process then forms an INSP which will be subsequently denoted by N here and henceforth. From the definition of an INSP, we see that the covariates, which often reflect conditions of the local environment, control the survival rate of an offspring at a given location. Another perspective to understand this formulation is that the initial offspring process is thinned by some thinning probabilities which depend on the covariates process X (Waagepetersen 2006). A nice property about INSPs is that their summary functions are often available in closed forms. For example, the first- and second-order intensity functions of the process are given as follows λ(s) = ρµ exp[x(s) T β]/m, (1) 2

λ(s 1, s 2 ) = λ(s 1 )λ(s 2 )g(s 1, s 2 ; θ), (2) respectively, where g(s 1, s 2 ; θ) is the PCF (e.g., Stoyan and Stoyan 1994) for the unthinned offspring process depending some unknown parameter vector θ. For example, if the pdf for an offspring location relative to its parent is a bivariate radially symmetric normal distribution (e.g., Diggle 2003), then g(s 1, s 2 ; θ) = 1 + exp[ s 1 s 2 2 /(4σ 2 )]/(4πρσ 2 ), where σ 2 is the variance parameter associated with the pdf and is the Euclidean norm. We call θ as the SOPs since it affects only the second-order structure of the process. Note that for the above example, θ = (ρ, σ 2 ). Throughout the remainder of this article, we will assume that g(s 1, s 2 ; θ) = g( s 1 s 2, θ), i.e., the PCF is isotropic. The FOPs β are often estimated by an estimation equation approach. Let β 0 = ρµ/m and let D be the domain of interest where a realization of N is observed. Waagpetersen (2006) proposed to estimate β by solving u(β) = s N D [ log(β0 ) + X(s) T β ] D β 0 exp[x(s) T β]ds = 0. (3) An alternative approach to estimate β is to solve the following estimation equation u(β) = s N D [ { }] X(s) T β log exp[x(s) T β]ds = 0. (4) D Our simulation experience suggests that (3) and (4) often give nearly identical estimates, although it appears that (4) may be preferable since it eliminates the need to estimate β 0. The statistical properties of the resulting estimator from (3) are available in Waagpetersen (2006). It should be emphasized here that although neither (3) or (4) requires the estimation of the SOPs, the inference on the FOPs does since their distributions depend on the true SOPs. 3

3. PROCEDURES TO ESTIMATE THE SOPS 3.1 MCEP based on the K-function Throughout the remainder of the paper, let ˆβ 0 and ˆβ denote an estimate of β 0 and β by solving either (3) or (4). Following Waagepetersen (2006), we define the empirical K- function for an INSP as follows ˆK(t) = s 1,s 2 N D I(0 < s 1 s 2 t) e(s 1, s 2 ) ( ˆβ 0 ) 2 exp{[x(s 1 ) T + X(s 2 ) T ] ˆβ}, where I( ) is an indicator function and e(s 1, s 2 ) is an edge correction term. For the various forms of e(s 1, s 2 ), see Diggle (2003) and Stoyan and Stoyan (1994). To estimate θ, the empirical K-function is often compared with the theoretical K- function K(t; θ) = t ug(u; θ)du through the following discrepancy measure 0 U K (θ) = r 0 {[ ˆK(t)] c [K(t; θ)] c } 2 dt. (5) The minimizer of U K over θ gives the estimated model parameters, ˆθ. The two additional parameters c and r in (5) are called tuning parameters and need to be preselected. Diggle (2003) suggested using c =.25 or smaller for attractive point patterns in the homogeneous case. Guan and Sherman (2006) found that this recommendation was questionable since using c = 1 with an appropriate choice of r often outperformed using c =.25. In the INSP case, Waagepetersen (2006) used c =.25. To the author s best knowledge, no specific recommendation has been given on the choice of r. Although the choice of c and r may be less of an issue if the sample size is very large, for data with a moderate sample size (say 200 points), it is often very important. We intend to give some recommendations on the choice of these parameters based on the results from our simulation study in Section 4. 4

3.2 MCEP based on the PCF The MCEP discussed in the previous section can be extended to the case of using the PCF. Specifically, we may define the following discrepancy measure that is analogous to (5) U g (θ) = r 0 {[ĝ(t; h)] c [g(t; θ)] c } 2 dt, (6) where ĝ(t; h) is an estimate of g(t; θ) by nonparametric smoothing and h is the bandwidth being used to obtain ĝ(t; h). Let k( ) denote a kernel function. We define ĝ(t; h) as ĝ(t; h) = 1 2π s 1,s 2 N D I(s 1 s 2 )k[(t s 1 s 2 )/h] e(s 1, s 2 ) ( ˆβ 0 ) 2 exp{[x(s 1 ) T + X(s 2 ) T ] ˆβ} s 1 s 2 h. (7) Note that there is one more parameter h that has been introduced in (6) as compared to (5). We will discuss how to select h at a later time. One difficulty with using (6) is that unlike the K-function case, there is no general recommendation available on how to select c. It certainly would be desirable if we can eliminate the need to select c. For this purpose, we will extend the method of Guan (2006a) in the homogeneous case to the current setting. Specifically, we derive the asymptotic variance of ĝ(t; h) and obtain an estimate of the variance based on its asymptotic expression. From the results that are detailed in the Appendix, we see that by ignoring some multiplicative constant, V ar[ĝ(t; h)] g(t; θ) t D 2π 0 1 ( ˆβ dψds, (8) 0 ) 2 exp[{x(s) T + X[s + u(t, ψ)] T } ˆβ] where u(t, ψ) = [t cos(ψ), t sin(ψ)]. Assume that r is relatively small compared to the size of D and that X( ) is smooth enough such that X[s + u(t 1, ψ)] X[s + u(t 2, ψ)] for all t 1, t 2 r. Then (8) may be simplified as V ar[ĝ(t; h)] g(t; θ)/t. In light of the above result, we assign w(t) = t/ĝ(t; h) and modify (6) as follows U g (θ) = r 0 w(t) [ĝ(t; h) g(t; θ)] 2 dt, (9) 5

Note that the use of the power transformation c in both (5) and (6) is intended to control the sampling fluctuation associated with the estimate of the respective summary function. In (9), this is achieved through the use of the weight function w(t). An advantage of using (9) over (6) is there is one fewer tuning parameter to be selected. To use (9), it is also important to select the bandwidth h, preferably by some data-driven methods. In this section, we discuss one method based on least-squares cross validation. In what follows, let r h be a preselected constant and ĝ (s 1,s 2 ) (t; h) denote the estimate of g(t; θ) by deleting s 1 and s 2 from the observed events. Define D D u = {s : s D, s + u D}. Guan (2006b) proposed to select h by minimizing the following criterion M(h) = [ĝ( u ; h)] 2 du u r h 2 I(0 < s 1 s 2 r h )ĝ (s 1,s 2 ) ( s 1 s 2 ; h) ( ˆβ. (10) 0 ) 2 s 1,s 2 D N D D s 1 + s 2 exp{[x(s 1 ) T + X(s 2 ) T ] ˆβ} In the asymptotic sense, Guan (2006b) showed that minimizing M(h) was equivalent to minimizing R(h) = u r h [ĝ( u ; h) g( u ; θ)] 2 du under some mild regulatory conditions. Note that to minimize (10), we need the value of r h. Our numerical experience is that setting r h to be slightly larger than the range of the process (i.e., the distance beyond which the dependence between events dies out) often works fine. Only a rough estimate of the range is necessary. This can be obtained by examining the empirical PCF plot using a pilot bandwidth or from a fitted model using a different procedure, say an MCEP using the K-function. 3.3 Estimation procedures based on CL The estimation procedures to be discussed in this section are motivated by the recent work of Guan (2006a) and Guan (2006c). Let W (s 1, s 2 ) denote an appropriately selected weight function. Guan (2006c) proposed to estimate θ for a homogeneous spatial point process by 6

maximizing the following CL criterion s 1,s 2 D N { [ W (s 1, s 2 ) log[λ 2 (s 1, s 2 ; θ)] log W (u, v)λ 2 (u, v; θ)dudv] }. (11) Here the integrations in (11) are both over D. For the weight function, Guan (2006c) recommended using W (s 1, s 2 ) = I(0 < s 1 s 2 r c )g( s 1 s 2 ; ˆθ) for attractive point patterns, where r c is a preselected constant and ˆθ is an estimate of θ. We note that (11) is a CL criterion because each term in the summations is a legitimate (log) likelihood due to the fact that f(s 1, s 2 ; θ) = λ 2 (s 1, s 2 ; θ) λ2 (u, v; θ)dudv (12) is the pdf for two arbitrary events in D N to be at s 1 and s 2. In another word, the criterion given by (11) simply sums up all the pairwise (log) likelihoods. For INSP processes, (11) can still be used in principle since (12) does not depend on homogeneity and thus is still a valid pdf. However, the calculation of the integrals in (11) may be computationally cumbersome due to the introduction of inhomogeneity. To solve this problem, we assign W (s 1, s 2 ) = g ( s 1 s 2 ; ˆθ) ( ˆβ 0 ) 2 exp{[x(s 1 ) T + X(s 2 ) T ] ˆβ} for some isotropic function g ( ), e.g., g (u) = I(0 < u r c )g(u; ˆθ). Then λ 2 (u, v; θ)dudv g ( u v ; ˆθ)g( u v ; θ)dudv, assuming that ˆβ 0 and ˆβ are close to β 0 and β, respectively. The fact that both g and g are isotropic will greatly simply the computational effort. In particular, the computationally efficient algorithms in Guan (2006c) can be applied without difficulty. Another use of criterion (10) is to select the bandwidth h in (7). To see this, first define W (s 1, s 2 ) = g ( s 1 s 2 ; ˆθ) ( ˆβ 0 ) 2 exp{[x(s 1 ) T + X(s 2 ) T ] ˆβ} 1 D D s 1 + s 2, 7

where the term 1 D D s 1 +s 2 is an edge correction term which simplifies the computation. By ignoring some terms that do not depend on h, we propose the following composite likelihood cross-validation criterion C(h) = s 1,s 2 D N { [ W (s 1, s 2 ) log[ĝ( s 1 s 2 ; h)] log ug (u)ĝ(u; h)du] }. (13) R The bandwidth h can then be selected by maximizing C(h). For an m-dependent (i.e., events separated by a distance larger than m are independent) and homogeneous spatial point process, Guan (2006a) argued that maximizing (13) was asymptotically equivalent to minimizing a modified Kullback-Leibler information distance measure (e.g., Silverman 1998) between the pdf f defined by (12) and its nonparametric estimates that is obtained by replacing λ 2 and θ in (12) with ˆλ 2 and h, respectively. 4. A SIMULATION STUDY To compare the performance of the proposed methods, we applied them to realizations from INSPs on a unit square, where the offspring dispersion of the processes was a bivariate radially symmetric normal distribution. We added inhomogeneity to the model by assigning X(s) = s x where s = (s x, s y ) and β = 1. Thus the likelihood for an offspring to survive increases as s x increases. The expected numbers of events per simulation (denoted by λ) was 100 and 200, whereas the expected number of parents ρ = 12.5, 25 for λ = 100 and ρ = 25, 50 for λ = 200. The cluster spread parameter σ was.01,.02,.04. To assess the effects of the tuning parameters, we set r = 3σ, 4σ, 5σ, 6σ and c =.125,.25,.5, 1 for MCEP using the K-function, r h = 5σ and r = 3σ, 4σ, 5σ, 6σ for MCEP using the PCF, and r c =.2,.3,.4 for the CL estimation procedure. Note that 4σ is often regarded as the dependence range for this type of process (e.g., Diggle 2003). Thus we simply intended to link the tuning parameter r to the range of the process. A recommendation on the choice of r in terms of the range of a process is practically appealing since a rough estimate of 8

the range can often be obtained with relative ease. For the convenience of presentation, we will use MCEPK, MCEPP and CLEP to denote the MCEP based on the K-function, the PCF and the CL estimation procedure, respectively. Table 1 lists the mean squared errors (MSE) for MCEPK and MCEPP. For MCEPK, we omit the results for c = 1 since they are similar to (and in fact often inferior to) those for c =.5. Similarly, we present only the results for MCEPP when (13) was used to select the bandwidth. Due to space constraint, we also do not include the results for ρ = 50. These omissions do not affect our main conclusions. From Table 1 we see that the choice of tuning parameters plays an important role in the performance of MCEPK. Note that the combination of r = 3σ and c =.125 often led to unstable estimations. In contrast, MCEPP is fairly insensitive to the choice of the tuning parameter (i.e., r). Nevertheless it appears that r = 3σ, 4σ yielded overall the best results. From a practical point of view, this means that we should choose r to be around or slightly smaller than the dependence range of the process. When compared to the best MCEPK, we see that MCEPP gave competitive results for ˆρ and generally comparative results for ˆσ across all levels of r. It s worth noting that for MCEPK, a small MSE for ˆρ is often accompanied by a large MSE for ˆσ and vice verse. This is especially true when σ =.04. We thus recommend MCEPP over MCEPK in general. We also comment on how to select r and c for MCEPK since this is a very popular approach for many practitioners. As being noted in the foregoing paragraph, a combination of a small r value relative to the range of the process and a small c value should be avoided. For a process with a short dependence range (i.e., tight clusters as is the case for σ =.01 in the simulation), c =.5 consistently yielded the best results across all r values. Note that this contradicts to the general perception that c =.25 or less gave better results for attractive point patterns. To select r in this case, we recommend using r = 4σ, 5σ, i.e., around or slightly larger than the dependence range. For a process with a medium 9

dependence range that is comparable to the case of σ =.02, c =.25 or.5 worked well. Although c =.125 sometime gave better results when estimating ρ, this was achieved often at a price of a highly variable estimate for σ. To select r in this case, we recommend using r = 4σ, i.e., around the range of the process. For a process with a long dependence range (e.g., σ =.04 in our simulation), c =.125 often performed better than c =.25 and.5 except when r = 3σ. However, one should be aware of that it may also result in an unstable estimate for σ, especially when µ (i.e., cluster size) is large. We thus do not recommend using c =.125 in this case. To select r, we recommend using r = 4σ with c =.25 or r = 3σ for c =.5. Table 2 lists the MSE for CLEP. Overall r c =.3 or.4 worked well. This agrees with the findings of Guan (2006b) in the homogeneous case. When compared to MCEPK and MCEPP, we see that CLEP often yielded better estimates for σ but slightly worse estimates for ρ. Thus only if the main interest of a study is to assess the interaction range between events, CLEP should be preferred. The performances of all three procedures were affected by the model parameters ρ, µ, σ and λ. For the same λ, σ can be estimated more accurately when ρ is larger. This is due to the fact that σ is a cluster level parameter. A larger number of clusters (i.e., ρ) means a bigger sample size for the estimation of σ, which in turn will lead to a better estimate. For the estimation of ρ, more accurate estimates can be achieved when the cluster is tighter (i.e., smaller σ) and the cluster size (i.e., µ) is large. This is because tighter and larger clusters can be more easily distinguished from each other. When the sample size increases, both ρ and σ can be estimated more accurately when µ and σ are fixed. 5. CONCLUSION We have discussed three estimation procedures (i.e., MCEPK, MCEPP and CLEP) that can be used to estimate the SOPs of an INSP. The estimation of the SOPs is important 10

since they play a critical role for the inference on the FOPs. We have given specific recommendations on how to select the tuning parameters involved in each procedure. These recommendations are based either on theoretical justifications or on simulation evidences as being presented in Section 3 and Section 4, respectively. The MCEPP is easier to use than the MCEPK due to the facts that only one tuning parameter is involved (as compared to two for MCEPK) and that the performance of it is fairly insensitive to the choice of the tuning parameter. We thus recommend the use of MCEPP over MCEPK in general, especially when the dependence range of the process is relatively large. If the main goal of the study is to obtain the interaction range among events, we recommend the use of the CLEP. APPENDIX: DERIVATIONS OF (8) Assume that λ(s) is bounded below from zero. Consider a sequence of regions D n and bandwidths h n. Let D n denote the boundary of D n and D n denote the length of D n. We assume the following condition on D n and h n : D n = O(n 2 ), D n = O(n), and h n = O(n β ) for some β (0, 2). (14) For the ease of mathematical representations, let e(s 1, s 2 ) = 1/ D n, that is, we do not consider any edge correction term. Also assume that ˆβ 0 and ˆβ converge to the target parameters faster than ĝ n (t; h) converges to g(t; θ). We thus substitute them with the true values without altering the asymptotic results. Also define λ(s) = β 0 exp[x(s)β]. Let λ k (s 1,, s k ) denote the kth-order intensity function of the process. Note that λ k (s 1,, s k ) has the general form of λ(s 1 ) λ(s k )g k (s 2 s 1,, s k s 1 ) for k 2. Then 4π 2 ( D n ) 2 V ar[ĝ n (t; h)] {k[(t s1 s 2 )/h n ]} 2 g( s 1 s 2 ) = 2 ds λ(s 1 )λ(s 2 ) s 1 s 2 2 (h n ) 2 1 ds 2 k[(t s1 s 2 )/h n ]k[(t s 1 s 3 )/h n ]g 3 (s 2 s 1, s 3 s 1 ) + 4 ds λ(s 1 ) s 1 s 2 s 1 s 3 (h n ) 2 1 ds 2 ds 3 11

+ k[(t s1 s 2 )/h n ]k[(t s 3 s 4 )/h n ] s 1 s 2 s 3 s 4 (h n ) 2 [g 4 (s 2 s 1, s 3 s 1, s 4 s 1 ) g( s 1 s 2 )g( s 3 s 4 )]ds 1 ds 2 ds 3 ds 4, where the integrations are all over D. Lengthy yet straightforward algebra not included in the paper yields that the first term on the right side of the equality dominates over the other two. This is due to condition (14) and the fact that any INSP is Brillinger mixing (Heinrich 1988). Furthermore, the first term is approximately g(t; θ) h n t 2π 0 1 λ(s)λ[s + u(t, ψ)] dψds [k(u)] 2 du. Thus we have proved (8). REFERENCES Diggle, P. J. (2003), Statistical Analysis of Spatial Point Patterns, New York: Oxford University Press Inc. Guan, Y. (2006a), A Composite Likelihood Cross-Validation Approach in Selecting Bandwidth for the Estimation of the Pair Correlation Function, Scandinavian Journal of Statistics, to appear. Guan, Y. (2006b), A Least-Squares Cross-Validation Bandwidth Selection Approach in Pair Correlation Estimations, Statistics & Probability Letters, submitted. Guan, Y. (2006c), A Composite Likelihood Approach in Fitting Spatial Point Process Models, Journal of the American Statistical Association, to appear. Guan, Y. and Sherman, M. (2006), On Least Squares Fitting for Spatial Point Processes, Journal of the Royal Statistical Society, Ser. B, submitted revision. 12

Heinrich, L. (1988), Asymptotic Gaussianity of Some Estimators for Reduced Factorial Moment Measures and Product Densities of Stationary Poisson Cluster Processes, Statistics, 19, 87 106. Møller, J. and Waagepetersen, R. P. (2004), Statistical Inference and Simulation for Spatial Point Processes, New York: Chapman & Hall. Møller, J. and Waagepetersen, R. P. (2006), Modern Statistics for Spatial Point Processes, Scandinavian Journal of Statistics, submitted. Nielsen, L. S. and Jensen, E. B..V. (2004), Statistical Inference for Transformation Inhomogeneous Point Processes, Scandinavian Journal of Statistics, 28, 641 656. Silverman, B. W. (1998), Density Estimation for Statistics and Data Analysis, New York: Chapman & Hall. Stoyan, D. and Stoyan, H. (1994), Fractals, Random Shapes and Point Fields, New York: Wiley. Stoyan, D. and Stoyan, H. (1998), Non-Homogeneous Gibbs Process Models for Forestry - A Case Study, Biometrical Journal, 40, 521 531. Waagepetersen, R. P. (2006), An Estimating Function Approach to Inference for Inhomogeneous Neyman-Scott Processes, Biometrics, to appear. 13

Table 1. MSEs of ˆρ and ˆσ for MCEPs using the K-function and the PCF. Each MSE is divided by the squared target parameter. ρ = 12.5, λ = 100 ρ = 25, λ = 100 ρ = 25, λ = 200 K PCF K PCF K PCF σ r.125.25.5.125.25.5.125.25.5 ˆρ.01 3σ.087.060.054.194.108.092.088.115.036.028.027 4σ.096.054.050.053.107.088.085.086.036.027.026.028 5σ.060.051.051.055.091.086.085.086.029.028.028.029 6σ.055.052.052.055.089.087.086.085.030.030.030.030.02 3σ.097.077.074.205.114.101.095.166.060.053.049 4σ.113.076.078.081.118.102.099.095.063.053.055.054 5σ.086.083.089.084.107.104.104.094.058.060.063.057 6σ.084.090.097.085.109.111.113.094.063.069.075.058.04 3σ.250.264.201.241.204.197.166.205.216.201 4σ.254.284.334.255.207.225.235.192.194.253.290.244 5σ.267.371.455.271.246.278.302.201.243.335.410.258 6σ.331.468.612.276.296.339.390.202.300.409.544.266 ˆσ.01 3σ.047.018.017.327.027.013.012.141.015.007.008 4σ.105.017.010.013.038.012.008.010.026.007.005.008 5σ.046.012.009.014.021.009.008.012.015.006.006.008 6σ.034.011.009.015.017.009.008.012.013.007.007.009.02 3σ.047.020.026.392.027.015.016.287.022.014.015 4σ.131.020.015.025.045.014.012.016.042.013.012.016 5σ.059.018.017.026.026.014.014.019.026.013.014.018 6σ.046.020.021.027.023.016.018.019.023.016.019.019.04 3σ.099.049.062.388.028.021.024.049.035.053 4σ.653.048.045.054.047.024.026.026.097.035.039.051 5σ.125.047.058.058.032.029.037.028.055.040.051.056 6σ.087.053.073.064.032.037.050.029.049.045.065.061 14

Table 2. MSEs of ˆρ and ˆσ for CLEP. Each MSE is divided by the squared target parameter. ρ = 12.5, λ = 100 ρ = 25, λ = 100 ρ = 25, λ = 200.2.3.4.2.3.4.2.3.4 ˆρ.01.086.064.056.169.124.110.036.031.028.02.105.088.082.181.135.122.071.062.060.04.336.276.270.238.207.203.227.222.227 ˆσ.01.009.009.009.008.008.008.006.006.005.02.017.017.017.012.012.012.012.012.012.04.047.044.048.019.022.026.033.039.044 15