Stochastic Quasi-likelihood for Case-Control Point Pattern Data

Size: px

Start display at page:

Download "Stochastic Quasi-likelihood for Case-Control Point Pattern Data"

Dylan Riley
5 years ago
Views:

1 Stochastic Quasi-likelihood for Case-Control Point Pattern Data Ganggang Xu, Rasmus Waagepetersen and Yongtao Guan January 8, 2018 Abstract We propose a novel stochastic quasi-likelihood estimation procedure for case-control point processes. Quasi-likelihood for point processes depends on a certain optimal weight function and for the new method the weight function is stochastic since it depends on the control point pattern. The new procedure also provides a computationally efficient implementation of quasi-likelihood for univariate point processes in which case a synthetic control point process is simulated by the user. Under mild conditions, the proposed approach yields consistent and asymptotically normal parameter estimators. We further show that the estimators are optimal in the sense that the associated Godambe information is maximal within a wide class of estimating functions for case-control point processes. The effectiveness of the proposed method is further illustrated using extensive simulation studies and two data examples. Some key words: Case-control data, Godambe information, Optimal estimating equations, Point process, Stochastic Quasi-likelihood. Short title: Stochastic Quasi-likelihood for Case-Control Point Pattern Data Ganggang Xu is Assistant Professor ( gang@math.binghamton.edu), Department of Mathematical Sciences, Binghamton University, State University of New York, NY Rasmus Waagepetersen is Professor ( rw@math.aau.dk), Department of Mathematical Sciences, Aalborg University, Denmark. Yongtao Guan is Leslie O. Barnes Professor ( yguan@bus.miami.edu), Department of Management Science, University of Miami, Coral Gables, FL Xu s research was supported by Collaboration Grants for Mathematicians from the Simons Foundation (Award Number: ). Waagepetersen s research was supported by The Danish Council for Independent Research-Natural Sciences, grant DFF Statistics for point processes in space and beyond, and by the Centre for Stochastic Geometry and Advanced Bioimaging, funded by grant 8721 from the Villum Foundation. Guan s research was supported by National Institutes of Health grant R01 CA The authors thank the editor, the associate editor and anonymous referees for their constructive comments that lead to substantial improvements of the article. The authors also thank Prof. Hansheng Wang and Mr. Yu Chen for their help in collecting the Beijing restaurant location data.

2 1 Introduction Today spatially referenced datasets on human activities can be easily harvested from social media platforms and mobile devices equipped with GPS. Such data can be of great interest e.g. to sociologists, geographers, economists and marketing analysts. As one example, we consider in Section 4.1 bivariate point pattern data obtained from the Chinese search engine baidu.com, with the locations of Chinese and Western-style restaurants in Beijing. Other examples include data from Twitter or analogues giving times of tweets and locations of the persons tweeting (Lu et al., 2016). For spatial point pattern data regarding human activities in urban environments, a particular challenge is the very complex form of the intensity function. For example, commercial restaurants in Beijing typically cannot be found in parks and certain official areas so that the intensity function in such areas is effectively zero. Also the intensity can vary abruptly when moving from one neighbourhood into another. This means that a full modeling of the intensity function would require very detailed information on the geography of the city and gathering such information can be a cumbersome task. The aforementioned difficulties are well-known in spatial epidemiology where spatial point processes have been used as an effective tool to investigate risk factors for various diseases; see for example, Diggle (1990), Diggle and Rowlingson (1994), Diggle et al. (1997), and Zimmerman et al. (2012). Suppose that a spatial point process N is used to model the locations of occurrences of a disease over a spatial domain W in the population at risk. Then a commonly used model for the intensity function λ N ( ) of N is λ N (s) = ψ(s)λ 0 (s), s W, (1) where λ 0 (s) serves as a baseline intensity function related to the population at risk and the nonnegative factor ψ(s) models the elevated or reduced risk that an individual located at s catches the disease. In this model, ψ( ) is of primary interest while λ 0 ( ) can be viewed as an 1

3 infinite dimensional nuisance parameter. Typically a parametric model ψ(s; β) is assumed for the dependence of the risk at location s on some risk factors Z(s) associated with s. For example, one popular choice is ψ(s; β) = exp{β T Z(s)} with Z(s) being some environmental, demographic, or life-style related variables at s W. With such a structure, the parameter β gives a direct interpretation of the potential risk related to the risk factors Z(s). For reasons similar to the ones mentioned previously, the specification of λ 0 ( ) is much more intricate and a simple parametric model may not be tenable. Instead it is common to estimate λ 0 ( ) nonparametrically, e.g., by using kernel smoothing (Diggle, 1990). However, the resulting estimator may not be consistent as argued by Guan (2008). The impact of using a potentially inconsistent estimator of λ 0 ( ) on the inference regarding the parameter β is further difficult to quantify. An appealing alternative is to use case-control data including an additional control process M, also observed over W. The intensity function of M is assumed to be of the form λ M (s) = α(s)λ 0 (s), s W, (2) where α(s) is the sampling intensity when collecting the control data. The value of α( ) is determined by the actual sampling scheme of the case-control study and is thus often considered known (Diggle and Rowlingson, 1994; Zimmerman et al., 2012). For the Beijing restaurants, for example, we will study the spatial pattern of Western-style restaurants using a random sample of the Chinese restaurants as a control process. To clarify the assumptions and scope of our modeling approach we consider the following illustrative example. Suppose there exist three sets of spatial covariates, X(s), Y(s) and Z(s), conditioned on which, the case process N and the control process M are independent Poisson processes with intensity functions Λ N (s) = exp { β T Y Y(s) + (β Z + β) T Z(s) + β T XX(s) } and Λ M (s) = α(s) exp { β T Y Y(s) + β T ZZ(s) }. (3) However, only Z(s) are collected in the observed data and both X(s) and Y(s) need to be treated 2

4 as latent processes. Note that the latent process Y(s) affect both the case and control processes equivalently, but X(s) only affects the case process. Assume that X(s) is independent of Y(s) and Z(s) and that β 0 = log [ E exp { β T XX(s) }] does not depend on s. Then conditioned on Y(s) and Z(s), the control process M is still a Poisson process with intensity λ M (s) = Λ M (s) = α(s)λ 0 (s) where λ 0 (s) = exp { β T Y Y(s) + β T ZZ(s) } and the case process N becomes a Cox process with an intensity λ N (s) = E {Λ N (s) Y(s), Z(s)} = λ 0 (s)ψ(s; β) where ψ(s; β) = exp { β 0 + β T Z(s) }. The first take-away from the above example is that the parameter β needs to be interpreted as the elevated/reduced impacts of Z(s) on the case intensity λ N (s) compared to their impacts on the control intensity λ M (s). Let Z j (s) and β j be the jth elements of Z(s) and β, respectively. Then, β j = 0 means that Z j (s) has the same effect on N and M. Secondly, a key assumption in (3) is that for all factors affecting both the case and control intensities, they are either observed, i.e., Z(s), or otherwise equivalently contributing to both intensities, such as Y(s). The major difficulty in treating the baseline intensity λ 0 (s) = exp { β T Y Y(s) + β T ZZ(s) } as an unknown deterministic function lies in that its estimation can be rather challenging without observing Y(s). This difficulty, however, can be avoided by using the proportional structure between the intensity functions defined in (1) and (2), where λ 0 (s) needs not to be estimated. Consequently, it enables our theoretical investigations to treat λ 0 (s) and ψ(s) in (1) and (2) as deterministic functions, conditioned on which N and M are assumed to be independent of each other with M being a Poisson process. Diggle and Rowlingson (1994) studied this case-control setting where N and M are independent Poisson processes and proposed a conditional likelihood approach to estimate the unknown parameter β. However, the strong independence properties implied by the Poisson assumption for the case process N may be too restrictive since they preclude possible interactions among the cases. For example, this may not be appropriate for modeling infectious diseases (Diggle et al., 1997; Diggle et al., 2007). For this reason, we consider the scenario where the case process N may have some clustering patterns. For example, in the above illustrative example, by allowing spatial 3

5 dependence in the latent process X(s) in (3), conditioned on λ 0 (s) = exp { β T Y Y(s) + β T ZZ(s) } and ψ(s; β) = exp { β 0 + β T Z(s) }, the case process N becomes a Cox process, which may have additional aggregations relative to the control process M. Diggle and Rowlingson s conditional likelihood may be viewed from an estimating function point of view. Consider the following spatial increment: U(ds; β) = N(ds) ψ(s; β) M(ds), (4) α(s) where N(ds) and M(ds) denote the numbers of points from N and M in an infinitesimal spatial increment ds located around the spatial location s. Note that E{N(ds)} = λ N (s)ds and E{M(ds)} = λ M (s)ds. Then based on (1) and (2), it is trivial to see that E{U(ds; β)} = 0. As a result, using general theories on estimating equations (see, e.g. Crowder, 1986), we can estimate the p 1 parameter vector β consistently by solving the following estimating equation: Q f (β) = W f(s; β)u(ds; β) = s N f(s; β) s M ψ(s; β) α(s) f(s; β) = 0 p, (5) where f(s; β) is a p 1 real vector-valued function and 0 p is a p 1 vector of zeros. When f(s; β) = γ(s) ψ(1) (s;β) ψ(s;β) where γ(s; β) = α(s) and α(s)+ψ(s;β) ψ(1) (s; β) = ψ(s;β) β, (5) becomes equivalent to the score function of Diggle and Rowlingson s conditional likelihood. In fact, Rathbun (2012) showed that if both N and M are Poisson processes, this choice of weight function is optimal in the sense of yielding minimal parameter estimation variance. However, when N is not Poisson, the conditional likelihood is no longer optimal. To the best of our knowledge, no previous work has been attempted to determine the optimal weight function f( ; β) for the class of estimating functions of the form (5). In this paper we fill this gap. Our approach is built upon a recent development on the quasi-likelihood method for spatial point processes (Guan et al., 2015), where the authors considered the problem of finding the optimal first-order estimating function for a single spatial point process by solving a certain Fredholm integral equation. The development of quasi-likelihood for case-control data faces two 4

6 major challenges. Firstly, there is an unobserved latent baseline intensity function λ 0 ( ). As we will see in Section 2, the theoretical optimal weight function is the solution to a Fredholm integral equation involving λ 0 ( ). If one chooses to estimate λ 0 ( ), then the benefits of using the case-control approach would be lost. Secondly, the quasi-likelihood method in Guan et al. (2015) relies on deterministic numerical approximations of two key integrals: (1) the integral in the estimating equation, see subsection 2.6; and (2) the integral in the Fredholm integral equation when solving for the optimal weight function. While the former introduces bias to the estimating equation that is difficult to quantify (Baddeley et al., 2014), the latter approximation may invalidate the asymptotic results obtained based on the theoretical optimal weight function. Further, both approximations require covariate information at all numerical quadrature points. For many case-control type of data, covariate information may be readily available at the observed case and control locations but not necessarily so for an arbitrary point in the study region. It may therefore require additional work to derive the covariate information at the quadrature points required for the deterministic numerical approximation. To overcome these challenges, we develop a stochastic quasi-likelihood approach for case-control data that does not rely on λ 0 ( ) and uses only the observed covariate information. We propose a carefully designed leaveone-out algorithm to eliminate estimation bias. We prove that our proposed approach leads to an asymptotically as efficient estimator as the theoretical optimal approach under suitable conditions. Furthermore, we derive the asymptotic distribution of the regression parameter estimators based on the estimated weight function and not its theoretical optimal counterpart. We also discuss how the method can be applied to univariate point pattern data by simulating a synthetic control point process. The rest of the paper is organized as follows. A detailed discussion is given in Section 2 on the motivation and practical implementation of the proposed method. Simulation studies are conducted in Section 3 and two real data applications are considered in Section 4. Asymptotic results are given in Section 5. A sketch of the proposed algorithm is given in the Appendix and 5

7 all technical proofs are collected in the supplementary material. 2 Stochastic quasi-likelihood using case-control data 2.1 Background In this paper we assume that the control process M is an inhomogeneous Poisson process independent of N. However, we allow for possible correlations between counts of N as quantified by the pair correlation function g(, ) of N defined through Cov {N(ds), N(dt)} = δ(s t)λ N (s; β)ds + λ N (s; β)λ N (t; β){g(s, t) 1}dsdt, (6) where δ( ) is the Dirac function satisfying δ(s t)dt = I(s A) and I( ) denotes the indicator A function. For a Poisson process where the counts in distinct sets are independent, g(s, t) = 1 for any s t. Values of g greater (smaller) than one typically corresponds to clustered (regular) behaviors of N. By (6), the pair correlation function is symmetric, g(s, t) = g(t, s). In addition we assume that g(s, t) depends on s and t only through s t. In other words, the case process N is assumed to be second-order intensity reweighted stationary (Baddeley et al., 2000). A popular measure of efficiency for estimating functions is the Godambe information (Song, 2007). For our estimating function (5), the Godambe information is G f (β) = S T f (β)v 1 f (β)s f (β), (7) where S f = E{ Q f (β)/ β T }, V f (β) = Var{Q f (β)}, and the expectation and variance are with respect to both the case process N and the control process M. For any two weight functions f 1 ( ; β) and f 2 ( ; β), Q f1 (β) is said to be more efficient than Q f2 (β) if G f1 (β) G f2 (β) is nonnegative definite, denoted as G f1 (β) G f2 (β). The optimal estimating function Q φ (β) is thus defined as the one associated with the optimal weight function φ( ; β) such that G φ (β) G f (β) for any f( ; β) : W R p. (8) 6

8 2.2 The optimal estimating function By Guan et al. (2015), a sufficient condition for φ( ; β) to be the optimal weight function is that S f (β) = Cov{Q f (β), Q φ (β)} for all f( ; β) : W R p. (9) By the definition of the spatial increment process U( ; β) in (4), it is trivial to show that Cov{U(ds; β), U(dt; β)} = δ(s t) { } U(ds; β) E = λ 0 (s)ψ (1) (s; β)ds, and β α(s) + ψ(s; β) λ N (s; β)ds+λ N (s; β)λ N (t; β){g(s, t) 1}dsdt, α(s) which leads to S f (β) = Cov{Q f (β), Q φ (β)} = λ 0 (s)f(s; β)ψ (1) (s; β) T ds, and (10) W λ 0 (s)λ 0 (t)ψ(s; β)ψ(t; β){g(s, t) 1}f(s; β)φ(t; β) T dsdt W W + λ 0 (s)γ 1 (s; β)ψ(s; β)f(s; β)φ(s; β) T ds, (11) W where γ( ; β) and ψ (1) ( ; β) were defined under the equation (5). Combining (9)-(11), the optimal weight function φ( ; β) is the solution to the following integral equation: φ(s; β) + γ(s; β) λ 0 (t)ψ(t; β){g(s, t) 1}φ(t; β)dt = γ(s; β) ψ(1) (s; β) W ψ(s; β), or equivalently, φ(s; β) + λ 0 (t)α(t)r(s, t; β)φ(t; β)dt = η(s; β), (12) W where R(s, t; β) = γ(s; β)ψ(t; β){g(s, t) 1}/α(t) and η(s; β) = γ(s; β) ψ(1) (s;β). When γ( ; β), ψ(s;β) ψ( ; β) and g(, ) are continuous functions and g(, ) 1 is a positive definite function, the solution to (12) is unique (Guan et al. 2015). From now on, for ease of notation, we suppress the dependence of the functions R(, ), γ( ), ψ( ), and η( ) on β whenever there is no ambiguity. When N is a Poisson point process, g(s, t) = 1 for any s, t. Then the optimal weight function obtained through (12) has a closed form φ d ( ; β) = η( ; β), for which the resulting Q φd 7

9 is equivalent to the score function of the conditional likelihood proposed by Diggle and Rowlingson (1994). Furthermore, φ d ( ; β) also coincides with the optimal weight function studied in Waagepetersen (2008) and Rathbun (2012) for inhomogeneous Poisson processes. Our approach is therefore an important generalization of the conditional likelihood approach by Diggle and Rowlingson (1994). For a more general point process N, the integral equation (12) appropriately takes into account the correlation structure of N as given by the pair correlation function g(, ). 2.3 A naive stochastic estimator of φ( ; β) using all controls The integral equation (12) belongs to the extensively studied class of Fredholm integral equation of the second kind, see e.g., Hackbusch, (1995), Zemyan, (2012) and Kress (2014). Guan et al. (2015) proposed to find an approximate solution of the Fredholm integral equation under their consideration through the Nyström method, which is based on a deterministic numerical quadrature approximation of the integral that effectively converts the integral equation into a matrix equation. As we argued in Section 1, such an approach may not be applicable in our setting because λ 0 ( ) is typically unknown and also complete knowledge of ψ(t) and η(t) may not be available at all quadrature points. Below we introduce a stochastic variant of the Nyström method which avoids the aforementioned difficulty. Considering the integral in (12), Campbell s formula gives { } λ 0 (t)α(t)r(s, t)φ(t; β)dt = E R(s, t)φ(t; β), s W. W t M W Therefore, a naive proposal to estimate φ( ; β) is to solve the following stochastic equation: φ(s; β) + R(s, t)φ(t; β) = η(s), where s M W. (13) t M W Note that λ 0 (s) is not present in the above equation. Denote by φ k (s; β) and η k (s) the kth components of φ(s; β) and η(s), respectively, for k = 1,..., p. Let further {s 1,..., s M } denote 8

10 a realization of M of cardinality M. Then the components of the solution to equation (13) are φ k (s 1, M; β) η k (s 1 ) φ k (s 2, M; β). = {I η M + R(M, M)} 1 k (s 2 ), k = 1,..., p, (14). φ k (s M, M; β) η k (s M ) where R(M, M) is an M M matrix whose ijth entry is R(s i, s j ), i, j = 1,..., M. Note that we deliberately emphasize the dependence of φ k (s i, M; β) on the entire control process M, since this dependence is critical to our theoretical investigation. Denoting φ(s i, M; β) = ( φ1 (s i, M; β),, φ p (s i, M; β)) T for any si M, we use the following interpolation for an arbitrary location s W, φ(s, M; β) = η(s) t M W Plugging φ(s, M; β) in for f(s; β) in (5), we obtain R(s, t) φ(t, M; β). (15) Q SQLn (β) = s N W φ(s, M; β) s M W φ(s, M; β) ψ(s) α(s). However, by the Campbell-Mecke theorem (Baddeley, 2007, Theorem 3.2), E M,N {Q SQLn (β)} = W } } ] [E M { φ(s, M; β) λ N (s) E s M { φ(s, M; β) λ 0 (s)ψ(s) ds, where E s M denotes the Palm expectation of the point process M at a point s W, see Baddeley (2007) for more details. Since the Palm distribution and the original point process distribution are different even in case of a Poisson process (Baddeley, 2007), it follows that E M,N {Q SQLn (β)} 0 p. Therefore, the naive plug-in estimating function Q SQLn (β) is biased, which in turn results in bias in the associated parameter estimator. Our empirical simulation studies confirm that the parameter estimator can be substantially biased. 2.4 Leave-one-out bias correction method The bias of the estimating function Q SQLn (β) can be corrected by an application of a generalized version of the Slivnyak-Mecke s Theorem (Mecke, 1967). More specifically, for a Poisson point 9

11 process M with an intensity function λ( ) and any function f(s, M), one has that { } E f(s, M\{s}) = E {f(s, M)} λ(s)ds, (16) s M W W provided that (16) is well-defined. Motivated by equation (16), we modify the estimating function Q SQLn (β) as Q SQLu (β) = s N W φ(s, M; β) s M W φ (s, M\{s}; β) ψ(s) α(s), (17) where φ(s, M\{s}; β) is as defined in (15) with M replaced by M\{s}. Using the Slivnyak- Mecke s Theorem (16) and because of independence between N and M, E M,N {Q SQLu (β)} = W } } ] [E M { φ(s, M; β) λ N (s) E M { φ(s, M; β) λ 0 (s)ψ(s) ds = 0 p, which shows that Q SQLu (β) is an unbiased estimating function. It appears that the computation of all weights φ(s, M\{s}; β), s M, requires inverting an ( M 1) ( M 1) matrix as described in (14) repeatedly for M times, which leads to a computational cost of O( M 4 ) floating operations. However, the following Lemma states that this can be avoided by providing a short-cut formula for the leave-one-out estimator φ(s, M\{s}; β). Lemma 1. For any s M, there is a one-to-one relationship between the estimators of φ(s, M; β) and φ(s, M\{s}; β) as follows φ(s, M\{s}; β) = φ(s, M; β) w ss (s, M\{s}), (18) where w ss (s, M\{s}) is the diagonal entry of the matrix {I M + R(M, M)} 1 corresponding to location s. The proof is given in the supplementary material. Lemma 1 shows that the leave-one-out estimator φ(s, M\{s}; β) at all control locations can be obtained by inverting an M M matrix just once as is needed in (14). The computational cost of evaluating Q SQLu (β) is thus of the order O( M 3 + N M ), the same as that of Q SQLn (β). 10

12 It remains to quantify the potential loss of Godambe information by replacing the optimal φ(s; β) by the estimates φ(s, M; β), s N or φ(s, M \{s}; β), s M. The Godambe information matrix for (17) is defined as G(β) = S T (β)v 1 (β)s(β), (19) where S(β) = E { Q SQLu (β)/ β T } = W λ 0(s)E{ φ(s, M; β)} { ψ(s, β)/ β T } ds and V(β) = Var {Q SQLu (β)}. Regarding Q SQLu (β)/ β T, the weight function φ(s, M; β) also depends on the parameter β but by (16), the terms in Q SQLu (β)/ β T involving φ(s, M; β)/ β T and φ(s, M\{s}; β)/ β T cancel after taking the expectation. The derivation of V(β) is more involved and will be addressed later in Section 5.1. Furthermore, we show in Section 5.3 that the difference between G(β) and the optimal Godambe information matrix G φ (β) as defined in (7) with f( ; β) replaced by φ( ; β) is asymptotically negligible. 2.5 Practical implementation details Although computing the estimating function (17) is quite straightforward due to Lemma 1, several issues need to be addressed in practice. First note that R(s, t) in the integral equation (12) is not symmetric, which is inconvenient both for practical implementation and for theoretical investigations. So we first derive a symmetric counterpart of R(s, t). Define functions m(s) = α(s) {α(s) + ψ(s)} 1/2 ψ(s) 1/2, φ (s; β) = m 1 (s)φ(s; β), η (s) = m 1 (s)η(s). (20) Then the integral equation (12) can be written as φ (s; β) + W λ 0 (t)α(t)r (s, t)φ (t; β)dt = η (s), (21) where R (s, t) = m(s)r(s, t)/m(t) = r(s)r(t){g(s, t) 1} and r(s) = ψ(s) 1/2 {α(s) + ψ(s)} 1/2. 11

13 Following the procedure from equations (14)-(15), we define φ k (s 1, M; β) ηk φ k (s (s 1) 2, M; β). = {I η M + R (M, M)} 1 k (s 2). φ k (s M, M; β) ηk (s M ), k = 1,..., p, (22) where R (M, M) is an M M matrix whose ijth entry is R (s i, s j ), i, j = 1,..., M. Similarly, denote φ (s i ; β) = ( φ 1(s i, M; β),, φ p(s i, M; β)) T for any si M. Then for an arbitrary location s W, we define the interpolated function φ (s, M; β) = η (s) R (s, t) φ (t, M; β), for any s W. (23) t M W By the above definition, we have the following relationship φ(s, M; β) = m(s) φ (s, M; β). (24) The second issue is the computation of the inverse matrix {I M + R (M, M)} 1 when M is large. Fortunately, by the definition of R (M, M) in (22), we can see that a significant portions of its entries may be very close to 0, depending on how fast the function g(s, t) decays to 1 as s t increases. Assume that the pair correlation function g(s, t) is isotropic and can be expressed in the form of g 0 ( s t ). Then we can create a tapered version R taper(m, M) such that its ijth entry is the same as that of R (M, M) if s i s j d taper for some d taper > 0 and 0 otherwise. Following Guan et al. (2015), the taper distance d taper is chosen such that {g 0 (d taper ) 1}/{g 0 (0) 1} = τ 0 for some small threshold τ 0. Then a sparse matrix Cholesky decomposition can be used to obtain {I M + R taper(m, M)} 1 computationally efficiently. In our simulation studies, we use τ 0 = 10 6 if M > 4, 000 and otherwise we use the exact R (M, M). The random labeling theorem for Poisson processes provides an alternative to tapering for reducing the computational burden of inverting R (M, M) when the cardinality of the control process M is high. The theorem states that for B 1, M can be randomly split into independent and identically distributed Poisson processes M 1,..., M B. Assuming that M has intensity func- 12

14 tion α( )λ 0 ( ), each M b has intensity function α( )λ 0 ( )/B. We may then apply the case-control approach to obtain an estimate β (b) for each pair (N, M b ). The cardinality of M b is roughly 1/B times that of M which makes the inversion of R (M b, M b ) more feasible. Finally, β is estimated by the average β = 1 B β (b) B b=1, whose theoretical properties are investigated in Corollary 1 in Section 5.1. Obviously, the choice of B plays an important role for this divide-and-conquer strategy, which will be addressed in a future work. Another issue is that we assumed full knowledge of the pair correlation function g(, ) when finding the weight function φ(s, M; β). In practice g(, ) needs to be estimated. It is a common practice to assume that g(, ) belongs to a parametric family, g(, ; θ), governed by a parameter vector θ, and estimate θ from the data. We first obtain an estimate θ of θ using Guan et al. (2008) and then plug in g(, ; θ) for g(, ) in (22). We construct confidence intervals for β based on approximate normality of β. The theoretical justification for this is Theorem 1 in Section 5.1 where consistency and asymptotic normality of β is stated under the condition that θ is consistent with a sufficiently fast rate of convergence. In Section 5.1 we also provide estimates for the covariance matrix of β. Our simulation studies in Section 3 support the validity of the confidence intervals for β. 2.6 Stochastic quasi-likelihood as Monte Carlo approximation Letting the intensity of controls tend to infinity, it is easy to see that (5) converges to Q f (β) = f(s; β) s N W f(s; β)λ 0 (s)ψ(s; β)ds. Thus, (5) can be viewed as a Monte Carlo approximation of Q f using M as a set of random quadrature points. Suppose that λ 0 (s) is known, which implies that λ N (s) is purely parametric with a known multiplicative offset. Then we may simulate a synthetic control process M of known intensity α( ) and approximate Q f (β) by (5) as an alternative to the deterministic quadrature approximation used in Guan et al. (2015). The use of deterministic quadrature approximation to 13

15 the integral in Q f (β) introduces bias that can be difficult to quantify (Baddeley et al., 2014). In contrast, unbiasedness can be maintained using our proposed stochastic approximation. The use of the random quadrature process M introduces additional parameter estimation error. The error can be reduced by using a larger control intensity α( ), but this will on the other hand increase the computing time due to the need to solve a larger matrix equation, c.f. Section 2.5. An alternative is to simulate several independent synthetic control processes and apply the divide-and-conquer strategy in Section 2.5. The use of replicated synthetic control processes is exemplified in Section Simulation Studies In this section, we conducted a simulation study to investigate the finite sample performance of the proposed method. Both the case and control processes were simulated over an n n square window with n = 1, 2 using the R package spatstat (Baddeley and Turner, 2005). For each n, we set the baseline intensity λ 0,n (s) = exp{β0,n M + Y (s) + Z(s)/4}, where Y (s) and Z(s) are two independent realizations of stationary and isotropic Gaussian random fields with mean 0 and β0,n M is chosen such that n 2 λ [0,n] 2 0,n (s)ds = 1; see Figure 1(a) for details. Then the case and control intensities were specified as λ N,n (s) = λ 0,n (s) exp{β N 0,n + Z(s)β 1 } and λ M,n (s) = α(s)λ 0,n (s)π(s), (25) where β 1 = 1 and the intercept β N 0,n were chosen so that on average 400n 2 case events were simulated. The function Π(s) was introduced here to allow various types of departure from the proportional assumption between λ N,n ( ) and λ M,n ( ) and will be specified later. The control processes were simulated using an inhomogeneous Poisson process by choosing α(s) equal to a constant α for all s W n, where α = 400, 500,..., 1500, The case processes were simulated 14

16 as inhomogeneous Thomas processes (Waagepetersen, 2007) with a pair correlation function g(s, t) = 1 + (4πω 2 κ) 1 exp { (4ω 2 ) 1 s t 2}, (26) where κ > 0 and ω > 0 are the intensity of the parent process and the dispersal parameter, respectively. We considered κ = 50, 100 and ω = 0.02, 0.04 for different clustering scenarios. 3.1 Correct model specification with Π(s) 1 In this subsection, we first consider the case scenario where the assumptions of (1) and (2) hold by setting Π(s) 1. For each simulated case and control processes, θ = (κ, ω) T was first estimated using the approach given in Guan et al. (2008) and then the proposed procedure was applied to estimate β0,n N and β 1 by plugging in the estimated θ n = ( κ n, ω n ) T. Three estimation approaches were considered: Diggle and Rowlingson (1994) s conditional likelihood estimate (CLE) and the two proposed stochastic quasi-likelihood estimation approaches based on the naive method (SQLn ) given in Section 2.3 and the unbiased version (SQLu) given in (17), where the leave-one-out correction was applied to φ(s, M\{s}) for s M. For the SQLu method, we also considered the situation when the parametric family of the pair correlation function was mis-specified. More specifically, instead of (26) we used a pair correlation function for a variance-gamma shot-noise Cox process (Jalilian et al., 2013) which has the incorrect exponential form g(s, t) = 1 + a 1 exp ( b 1 s t ), a > 0, b > 0. Summary statistics based on 1,000 simulations are presented in Table 1 and Figure 1, where rmse represents root mean square error of the parameter estimates and CP90 represents the coverage probabilities of the nominal 90% confidence intervals constructed using Theorem 1 by plugging in the estimated matrices given in (30). The first observation is that SQLn produced biased estimators for β 1, which confirms our discussion in Section 2.3. The large bias typically 15

17 led to a larger rmse than for CLE. Table 1 also suggests that this bias decreased as α increased. In contrast, the SQLu estimate of β 1 was close to unbiased. Table 1: Biases and rmses of the different estimators for β 1. CLE SQLn SQLu -Est.Thomas SQLu -Est.Exponential (κ, ω) n α BIAS rmse BIAS rmse BIAS rmse CP90 BIAS rmse CP90 (50,0.02) % % % % % % % % % % % % (50,0.04) % % % % % % % % % % % % (100,0.02) % % % % % % % % % % % % (100,0.04) % % % % % % % % % % % % In terms of rmse, the SQLu estimating function outperformed CLE in almost all cases and the improvement in rmse could be quite significant. In accordance with the asymptotic results in Theorem 1, the rmses were approximately halved when n was increased from 1 to 2. Figure 1(b)- (c) show that the rmses of the CLE did not necessarily decrease as α increased. In contrast, the rmses for SQLu decreased steadily as α increased. This indicates that CLE made less efficient use of the control processes than the SQLu method. In addition, Figure 1 (f) illustrates that the averages of the empirical W n 1 tr(g φn ) also increased steadily as α increased. Both 16

18 0.09 MSE MSE 0.14 y CLE SQLu True g(r) SQLu Est.Thomas SQLu Est.Exp. SQLu ASE (c) Estimation accuracies (κ=50, ω=0.04, n=2) (b) Estimation accuracies (κ=50, ω=0.04, n=1) (a) The baseline intensity λ0, 2(s) κ=50 ω=0.04 α=1/2 n=2 True g(r) n=1 n=2 90 tr(gφ^n) Wn 2.5 g(r) (f) The Godambe information (κ=50, ω=0.04) (e) Estimated PCF (Exponential) g(r) α (d) Estimated PCF (Thomas) True g(r) α x κ=50 ω=0.04 α=1/2 n= CLE SQLu True g(r) SQLu Est.Thomas SQLu Est.Exp. SQLu ASE r 0.06 r α Figure 1: (a) The baseline intensity function λ0 (s); (b)-(c): empirical rmses of estimates of β1 obtained using CLE and SQLu with the true g(, ), the estimated Thomas PCF and the estimated Exponential PCF. The abbreviation SQLu ASE is for the asymptotic standard errors given by Theorem 1; (d)-(e): the estimated Thomas PCFs and Exponential PCFs; (f): the averages of the empirical Wn 1 tr(gφbn ). observations support our theoretical findings in Theorem 4. Furthermore, Figure 1(b)-(c) and Table 1 show that even when the parametric family of g(s, t; θ) was mis-specified, the estimation accuracy as well as the coverage probabilities of the confidence intervals were almost not affected at all. This surprising observation can be explained by Figure 1(d)-(e), where we can see that the estimated exponential pair correlation functions, although mis-specified, were still able to capture the clustering pattern among the cases. Finally, we investigated the impact of using the preliminary estimator θ n on the statistical properties of the estimates obtained using the SQLu method. To do so, we again estimated the N β0,n and β1 using SQLu but now with the true pair correlation function g(s, t) instead of plugging in the estimated pair correlation function. The results are summarized in Figure 1(b)-(c), where 17

19 we can see that the estimated θ n indeed caused additional variability in ˆβ 1. As a result, the asymptotic standard error given in Theorem 1 slightly underestimated the standard error of ˆβ 1. However, when n = 2, the asymptotic standard error matched the empirical standard error quite well, which resulted in valid coverage probabilities of the confidence intervals in almost all cases. This confirms our theoretical finding of Theorem Misspecified models with Π(s) 1 To study the robustness of the SQLu method to model misspecification, we applied it to casecontrol point pattern data that have some departure from assumptions (1) and (2). More specifically, let X(s) be an isotropic Gaussian random field with an exponential covariance function with mean 0, variance 1 and a range parameter 10. We consider three forms of Π(s) Model I: Model II: Π(s) = 1 + q sin(2πy), for s = (x, y), ( ) Π(s) = Φ ρ q Z(s) + 1 ρ 2 qx (s), (27) Model III: Π(s) = exp { q X(s) q 2 /2 }, where X (s) denotes a single realization of X(s), q = 0.25, 0.5, 0.75, 1.0, ρ q = 0.8(q 0.25) and Φ( ) is the standard normal cumulative distribution function. Following the same estimation procedures outlined in the previous subsection and pretending Π(s) 1, summary statistics based on 1, 000 simulation runs with κ = 50, ω = 0.04 and n = 2 are presented in Table 2. Model I investigates the case when the sampling scheme α(s) is systematically misspecified and the misspecification becomes more severe as q increases. In this case, we can see from Table 2 that both CLE and SQLu might produce a biased estimator for β 1 and the biases increased as q grew. However, one noticeable feature is that the SQLu consistently produced much smaller biases than the CLE until q = 1. Furthermore, the coverage probabilities of the confidence intervals of the SQLu method appear to be reasonably good with small to moderate values of q. Under this case scenario, the SQLu appeared to be more robust than the CLE method. Model II mimics the situation when some covariate, namely, X (s), that only affects the 18

20 control process is left out. In a sense, this can also be viewed as a misspecification of the function ψ(s) in (1), which should be ψ(s, β) = exp{β0,n+z(s)β N 1 }Π 1 (s) as opposed to ψ(s, β) = exp{β0,n N + Z(s)β 1 } given in (25). In this case, it appears that when ρ q = 0 with q = 0.25, both CLE and SQLu methods still produced unbiased estimators for β 1. The estimation biases became larger as ρ q increased as expected. However, when ρ q 0, β 1 can no longer be interpreted as the elevated/reduced impact of Z(s) on case process relative to control process. Model III deals with the case scenario when even conditioned on λ 0 (s) and ψ(s), the control process is still not a Poisson process. With a new X(s) simulated for each simulation run, the control process M becomes a log-gaussian Cox process with a pair correlation function g M (s, t) = exp {q 2 exp ( s t /10)}. In this case, both CLE and SQLu methods yielded unbiased estimators for β 1 for any values of q. However, as q increased, the variances of both estimators generally increased due to additional aggregations introduced into the control process. Coverage probabilities of the resulting confidence intervals are slightly off the nominal level. Nevertheless, the SQLu estimator outperformed the CLE estimator in terms of rmse for any q in this case. 4 Data examples 4.1 Beijing restaurant locations The first data example concerns locations of two types of restaurants in Beijing, China. The data were collected from 11 districts of Beijing through the search engine The control process consisted of locations of traditional Chinese restaurants. Due to the limit on the number of restaurant locations that can be returned by the search engine, we extracted a random sample consisting of 6% of the Chinese restaurants, i.e. using a uniform sampling probability α(s) = This resulted in 2,659 control locations. The case process consisted of locations of all 1, 781 Western-style restaurants in Beijing. Figure 2(a) gives all restaurant locations, where 19

21 Table 2: Biases and rmses of the different estimators for β 1 with κ = 50, ω = 0.04 and n = 2. q = 0.25 (ρ q = 0) q = 0.5 (ρ q = 0.2) CLE SQLu CLE SQLu Model α BIAS rmse BIAS rmse CP90 BIAS rmse BIAS rmse CP90 I % % % % % % II % % % % % % III % % % % % % q = 0.75 (ρ q = 0.4) q = 1.0 (ρ q = 0.6) CLE SQLu CLE SQLu BIAS rmse BIAS rmse CP90 BIAS rmse BIAS rmse CP90 I % % % % % % II % % % % % % III % % % % % % it appears that the Western-style restaurants tended to be more concentrated than the Chinese restaurants. For model estimation, we converted the longitude/latitude locations into UTM coordinates (northing, easting) following Snyder (1987). We modeled the possible differences between the spatial patterns of Western and Chinese restaurants using ψ(s; β) = exp{β 0 + β T Z(s)}, where the covariate vector Z(s) consisted of two district level covariates: the average annual income of a regular worker (in 10, 000 RMB, Income ) and the logarithm of the total number of foreign tourists (in 10, 000, log-travel ) in The intercept β 0 was introduced in ψ(s; β) to model the overall difference in the intensities between the Western and Chinese restaurants in Beijing. To model possible clustering in the Western restaurant locations that was not explained by the intensity function, we further 20

(a) Restaurants locations 1.4 (b) Estimated Pair correlation function 40.50 (c) Residual plot (h= 0.5) Parametric PCF Nonparametric PCF 39.75 445 1 0 440 1.2 g(r) 40.00 1.1 Latitude 40.

22 (a) Restaurants locations 1.4 (b) Estimated Pair correlation function (c) Residual plot (h= 0.5) Parametric PCF Nonparametric PCF g(r) Latitude Northning (10km) Western Longitude Chinese Type distance (10 km) Easting (10km) Figure 2: (a) Locations of Restaurants; (b) Estimated pair correlation functions (c) Residuals from the reduced model (locations with UTM coordinates). introduced a parametric pair correlation function g(s, t) as defined in (26). Using the approach in Guan et al. (2008), the estimated parameters are κ = 5.65 and ω = Figure 2(b) shows that the estimated parametric pair correlation function agrees well with the nonparametric one (Guan et al. 2008) and both indicate the presence of clustering. Table 3: Regression parameter estimates for the restaurant data Method Full Model Reduced Model Intercept Income log-travel Intercept log-travel CLE Estimates (SE) -4.38(0.51) 0.050(0.079) 0.21(0.094) -4.05(0.19) 0.25(0.067) P-value SQLu Estimates (SE) -4.30(0.32) 0.028(0.045) 0.19(0.048) -4.11(0.12) 0.21(0.037) P-value Finally, the estimated regression parameters are summarized in Table 3. The covariate Income is not significant while the covariate log-travel is significant regardless of the estimation method used. The covariate Income impacts the distributions of both Chinese and Westernstyle restaurants and is therefore likely being absorbed into the baseline intensity λ0 (s). The positive parameter estimate for the covariate log-travel shows that the Western-style restaurants tended to be more concentrated (relative to the Chinese restaurants) in districts that attracted more foreign tourists. Comparing the two approaches, SQLu produced much smaller standard errors than CLE, which illustrates the potential advantage of the proposed method. To 21

23 assess the goodness of fit, we also computed standardized smoothed residuals (see Guan et al. 2008) on a grid over the the banded region in Figure 2(c) (residuals are only calculated for the 602 grid points that have at least 5 restaurants within a 5 km radius). The residuals are all of moderate magnitude and do not contradict the proposed model. Note that the apparent correlation in the residual plot is partly due to the smoothing procedure and partly due to the correlation in the point pattern data, cf. the fitted pair correlation function in Figure Tropical rain forest data The second data example concerns the spatial locations of three tropical forest tree species, Acalypha diversifolia (528 trees), Lonchocarpus heptaphyllus (836 trees) and Capparis frondosa (3299 trees), in a 1000m 500m rectangle window on the Barro Colorado Island (Condit, 1998; Hubbell et al. 1999; Hubbell et al. 2005). Guan et al. (2015) conducted a detailed investigation of the point patterns of locations of these three species and their associations with environmental variables such as elevation (dem), slope gradient (grad), and soil contents of potassium (K), mineralized nitrogen (Nmin) and phosphorus (P). All three species display certain clustering patterns modeled using parametric pair correlation functions. For each species, there is no apparent control process available to assist modeling of the underlying spatial intensity function. The purpose of this analysis is to show how the case-control methodology can be used, as described in Section 2.6, as a computationally efficient alternative to deterministic quadrature approximation when implementing quasi-likelihood for spatial point patterns (Guan et al. 2015). More specifically, we treated each species of interest as a case process separately and assumed that the case intensity function took a purely parametric form λ N (s; β) = exp{β 0 + β T Z(s)}, as assumed in Guan et al. (2015), where the covariate vector Z(s) consisted of environmental variables. Such a parametric assumption on λ N (s; β) leads to a special case of model (1) with λ 0 (s) = 1 and ψ(s; β) = exp{β 0 + β T Z(s)} and enabled us to simulate controls from a homogeneous Poisson process with a constant intensity α for the analysis of each 22

24 species. The proportional structure (1)-(2) was therefore maintained with such constructions of case and control point patterns. Furthermore, in this case, the regression parameter β should be interpreted as the elevated/reduced impacts of Z(s) on the tree location intensities relative to the complete spatial randomness when all tree locations follow a homogeneous Poisson process. For a fair comparison with Guan et al. (2015), we adopted both the selected covariates and the estimated pair correlation functions given in Guan et al. (2015); see Guan et al. (2015) for more details. The controls were simulated using increasing intensities α such that the average numbers of simulated controls W α ranged from 500 to For each given intensity α, 1, 000 independent realizations of the control process were simulated and an averaged estimator β as well as its standard error for both the CLE and the SQLu method were obtained following Corollary 1. The results are summarized in Table 4 and Figure 3, where QL stands for the quasi-likelihood approach proposed in Guan et al. (2015). We did not apply any tapering for neither the SQLu method nor the QL method. Table 4: Estimates and standard errors of the Tropical Forest data Acalypha Lonchocarpus Capparis Method W n α K Nmin P dem grad K CLE (1.22) -2.79(0.71) -0.16(0.057) 2.85(0.83) -0.88(1.05) 4.11(0.99) (1.22) -2.78(0.71) -0.16(0.057) 2.84(0.83) -0.98(1.05) 4.15(0.99) (1.23) -2.74(0.72) -0.16(0.057) 2.86(0.83) -0.97(1.06) 4.19(0.99) (1.24) -2.75(0.73) -0.16(0.058) 2.86(0.84) -1.03(1.07) 4.19(0.99) SQLu (1.22) -2.77(0.70) -0.15(0.056) 2.74(0.81) -1.05(1.00) 4.03(0.96) (1.22) -2.74(0.70) -0.15(0.056) 2.67(0.80) -1.27(0.98) 4.05(0.95) (1.23) -2.72(0.70) -0.14(0.056) 2.57(0.80) -1.43(0.96) 4.05(0.94) (1.23) -2.72(0.70) -0.14(0.055) 2.45(0.79) -1.70(0.95) 4.01(0.94) QL(100 50) N/A 4.39(1.22) -2.77(0.70) -0.15(0.055) 2.29(0.79) -1.88(0.94) 4.04(0.94) Table 4 shows that the estimates for both Acalypha and Lonchocarpus are very similar for all approaches. This is because the pair-correlation functions drop quickly, see Figure 3(e). On the other hand, for Capparis, where the pair correlation function decays much slower, see Figure 3(f), SQLu and QL produced very different estimates from those obtained with CLE. One noticeable feature is that as α increased, the estimated coefficient of grad as well as the 23

25 associated standard error decreased for the SQLu method. To give a better idea of the efficiency of each method, Figure 3 shows the efficiency of CLE/SQLu relative to QL (using grid points) as a function of W n α. The CLE method is almost always less efficient than SQLu and QL. On the contrary, for the SQLu approach, the standard errors quickly reached the same level as the approximately optimal QL method as α increased. The SQLu method maybe more computationally scalable because (a) much less control locations were needed in all three examples to reach similar standard errors as the QL method, which relied on 5, 000 quadrature points for all cases and (b) the computation of the averaged-sqlu estimate can be easily parallelized. (a) Acalypha (b) Lonchocarpus (c) Capparis Relative Efficiency K (CCU) K (CLE) Relative Efficiency Nmin (CCU) P (CCU) Nmin (CLE) P (CLE) Relative Efficiency Dem (CCU) grad (CCU) K (CCU) Dem (CLE) grad (CLE) K (CLE) W nα W nα W nα (d) Estimated PCF (Acalypha) (e) Estimated PCF (Lonchocarpus) (f) Estimated PCF (Capparis) g(r) g(r) g(r) distance (km) distance (km) distance (km) Figure 3: Top panels: Relative efficiency defined as the standard error of CLE or SQLu divided by the standard error of QL estimators; Bottom panels: estimated pair correlation functions. 5 Asymptotic properties In this section, we first study the asymptotic properties of the estimator β obtained using the estimating function (17). Then we show that under certain conditions, this estimator is asymp- 24

Spatial analysis of tropical rain forest plot data

Spatial analysis of tropical rain forest plot data Rasmus Waagepetersen Department of Mathematical Sciences Aalborg University December 11, 2010 1/45 Tropical rain forest ecology Fundamental questions: