Stochastic Quasi-likelihood for Case-Control Point Pattern Data

Size: px
Start display at page:

Download "Stochastic Quasi-likelihood for Case-Control Point Pattern Data"

Transcription

1 Stochastic Quasi-likelihood for Case-Control Point Pattern Data Ganggang Xu, Rasmus Waagepetersen and Yongtao Guan January 8, 2018 Abstract We propose a novel stochastic quasi-likelihood estimation procedure for case-control point processes. Quasi-likelihood for point processes depends on a certain optimal weight function and for the new method the weight function is stochastic since it depends on the control point pattern. The new procedure also provides a computationally efficient implementation of quasi-likelihood for univariate point processes in which case a synthetic control point process is simulated by the user. Under mild conditions, the proposed approach yields consistent and asymptotically normal parameter estimators. We further show that the estimators are optimal in the sense that the associated Godambe information is maximal within a wide class of estimating functions for case-control point processes. The effectiveness of the proposed method is further illustrated using extensive simulation studies and two data examples. Some key words: Case-control data, Godambe information, Optimal estimating equations, Point process, Stochastic Quasi-likelihood. Short title: Stochastic Quasi-likelihood for Case-Control Point Pattern Data Ganggang Xu is Assistant Professor ( gang@math.binghamton.edu), Department of Mathematical Sciences, Binghamton University, State University of New York, NY Rasmus Waagepetersen is Professor ( rw@math.aau.dk), Department of Mathematical Sciences, Aalborg University, Denmark. Yongtao Guan is Leslie O. Barnes Professor ( yguan@bus.miami.edu), Department of Management Science, University of Miami, Coral Gables, FL Xu s research was supported by Collaboration Grants for Mathematicians from the Simons Foundation (Award Number: ). Waagepetersen s research was supported by The Danish Council for Independent Research-Natural Sciences, grant DFF Statistics for point processes in space and beyond, and by the Centre for Stochastic Geometry and Advanced Bioimaging, funded by grant 8721 from the Villum Foundation. Guan s research was supported by National Institutes of Health grant R01 CA The authors thank the editor, the associate editor and anonymous referees for their constructive comments that lead to substantial improvements of the article. The authors also thank Prof. Hansheng Wang and Mr. Yu Chen for their help in collecting the Beijing restaurant location data.

2 1 Introduction Today spatially referenced datasets on human activities can be easily harvested from social media platforms and mobile devices equipped with GPS. Such data can be of great interest e.g. to sociologists, geographers, economists and marketing analysts. As one example, we consider in Section 4.1 bivariate point pattern data obtained from the Chinese search engine baidu.com, with the locations of Chinese and Western-style restaurants in Beijing. Other examples include data from Twitter or analogues giving times of tweets and locations of the persons tweeting (Lu et al., 2016). For spatial point pattern data regarding human activities in urban environments, a particular challenge is the very complex form of the intensity function. For example, commercial restaurants in Beijing typically cannot be found in parks and certain official areas so that the intensity function in such areas is effectively zero. Also the intensity can vary abruptly when moving from one neighbourhood into another. This means that a full modeling of the intensity function would require very detailed information on the geography of the city and gathering such information can be a cumbersome task. The aforementioned difficulties are well-known in spatial epidemiology where spatial point processes have been used as an effective tool to investigate risk factors for various diseases; see for example, Diggle (1990), Diggle and Rowlingson (1994), Diggle et al. (1997), and Zimmerman et al. (2012). Suppose that a spatial point process N is used to model the locations of occurrences of a disease over a spatial domain W in the population at risk. Then a commonly used model for the intensity function λ N ( ) of N is λ N (s) = ψ(s)λ 0 (s), s W, (1) where λ 0 (s) serves as a baseline intensity function related to the population at risk and the nonnegative factor ψ(s) models the elevated or reduced risk that an individual located at s catches the disease. In this model, ψ( ) is of primary interest while λ 0 ( ) can be viewed as an 1

3 infinite dimensional nuisance parameter. Typically a parametric model ψ(s; β) is assumed for the dependence of the risk at location s on some risk factors Z(s) associated with s. For example, one popular choice is ψ(s; β) = exp{β T Z(s)} with Z(s) being some environmental, demographic, or life-style related variables at s W. With such a structure, the parameter β gives a direct interpretation of the potential risk related to the risk factors Z(s). For reasons similar to the ones mentioned previously, the specification of λ 0 ( ) is much more intricate and a simple parametric model may not be tenable. Instead it is common to estimate λ 0 ( ) nonparametrically, e.g., by using kernel smoothing (Diggle, 1990). However, the resulting estimator may not be consistent as argued by Guan (2008). The impact of using a potentially inconsistent estimator of λ 0 ( ) on the inference regarding the parameter β is further difficult to quantify. An appealing alternative is to use case-control data including an additional control process M, also observed over W. The intensity function of M is assumed to be of the form λ M (s) = α(s)λ 0 (s), s W, (2) where α(s) is the sampling intensity when collecting the control data. The value of α( ) is determined by the actual sampling scheme of the case-control study and is thus often considered known (Diggle and Rowlingson, 1994; Zimmerman et al., 2012). For the Beijing restaurants, for example, we will study the spatial pattern of Western-style restaurants using a random sample of the Chinese restaurants as a control process. To clarify the assumptions and scope of our modeling approach we consider the following illustrative example. Suppose there exist three sets of spatial covariates, X(s), Y(s) and Z(s), conditioned on which, the case process N and the control process M are independent Poisson processes with intensity functions Λ N (s) = exp { β T Y Y(s) + (β Z + β) T Z(s) + β T XX(s) } and Λ M (s) = α(s) exp { β T Y Y(s) + β T ZZ(s) }. (3) However, only Z(s) are collected in the observed data and both X(s) and Y(s) need to be treated 2

4 as latent processes. Note that the latent process Y(s) affect both the case and control processes equivalently, but X(s) only affects the case process. Assume that X(s) is independent of Y(s) and Z(s) and that β 0 = log [ E exp { β T XX(s) }] does not depend on s. Then conditioned on Y(s) and Z(s), the control process M is still a Poisson process with intensity λ M (s) = Λ M (s) = α(s)λ 0 (s) where λ 0 (s) = exp { β T Y Y(s) + β T ZZ(s) } and the case process N becomes a Cox process with an intensity λ N (s) = E {Λ N (s) Y(s), Z(s)} = λ 0 (s)ψ(s; β) where ψ(s; β) = exp { β 0 + β T Z(s) }. The first take-away from the above example is that the parameter β needs to be interpreted as the elevated/reduced impacts of Z(s) on the case intensity λ N (s) compared to their impacts on the control intensity λ M (s). Let Z j (s) and β j be the jth elements of Z(s) and β, respectively. Then, β j = 0 means that Z j (s) has the same effect on N and M. Secondly, a key assumption in (3) is that for all factors affecting both the case and control intensities, they are either observed, i.e., Z(s), or otherwise equivalently contributing to both intensities, such as Y(s). The major difficulty in treating the baseline intensity λ 0 (s) = exp { β T Y Y(s) + β T ZZ(s) } as an unknown deterministic function lies in that its estimation can be rather challenging without observing Y(s). This difficulty, however, can be avoided by using the proportional structure between the intensity functions defined in (1) and (2), where λ 0 (s) needs not to be estimated. Consequently, it enables our theoretical investigations to treat λ 0 (s) and ψ(s) in (1) and (2) as deterministic functions, conditioned on which N and M are assumed to be independent of each other with M being a Poisson process. Diggle and Rowlingson (1994) studied this case-control setting where N and M are independent Poisson processes and proposed a conditional likelihood approach to estimate the unknown parameter β. However, the strong independence properties implied by the Poisson assumption for the case process N may be too restrictive since they preclude possible interactions among the cases. For example, this may not be appropriate for modeling infectious diseases (Diggle et al., 1997; Diggle et al., 2007). For this reason, we consider the scenario where the case process N may have some clustering patterns. For example, in the above illustrative example, by allowing spatial 3

5 dependence in the latent process X(s) in (3), conditioned on λ 0 (s) = exp { β T Y Y(s) + β T ZZ(s) } and ψ(s; β) = exp { β 0 + β T Z(s) }, the case process N becomes a Cox process, which may have additional aggregations relative to the control process M. Diggle and Rowlingson s conditional likelihood may be viewed from an estimating function point of view. Consider the following spatial increment: U(ds; β) = N(ds) ψ(s; β) M(ds), (4) α(s) where N(ds) and M(ds) denote the numbers of points from N and M in an infinitesimal spatial increment ds located around the spatial location s. Note that E{N(ds)} = λ N (s)ds and E{M(ds)} = λ M (s)ds. Then based on (1) and (2), it is trivial to see that E{U(ds; β)} = 0. As a result, using general theories on estimating equations (see, e.g. Crowder, 1986), we can estimate the p 1 parameter vector β consistently by solving the following estimating equation: Q f (β) = W f(s; β)u(ds; β) = s N f(s; β) s M ψ(s; β) α(s) f(s; β) = 0 p, (5) where f(s; β) is a p 1 real vector-valued function and 0 p is a p 1 vector of zeros. When f(s; β) = γ(s) ψ(1) (s;β) ψ(s;β) where γ(s; β) = α(s) and α(s)+ψ(s;β) ψ(1) (s; β) = ψ(s;β) β, (5) becomes equivalent to the score function of Diggle and Rowlingson s conditional likelihood. In fact, Rathbun (2012) showed that if both N and M are Poisson processes, this choice of weight function is optimal in the sense of yielding minimal parameter estimation variance. However, when N is not Poisson, the conditional likelihood is no longer optimal. To the best of our knowledge, no previous work has been attempted to determine the optimal weight function f( ; β) for the class of estimating functions of the form (5). In this paper we fill this gap. Our approach is built upon a recent development on the quasi-likelihood method for spatial point processes (Guan et al., 2015), where the authors considered the problem of finding the optimal first-order estimating function for a single spatial point process by solving a certain Fredholm integral equation. The development of quasi-likelihood for case-control data faces two 4

6 major challenges. Firstly, there is an unobserved latent baseline intensity function λ 0 ( ). As we will see in Section 2, the theoretical optimal weight function is the solution to a Fredholm integral equation involving λ 0 ( ). If one chooses to estimate λ 0 ( ), then the benefits of using the case-control approach would be lost. Secondly, the quasi-likelihood method in Guan et al. (2015) relies on deterministic numerical approximations of two key integrals: (1) the integral in the estimating equation, see subsection 2.6; and (2) the integral in the Fredholm integral equation when solving for the optimal weight function. While the former introduces bias to the estimating equation that is difficult to quantify (Baddeley et al., 2014), the latter approximation may invalidate the asymptotic results obtained based on the theoretical optimal weight function. Further, both approximations require covariate information at all numerical quadrature points. For many case-control type of data, covariate information may be readily available at the observed case and control locations but not necessarily so for an arbitrary point in the study region. It may therefore require additional work to derive the covariate information at the quadrature points required for the deterministic numerical approximation. To overcome these challenges, we develop a stochastic quasi-likelihood approach for case-control data that does not rely on λ 0 ( ) and uses only the observed covariate information. We propose a carefully designed leaveone-out algorithm to eliminate estimation bias. We prove that our proposed approach leads to an asymptotically as efficient estimator as the theoretical optimal approach under suitable conditions. Furthermore, we derive the asymptotic distribution of the regression parameter estimators based on the estimated weight function and not its theoretical optimal counterpart. We also discuss how the method can be applied to univariate point pattern data by simulating a synthetic control point process. The rest of the paper is organized as follows. A detailed discussion is given in Section 2 on the motivation and practical implementation of the proposed method. Simulation studies are conducted in Section 3 and two real data applications are considered in Section 4. Asymptotic results are given in Section 5. A sketch of the proposed algorithm is given in the Appendix and 5

7 all technical proofs are collected in the supplementary material. 2 Stochastic quasi-likelihood using case-control data 2.1 Background In this paper we assume that the control process M is an inhomogeneous Poisson process independent of N. However, we allow for possible correlations between counts of N as quantified by the pair correlation function g(, ) of N defined through Cov {N(ds), N(dt)} = δ(s t)λ N (s; β)ds + λ N (s; β)λ N (t; β){g(s, t) 1}dsdt, (6) where δ( ) is the Dirac function satisfying δ(s t)dt = I(s A) and I( ) denotes the indicator A function. For a Poisson process where the counts in distinct sets are independent, g(s, t) = 1 for any s t. Values of g greater (smaller) than one typically corresponds to clustered (regular) behaviors of N. By (6), the pair correlation function is symmetric, g(s, t) = g(t, s). In addition we assume that g(s, t) depends on s and t only through s t. In other words, the case process N is assumed to be second-order intensity reweighted stationary (Baddeley et al., 2000). A popular measure of efficiency for estimating functions is the Godambe information (Song, 2007). For our estimating function (5), the Godambe information is G f (β) = S T f (β)v 1 f (β)s f (β), (7) where S f = E{ Q f (β)/ β T }, V f (β) = Var{Q f (β)}, and the expectation and variance are with respect to both the case process N and the control process M. For any two weight functions f 1 ( ; β) and f 2 ( ; β), Q f1 (β) is said to be more efficient than Q f2 (β) if G f1 (β) G f2 (β) is nonnegative definite, denoted as G f1 (β) G f2 (β). The optimal estimating function Q φ (β) is thus defined as the one associated with the optimal weight function φ( ; β) such that G φ (β) G f (β) for any f( ; β) : W R p. (8) 6

8 2.2 The optimal estimating function By Guan et al. (2015), a sufficient condition for φ( ; β) to be the optimal weight function is that S f (β) = Cov{Q f (β), Q φ (β)} for all f( ; β) : W R p. (9) By the definition of the spatial increment process U( ; β) in (4), it is trivial to show that Cov{U(ds; β), U(dt; β)} = δ(s t) { } U(ds; β) E = λ 0 (s)ψ (1) (s; β)ds, and β α(s) + ψ(s; β) λ N (s; β)ds+λ N (s; β)λ N (t; β){g(s, t) 1}dsdt, α(s) which leads to S f (β) = Cov{Q f (β), Q φ (β)} = λ 0 (s)f(s; β)ψ (1) (s; β) T ds, and (10) W λ 0 (s)λ 0 (t)ψ(s; β)ψ(t; β){g(s, t) 1}f(s; β)φ(t; β) T dsdt W W + λ 0 (s)γ 1 (s; β)ψ(s; β)f(s; β)φ(s; β) T ds, (11) W where γ( ; β) and ψ (1) ( ; β) were defined under the equation (5). Combining (9)-(11), the optimal weight function φ( ; β) is the solution to the following integral equation: φ(s; β) + γ(s; β) λ 0 (t)ψ(t; β){g(s, t) 1}φ(t; β)dt = γ(s; β) ψ(1) (s; β) W ψ(s; β), or equivalently, φ(s; β) + λ 0 (t)α(t)r(s, t; β)φ(t; β)dt = η(s; β), (12) W where R(s, t; β) = γ(s; β)ψ(t; β){g(s, t) 1}/α(t) and η(s; β) = γ(s; β) ψ(1) (s;β). When γ( ; β), ψ(s;β) ψ( ; β) and g(, ) are continuous functions and g(, ) 1 is a positive definite function, the solution to (12) is unique (Guan et al. 2015). From now on, for ease of notation, we suppress the dependence of the functions R(, ), γ( ), ψ( ), and η( ) on β whenever there is no ambiguity. When N is a Poisson point process, g(s, t) = 1 for any s, t. Then the optimal weight function obtained through (12) has a closed form φ d ( ; β) = η( ; β), for which the resulting Q φd 7

9 is equivalent to the score function of the conditional likelihood proposed by Diggle and Rowlingson (1994). Furthermore, φ d ( ; β) also coincides with the optimal weight function studied in Waagepetersen (2008) and Rathbun (2012) for inhomogeneous Poisson processes. Our approach is therefore an important generalization of the conditional likelihood approach by Diggle and Rowlingson (1994). For a more general point process N, the integral equation (12) appropriately takes into account the correlation structure of N as given by the pair correlation function g(, ). 2.3 A naive stochastic estimator of φ( ; β) using all controls The integral equation (12) belongs to the extensively studied class of Fredholm integral equation of the second kind, see e.g., Hackbusch, (1995), Zemyan, (2012) and Kress (2014). Guan et al. (2015) proposed to find an approximate solution of the Fredholm integral equation under their consideration through the Nyström method, which is based on a deterministic numerical quadrature approximation of the integral that effectively converts the integral equation into a matrix equation. As we argued in Section 1, such an approach may not be applicable in our setting because λ 0 ( ) is typically unknown and also complete knowledge of ψ(t) and η(t) may not be available at all quadrature points. Below we introduce a stochastic variant of the Nyström method which avoids the aforementioned difficulty. Considering the integral in (12), Campbell s formula gives { } λ 0 (t)α(t)r(s, t)φ(t; β)dt = E R(s, t)φ(t; β), s W. W t M W Therefore, a naive proposal to estimate φ( ; β) is to solve the following stochastic equation: φ(s; β) + R(s, t)φ(t; β) = η(s), where s M W. (13) t M W Note that λ 0 (s) is not present in the above equation. Denote by φ k (s; β) and η k (s) the kth components of φ(s; β) and η(s), respectively, for k = 1,..., p. Let further {s 1,..., s M } denote 8

10 a realization of M of cardinality M. Then the components of the solution to equation (13) are φ k (s 1, M; β) η k (s 1 ) φ k (s 2, M; β). = {I η M + R(M, M)} 1 k (s 2 ), k = 1,..., p, (14). φ k (s M, M; β) η k (s M ) where R(M, M) is an M M matrix whose ijth entry is R(s i, s j ), i, j = 1,..., M. Note that we deliberately emphasize the dependence of φ k (s i, M; β) on the entire control process M, since this dependence is critical to our theoretical investigation. Denoting φ(s i, M; β) = ( φ1 (s i, M; β),, φ p (s i, M; β)) T for any si M, we use the following interpolation for an arbitrary location s W, φ(s, M; β) = η(s) t M W Plugging φ(s, M; β) in for f(s; β) in (5), we obtain R(s, t) φ(t, M; β). (15) Q SQLn (β) = s N W φ(s, M; β) s M W φ(s, M; β) ψ(s) α(s). However, by the Campbell-Mecke theorem (Baddeley, 2007, Theorem 3.2), E M,N {Q SQLn (β)} = W } } ] [E M { φ(s, M; β) λ N (s) E s M { φ(s, M; β) λ 0 (s)ψ(s) ds, where E s M denotes the Palm expectation of the point process M at a point s W, see Baddeley (2007) for more details. Since the Palm distribution and the original point process distribution are different even in case of a Poisson process (Baddeley, 2007), it follows that E M,N {Q SQLn (β)} 0 p. Therefore, the naive plug-in estimating function Q SQLn (β) is biased, which in turn results in bias in the associated parameter estimator. Our empirical simulation studies confirm that the parameter estimator can be substantially biased. 2.4 Leave-one-out bias correction method The bias of the estimating function Q SQLn (β) can be corrected by an application of a generalized version of the Slivnyak-Mecke s Theorem (Mecke, 1967). More specifically, for a Poisson point 9

11 process M with an intensity function λ( ) and any function f(s, M), one has that { } E f(s, M\{s}) = E {f(s, M)} λ(s)ds, (16) s M W W provided that (16) is well-defined. Motivated by equation (16), we modify the estimating function Q SQLn (β) as Q SQLu (β) = s N W φ(s, M; β) s M W φ (s, M\{s}; β) ψ(s) α(s), (17) where φ(s, M\{s}; β) is as defined in (15) with M replaced by M\{s}. Using the Slivnyak- Mecke s Theorem (16) and because of independence between N and M, E M,N {Q SQLu (β)} = W } } ] [E M { φ(s, M; β) λ N (s) E M { φ(s, M; β) λ 0 (s)ψ(s) ds = 0 p, which shows that Q SQLu (β) is an unbiased estimating function. It appears that the computation of all weights φ(s, M\{s}; β), s M, requires inverting an ( M 1) ( M 1) matrix as described in (14) repeatedly for M times, which leads to a computational cost of O( M 4 ) floating operations. However, the following Lemma states that this can be avoided by providing a short-cut formula for the leave-one-out estimator φ(s, M\{s}; β). Lemma 1. For any s M, there is a one-to-one relationship between the estimators of φ(s, M; β) and φ(s, M\{s}; β) as follows φ(s, M\{s}; β) = φ(s, M; β) w ss (s, M\{s}), (18) where w ss (s, M\{s}) is the diagonal entry of the matrix {I M + R(M, M)} 1 corresponding to location s. The proof is given in the supplementary material. Lemma 1 shows that the leave-one-out estimator φ(s, M\{s}; β) at all control locations can be obtained by inverting an M M matrix just once as is needed in (14). The computational cost of evaluating Q SQLu (β) is thus of the order O( M 3 + N M ), the same as that of Q SQLn (β). 10

12 It remains to quantify the potential loss of Godambe information by replacing the optimal φ(s; β) by the estimates φ(s, M; β), s N or φ(s, M \{s}; β), s M. The Godambe information matrix for (17) is defined as G(β) = S T (β)v 1 (β)s(β), (19) where S(β) = E { Q SQLu (β)/ β T } = W λ 0(s)E{ φ(s, M; β)} { ψ(s, β)/ β T } ds and V(β) = Var {Q SQLu (β)}. Regarding Q SQLu (β)/ β T, the weight function φ(s, M; β) also depends on the parameter β but by (16), the terms in Q SQLu (β)/ β T involving φ(s, M; β)/ β T and φ(s, M\{s}; β)/ β T cancel after taking the expectation. The derivation of V(β) is more involved and will be addressed later in Section 5.1. Furthermore, we show in Section 5.3 that the difference between G(β) and the optimal Godambe information matrix G φ (β) as defined in (7) with f( ; β) replaced by φ( ; β) is asymptotically negligible. 2.5 Practical implementation details Although computing the estimating function (17) is quite straightforward due to Lemma 1, several issues need to be addressed in practice. First note that R(s, t) in the integral equation (12) is not symmetric, which is inconvenient both for practical implementation and for theoretical investigations. So we first derive a symmetric counterpart of R(s, t). Define functions m(s) = α(s) {α(s) + ψ(s)} 1/2 ψ(s) 1/2, φ (s; β) = m 1 (s)φ(s; β), η (s) = m 1 (s)η(s). (20) Then the integral equation (12) can be written as φ (s; β) + W λ 0 (t)α(t)r (s, t)φ (t; β)dt = η (s), (21) where R (s, t) = m(s)r(s, t)/m(t) = r(s)r(t){g(s, t) 1} and r(s) = ψ(s) 1/2 {α(s) + ψ(s)} 1/2. 11

13 Following the procedure from equations (14)-(15), we define φ k (s 1, M; β) ηk φ k (s (s 1) 2, M; β). = {I η M + R (M, M)} 1 k (s 2). φ k (s M, M; β) ηk (s M ), k = 1,..., p, (22) where R (M, M) is an M M matrix whose ijth entry is R (s i, s j ), i, j = 1,..., M. Similarly, denote φ (s i ; β) = ( φ 1(s i, M; β),, φ p(s i, M; β)) T for any si M. Then for an arbitrary location s W, we define the interpolated function φ (s, M; β) = η (s) R (s, t) φ (t, M; β), for any s W. (23) t M W By the above definition, we have the following relationship φ(s, M; β) = m(s) φ (s, M; β). (24) The second issue is the computation of the inverse matrix {I M + R (M, M)} 1 when M is large. Fortunately, by the definition of R (M, M) in (22), we can see that a significant portions of its entries may be very close to 0, depending on how fast the function g(s, t) decays to 1 as s t increases. Assume that the pair correlation function g(s, t) is isotropic and can be expressed in the form of g 0 ( s t ). Then we can create a tapered version R taper(m, M) such that its ijth entry is the same as that of R (M, M) if s i s j d taper for some d taper > 0 and 0 otherwise. Following Guan et al. (2015), the taper distance d taper is chosen such that {g 0 (d taper ) 1}/{g 0 (0) 1} = τ 0 for some small threshold τ 0. Then a sparse matrix Cholesky decomposition can be used to obtain {I M + R taper(m, M)} 1 computationally efficiently. In our simulation studies, we use τ 0 = 10 6 if M > 4, 000 and otherwise we use the exact R (M, M). The random labeling theorem for Poisson processes provides an alternative to tapering for reducing the computational burden of inverting R (M, M) when the cardinality of the control process M is high. The theorem states that for B 1, M can be randomly split into independent and identically distributed Poisson processes M 1,..., M B. Assuming that M has intensity func- 12

14 tion α( )λ 0 ( ), each M b has intensity function α( )λ 0 ( )/B. We may then apply the case-control approach to obtain an estimate β (b) for each pair (N, M b ). The cardinality of M b is roughly 1/B times that of M which makes the inversion of R (M b, M b ) more feasible. Finally, β is estimated by the average β = 1 B β (b) B b=1, whose theoretical properties are investigated in Corollary 1 in Section 5.1. Obviously, the choice of B plays an important role for this divide-and-conquer strategy, which will be addressed in a future work. Another issue is that we assumed full knowledge of the pair correlation function g(, ) when finding the weight function φ(s, M; β). In practice g(, ) needs to be estimated. It is a common practice to assume that g(, ) belongs to a parametric family, g(, ; θ), governed by a parameter vector θ, and estimate θ from the data. We first obtain an estimate θ of θ using Guan et al. (2008) and then plug in g(, ; θ) for g(, ) in (22). We construct confidence intervals for β based on approximate normality of β. The theoretical justification for this is Theorem 1 in Section 5.1 where consistency and asymptotic normality of β is stated under the condition that θ is consistent with a sufficiently fast rate of convergence. In Section 5.1 we also provide estimates for the covariance matrix of β. Our simulation studies in Section 3 support the validity of the confidence intervals for β. 2.6 Stochastic quasi-likelihood as Monte Carlo approximation Letting the intensity of controls tend to infinity, it is easy to see that (5) converges to Q f (β) = f(s; β) s N W f(s; β)λ 0 (s)ψ(s; β)ds. Thus, (5) can be viewed as a Monte Carlo approximation of Q f using M as a set of random quadrature points. Suppose that λ 0 (s) is known, which implies that λ N (s) is purely parametric with a known multiplicative offset. Then we may simulate a synthetic control process M of known intensity α( ) and approximate Q f (β) by (5) as an alternative to the deterministic quadrature approximation used in Guan et al. (2015). The use of deterministic quadrature approximation to 13

15 the integral in Q f (β) introduces bias that can be difficult to quantify (Baddeley et al., 2014). In contrast, unbiasedness can be maintained using our proposed stochastic approximation. The use of the random quadrature process M introduces additional parameter estimation error. The error can be reduced by using a larger control intensity α( ), but this will on the other hand increase the computing time due to the need to solve a larger matrix equation, c.f. Section 2.5. An alternative is to simulate several independent synthetic control processes and apply the divide-and-conquer strategy in Section 2.5. The use of replicated synthetic control processes is exemplified in Section Simulation Studies In this section, we conducted a simulation study to investigate the finite sample performance of the proposed method. Both the case and control processes were simulated over an n n square window with n = 1, 2 using the R package spatstat (Baddeley and Turner, 2005). For each n, we set the baseline intensity λ 0,n (s) = exp{β0,n M + Y (s) + Z(s)/4}, where Y (s) and Z(s) are two independent realizations of stationary and isotropic Gaussian random fields with mean 0 and β0,n M is chosen such that n 2 λ [0,n] 2 0,n (s)ds = 1; see Figure 1(a) for details. Then the case and control intensities were specified as λ N,n (s) = λ 0,n (s) exp{β N 0,n + Z(s)β 1 } and λ M,n (s) = α(s)λ 0,n (s)π(s), (25) where β 1 = 1 and the intercept β N 0,n were chosen so that on average 400n 2 case events were simulated. The function Π(s) was introduced here to allow various types of departure from the proportional assumption between λ N,n ( ) and λ M,n ( ) and will be specified later. The control processes were simulated using an inhomogeneous Poisson process by choosing α(s) equal to a constant α for all s W n, where α = 400, 500,..., 1500, The case processes were simulated 14

16 as inhomogeneous Thomas processes (Waagepetersen, 2007) with a pair correlation function g(s, t) = 1 + (4πω 2 κ) 1 exp { (4ω 2 ) 1 s t 2}, (26) where κ > 0 and ω > 0 are the intensity of the parent process and the dispersal parameter, respectively. We considered κ = 50, 100 and ω = 0.02, 0.04 for different clustering scenarios. 3.1 Correct model specification with Π(s) 1 In this subsection, we first consider the case scenario where the assumptions of (1) and (2) hold by setting Π(s) 1. For each simulated case and control processes, θ = (κ, ω) T was first estimated using the approach given in Guan et al. (2008) and then the proposed procedure was applied to estimate β0,n N and β 1 by plugging in the estimated θ n = ( κ n, ω n ) T. Three estimation approaches were considered: Diggle and Rowlingson (1994) s conditional likelihood estimate (CLE) and the two proposed stochastic quasi-likelihood estimation approaches based on the naive method (SQLn ) given in Section 2.3 and the unbiased version (SQLu) given in (17), where the leave-one-out correction was applied to φ(s, M\{s}) for s M. For the SQLu method, we also considered the situation when the parametric family of the pair correlation function was mis-specified. More specifically, instead of (26) we used a pair correlation function for a variance-gamma shot-noise Cox process (Jalilian et al., 2013) which has the incorrect exponential form g(s, t) = 1 + a 1 exp ( b 1 s t ), a > 0, b > 0. Summary statistics based on 1,000 simulations are presented in Table 1 and Figure 1, where rmse represents root mean square error of the parameter estimates and CP90 represents the coverage probabilities of the nominal 90% confidence intervals constructed using Theorem 1 by plugging in the estimated matrices given in (30). The first observation is that SQLn produced biased estimators for β 1, which confirms our discussion in Section 2.3. The large bias typically 15

17 led to a larger rmse than for CLE. Table 1 also suggests that this bias decreased as α increased. In contrast, the SQLu estimate of β 1 was close to unbiased. Table 1: Biases and rmses of the different estimators for β 1. CLE SQLn SQLu -Est.Thomas SQLu -Est.Exponential (κ, ω) n α BIAS rmse BIAS rmse BIAS rmse CP90 BIAS rmse CP90 (50,0.02) % % % % % % % % % % % % (50,0.04) % % % % % % % % % % % % (100,0.02) % % % % % % % % % % % % (100,0.04) % % % % % % % % % % % % In terms of rmse, the SQLu estimating function outperformed CLE in almost all cases and the improvement in rmse could be quite significant. In accordance with the asymptotic results in Theorem 1, the rmses were approximately halved when n was increased from 1 to 2. Figure 1(b)- (c) show that the rmses of the CLE did not necessarily decrease as α increased. In contrast, the rmses for SQLu decreased steadily as α increased. This indicates that CLE made less efficient use of the control processes than the SQLu method. In addition, Figure 1 (f) illustrates that the averages of the empirical W n 1 tr(g φn ) also increased steadily as α increased. Both 16

18 0.09 MSE MSE 0.14 y CLE SQLu True g(r) SQLu Est.Thomas SQLu Est.Exp. SQLu ASE (c) Estimation accuracies (κ=50, ω=0.04, n=2) (b) Estimation accuracies (κ=50, ω=0.04, n=1) (a) The baseline intensity λ0, 2(s) κ=50 ω=0.04 α=1/2 n=2 True g(r) n=1 n=2 90 tr(gφ^n) Wn 2.5 g(r) (f) The Godambe information (κ=50, ω=0.04) (e) Estimated PCF (Exponential) g(r) α (d) Estimated PCF (Thomas) True g(r) α x κ=50 ω=0.04 α=1/2 n= CLE SQLu True g(r) SQLu Est.Thomas SQLu Est.Exp. SQLu ASE r 0.06 r α Figure 1: (a) The baseline intensity function λ0 (s); (b)-(c): empirical rmses of estimates of β1 obtained using CLE and SQLu with the true g(, ), the estimated Thomas PCF and the estimated Exponential PCF. The abbreviation SQLu ASE is for the asymptotic standard errors given by Theorem 1; (d)-(e): the estimated Thomas PCFs and Exponential PCFs; (f): the averages of the empirical Wn 1 tr(gφbn ). observations support our theoretical findings in Theorem 4. Furthermore, Figure 1(b)-(c) and Table 1 show that even when the parametric family of g(s, t; θ) was mis-specified, the estimation accuracy as well as the coverage probabilities of the confidence intervals were almost not affected at all. This surprising observation can be explained by Figure 1(d)-(e), where we can see that the estimated exponential pair correlation functions, although mis-specified, were still able to capture the clustering pattern among the cases. Finally, we investigated the impact of using the preliminary estimator θ n on the statistical properties of the estimates obtained using the SQLu method. To do so, we again estimated the N β0,n and β1 using SQLu but now with the true pair correlation function g(s, t) instead of plugging in the estimated pair correlation function. The results are summarized in Figure 1(b)-(c), where 17

19 we can see that the estimated θ n indeed caused additional variability in ˆβ 1. As a result, the asymptotic standard error given in Theorem 1 slightly underestimated the standard error of ˆβ 1. However, when n = 2, the asymptotic standard error matched the empirical standard error quite well, which resulted in valid coverage probabilities of the confidence intervals in almost all cases. This confirms our theoretical finding of Theorem Misspecified models with Π(s) 1 To study the robustness of the SQLu method to model misspecification, we applied it to casecontrol point pattern data that have some departure from assumptions (1) and (2). More specifically, let X(s) be an isotropic Gaussian random field with an exponential covariance function with mean 0, variance 1 and a range parameter 10. We consider three forms of Π(s) Model I: Model II: Π(s) = 1 + q sin(2πy), for s = (x, y), ( ) Π(s) = Φ ρ q Z(s) + 1 ρ 2 qx (s), (27) Model III: Π(s) = exp { q X(s) q 2 /2 }, where X (s) denotes a single realization of X(s), q = 0.25, 0.5, 0.75, 1.0, ρ q = 0.8(q 0.25) and Φ( ) is the standard normal cumulative distribution function. Following the same estimation procedures outlined in the previous subsection and pretending Π(s) 1, summary statistics based on 1, 000 simulation runs with κ = 50, ω = 0.04 and n = 2 are presented in Table 2. Model I investigates the case when the sampling scheme α(s) is systematically misspecified and the misspecification becomes more severe as q increases. In this case, we can see from Table 2 that both CLE and SQLu might produce a biased estimator for β 1 and the biases increased as q grew. However, one noticeable feature is that the SQLu consistently produced much smaller biases than the CLE until q = 1. Furthermore, the coverage probabilities of the confidence intervals of the SQLu method appear to be reasonably good with small to moderate values of q. Under this case scenario, the SQLu appeared to be more robust than the CLE method. Model II mimics the situation when some covariate, namely, X (s), that only affects the 18

20 control process is left out. In a sense, this can also be viewed as a misspecification of the function ψ(s) in (1), which should be ψ(s, β) = exp{β0,n+z(s)β N 1 }Π 1 (s) as opposed to ψ(s, β) = exp{β0,n N + Z(s)β 1 } given in (25). In this case, it appears that when ρ q = 0 with q = 0.25, both CLE and SQLu methods still produced unbiased estimators for β 1. The estimation biases became larger as ρ q increased as expected. However, when ρ q 0, β 1 can no longer be interpreted as the elevated/reduced impact of Z(s) on case process relative to control process. Model III deals with the case scenario when even conditioned on λ 0 (s) and ψ(s), the control process is still not a Poisson process. With a new X(s) simulated for each simulation run, the control process M becomes a log-gaussian Cox process with a pair correlation function g M (s, t) = exp {q 2 exp ( s t /10)}. In this case, both CLE and SQLu methods yielded unbiased estimators for β 1 for any values of q. However, as q increased, the variances of both estimators generally increased due to additional aggregations introduced into the control process. Coverage probabilities of the resulting confidence intervals are slightly off the nominal level. Nevertheless, the SQLu estimator outperformed the CLE estimator in terms of rmse for any q in this case. 4 Data examples 4.1 Beijing restaurant locations The first data example concerns locations of two types of restaurants in Beijing, China. The data were collected from 11 districts of Beijing through the search engine The control process consisted of locations of traditional Chinese restaurants. Due to the limit on the number of restaurant locations that can be returned by the search engine, we extracted a random sample consisting of 6% of the Chinese restaurants, i.e. using a uniform sampling probability α(s) = This resulted in 2,659 control locations. The case process consisted of locations of all 1, 781 Western-style restaurants in Beijing. Figure 2(a) gives all restaurant locations, where 19

21 Table 2: Biases and rmses of the different estimators for β 1 with κ = 50, ω = 0.04 and n = 2. q = 0.25 (ρ q = 0) q = 0.5 (ρ q = 0.2) CLE SQLu CLE SQLu Model α BIAS rmse BIAS rmse CP90 BIAS rmse BIAS rmse CP90 I % % % % % % II % % % % % % III % % % % % % q = 0.75 (ρ q = 0.4) q = 1.0 (ρ q = 0.6) CLE SQLu CLE SQLu BIAS rmse BIAS rmse CP90 BIAS rmse BIAS rmse CP90 I % % % % % % II % % % % % % III % % % % % % it appears that the Western-style restaurants tended to be more concentrated than the Chinese restaurants. For model estimation, we converted the longitude/latitude locations into UTM coordinates (northing, easting) following Snyder (1987). We modeled the possible differences between the spatial patterns of Western and Chinese restaurants using ψ(s; β) = exp{β 0 + β T Z(s)}, where the covariate vector Z(s) consisted of two district level covariates: the average annual income of a regular worker (in 10, 000 RMB, Income ) and the logarithm of the total number of foreign tourists (in 10, 000, log-travel ) in The intercept β 0 was introduced in ψ(s; β) to model the overall difference in the intensities between the Western and Chinese restaurants in Beijing. To model possible clustering in the Western restaurant locations that was not explained by the intensity function, we further 20

22 (a) Restaurants locations 1.4 (b) Estimated Pair correlation function (c) Residual plot (h= 0.5) Parametric PCF Nonparametric PCF g(r) Latitude Northning (10km) Western Longitude Chinese Type distance (10 km) Easting (10km) Figure 2: (a) Locations of Restaurants; (b) Estimated pair correlation functions (c) Residuals from the reduced model (locations with UTM coordinates). introduced a parametric pair correlation function g(s, t) as defined in (26). Using the approach in Guan et al. (2008), the estimated parameters are κ = 5.65 and ω = Figure 2(b) shows that the estimated parametric pair correlation function agrees well with the nonparametric one (Guan et al. 2008) and both indicate the presence of clustering. Table 3: Regression parameter estimates for the restaurant data Method Full Model Reduced Model Intercept Income log-travel Intercept log-travel CLE Estimates (SE) -4.38(0.51) 0.050(0.079) 0.21(0.094) -4.05(0.19) 0.25(0.067) P-value SQLu Estimates (SE) -4.30(0.32) 0.028(0.045) 0.19(0.048) -4.11(0.12) 0.21(0.037) P-value Finally, the estimated regression parameters are summarized in Table 3. The covariate Income is not significant while the covariate log-travel is significant regardless of the estimation method used. The covariate Income impacts the distributions of both Chinese and Westernstyle restaurants and is therefore likely being absorbed into the baseline intensity λ0 (s). The positive parameter estimate for the covariate log-travel shows that the Western-style restaurants tended to be more concentrated (relative to the Chinese restaurants) in districts that attracted more foreign tourists. Comparing the two approaches, SQLu produced much smaller standard errors than CLE, which illustrates the potential advantage of the proposed method. To 21

23 assess the goodness of fit, we also computed standardized smoothed residuals (see Guan et al. 2008) on a grid over the the banded region in Figure 2(c) (residuals are only calculated for the 602 grid points that have at least 5 restaurants within a 5 km radius). The residuals are all of moderate magnitude and do not contradict the proposed model. Note that the apparent correlation in the residual plot is partly due to the smoothing procedure and partly due to the correlation in the point pattern data, cf. the fitted pair correlation function in Figure Tropical rain forest data The second data example concerns the spatial locations of three tropical forest tree species, Acalypha diversifolia (528 trees), Lonchocarpus heptaphyllus (836 trees) and Capparis frondosa (3299 trees), in a 1000m 500m rectangle window on the Barro Colorado Island (Condit, 1998; Hubbell et al. 1999; Hubbell et al. 2005). Guan et al. (2015) conducted a detailed investigation of the point patterns of locations of these three species and their associations with environmental variables such as elevation (dem), slope gradient (grad), and soil contents of potassium (K), mineralized nitrogen (Nmin) and phosphorus (P). All three species display certain clustering patterns modeled using parametric pair correlation functions. For each species, there is no apparent control process available to assist modeling of the underlying spatial intensity function. The purpose of this analysis is to show how the case-control methodology can be used, as described in Section 2.6, as a computationally efficient alternative to deterministic quadrature approximation when implementing quasi-likelihood for spatial point patterns (Guan et al. 2015). More specifically, we treated each species of interest as a case process separately and assumed that the case intensity function took a purely parametric form λ N (s; β) = exp{β 0 + β T Z(s)}, as assumed in Guan et al. (2015), where the covariate vector Z(s) consisted of environmental variables. Such a parametric assumption on λ N (s; β) leads to a special case of model (1) with λ 0 (s) = 1 and ψ(s; β) = exp{β 0 + β T Z(s)} and enabled us to simulate controls from a homogeneous Poisson process with a constant intensity α for the analysis of each 22

24 species. The proportional structure (1)-(2) was therefore maintained with such constructions of case and control point patterns. Furthermore, in this case, the regression parameter β should be interpreted as the elevated/reduced impacts of Z(s) on the tree location intensities relative to the complete spatial randomness when all tree locations follow a homogeneous Poisson process. For a fair comparison with Guan et al. (2015), we adopted both the selected covariates and the estimated pair correlation functions given in Guan et al. (2015); see Guan et al. (2015) for more details. The controls were simulated using increasing intensities α such that the average numbers of simulated controls W α ranged from 500 to For each given intensity α, 1, 000 independent realizations of the control process were simulated and an averaged estimator β as well as its standard error for both the CLE and the SQLu method were obtained following Corollary 1. The results are summarized in Table 4 and Figure 3, where QL stands for the quasi-likelihood approach proposed in Guan et al. (2015). We did not apply any tapering for neither the SQLu method nor the QL method. Table 4: Estimates and standard errors of the Tropical Forest data Acalypha Lonchocarpus Capparis Method W n α K Nmin P dem grad K CLE (1.22) -2.79(0.71) -0.16(0.057) 2.85(0.83) -0.88(1.05) 4.11(0.99) (1.22) -2.78(0.71) -0.16(0.057) 2.84(0.83) -0.98(1.05) 4.15(0.99) (1.23) -2.74(0.72) -0.16(0.057) 2.86(0.83) -0.97(1.06) 4.19(0.99) (1.24) -2.75(0.73) -0.16(0.058) 2.86(0.84) -1.03(1.07) 4.19(0.99) SQLu (1.22) -2.77(0.70) -0.15(0.056) 2.74(0.81) -1.05(1.00) 4.03(0.96) (1.22) -2.74(0.70) -0.15(0.056) 2.67(0.80) -1.27(0.98) 4.05(0.95) (1.23) -2.72(0.70) -0.14(0.056) 2.57(0.80) -1.43(0.96) 4.05(0.94) (1.23) -2.72(0.70) -0.14(0.055) 2.45(0.79) -1.70(0.95) 4.01(0.94) QL(100 50) N/A 4.39(1.22) -2.77(0.70) -0.15(0.055) 2.29(0.79) -1.88(0.94) 4.04(0.94) Table 4 shows that the estimates for both Acalypha and Lonchocarpus are very similar for all approaches. This is because the pair-correlation functions drop quickly, see Figure 3(e). On the other hand, for Capparis, where the pair correlation function decays much slower, see Figure 3(f), SQLu and QL produced very different estimates from those obtained with CLE. One noticeable feature is that as α increased, the estimated coefficient of grad as well as the 23

25 associated standard error decreased for the SQLu method. To give a better idea of the efficiency of each method, Figure 3 shows the efficiency of CLE/SQLu relative to QL (using grid points) as a function of W n α. The CLE method is almost always less efficient than SQLu and QL. On the contrary, for the SQLu approach, the standard errors quickly reached the same level as the approximately optimal QL method as α increased. The SQLu method maybe more computationally scalable because (a) much less control locations were needed in all three examples to reach similar standard errors as the QL method, which relied on 5, 000 quadrature points for all cases and (b) the computation of the averaged-sqlu estimate can be easily parallelized. (a) Acalypha (b) Lonchocarpus (c) Capparis Relative Efficiency K (CCU) K (CLE) Relative Efficiency Nmin (CCU) P (CCU) Nmin (CLE) P (CLE) Relative Efficiency Dem (CCU) grad (CCU) K (CCU) Dem (CLE) grad (CLE) K (CLE) W nα W nα W nα (d) Estimated PCF (Acalypha) (e) Estimated PCF (Lonchocarpus) (f) Estimated PCF (Capparis) g(r) g(r) g(r) distance (km) distance (km) distance (km) Figure 3: Top panels: Relative efficiency defined as the standard error of CLE or SQLu divided by the standard error of QL estimators; Bottom panels: estimated pair correlation functions. 5 Asymptotic properties In this section, we first study the asymptotic properties of the estimator β obtained using the estimating function (17). Then we show that under certain conditions, this estimator is asymp- 24

Spatial analysis of tropical rain forest plot data

Spatial analysis of tropical rain forest plot data Spatial analysis of tropical rain forest plot data Rasmus Waagepetersen Department of Mathematical Sciences Aalborg University December 11, 2010 1/45 Tropical rain forest ecology Fundamental questions:

More information

Estimating functions for inhomogeneous spatial point processes with incomplete covariate data

Estimating functions for inhomogeneous spatial point processes with incomplete covariate data Estimating functions for inhomogeneous spatial point processes with incomplete covariate data Rasmus aagepetersen Department of Mathematics Aalborg University Denmark August 15, 2007 1 / 23 Data (Barro

More information

Decomposition of variance for spatial Cox processes

Decomposition of variance for spatial Cox processes Decomposition of variance for spatial Cox processes Rasmus Waagepetersen Department of Mathematical Sciences Aalborg University Joint work with Abdollah Jalilian and Yongtao Guan December 13, 2010 1/25

More information

On Model Fitting Procedures for Inhomogeneous Neyman-Scott Processes

On Model Fitting Procedures for Inhomogeneous Neyman-Scott Processes On Model Fitting Procedures for Inhomogeneous Neyman-Scott Processes Yongtao Guan July 31, 2006 ABSTRACT In this paper we study computationally efficient procedures to estimate the second-order parameters

More information

Second-Order Analysis of Spatial Point Processes

Second-Order Analysis of Spatial Point Processes Title Second-Order Analysis of Spatial Point Process Tonglin Zhang Outline Outline Spatial Point Processes Intensity Functions Mean and Variance Pair Correlation Functions Stationarity K-functions Some

More information

Decomposition of variance for spatial Cox processes

Decomposition of variance for spatial Cox processes Decomposition of variance for spatial Cox processes Rasmus Waagepetersen Department of Mathematical Sciences Aalborg University Joint work with Abdollah Jalilian and Yongtao Guan November 8, 2010 1/34

More information

ESTIMATING FUNCTIONS FOR INHOMOGENEOUS COX PROCESSES

ESTIMATING FUNCTIONS FOR INHOMOGENEOUS COX PROCESSES ESTIMATING FUNCTIONS FOR INHOMOGENEOUS COX PROCESSES Rasmus Waagepetersen Department of Mathematics, Aalborg University, Fredrik Bajersvej 7G, DK-9220 Aalborg, Denmark (rw@math.aau.dk) Abstract. Estimation

More information

A Thinned Block Bootstrap Variance Estimation. Procedure for Inhomogeneous Spatial Point Patterns

A Thinned Block Bootstrap Variance Estimation. Procedure for Inhomogeneous Spatial Point Patterns A Thinned Block Bootstrap Variance Estimation Procedure for Inhomogeneous Spatial Point Patterns May 22, 2007 Abstract When modeling inhomogeneous spatial point patterns, it is of interest to fit a parametric

More information

Variance Estimation for Statistics Computed from. Inhomogeneous Spatial Point Processes

Variance Estimation for Statistics Computed from. Inhomogeneous Spatial Point Processes Variance Estimation for Statistics Computed from Inhomogeneous Spatial Point Processes Yongtao Guan April 14, 2007 Abstract This paper introduces a new approach to estimate the variance of statistics that

More information

Chapter 2. Poisson point processes

Chapter 2. Poisson point processes Chapter 2. Poisson point processes Jean-François Coeurjolly http://www-ljk.imag.fr/membres/jean-francois.coeurjolly/ Laboratoire Jean Kuntzmann (LJK), Grenoble University Setting for this chapter To ease

More information

RESEARCH REPORT. A note on gaps in proofs of central limit theorems. Christophe A.N. Biscio, Arnaud Poinas and Rasmus Waagepetersen

RESEARCH REPORT. A note on gaps in proofs of central limit theorems.   Christophe A.N. Biscio, Arnaud Poinas and Rasmus Waagepetersen CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING 2017 www.csgb.dk RESEARCH REPORT Christophe A.N. Biscio, Arnaud Poinas and Rasmus Waagepetersen A note on gaps in proofs of central limit theorems

More information

Lecture 2: Poisson point processes: properties and statistical inference

Lecture 2: Poisson point processes: properties and statistical inference Lecture 2: Poisson point processes: properties and statistical inference Jean-François Coeurjolly http://www-ljk.imag.fr/membres/jean-francois.coeurjolly/ 1 / 20 Definition, properties and simulation Statistical

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation A Framework for Daily Spatio-Temporal Stochastic Weather Simulation, Rick Katz, Balaji Rajagopalan Geophysical Statistics Project Institute for Mathematics Applied to Geosciences National Center for Atmospheric

More information

A Bivariate Point Process Model with Application to Social Media User Content Generation

A Bivariate Point Process Model with Application to Social Media User Content Generation 1 / 33 A Bivariate Point Process Model with Application to Social Media User Content Generation Emma Jingfei Zhang ezhang@bus.miami.edu Yongtao Guan yguan@bus.miami.edu Department of Management Science

More information

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Model Selection for Geostatistical Models

Model Selection for Geostatistical Models Model Selection for Geostatistical Models Richard A. Davis Colorado State University http://www.stat.colostate.edu/~rdavis/lectures Joint work with: Jennifer A. Hoeting, Colorado State University Andrew

More information

Bayesian Hierarchical Models

Bayesian Hierarchical Models Bayesian Hierarchical Models Gavin Shaddick, Millie Green, Matthew Thomas University of Bath 6 th - 9 th December 2016 1/ 34 APPLICATIONS OF BAYESIAN HIERARCHICAL MODELS 2/ 34 OUTLINE Spatial epidemiology

More information

Introduction to Spatial Data and Models

Introduction to Spatial Data and Models Introduction to Spatial Data and Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry

More information

Quasi-likelihood for Spatial Point Processes

Quasi-likelihood for Spatial Point Processes Quasi-likelihood for Spatial Point Processes Yongtao Guan Miami, USA Abdollah Jalilian Kermanshah, Iran Rasmus aagepetersen Aalborg, Denmark Summary. Fitting regression models for intensity functions of

More information

Forecasting Data Streams: Next Generation Flow Field Forecasting

Forecasting Data Streams: Next Generation Flow Field Forecasting Forecasting Data Streams: Next Generation Flow Field Forecasting Kyle Caudle South Dakota School of Mines & Technology (SDSMT) kyle.caudle@sdsmt.edu Joint work with Michael Frey (Bucknell University) and

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Nearest Neighbor Gaussian Processes for Large Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns

More information

Basics of Point-Referenced Data Models

Basics of Point-Referenced Data Models Basics of Point-Referenced Data Models Basic tool is a spatial process, {Y (s), s D}, where D R r Chapter 2: Basics of Point-Referenced Data Models p. 1/45 Basics of Point-Referenced Data Models Basic

More information

Estimating functions for inhomogeneous spatial point processes with incomplete covariate data

Estimating functions for inhomogeneous spatial point processes with incomplete covariate data Estimating functions for inhomogeneous spatial point processes with incomplete covariate data Rasmus aagepetersen Department of Mathematical Sciences, Aalborg University Fredrik Bajersvej 7G, DK-9220 Aalborg

More information

Likelihood and p-value functions in the composite likelihood context

Likelihood and p-value functions in the composite likelihood context Likelihood and p-value functions in the composite likelihood context D.A.S. Fraser and N. Reid Department of Statistical Sciences University of Toronto November 19, 2016 Abstract The need for combining

More information

arxiv: v2 [math.st] 20 Jun 2014

arxiv: v2 [math.st] 20 Jun 2014 A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun

More information

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields 1 Introduction Jo Eidsvik Department of Mathematical Sciences, NTNU, Norway. (joeid@math.ntnu.no) February

More information

REGULARIZED ESTIMATING EQUATIONS FOR MODEL SELECTION OF CLUSTERED SPATIAL POINT PROCESSES

REGULARIZED ESTIMATING EQUATIONS FOR MODEL SELECTION OF CLUSTERED SPATIAL POINT PROCESSES Statistica Sinica 25 (2015), 173-188 doi:http://dx.doi.org/10.5705/ss.2013.208w REGULARIZED ESTIMATING EQUATIONS FOR MODEL SELECTION OF CLUSTERED SPATIAL POINT PROCESSES Andrew L. Thurman 1, Rao Fu 2,

More information

Hierarchical Modelling for Univariate and Multivariate Spatial Data

Hierarchical Modelling for Univariate and Multivariate Spatial Data Hierarchical Modelling for Univariate and Multivariate Spatial Data p. 1/4 Hierarchical Modelling for Univariate and Multivariate Spatial Data Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Introduction to Geostatistics

Introduction to Geostatistics Introduction to Geostatistics Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore,

More information

Gaussian predictive process models for large spatial data sets.

Gaussian predictive process models for large spatial data sets. Gaussian predictive process models for large spatial data sets. Sudipto Banerjee, Alan E. Gelfand, Andrew O. Finley, and Huiyan Sang Presenters: Halley Brantley and Chris Krut September 28, 2015 Overview

More information

Point process with spatio-temporal heterogeneity

Point process with spatio-temporal heterogeneity Point process with spatio-temporal heterogeneity Jony Arrais Pinto Jr Universidade Federal Fluminense Universidade Federal do Rio de Janeiro PASI June 24, 2014 * - Joint work with Dani Gamerman and Marina

More information

Introduction to Spatial Data and Models

Introduction to Spatial Data and Models Introduction to Spatial Data and Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Department of Forestry & Department of Geography, Michigan State University, Lansing Michigan, U.S.A. 2 Biostatistics,

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data Hierarchical models for spatial data Based on the book by Banerjee, Carlin and Gelfand Hierarchical Modeling and Analysis for Spatial Data, 2004. We focus on Chapters 1, 2 and 5. Geo-referenced data arise

More information

Inference Methods for the Conditional Logistic Regression Model with Longitudinal Data Arising from Animal Habitat Selection Studies

Inference Methods for the Conditional Logistic Regression Model with Longitudinal Data Arising from Animal Habitat Selection Studies Inference Methods for the Conditional Logistic Regression Model with Longitudinal Data Arising from Animal Habitat Selection Studies Thierry Duchesne 1 (Thierry.Duchesne@mat.ulaval.ca) with Radu Craiu,

More information

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University this presentation derived from that presented at the Pan-American Advanced

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

Casuality and Programme Evaluation

Casuality and Programme Evaluation Casuality and Programme Evaluation Lecture V: Difference-in-Differences II Dr Martin Karlsson University of Duisburg-Essen Summer Semester 2017 M Karlsson (University of Duisburg-Essen) Casuality and Programme

More information

Hierarchical Modelling for Multivariate Spatial Data

Hierarchical Modelling for Multivariate Spatial Data Hierarchical Modelling for Multivariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Point-referenced spatial data often come as

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

AALBORG UNIVERSITY. An estimating function approach to inference for inhomogeneous Neyman-Scott processes. Rasmus Plenge Waagepetersen

AALBORG UNIVERSITY. An estimating function approach to inference for inhomogeneous Neyman-Scott processes. Rasmus Plenge Waagepetersen AALBORG UNIVERITY An estimating function approach to inference for inhomogeneous Neyman-cott processes by Rasmus Plenge Waagepetersen R-2005-30 eptember 2005 Department of Mathematical ciences Aalborg

More information

Generating Spatial Correlated Binary Data Through a Copulas Method

Generating Spatial Correlated Binary Data Through a Copulas Method Science Research 2015; 3(4): 206-212 Published online July 23, 2015 (http://www.sciencepublishinggroup.com/j/sr) doi: 10.11648/j.sr.20150304.18 ISSN: 2329-0935 (Print); ISSN: 2329-0927 (Online) Generating

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT

ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT Rachid el Halimi and Jordi Ocaña Departament d Estadística

More information

Quasi-likelihood Scan Statistics for Detection of

Quasi-likelihood Scan Statistics for Detection of for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for

More information

Multivariate spatial modeling

Multivariate spatial modeling Multivariate spatial modeling Point-referenced spatial data often come as multivariate measurements at each location Chapter 7: Multivariate Spatial Modeling p. 1/21 Multivariate spatial modeling Point-referenced

More information

Open Problems in Mixed Models

Open Problems in Mixed Models xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For

More information

arxiv: v2 [stat.me] 8 Jun 2016

arxiv: v2 [stat.me] 8 Jun 2016 Orthogonality of the Mean and Error Distribution in Generalized Linear Models 1 BY ALAN HUANG 2 and PAUL J. RATHOUZ 3 University of Technology Sydney and University of Wisconsin Madison 4th August, 2013

More information

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information

Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information p. 1/27 Analysis of Marked Point Patterns with Spatial and Non-spatial Covariate Information Shengde Liang, Bradley

More information

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates

Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates Least Squares Estimation of a Panel Data Model with Multifactor Error Structure and Endogenous Covariates Matthew Harding and Carlos Lamarche January 12, 2011 Abstract We propose a method for estimating

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

ENGRG Introduction to GIS

ENGRG Introduction to GIS ENGRG 59910 Introduction to GIS Michael Piasecki October 13, 2017 Lecture 06: Spatial Analysis Outline Today Concepts What is spatial interpolation Why is necessary Sample of interpolation (size and pattern)

More information

Bias-Correction in Vector Autoregressive Models: A Simulation Study

Bias-Correction in Vector Autoregressive Models: A Simulation Study Econometrics 2014, 2, 45-71; doi:10.3390/econometrics2010045 OPEN ACCESS econometrics ISSN 2225-1146 www.mdpi.com/journal/econometrics Article Bias-Correction in Vector Autoregressive Models: A Simulation

More information

Introduction. Spatial Processes & Spatial Patterns

Introduction. Spatial Processes & Spatial Patterns Introduction Spatial data: set of geo-referenced attribute measurements: each measurement is associated with a location (point) or an entity (area/region/object) in geographical (or other) space; the domain

More information

Decomposition of variance for spatial Cox processes Jalilian, Abdollah; Guan, Yongtao; Waagepetersen, Rasmus Plenge

Decomposition of variance for spatial Cox processes Jalilian, Abdollah; Guan, Yongtao; Waagepetersen, Rasmus Plenge Aalborg Universitet Decomposition of variance for spatial Cox processes Jalilian, Abdollah; Guan, Yongtao; Waagepetersen, Rasmus Plenge Publication date: 2011 Document Version Early version, also known

More information

Extreme Value Analysis and Spatial Extremes

Extreme Value Analysis and Spatial Extremes Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models

More information

On block bootstrapping areal data Introduction

On block bootstrapping areal data Introduction On block bootstrapping areal data Nicholas Nagle Department of Geography University of Colorado UCB 260 Boulder, CO 80309-0260 Telephone: 303-492-4794 Email: nicholas.nagle@colorado.edu Introduction Inference

More information

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,

More information

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University

More information

6.435, System Identification

6.435, System Identification System Identification 6.435 SET 3 Nonparametric Identification Munther A. Dahleh 1 Nonparametric Methods for System ID Time domain methods Impulse response Step response Correlation analysis / time Frequency

More information

Chapter 2 Inference on Mean Residual Life-Overview

Chapter 2 Inference on Mean Residual Life-Overview Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate

More information

Forecasting Levels of log Variables in Vector Autoregressions

Forecasting Levels of log Variables in Vector Autoregressions September 24, 200 Forecasting Levels of log Variables in Vector Autoregressions Gunnar Bårdsen Department of Economics, Dragvoll, NTNU, N-749 Trondheim, NORWAY email: gunnar.bardsen@svt.ntnu.no Helmut

More information

An adapted intensity estimator for linear networks with an application to modelling anti-social behaviour in an urban environment

An adapted intensity estimator for linear networks with an application to modelling anti-social behaviour in an urban environment An adapted intensity estimator for linear networks with an application to modelling anti-social behaviour in an urban environment M. M. Moradi 1,2,, F. J. Rodríguez-Cortés 2 and J. Mateu 2 1 Institute

More information

A note on profile likelihood for exponential tilt mixture models

A note on profile likelihood for exponential tilt mixture models Biometrika (2009), 96, 1,pp. 229 236 C 2009 Biometrika Trust Printed in Great Britain doi: 10.1093/biomet/asn059 Advance Access publication 22 January 2009 A note on profile likelihood for exponential

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

An application of the GAM-PCA-VAR model to respiratory disease and air pollution data

An application of the GAM-PCA-VAR model to respiratory disease and air pollution data An application of the GAM-PCA-VAR model to respiratory disease and air pollution data Márton Ispány 1 Faculty of Informatics, University of Debrecen Hungary Joint work with Juliana Bottoni de Souza, Valdério

More information

Wrapped Gaussian processes: a short review and some new results

Wrapped Gaussian processes: a short review and some new results Wrapped Gaussian processes: a short review and some new results Giovanna Jona Lasinio 1, Gianluca Mastrantonio 2 and Alan Gelfand 3 1-Università Sapienza di Roma 2- Università RomaTRE 3- Duke University

More information

Issues on quantile autoregression

Issues on quantile autoregression Issues on quantile autoregression Jianqing Fan and Yingying Fan We congratulate Koenker and Xiao on their interesting and important contribution to the quantile autoregression (QAR). The paper provides

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Summary statistics for inhomogeneous spatio-temporal marked point patterns

Summary statistics for inhomogeneous spatio-temporal marked point patterns Summary statistics for inhomogeneous spatio-temporal marked point patterns Marie-Colette van Lieshout CWI Amsterdam The Netherlands Joint work with Ottmar Cronie Summary statistics for inhomogeneous spatio-temporal

More information

Bayesian Analysis of Latent Variable Models using Mplus

Bayesian Analysis of Latent Variable Models using Mplus Bayesian Analysis of Latent Variable Models using Mplus Tihomir Asparouhov and Bengt Muthén Version 2 June 29, 2010 1 1 Introduction In this paper we describe some of the modeling possibilities that are

More information

Inference For High Dimensional M-estimates: Fixed Design Results

Inference For High Dimensional M-estimates: Fixed Design Results Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Environmentrics 00, 1 12 DOI: 10.1002/env.XXXX Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Regina Wu a and Cari G. Kaufman a Summary: Fitting a Bayesian model to spatial

More information

Simulating Uniform- and Triangular- Based Double Power Method Distributions

Simulating Uniform- and Triangular- Based Double Power Method Distributions Journal of Statistical and Econometric Methods, vol.6, no.1, 2017, 1-44 ISSN: 1792-6602 (print), 1792-6939 (online) Scienpress Ltd, 2017 Simulating Uniform- and Triangular- Based Double Power Method Distributions

More information

Spatial Misalignment

Spatial Misalignment Spatial Misalignment Jamie Monogan University of Georgia Spring 2013 Jamie Monogan (UGA) Spatial Misalignment Spring 2013 1 / 28 Objectives By the end of today s meeting, participants should be able to:

More information

Improving the travel time prediction by using the real-time floating car data

Improving the travel time prediction by using the real-time floating car data Improving the travel time prediction by using the real-time floating car data Krzysztof Dembczyński Przemys law Gawe l Andrzej Jaszkiewicz Wojciech Kot lowski Adam Szarecki Institute of Computing Science,

More information

Fusing point and areal level space-time data. data with application to wet deposition

Fusing point and areal level space-time data. data with application to wet deposition Fusing point and areal level space-time data with application to wet deposition Alan Gelfand Duke University Joint work with Sujit Sahu and David Holland Chemical Deposition Combustion of fossil fuel produces

More information

Measuring Social Influence Without Bias

Measuring Social Influence Without Bias Measuring Social Influence Without Bias Annie Franco Bobbie NJ Macdonald December 9, 2015 The Problem CS224W: Final Paper How well can statistical models disentangle the effects of social influence from

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

Impact of serial correlation structures on random effect misspecification with the linear mixed model.

Impact of serial correlation structures on random effect misspecification with the linear mixed model. Impact of serial correlation structures on random effect misspecification with the linear mixed model. Brandon LeBeau University of Iowa file:///c:/users/bleb/onedrive%20 %20University%20of%20Iowa%201/JournalArticlesInProgress/Diss/Study2/Pres/pres.html#(2)

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data Spatial omain Hierarchical Modelling for Univariate Spatial ata Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A.

More information

On the errors introduced by the naive Bayes independence assumption

On the errors introduced by the naive Bayes independence assumption On the errors introduced by the naive Bayes independence assumption Author Matthijs de Wachter 3671100 Utrecht University Master Thesis Artificial Intelligence Supervisor Dr. Silja Renooij Department of

More information

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Peter M. Aronow and Cyrus Samii Forthcoming at Survey Methodology Abstract We consider conservative variance

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 12: Frequentist properties of estimators (v4) Ramesh Johari ramesh.johari@stanford.edu 1 / 39 Frequentist inference 2 / 39 Thinking like a frequentist Suppose that for some

More information

A Non-parametric bootstrap for multilevel models

A Non-parametric bootstrap for multilevel models A Non-parametric bootstrap for multilevel models By James Carpenter London School of Hygiene and ropical Medicine Harvey Goldstein and Jon asbash Institute of Education 1. Introduction Bootstrapping is

More information