Residuals for spatial point processes based on Voronoi tessellations

Similar documents
arxiv: v1 [stat.ap] 26 Jan 2015

Voronoi residuals and other residual analyses applied to CSEP earthquake forecasts.

Evaluation of space-time point process models using. super-thinning

On thinning a spatial point process into a Poisson process using the Papangelou intensity

On the Voronoi estimator for the intensity of an inhomogeneous planar Poisson process

Research Article. J. Molyneux*, J. S. Gordon, F. P. Schoenberg

Assessing Spatial Point Process Models Using Weighted K-functions: Analysis of California Earthquakes

Assessment of point process models for earthquake forecasting

arxiv: v1 [stat.ap] 29 Feb 2012

Point processes, spatial temporal

FIRST PAGE PROOFS. Point processes, spatial-temporal. Characterizations. vap020

AN EM ALGORITHM FOR HAWKES PROCESS

Are Declustered Earthquake Catalogs Poisson?

Testing for Poisson Behavior

Multi-dimensional residual analysis of point process models for earthquake. occurrences. Frederic Paik Schoenberg

Residuals and Goodness-of-fit tests for marked Gibbs point processes

Chapter 2. Mean and Standard Deviation

On Mainshock Focal Mechanisms and the Spatial Distribution of Aftershocks

A Cautionary Note on Estimating the Reliability of a Mastery Test with the Beta-Binomial Model

POINT PROCESSES. Frederic Paik Schoenberg. UCLA Department of Statistics, MS Los Angeles, CA

ORDER RESTRICTED STATISTICAL INFERENCE ON LORENZ CURVES OF PARETO DISTRIBUTIONS. Myongsik Oh. 1. Introduction

Critical Values for the Test of Flatness of a Histogram Using the Bhattacharyya Measure.

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

The number of distributions used in this book is small, basically the binomial and Poisson distributions, and some variations on them.

A Graphical Test for Local Self-Similarity in Univariate Data

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

Robustness and Distribution Assumptions

Module 6: Model Diagnostics

Testing separability in multi-dimensional point processes

On Rescaled Poisson Processes and the Brownian Bridge. Frederic Schoenberg. Department of Statistics. University of California, Los Angeles

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Q-Matrix Development. NCME 2009 Workshop

Statistical tests for evaluating predictability experiments in Japan. Jeremy Douglas Zechar Lamont-Doherty Earth Observatory of Columbia University

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics of Non-Poisson Point Processes in Several Dimensions

Earthquake Clustering and Declustering

Rescaling Marked Point Processes

S The Over-Reliance on the Central Limit Theorem

A homogeneity test for spatial point patterns

9 Correlation and Regression

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny

Stat 13, Intro. to Statistical Methods for the Life and Health Sciences.

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

Exact Bounds for Degree Centralization

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Unit 27 One-Way Analysis of Variance

Two-sample inference: Continuous data

Spatial point processes

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Statistics Introductory Correlation

12 The Analysis of Residuals

PRIME GENERATING LUCAS SEQUENCES

Poisson regression: Further topics

UNIVERSITY OF NOTTINGHAM. Discussion Papers in Economics CONSISTENT FIRM CHOICE AND THE THEORY OF SUPPLY

A Monte-Carlo study of asymptotically robust tests for correlation coefficients

Multivariate Time Series: Part 4

Analytic computation of nonparametric Marsan-Lengliné. estimates for Hawkes point processes. phone:

Analysis of Variance (ANOVA)

On the Arbitrary Choice Regarding Which Inertial Reference Frame is "Stationary" and Which is "Moving" in the Special Theory of Relativity

Spatial Autocorrelation

A vector identity for the Dirichlet tessellation

Loglikelihood and Confidence Intervals

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

Probability Distributions.

Application of branching point process models to the study of invasive red banana plants in Costa Rica

Empirical Bayes Moderation of Asymptotically Linear Parameters

Review of Multiple Regression

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Generalized linear models

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

Confidence Intervals and Hypothesis Tests

Harvard University. Rigorous Research in Engineering Education

EE290H F05. Spanos. Lecture 5: Comparison of Treatments and ANOVA

Chapter 16. Simple Linear Regression and Correlation

Two-sample Categorical data: Testing

A Spatio-Temporal Point Process Model for Firemen Demand in Twente

Descriptive Statistics-I. Dr Mahmoud Alhussami

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Recall the Basics of Hypothesis Testing

Advanced Regression Topics: Violation of Assumptions

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Regression Analysis: Exploring relationships between variables. Stat 251

Earthquake predictability measurement: information score and error diagram

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE

Probabilistic approach to earthquake prediction

Two-sample inference: Continuous data

A Space-Time Conditional Intensity Model for. Evaluating a Wildfire Hazard Index

Gibbs point processes : modelling and inference

UC Berkeley Math 10B, Spring 2015: Midterm 2 Prof. Sturmfels, April 9, SOLUTIONS

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Review of Statistics 101

Transcription:

Residuals for spatial point processes based on Voronoi tessellations Ka Wong 1, Frederic Paik Schoenberg 2, Chris Barr 3. 1 Google, Mountanview, CA. 2 Corresponding author, 8142 Math-Science Building, Department of Statistics, University of California, Los Angeles, CA 90095 1554, USA. email: frederic@stat.ucla.edu. phone: 310-794-5193. fax: 310-206-5658. 3 Department of Biostatistics, Harvard University. 1

Abstract A residual analysis method for spatial point processes is proposed, where differences between the modeled conditional intensity and the observed number of points are assessed over the Voronoi cells generated by the observations. The resulting residuals appear to be substantially less skewed and hence more stable, particularly for point processes with conditional intensities close to zero, compared to ordinary Pearson residuals and other pixel-based methods. An application to models for Southern California earthquakes is provided. Key words: Papangelou intensity, Pearson residuals, point patterns, residual analysis, Voronoi residuals. 1 Introduction. The aim of this paper is to propose a new form of residual analysis for assessing the goodness of fit of spatial or spatial-temporal point process models. The proposed method relies on comparing the normalized observed and expected numbers of points over Voronoi cells generated by the observed point pattern. The excellent treatment of Pearson residuals and other pixel-based residuals by Baddeley et al. (2005), the thorough discussion of their properties in Baddeley et al. (2008), and the fact that such residuals extend so readily to the case of spatial-temporal point processes may suggest that the problem of residual analysis for such point processes is generally solved. Hence, we feel it is necessary to devote a substantial portion of this paper to a major shortcoming of such pixel-based residuals, in order to motivate our proposed alternative. In 2

brief, Pearson residuals and other pixel-based residuals tend to be highly skewed when the integrated conditional intensity over some of the cells is close to zero, which is common in many applications. By contrast, the proposed Voronoi residuals are approximately Gamma distributed and tend to be far less skewed than Pearson residuals, and are thus far more amenable to assessment of goodness of fit. zzq: We might need to define spatial and spatial-temporal point processes and their intensities, and say that we are assuming throughout that the point processes are simple. We are assuming that the observation region is equipped with Lebesgue measure, µ. Note that we are not emphasizing the distinction between conditional and Papangelou intensities here. The methods and results here are essentially equivalent for spatial and spatial-temporal point processes. This paper is organized as follows. In Section 2, we briefly review the goals of residual analysis as well as Pearson residuals and other pixel-based residuals described in Baddeley et al. (2005), and discuss their limitations when the integrated conditional intensity is small. Section 3 describes Voronoi residuals and discusses their properties. The simulations shown in Section 4 demonstrate the potential advantages of the Voronoi residuals over conventional pixel-based residuals in cases where the conditional intensity is occasionally close to zero. Section 5 includes an application to models for earthquake occurrences in Southern California. 3

2 Pearson residuals and other pixel-based methods. Residual plots for spatial point processes have two related purposes: (i) to suggest locations or aspects of the model where the fit is poor, so that an incorrectly specified model may be improved; (ii) to form the basis of formal testing, i.e. to assess the overall appropriateness of a model or to what extent the model fits well and hence results based on the model may be trusted. Baddeley et al. (2005) discuss a variety of pixel-based residuals for spatial point processes. The residual diagnostics are plots showing the standardized differences between the number of points occurring in each plot and the number expected according to the fitted model, where the standardization may be performed in various ways. For instance, for Pearson residuals, one divides the difference by the estimated standard deviation of the number of points in the pixel, in analogy with Pearson residuals in the context of linear models. Baddeley et al. (2005) also propose scaling the residuals based on the contribution of each pixel to the total pseudo-loglikelihood of the model, in analogy with score statistics in generalized linear modeling. Standardization is important for both purposes (i) and (ii), since otherwise plots of the residuals will tend to overemphasize deviations in pixels where the rate is high, and obviously formal testing based on individual pixels requires the standard deviation of the number of points in the pixel to be taken into account. Behind the term Pearson residuals lies the implication, both implicit and explicit (see e.g. the error bounds in Fig.7 of Baddeley et al. 2005), that these standardized residuals should be approximately standard normally distributed, so that the squared residuals, or their sum, are distributed approximately according to Pearson s χ 2 -distribution. Pearson residuals appear to be effec- 4

tive model evaluation tools in examples where the estimate of the conditional intensity, λ, is moderately sized throughout the space of observation, as is the case throughout Baddeley et al. (2005). ZZQ: briefly outline other residuals in Baddeley et al. (2005) and Baddeley et al. (2008)? If λ is small, however, then the Pearson residuals will be heavily skewed and their distribution will not be well approximated by the normal or χ 2 distributions. Indeed, when λ is close to zero, the raw residuals tend to have a distribution that is very highly skewed, and the standardization to form Pearson residuals actually exacerbates this skew. These situations arise in many applications, unfortunately. For example, in modeling earthquake occurrences, typically the modeled conditional intensity is close to zero far way from known faults or previous seismicity, and in the case of modeling wildfires, one may have a modeled conditional intensity close to zero in areas far from human use or frequent lightning, or with vegetation types that do not readily support much wildfire activity (zzq: cite). Furthermore, even if λ is not extremely close to zero, if the pixels used for Pearson residuals are sufficiently small so that the integral of λ over pixels is occasionally very small, then the same skew occurs. Since the Pearson residuals are standardized to have mean zero and unit (or approximately unit) variance under the null hypothesis that the modeled conditional intensity is correct (see Baddeley et al. 2008), one may inquire whether the skew of these residuals is indeed problematic. Consider a case of a planar Poisson process where the estimate of λ is exactly correct, i.e. ˆλ(x, y) = λ(x, y) at all locations, and where one elects to use Pearson residuals on pixels. Suppose that there are several pixels where the integral of λ over the pixel is roughly 0.01. Given many of these pixels, it is not unlikely that at least one of 5

them will contain a point of the process. In such pixels, the raw residual will be 0.99, and the standard deviation of the number of points in the pixel is 0.01 = 0.1, so the Pearson residual is 9.90. This may yield the following effects: a) Such Pearson residuals may overwhelm the others in a visual inspection, rendering a plot of the Pearson residuals largely useless in terms of evaluating the quality of the fit of the model; b) Conventional tests based on the normal approximation will have grossly incorrect p- values, and will commonly tend to reject the model, although it is correct, based on one such residual alone. Even if one adjusts for the non-normality of the residual and instead uses exact p-values based on the Poisson distribution for one such pixel individually, the test will still reject the model at the significance level of 0.01. c) If one adjusts for the non-normality of the residual and computes exact simultaneous p-values, then the resulting tests will have extremely low power. Indeed, if 10,000 pixels (a 100 100 grid) are used, then gross mis-specification would be required in order to reject the null hypothesis with more than a probability of merely 10% under such circumstances. ZZQ: We need to check this. We need to simulate Poisson processes to make this last statement more concrete. We will need to make an assumption about the intensity. 3 Voronoi residuals. A Voronoi tessellation is a division of the metric space on which a point process is defined into convex polygons, or Voronoi cells. Specifically, given a spatial or spatial-temporal point pattern N, one may define its corresponding Voronoi tessellation as follows: for each point τ i 6

of the point process, its corresponding cell D i is the region consisting of all locations which are closer to τ i than to any other point of N. The Voronoi tessellation is the collection of such cells. See e.g. Okabe et al. (2000) for a thorough treatment of Voronoi tessellations and their properties. Given a model for the conditional intensity of a spatial or space-time point process, one may construct residuals simply by evaluating the Pearson residuals over cells rather than rectangular pixels, where the cells comprise the Voronoi tessellation of the observed spatial or spatial-temporal point pattern. We will refer to such residuals as Voronoi residuals. Voronoi residuals offer one obvious advantage over conventional pixel-based methods, in that the cell sizes are entirely automatic and data-driven in the case of Voronoi residuals. With pixel-based methods, the cell boundaries are often determined rather arbitrarily, yet these boundaries can have immense impacts on the results, particularly when λ is volatile. More importantly, the distributions of the Voronoi residuals tend to be far less skewed than pixel-based methods such as Pearson residuals, particularly when ˆλ is small in some areas. Indeed, since each Voronoi cell has exactly one point inside it by construction, the Voronoi residual for cell i is given by ˆr i := 1 D i ˆλdµ D i ˆλdµ = 1 D i λ D i λ, (1) where λ denote the mean of ˆλ over D i. Note that when N is a homogeneous Poisson process, the cell size D i is approximately Gamma distributed. Indeed, for a homogeneous Poisson process, the expected area of a Voronoi cell is equal to the reciprocal of the intensity of 7

the process (Meijering 1953), and simulation studies have shown that the area of a typical Voronoi cell is approximately Gamma distributed (Hinde and Miles, 1980; Tanemura, 2003), and these properties these properties continue to hold approximately in the inhomogeneous case provided that the conditional intensity is approximately constant near the location in question (Barr and Schoenberg 2010). Hence the numerator in equation (??) will often tend to be distributed approximately like a rescaled Gamma random variable. By contrast, for pixels over which the integrated conditional intensity is close to zero, the conventional raw residuals are approximately Bernoulli distributed. 4 Simulated examples. The exact distributions of the Voronoi residuals are generally quite intractable due to the fact that the cells themselves are random. Simulations may be useful to investigate the approximate distributions of these residuals. ZZQ1: Assume a certain specific intensity for a spatial inhomogeneous Poisson process with small pixels. For instance, we could take λ(x, y) = 100x 2 y over the space [ 1, 1] [ 1, 1]. There will be around 67 points. Use a 100 100 grid of pixels. Look at a typical plot of the raw residuals, Pearson residuals, and tessellation residuals. The raw and Pearson residuals will probably just show the points basically, near the origin. ZZQ2: Simulate the inhomogeneous Poisson process many times and look at histograms of the residuals at the origin for Pearson and Voronoi residuals. For Pearson residuals, use the pixel [0,.01] [0,.01]. 8

5 A seismological application. ZZQ3: There are many options here. We can use two models in the CSEP, Collaborative Study of Earthquake Predictability, project. I will get this data. Acknowledgements This material is based upon work supported by the National Science Foundation under Grant No. zzq. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation. 9

References Baddeley, A., Turner, R., Moller, J., and Hazelton, M. (2005). Residual analysis for spatial point processes (with discussion). Journal of the Royal Statistical Society, series B, 67(5):617-666. Baddeley, A., Moller, J., and Pakes, A.G. (2008). Properties of residuals for spatial point processes. Annals of the Institute of Statistical Mathematics, 60:627-649. Daley, D., and Vere-Jones, D. (1988). An Introduction to the Theory of Point Processes. Springer, New York. 10