Residuals for spatial point processes based on Voronoi tessellations

Size: px
Start display at page:

Download "Residuals for spatial point processes based on Voronoi tessellations"

Transcription

1 Residuals for spatial point processes based on Voronoi tessellations Ka Wong 1, Frederic Paik Schoenberg 2, Chris Barr 3. 1 Google, Mountanview, CA. 2 Corresponding author, 8142 Math-Science Building, Department of Statistics, University of California, Los Angeles, CA , USA. frederic@stat.ucla.edu. phone: fax: Department of Biostatistics, Harvard University. 1

2 Abstract A residual analysis method for spatial point processes is proposed, where differences between the modeled conditional intensity and the observed number of points are assessed over the Voronoi cells generated by the observations. The resulting residuals appear to be substantially less skewed and hence more stable, particularly for point processes with conditional intensities close to zero, compared to ordinary Pearson residuals and other pixel-based methods. An application to models for Southern California earthquakes is provided. Key words: Papangelou intensity, Pearson residuals, point patterns, residual analysis, Voronoi residuals. 1 Introduction. The aim of this paper is to propose a new form of residual analysis for assessing the goodness of fit of spatial or spatial-temporal point process models. The proposed method relies on comparing the normalized observed and expected numbers of points over Voronoi cells generated by the observed point pattern. The excellent treatment of Pearson residuals and other pixel-based residuals by Baddeley et al. (2005), the thorough discussion of their properties in Baddeley et al. (2008), and the fact that such residuals extend so readily to the case of spatial-temporal point processes may suggest that the problem of residual analysis for such point processes is generally solved. Hence, we feel it is necessary to devote a substantial portion of this paper to a major shortcoming of such pixel-based residuals, in order to motivate our proposed alternative. In 2

3 brief, Pearson residuals and other pixel-based residuals tend to be highly skewed when the integrated conditional intensity over some of the cells is close to zero, which is common in many applications. By contrast, the proposed Voronoi residuals are approximately Gamma distributed and tend to be far less skewed than Pearson residuals, and are thus far more amenable to assessment of goodness of fit. zzq: We might need to define spatial and spatial-temporal point processes and their intensities, and say that we are assuming throughout that the point processes are simple. We are assuming that the observation region is equipped with Lebesgue measure, µ. Note that we are not emphasizing the distinction between conditional and Papangelou intensities here. The methods and results here are essentially equivalent for spatial and spatial-temporal point processes. This paper is organized as follows. In Section 2, we briefly review the goals of residual analysis as well as Pearson residuals and other pixel-based residuals described in Baddeley et al. (2005), and discuss their limitations when the integrated conditional intensity is small. Section 3 describes Voronoi residuals and discusses their properties. The simulations shown in Section 4 demonstrate the potential advantages of the Voronoi residuals over conventional pixel-based residuals in cases where the conditional intensity is occasionally close to zero. Section 5 includes an application to models for earthquake occurrences in Southern California. 3

4 2 Pearson residuals and other pixel-based methods. Residual plots for spatial point processes have two related purposes: (i) to suggest locations or aspects of the model where the fit is poor, so that an incorrectly specified model may be improved; (ii) to form the basis of formal testing, i.e. to assess the overall appropriateness of a model or to what extent the model fits well and hence results based on the model may be trusted. Baddeley et al. (2005) discuss a variety of pixel-based residuals for spatial point processes. The residual diagnostics are plots showing the standardized differences between the number of points occurring in each plot and the number expected according to the fitted model, where the standardization may be performed in various ways. For instance, for Pearson residuals, one divides the difference by the estimated standard deviation of the number of points in the pixel, in analogy with Pearson residuals in the context of linear models. Baddeley et al. (2005) also propose scaling the residuals based on the contribution of each pixel to the total pseudo-loglikelihood of the model, in analogy with score statistics in generalized linear modeling. Standardization is important for both purposes (i) and (ii), since otherwise plots of the residuals will tend to overemphasize deviations in pixels where the rate is high, and obviously formal testing based on individual pixels requires the standard deviation of the number of points in the pixel to be taken into account. Behind the term Pearson residuals lies the implication, both implicit and explicit (see e.g. the error bounds in Fig.7 of Baddeley et al. 2005), that these standardized residuals should be approximately standard normally distributed, so that the squared residuals, or their sum, are distributed approximately according to Pearson s χ 2 -distribution. Pearson residuals appear to be effec- 4

5 tive model evaluation tools in examples where the estimate of the conditional intensity, λ, is moderately sized throughout the space of observation, as is the case throughout Baddeley et al. (2005). ZZQ: briefly outline other residuals in Baddeley et al. (2005) and Baddeley et al. (2008)? If λ is small, however, then the Pearson residuals will be heavily skewed and their distribution will not be well approximated by the normal or χ 2 distributions. Indeed, when λ is close to zero, the raw residuals tend to have a distribution that is very highly skewed, and the standardization to form Pearson residuals actually exacerbates this skew. These situations arise in many applications, unfortunately. For example, in modeling earthquake occurrences, typically the modeled conditional intensity is close to zero far way from known faults or previous seismicity, and in the case of modeling wildfires, one may have a modeled conditional intensity close to zero in areas far from human use or frequent lightning, or with vegetation types that do not readily support much wildfire activity (zzq: cite). Furthermore, even if λ is not extremely close to zero, if the pixels used for Pearson residuals are sufficiently small so that the integral of λ over pixels is occasionally very small, then the same skew occurs. Since the Pearson residuals are standardized to have mean zero and unit (or approximately unit) variance under the null hypothesis that the modeled conditional intensity is correct (see Baddeley et al. 2008), one may inquire whether the skew of these residuals is indeed problematic. Consider a case of a planar Poisson process where the estimate of λ is exactly correct, i.e. ˆλ(x, y) = λ(x, y) at all locations, and where one elects to use Pearson residuals on pixels. Suppose that there are several pixels where the integral of λ over the pixel is roughly Given many of these pixels, it is not unlikely that at least one of 5

6 them will contain a point of the process. In such pixels, the raw residual will be 0.99, and the standard deviation of the number of points in the pixel is 0.01 = 0.1, so the Pearson residual is This may yield the following effects: a) Such Pearson residuals may overwhelm the others in a visual inspection, rendering a plot of the Pearson residuals largely useless in terms of evaluating the quality of the fit of the model; b) Conventional tests based on the normal approximation will have grossly incorrect p- values, and will commonly tend to reject the model, although it is correct, based on one such residual alone. Even if one adjusts for the non-normality of the residual and instead uses exact p-values based on the Poisson distribution for one such pixel individually, the test will still reject the model at the significance level of c) If one adjusts for the non-normality of the residual and computes exact simultaneous p-values, then the resulting tests will have extremely low power. Indeed, if 10,000 pixels (a grid) are used, then gross mis-specification would be required in order to reject the null hypothesis with more than a probability of merely 10% under such circumstances. ZZQ: We need to check this. We need to simulate Poisson processes to make this last statement more concrete. We will need to make an assumption about the intensity. 3 Voronoi residuals. A Voronoi tessellation is a division of the metric space on which a point process is defined into convex polygons, or Voronoi cells. Specifically, given a spatial or spatial-temporal point pattern N, one may define its corresponding Voronoi tessellation as follows: for each point τ i 6

7 of the point process, its corresponding cell D i is the region consisting of all locations which are closer to τ i than to any other point of N. The Voronoi tessellation is the collection of such cells. See e.g. Okabe et al. (2000) for a thorough treatment of Voronoi tessellations and their properties. Given a model for the conditional intensity of a spatial or space-time point process, one may construct residuals simply by evaluating the Pearson residuals over cells rather than rectangular pixels, where the cells comprise the Voronoi tessellation of the observed spatial or spatial-temporal point pattern. We will refer to such residuals as Voronoi residuals. Voronoi residuals offer one obvious advantage over conventional pixel-based methods, in that the cell sizes are entirely automatic and data-driven in the case of Voronoi residuals. With pixel-based methods, the cell boundaries are often determined rather arbitrarily, yet these boundaries can have immense impacts on the results, particularly when λ is volatile. More importantly, the distributions of the Voronoi residuals tend to be far less skewed than pixel-based methods such as Pearson residuals, particularly when ˆλ is small in some areas. Indeed, since each Voronoi cell has exactly one point inside it by construction, the Voronoi residual for cell i is given by ˆr i := 1 D i ˆλdµ D i ˆλdµ = 1 D i λ D i λ, (1) where λ denote the mean of ˆλ over D i. Note that when N is a homogeneous Poisson process, the cell size D i is approximately Gamma distributed. Indeed, for a homogeneous Poisson process, the expected area of a Voronoi cell is equal to the reciprocal of the intensity of 7

8 the process (Meijering 1953), and simulation studies have shown that the area of a typical Voronoi cell is approximately Gamma distributed (Hinde and Miles, 1980; Tanemura, 2003), and these properties these properties continue to hold approximately in the inhomogeneous case provided that the conditional intensity is approximately constant near the location in question (Barr and Schoenberg 2010). Hence the numerator in equation (??) will often tend to be distributed approximately like a rescaled Gamma random variable. By contrast, for pixels over which the integrated conditional intensity is close to zero, the conventional raw residuals are approximately Bernoulli distributed. 4 Simulated examples. The exact distributions of the Voronoi residuals are generally quite intractable due to the fact that the cells themselves are random. Simulations may be useful to investigate the approximate distributions of these residuals. ZZQ1: Assume a certain specific intensity for a spatial inhomogeneous Poisson process with small pixels. For instance, we could take λ(x, y) = 100x 2 y over the space [ 1, 1] [ 1, 1]. There will be around 67 points. Use a grid of pixels. Look at a typical plot of the raw residuals, Pearson residuals, and tessellation residuals. The raw and Pearson residuals will probably just show the points basically, near the origin. ZZQ2: Simulate the inhomogeneous Poisson process many times and look at histograms of the residuals at the origin for Pearson and Voronoi residuals. For Pearson residuals, use the pixel [0,.01] [0,.01]. 8

9 5 A seismological application. ZZQ3: There are many options here. We can use two models in the CSEP, Collaborative Study of Earthquake Predictability, project. I will get this data. Acknowledgements This material is based upon work supported by the National Science Foundation under Grant No. zzq. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation. 9

10 References Baddeley, A., Turner, R., Moller, J., and Hazelton, M. (2005). Residual analysis for spatial point processes (with discussion). Journal of the Royal Statistical Society, series B, 67(5): Baddeley, A., Moller, J., and Pakes, A.G. (2008). Properties of residuals for spatial point processes. Annals of the Institute of Statistical Mathematics, 60: Daley, D., and Vere-Jones, D. (1988). An Introduction to the Theory of Point Processes. Springer, New York. 10

arxiv: v1 [stat.ap] 26 Jan 2015

arxiv: v1 [stat.ap] 26 Jan 2015 The Annals of Applied Statistics 2014, Vol. 8, No. 4, 2247 2267 DOI: 10.1214/14-AOAS767 c Institute of Mathematical Statistics, 2014 arxiv:1501.06387v1 [stat.ap] 26 Jan 2015 VORONOI RESIDUAL ANALYSIS OF

More information

Voronoi residuals and other residual analyses applied to CSEP earthquake forecasts.

Voronoi residuals and other residual analyses applied to CSEP earthquake forecasts. Voronoi residuals and other residual analyses applied to CSEP earthquake forecasts. Joshua Seth Gordon 1, Robert Alan Clements 2, Frederic Paik Schoenberg 3, and Danijel Schorlemmer 4. Abstract. Voronoi

More information

Evaluation of space-time point process models using. super-thinning

Evaluation of space-time point process models using. super-thinning Evaluation of space-time point process models using super-thinning Robert Alan lements 1, Frederic Paik Schoenberg 1, and Alejandro Veen 2 1 ULA Department of Statistics, 8125 Math Sciences Building, Los

More information

On thinning a spatial point process into a Poisson process using the Papangelou intensity

On thinning a spatial point process into a Poisson process using the Papangelou intensity On thinning a spatial point process into a Poisson process using the Papangelou intensity Frederic Paik choenberg Department of tatistics, University of California, Los Angeles, CA 90095-1554, UA and Jiancang

More information

On the Voronoi estimator for the intensity of an inhomogeneous planar Poisson process

On the Voronoi estimator for the intensity of an inhomogeneous planar Poisson process Biometrika (21), xx, x, pp. 1 13 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 16 17 18 19 2 21 22 23 24 25 C 27 Biometrika Trust Printed in Great Britain On the Voronoi estimator for the intensity of an inhomogeneous

More information

Research Article. J. Molyneux*, J. S. Gordon, F. P. Schoenberg

Research Article. J. Molyneux*, J. S. Gordon, F. P. Schoenberg Assessing the predictive accuracy of earthquake strike angle estimates using non-parametric Hawkes processes Research Article J. Molyneux*, J. S. Gordon, F. P. Schoenberg Department of Statistics, University

More information

Assessing Spatial Point Process Models Using Weighted K-functions: Analysis of California Earthquakes

Assessing Spatial Point Process Models Using Weighted K-functions: Analysis of California Earthquakes Assessing Spatial Point Process Models Using Weighted K-functions: Analysis of California Earthquakes Alejandro Veen 1 and Frederic Paik Schoenberg 2 1 UCLA Department of Statistics 8125 Math Sciences

More information

Assessment of point process models for earthquake forecasting

Assessment of point process models for earthquake forecasting Assessment of point process models for earthquake forecasting Andrew Bray 1 and Frederic Paik Schoenberg 1 1 UCLA Department of Statistics, 8125 Math Sciences Building, Los Angeles, CA 90095-1554 Abstract

More information

arxiv: v1 [stat.ap] 29 Feb 2012

arxiv: v1 [stat.ap] 29 Feb 2012 The Annals of Applied Statistics 2011, Vol. 5, No. 4, 2549 2571 DOI: 10.1214/11-AOAS487 c Institute of Mathematical Statistics, 2011 arxiv:1202.6487v1 [stat.ap] 29 Feb 2012 RESIDUAL ANALYSIS METHODS FOR

More information

Point processes, spatial temporal

Point processes, spatial temporal Point processes, spatial temporal A spatial temporal point process (also called space time or spatio-temporal point process) is a random collection of points, where each point represents the time and location

More information

FIRST PAGE PROOFS. Point processes, spatial-temporal. Characterizations. vap020

FIRST PAGE PROOFS. Point processes, spatial-temporal. Characterizations. vap020 Q1 Q2 Point processes, spatial-temporal A spatial temporal point process (also called space time or spatio-temporal point process) is a random collection of points, where each point represents the time

More information

AN EM ALGORITHM FOR HAWKES PROCESS

AN EM ALGORITHM FOR HAWKES PROCESS AN EM ALGORITHM FOR HAWKES PROCESS Peter F. Halpin new york university December 17, 2012 Correspondence should be sent to Dr. Peter F. Halpin 246 Greene Street, Office 316E New York, NY 10003-6677 E-Mail:

More information

Are Declustered Earthquake Catalogs Poisson?

Are Declustered Earthquake Catalogs Poisson? Are Declustered Earthquake Catalogs Poisson? Philip B. Stark Department of Statistics, UC Berkeley Brad Luen Department of Mathematics, Reed College 14 October 2010 Department of Statistics, Penn State

More information

Testing for Poisson Behavior

Testing for Poisson Behavior Testing for Poisson Behavior Philip B. Stark Department of Statistics, UC Berkeley joint with Brad Luen 17 April 2012 Seismological Society of America Annual Meeting San Diego, CA Quake Physics versus

More information

Multi-dimensional residual analysis of point process models for earthquake. occurrences. Frederic Paik Schoenberg

Multi-dimensional residual analysis of point process models for earthquake. occurrences. Frederic Paik Schoenberg Multi-dimensional residual analysis of point process models for earthquake occurrences. Frederic Paik Schoenberg Department of Statistics, University of California, Los Angeles, CA 90095 1554, USA. phone:

More information

Residuals and Goodness-of-fit tests for marked Gibbs point processes

Residuals and Goodness-of-fit tests for marked Gibbs point processes Residuals and Goodness-of-fit tests for marked Gibbs point processes Frédéric Lavancier (Laboratoire Jean Leray, Nantes, France) Joint work with J.-F. Coeurjolly (Grenoble, France) 09/06/2010 F. Lavancier

More information

Chapter 2. Mean and Standard Deviation

Chapter 2. Mean and Standard Deviation Chapter 2. Mean and Standard Deviation The median is known as a measure of location; that is, it tells us where the data are. As stated in, we do not need to know all the exact values to calculate the

More information

On Mainshock Focal Mechanisms and the Spatial Distribution of Aftershocks

On Mainshock Focal Mechanisms and the Spatial Distribution of Aftershocks On Mainshock Focal Mechanisms and the Spatial Distribution of Aftershocks Ka Wong 1 and Frederic Paik Schoenberg 1,2 1 UCLA Department of Statistics 8125 Math-Science Building, Los Angeles, CA 90095 1554,

More information

A Cautionary Note on Estimating the Reliability of a Mastery Test with the Beta-Binomial Model

A Cautionary Note on Estimating the Reliability of a Mastery Test with the Beta-Binomial Model A Cautionary Note on Estimating the Reliability of a Mastery Test with the Beta-Binomial Model Rand R. Wilcox University of Southern California Based on recently published papers, it might be tempting

More information

POINT PROCESSES. Frederic Paik Schoenberg. UCLA Department of Statistics, MS Los Angeles, CA

POINT PROCESSES. Frederic Paik Schoenberg. UCLA Department of Statistics, MS Los Angeles, CA POINT PROCESSES Frederic Paik Schoenberg UCLA Department of Statistics, MS 8148 Los Angeles, CA 90095-1554 frederic@stat.ucla.edu July 2000 1 A point process is a random collection of points falling in

More information

ORDER RESTRICTED STATISTICAL INFERENCE ON LORENZ CURVES OF PARETO DISTRIBUTIONS. Myongsik Oh. 1. Introduction

ORDER RESTRICTED STATISTICAL INFERENCE ON LORENZ CURVES OF PARETO DISTRIBUTIONS. Myongsik Oh. 1. Introduction J. Appl. Math & Computing Vol. 13(2003), No. 1-2, pp. 457-470 ORDER RESTRICTED STATISTICAL INFERENCE ON LORENZ CURVES OF PARETO DISTRIBUTIONS Myongsik Oh Abstract. The comparison of two or more Lorenz

More information

Critical Values for the Test of Flatness of a Histogram Using the Bhattacharyya Measure.

Critical Values for the Test of Flatness of a Histogram Using the Bhattacharyya Measure. Tina Memo No. 24-1 To Appear in; Towards a Quantitative mehtodology for the Quantitative Assessment of Cerebral Blood Flow in Magnetic Resonance Imaging. PhD Thesis, M.L.J.Scott, Manchester, 24. Critical

More information

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown

Nonparametric Statistics. Leah Wright, Tyler Ross, Taylor Brown Nonparametric Statistics Leah Wright, Tyler Ross, Taylor Brown Before we get to nonparametric statistics, what are parametric statistics? These statistics estimate and test population means, while holding

More information

The number of distributions used in this book is small, basically the binomial and Poisson distributions, and some variations on them.

The number of distributions used in this book is small, basically the binomial and Poisson distributions, and some variations on them. Chapter 2 Statistics In the present chapter, I will briefly review some statistical distributions that are used often in this book. I will also discuss some statistical techniques that are important in

More information

A Graphical Test for Local Self-Similarity in Univariate Data

A Graphical Test for Local Self-Similarity in Univariate Data A Graphical Test for Local Self-Similarity in Univariate Data Rakhee Dinubhai Patel Frederic Paik Schoenberg Department of Statistics University of California, Los Angeles Los Angeles, CA 90095-1554 Rakhee

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Fall 213 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

Robustness and Distribution Assumptions

Robustness and Distribution Assumptions Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology

More information

Module 6: Model Diagnostics

Module 6: Model Diagnostics St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 6: Model Diagnostics 6.1 Introduction............................... 1 6.2 Linear model diagnostics........................

More information

Testing separability in multi-dimensional point processes

Testing separability in multi-dimensional point processes 1 Testing separability in multi-dimensional point processes Frederic Paik Schoenberg 1, University of California, Los Angeles Abstract Nonparametric tests for investigating the separability of a multi-dimensional

More information

On Rescaled Poisson Processes and the Brownian Bridge. Frederic Schoenberg. Department of Statistics. University of California, Los Angeles

On Rescaled Poisson Processes and the Brownian Bridge. Frederic Schoenberg. Department of Statistics. University of California, Los Angeles On Rescaled Poisson Processes and the Brownian Bridge Frederic Schoenberg Department of Statistics University of California, Los Angeles Los Angeles, CA 90095 1554, USA Running head: Rescaled Poisson Processes

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Q-Matrix Development. NCME 2009 Workshop

Q-Matrix Development. NCME 2009 Workshop Q-Matrix Development NCME 2009 Workshop Introduction We will define the Q-matrix Then we will discuss method of developing your own Q-matrix Talk about possible problems of the Q-matrix to avoid The Q-matrix

More information

Statistical tests for evaluating predictability experiments in Japan. Jeremy Douglas Zechar Lamont-Doherty Earth Observatory of Columbia University

Statistical tests for evaluating predictability experiments in Japan. Jeremy Douglas Zechar Lamont-Doherty Earth Observatory of Columbia University Statistical tests for evaluating predictability experiments in Japan Jeremy Douglas Zechar Lamont-Doherty Earth Observatory of Columbia University Outline Likelihood tests, inherited from RELM Post-RELM

More information

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption

Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Application of Parametric Homogeneity of Variances Tests under Violation of Classical Assumption Alisa A. Gorbunova and Boris Yu. Lemeshko Novosibirsk State Technical University Department of Applied Mathematics,

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Spatial Analysis I. Spatial data analysis Spatial analysis and inference Spatial Analysis I Spatial data analysis Spatial analysis and inference Roadmap Outline: What is spatial analysis? Spatial Joins Step 1: Analysis of attributes Step 2: Preparing for analyses: working with

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

Statistics of Non-Poisson Point Processes in Several Dimensions

Statistics of Non-Poisson Point Processes in Several Dimensions Statistics of Non-Poisson Point Processes in Several Dimensions Kenneth A. Brakke Department of Mathematical Sciences Susquehanna University Selinsgrove, Pennsylvania 17870 brakke@susqu.edu originally

More information

Earthquake Clustering and Declustering

Earthquake Clustering and Declustering Earthquake Clustering and Declustering Philip B. Stark Department of Statistics, UC Berkeley joint with (separately) Peter Shearer, SIO/IGPP, UCSD Brad Luen 4 October 2011 Institut de Physique du Globe

More information

Rescaling Marked Point Processes

Rescaling Marked Point Processes 1 Rescaling Marked Point Processes David Vere-Jones 1, Victoria University of Wellington Frederic Paik Schoenberg 2, University of California, Los Angeles Abstract In 1971, Meyer showed how one could use

More information

S The Over-Reliance on the Central Limit Theorem

S The Over-Reliance on the Central Limit Theorem S04-2008 The Over-Reliance on the Central Limit Theorem Abstract The objective is to demonstrate the theoretical and practical implication of the central limit theorem. The theorem states that as n approaches

More information

A homogeneity test for spatial point patterns

A homogeneity test for spatial point patterns A homogeneity test for spatial point patterns M.V. Alba-Fernández University of Jaén Paraje las lagunillas, s/n B3-053, 23071, Jaén, Spain mvalba@ujaen.es F. J. Ariza-López University of Jaén Paraje las

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny

Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny 008 by The University of Chicago. All rights reserved.doi: 10.1086/588078 Appendix from L. J. Revell, On the Analysis of Evolutionary Change along Single Branches in a Phylogeny (Am. Nat., vol. 17, no.

More information

Stat 13, Intro. to Statistical Methods for the Life and Health Sciences.

Stat 13, Intro. to Statistical Methods for the Life and Health Sciences. Stat 13, Intro. to Statistical Methods for the Life and Health Sciences. 1. Review exercises. 2. Statistical analysis of wildfires. 3. Forecasting earthquakes. 4. Global temperature data. 5. Disease epidemics.

More information

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1 Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression

More information

Exact Bounds for Degree Centralization

Exact Bounds for Degree Centralization Exact Bounds for Degree Carter T. Butts 5/1/04 Abstract Degree centralization is a simple and widely used index of degree distribution concentration in social networks. Conventionally, the centralization

More information

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Chapte The McGraw-Hill Companies, Inc. All rights reserved. er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations

More information

Unit 27 One-Way Analysis of Variance

Unit 27 One-Way Analysis of Variance Unit 27 One-Way Analysis of Variance Objectives: To perform the hypothesis test in a one-way analysis of variance for comparing more than two population means Recall that a two sample t test is applied

More information

Two-sample inference: Continuous data

Two-sample inference: Continuous data Two-sample inference: Continuous data Patrick Breheny November 11 Patrick Breheny STA 580: Biostatistics I 1/32 Introduction Our next two lectures will deal with two-sample inference for continuous data

More information

Spatial point processes

Spatial point processes Mathematical sciences Chalmers University of Technology and University of Gothenburg Gothenburg, Sweden June 25, 2014 Definition A point process N is a stochastic mechanism or rule to produce point patterns

More information

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

Statistics Introductory Correlation

Statistics Introductory Correlation Statistics Introductory Correlation Session 10 oscardavid.barrerarodriguez@sciencespo.fr April 9, 2018 Outline 1 Statistics are not used only to describe central tendency and variability for a single variable.

More information

12 The Analysis of Residuals

12 The Analysis of Residuals B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 12 The Analysis of Residuals 12.1 Errors and residuals Recall that in the statistical model for the completely randomized one-way design, Y ij

More information

PRIME GENERATING LUCAS SEQUENCES

PRIME GENERATING LUCAS SEQUENCES PRIME GENERATING LUCAS SEQUENCES PAUL LIU & RON ESTRIN Science One Program The University of British Columbia Vancouver, Canada April 011 1 PRIME GENERATING LUCAS SEQUENCES Abstract. The distribution of

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

UNIVERSITY OF NOTTINGHAM. Discussion Papers in Economics CONSISTENT FIRM CHOICE AND THE THEORY OF SUPPLY

UNIVERSITY OF NOTTINGHAM. Discussion Papers in Economics CONSISTENT FIRM CHOICE AND THE THEORY OF SUPPLY UNIVERSITY OF NOTTINGHAM Discussion Papers in Economics Discussion Paper No. 0/06 CONSISTENT FIRM CHOICE AND THE THEORY OF SUPPLY by Indraneel Dasgupta July 00 DP 0/06 ISSN 1360-438 UNIVERSITY OF NOTTINGHAM

More information

A Monte-Carlo study of asymptotically robust tests for correlation coefficients

A Monte-Carlo study of asymptotically robust tests for correlation coefficients Biometrika (1973), 6, 3, p. 661 551 Printed in Great Britain A Monte-Carlo study of asymptotically robust tests for correlation coefficients BY G. T. DUNCAN AND M. W. J. LAYAKD University of California,

More information

Multivariate Time Series: Part 4

Multivariate Time Series: Part 4 Multivariate Time Series: Part 4 Cointegration Gerald P. Dwyer Clemson University March 2016 Outline 1 Multivariate Time Series: Part 4 Cointegration Engle-Granger Test for Cointegration Johansen Test

More information

Analytic computation of nonparametric Marsan-Lengliné. estimates for Hawkes point processes. phone:

Analytic computation of nonparametric Marsan-Lengliné. estimates for Hawkes point processes. phone: Analytic computation of nonparametric Marsan-Lengliné estimates for Hawkes point processes. Frederic Paik Schoenberg 1, Joshua Seth Gordon 1, and Ryan Harrigan 2. 1 Department of Statistics, University

More information

Analysis of Variance (ANOVA)

Analysis of Variance (ANOVA) Analysis of Variance (ANOVA) Two types of ANOVA tests: Independent measures and Repeated measures Comparing 2 means: X 1 = 20 t - test X 2 = 30 How can we Compare 3 means?: X 1 = 20 X 2 = 30 X 3 = 35 ANOVA

More information

On the Arbitrary Choice Regarding Which Inertial Reference Frame is "Stationary" and Which is "Moving" in the Special Theory of Relativity

On the Arbitrary Choice Regarding Which Inertial Reference Frame is Stationary and Which is Moving in the Special Theory of Relativity Regarding Which Inertial Reference Frame is "Stationary" and Which is "Moving" in the Special Theory of Relativity Douglas M. Snyder Los Angeles, CA The relativity of simultaneity is central to the special

More information

Spatial Autocorrelation

Spatial Autocorrelation Spatial Autocorrelation Luc Anselin http://spatial.uchicago.edu spatial randomness positive and negative spatial autocorrelation spatial autocorrelation statistics spatial weights Spatial Randomness The

More information

A vector identity for the Dirichlet tessellation

A vector identity for the Dirichlet tessellation Math. Proc. Camb. Phil. Soc. (1980), 87, 151 Printed in Great Britain A vector identity for the Dirichlet tessellation BY ROBIN SIBSON University of Bath (Received 1 March 1979, revised 5 June 1979) Summary.

More information

Loglikelihood and Confidence Intervals

Loglikelihood and Confidence Intervals Stat 504, Lecture 2 1 Loglikelihood and Confidence Intervals The loglikelihood function is defined to be the natural logarithm of the likelihood function, l(θ ; x) = log L(θ ; x). For a variety of reasons,

More information

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018 15-388/688 - Practical Data Science: Basic probability J. Zico Kolter Carnegie Mellon University Spring 2018 1 Announcements Logistics of next few lectures Final project released, proposals/groups due

More information

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the CHAPTER 4 VARIABILITY ANALYSES Chapter 3 introduced the mode, median, and mean as tools for summarizing the information provided in an distribution of data. Measures of central tendency are often useful

More information

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1 9.1 Scatter Plots and Linear Correlation Answers 1. A high school psychologist wants to conduct a survey to answer the question: Is there a relationship between a student s athletic ability and his/her

More information

Probability Distributions.

Probability Distributions. Probability Distributions http://www.pelagicos.net/classes_biometry_fa18.htm Probability Measuring Discrete Outcomes Plotting probabilities for discrete outcomes: 0.6 0.5 0.4 0.3 0.2 0.1 NOTE: Area within

More information

Application of branching point process models to the study of invasive red banana plants in Costa Rica

Application of branching point process models to the study of invasive red banana plants in Costa Rica Application of branching point process models to the study of invasive red banana plants in Costa Rica Earvin Balderama Department of Statistics University of California Los Angeles, CA 90095 Frederic

More information

Empirical Bayes Moderation of Asymptotically Linear Parameters

Empirical Bayes Moderation of Asymptotically Linear Parameters Empirical Bayes Moderation of Asymptotically Linear Parameters Nima Hejazi Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi nimahejazi.org twitter/@nshejazi github/nhejazi

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong Statistics Primer ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong 1 Quick Overview of Statistics 2 Descriptive vs. Inferential Statistics Descriptive Statistics: summarize and describe data

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80 71. Decide in each case whether the hypothesis is simple

More information

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD Paper: ST-161 Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop Institute @ UMBC, Baltimore, MD ABSTRACT SAS has many tools that can be used for data analysis. From Freqs

More information

Confidence Intervals and Hypothesis Tests

Confidence Intervals and Hypothesis Tests Confidence Intervals and Hypothesis Tests STA 281 Fall 2011 1 Background The central limit theorem provides a very powerful tool for determining the distribution of sample means for large sample sizes.

More information

Harvard University. Rigorous Research in Engineering Education

Harvard University. Rigorous Research in Engineering Education Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected

More information

EE290H F05. Spanos. Lecture 5: Comparison of Treatments and ANOVA

EE290H F05. Spanos. Lecture 5: Comparison of Treatments and ANOVA 1 Design of Experiments in Semiconductor Manufacturing Comparison of Treatments which recipe works the best? Simple Factorial Experiments to explore impact of few variables Fractional Factorial Experiments

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Two-sample Categorical data: Testing

Two-sample Categorical data: Testing Two-sample Categorical data: Testing Patrick Breheny October 29 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/22 Lister s experiment Introduction In the 1860s, Joseph Lister conducted a landmark

More information

A Spatio-Temporal Point Process Model for Firemen Demand in Twente

A Spatio-Temporal Point Process Model for Firemen Demand in Twente University of Twente A Spatio-Temporal Point Process Model for Firemen Demand in Twente Bachelor Thesis Author: Mike Wendels Supervisor: prof. dr. M.N.M. van Lieshout Stochastic Operations Research Applied

More information

Descriptive Statistics-I. Dr Mahmoud Alhussami

Descriptive Statistics-I. Dr Mahmoud Alhussami Descriptive Statistics-I Dr Mahmoud Alhussami Biostatistics What is the biostatistics? A branch of applied math. that deals with collecting, organizing and interpreting data using well-defined procedures.

More information

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

Advanced Regression Topics: Violation of Assumptions

Advanced Regression Topics: Violation of Assumptions Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, 2005 Applied Regression Analysis Lecture #7-2/15/2005 Slide 1 of 36 Today s Lecture Today s Lecture rapping Up Revisiting residuals.

More information

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed) Institutional Assessment Report Texas Southern University College of Pharmacy and Health Sciences "An Analysis of 2013 NAPLEX, P4-Comp. Exams and P3 courses The following analysis illustrates relationships

More information

Regression Analysis: Exploring relationships between variables. Stat 251

Regression Analysis: Exploring relationships between variables. Stat 251 Regression Analysis: Exploring relationships between variables Stat 251 Introduction Objective of regression analysis is to explore the relationship between two (or more) variables so that information

More information

Earthquake predictability measurement: information score and error diagram

Earthquake predictability measurement: information score and error diagram Earthquake predictability measurement: information score and error diagram Yan Y. Kagan Department of Earth and Space Sciences University of California, Los Angeles, California, USA August, 00 Abstract

More information

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE Course Title: Probability and Statistics (MATH 80) Recommended Textbook(s): Number & Type of Questions: Probability and Statistics for Engineers

More information

Probabilistic approach to earthquake prediction

Probabilistic approach to earthquake prediction ANNALS OF GEOPHYSICS, VOL. 45, N. 6, December 2002 Probabilistic approach to earthquake prediction Rodolfo Console, Daniela Pantosti and Giuliana D Addezio Istituto Nazionale di Geofisica e Vulcanologia,

More information

Two-sample inference: Continuous data

Two-sample inference: Continuous data Two-sample inference: Continuous data Patrick Breheny April 6 Patrick Breheny University of Iowa to Biostatistics (BIOS 4120) 1 / 36 Our next several lectures will deal with two-sample inference for continuous

More information

A Space-Time Conditional Intensity Model for. Evaluating a Wildfire Hazard Index

A Space-Time Conditional Intensity Model for. Evaluating a Wildfire Hazard Index A Space-Time Conditional Intensity Model for Evaluating a Wildfire Hazard Index Roger D. Peng Frederic Paik Schoenberg James Woods Author s footnote: Roger D. Peng (rpeng@stat.ucla.edu) is a Graduate Student,

More information

Gibbs point processes : modelling and inference

Gibbs point processes : modelling and inference Gibbs point processes : modelling and inference J.-F. Coeurjolly (Grenoble University) et J.-M Billiot, D. Dereudre, R. Drouilhet, F. Lavancier 03/09/2010 J.-F. Coeurjolly () Gibbs models 03/09/2010 1

More information

UC Berkeley Math 10B, Spring 2015: Midterm 2 Prof. Sturmfels, April 9, SOLUTIONS

UC Berkeley Math 10B, Spring 2015: Midterm 2 Prof. Sturmfels, April 9, SOLUTIONS UC Berkeley Math 10B, Spring 2015: Midterm 2 Prof. Sturmfels, April 9, SOLUTIONS 1. (5 points) You are a pollster for the 2016 presidential elections. You ask 0 random people whether they would vote for

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information