CONDUCTING INFERENCE ON RIPLEY S K-FUNCTION OF SPATIAL POINT PROCESSES WITH APPLICATIONS

Size: px
Start display at page:

Download "CONDUCTING INFERENCE ON RIPLEY S K-FUNCTION OF SPATIAL POINT PROCESSES WITH APPLICATIONS"

Transcription

1 CONDUCTING INFERENCE ON RIPLEY S K-FUNCTION OF SPATIAL POINT PROCESSES WITH APPLICATIONS By MICHAEL ALLEN HYMAN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2013

2 c 2013 Michael Allen Hyman 2

3 To my loving family and friends The first law of ecology is that everything is related to everything else. - Barry Commoner 3

4 ACKNOWLEDGMENTS First and foremost, I would like to acknowledge my advisor, Dr. Linda J. Young. Dr. Young, you were to person to convince me to achieve this degree four years ago and the person to push me every step of the way to see it finished. Throughout the past four years, you have devoted an enormous amount of your free time into helping and teaching me. You have also granted me many incredible opportunities and I thank you for every one of them. You have gone above and beyond the role of an advisor and I am forever grateful. I have learned so much about statistics and life and also have had a lot of fun working with you. You have taught me to stay calm when things don t work out and how to identify the problem, whether it is big or small. Thank you for everything you have done. I would also like to acknowledge all of my professors at the University of Florida. Specifically, I would like to thank Dr. George Casella and Dr. Nikolay Blitznuk for going out of their way to help students in IFAS statistics. I would also like to acknowledge Dr. Christina Staudhammer for introducing me to applied statistical research and Dr. Mihai Giurcanu for always finding the time to help me when I needed it. Nikolay, thank you for taking the initiative to start the R workshop this past year. Not only have I become a better programmer, but I would not have finished the simulations for this dissertation without learning to use the HPC. Christie, before becoming one of your students I had spent two years in classrooms learning the theoretical components of statistics. Working on the first applications inspired a love in the subject that before I had only appreciated and gave me the motivation to continue learning. Thank you for sticking by as my co-advisor despite the distance. Dr. Casella, you have inspired so many people with your passion for statistics and love of life. Despite being incredibly busy, you always found time to help students, no matter how small the problem. You are a true inspiration and I am so happy to have had the opportunity to know you and learn from you. 4

5 Finally, I would like to acknowledge my family and friends. To my Mom and Dad, thank you for convincing me (or making me) go to graduate school. You have shown your love and support in so many ways. Thank you for always being there when I needed you. To my sister Casey and brother-in-law Elliot, thank you for the endless support and encouragement. To my girlfriend, Whitney, thank you for your love and support over the past four years. I would not have been able to do this without you working by my side. Finally, I d like to acknowledge all of my friends in the SNRE and statistics department. Thank you Emily and Dan for always being there when a happy hour was needed. Nate and Kenny, thank you for being such good friends and making these past few years so much fun. I would not have been able to complete this degree without you guys walking the road ahead of me. Thank you for all your help. 5

6 TABLE OF CONTENTS page ACKNOWLEDGMENTS LIST OF TABLES LIST OF FIGURES ABSTRACT CHAPTER 1 BACKGROUND AND MOTIVATION Introduction Point Process Summary Inference on Point Processes - Confidence Intervals Inference on Point Processes - Hypothesis Testing Objectives CONFIDENCE INTERVALS FOR RIPLEY S K-FUNCTION Introduction Bootstrapping a Spatial Point Pattern Network Resampling to Construct Confidence Intervals of K (r ) Unbiasedness of Bootstrapped Estimator of K (r ) Simulation Study Results Discussion HYPOTHESIS TESTING FOR RIPLEY S K-FUNCTION Introduction Hahn s (2012) Studentized Permutation Test Proposed Test Statistic Proposed Test Statistic Simulation Study Results Discussion APPLICATION OF METHODS WITH JOSEPH W. JONES ECOLOGICAL RESEARCH CENTER DATA Introduction Joseph W. Jones Ecological Research Center Exploratory Data Analysis Estimation of Confidence Intervals for the K-Function Hypothesis Testing of the K-Function

7 4.5.1 Hypothesis Hypothesis Hypothesis Discussion FUTURE WORK Estimating Confidence Intervals for the K-function Appropriate Number of Networks to Resample Extension to Inhomogeneous Spatial Point Processes Bayesian Methods of Inference Hypothesis Testing for the K-function REFERENCES BIOGRAPHICAL SKETCH

8 Table LIST OF TABLES page 2-1 The processes and their respective parameters used in the simulation study Number of each tree classification observed in each plot of the Joseph. W. Jones Ecological Research Center Estimated Deviation from stationarity for each pattern observed at each plot. The unit of measurement for each Plot is 1 meter x 1 meter P-values for tests of Hypothesis 1 using adult and juvenile trees in all plots P-values for tests of Hypothesis 2 using adult trees in Plot 1 (wiregrass understory) and Plot 2 ( old-field plot) P-values for tests of Hypothesis 2 using juvenile pine trees in Plot 1 (wiregrass understory) and Plot 2 ( old-field plot) P-values for tests of Hypothesis 3 using juvenile pine trees in Plot 1 (single tree harvesting) and Plot 3 (control - no harvesting)

9 Figure LIST OF FIGURES page 1-1 Sample patterns and their resulting K and L functions Example of tiling method Example of Loh and Stein s marked point method Example of toriodal wrapping Example of Loh and Stein s marked point method Dendrogram of networking method Realizations of point patterns for confidence interval simulation Percent coverages for Poisson patterns with intensity= Percent coverages for Poisson patterns with intensity= Percent coverages for Poisson patterns with intensity= Percent coverages for softcore patterns with intensity= Percent coverages for softcore patterns with intensity= Percent coverages for softcore patterns with intensity= Percent coverages for Matern clustered patterns with intensity= Percent coverages for Matern clustered patterns with intensity= Percent coverages for Matern clustered patterns with intensity= Percent coverages for Matern clustered patterns with intensity= Confidence interval widths for Poisson patterns with intensity= Confidence interval widths for Poisson patterns with intensity= Confidence interval widths for Poisson patterns with intensity= Confidence interval widths for softcore patterns with intensity= Confidence interval widths for softcore patterns with intensity= Confidence interval widths for softcore patterns with intensity= Confidence interval widths for Matern clustered patterns with intensity= Confidence interval widths for Matern clustered patterns with intensity=

10 2-23 Confidence interval widths for Matern clustered patterns with intensity= Confidence interval widths for Matern clustered patterns with intensity= s of tests for Poisson point patterns of varying intensities s of tests for softcore point patterns of varying intensities s of tests for Matern point patterns 1 of varying intensities s of tests for Matern point patterns 2 of varying intensities s of tests for hardcore point patterns of varying intensities s of tests for Poisson point patterns when patterns have different intensities Powers of tests comparing Matern 1 patterns and Poisson patterns of varying intensities Powers of tests comparing Matern 2 patterns and Poisson patterns of varying intensities Powers of tests comparing softcore patterns and Poisson patterns of varying intensities Powers of tests comparing hardcore patterns and Poisson patterns of varying intensities s of tests for Poisson point patterns using different numbers of quadrats s of tests for Matern 1 point patterns using different numbers of quadrats s of tests for Matern 2 point patterns using different numbers of quadrats s of tests for softcore point patterns using different numbers of quadrats s of tests for hardcore point patterns using different numbers of quadrats s of tests for Poisson point patterns with different intensities and different numbers of quadrats Powers of tests comparing Matern 1 patterns and Poisson patterns using different numbers of quadrats Powers of tests comparing Matern 2 patterns and Poisson patterns using different numbers of quadrats Powers of tests comparing softcore patterns and Poisson patterns using different numbers of quadrats Powers of tests comparing hardcore patterns and Poisson patterns using different numbers of quadrats

11 4-1 Locations of trees in three plots of the Joseph W. Jones Research Center Estimated K-functions from patterns of trees in three plots Cross-K functions of adult and juvenile pine trees Confidence intervals for adult trees, Plot Confidence intervals for juvenile pine trees, Plot Confidence intervals for adult trees, Plot Confidence intervals for juvenile pine trees, Plot Confidence intervals for adult trees, Plot Confidence intervals for juvenile pine trees, Plot

12 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy CONDUCTING INFERENCE ON RIPLEY S K-FUNCTION OF SPATIAL POINT PROCESSES WITH APPLICATIONS Chair: Linda J. Young Major: Interdisciplinary Ecology By Michael Allen Hyman August 2013 In many sciences, spatially-referenced data are collected and analyzed. In some cases, these data represent the locations of a set of events recorded over an area. Collections of these data are referred to as point patterns and the underlying distributions determining these data are point processes. Specifically, when data are collected in 2-dimensional space, the underlying distributions for these data are referred to as spatial point processes. In these cases, analysis is typically focused on the number of points observed and the locations of these events relative to one another. Many summary functions have been developed to describe the interaction among points of spatial point patterns. Ripley s K-function is a commonly used function to describe the spatial interaction of an observed set of points at a range of distances. The statistical properties of this function and its estimators are unknown for most cases. Thus, empirical methods are typically used to to conduct inference on the K-function of an observed point pattern. In this work, we propose several new inferential methods to conduct inference on Ripley s K-function. A new method of setting confidence intervals for Ripley s K-function is proposed using a bootstrap technique for spatial point patterns. The proposed method accounts for the intensity and interaction among points in the pattern and adjusts the bootstrap sample accordingly. Confidence intervals are estimated using the quantiles from bootstrap estimates of the K-function. The variance of the proposed bootstrap 12

13 estimator more closely approximates the variance of the estimator of the K-function than for many current methods of bootstrapping spatial point patterns. A simulation study is conducted to compare this new method to current methods of interval estimation. The percent coverage of the resulting confidence intervals for the K-function and confidence interval widths are determined for the proposed method and current bootstrap methods using point processes with different intensities and interactions among points. A hypothesis test used to compare the K-function across multiple observed patterns is proposed. The purpose of the proposed test is to compare the K-function from single realizations of spatial point processes. Here, two test statistics are proposed using different methods to account for the heteroskedasticity of the K-function at larger distances. A permutation test is used to calculate p-values for the tests. A simulation study compares the proposed test to an existing permutation test using processes with varying intensities and interactions among points. The size of the proposed tests are better controlled when testing patterns with different intensities. The proposed methods for conducting inference on Ripley s K-function are applied to several point patterns recorded at the Joseph W. Jones Ecological Research Center. These patterns represent the locations of longleaf pine trees on plots with different understory composition and different harvesting schemes. The interaction of adult trees and juvenile pine trees is assessed for plots with different treatments and/or forest characteristics. Conclusions drawn from this analysis are helpful for management and conservation efforts of longleaf pine forests. 13

14 CHAPTER 1 BACKGROUND AND MOTIVATION 1.1 Introduction Spatially-referenced data are commonly observed in ecology as well as other disciplines. In general, spatial data presents challenges in analysis due to spatial correlation among the observations. The two basic types of spatial data are geostatistical and point processes. In geostatistics, geographically-referenced, quantitative random variables are observed at a set of locations. Inference is conducted on the distribution of this quantitative variable, adjusting for the spatial covariance structure. A point process is an underlying process giving rise to spatially-referenced events, and a spatial point pattern is a realization of that process. The distribution of all possible realizations is a point process distribution. For point processes, the events themselves are the observations, and interest is in the number of points located in an area and the position of the points relative to one another. The focus of this work is inferential procedures for spatial point processes. A spatial point pattern may be recorded as a marked point process, in which a categorical or quantitative mark is associated with each of the points. This mark might represent the species or diameter of a tree observed at a particular location. Marked point processes can help assess the interaction among several classifications of points. A point pattern is observed in d-dimensional space, where d is typically 1, 2, or 3 dimensions for most applications. The locations of cellular phone towers in a state is an example of a 2-dimensional spatial point pattern. Similarly, the locations of cell nuclei on a piece of brain tissue or the locations of galaxy clusters in the observable universe are examples of 2-dimensional and 3-dimensional point patterns, respectively, observed at vastly different scales. In ecology, point processes can represent the locations of some species of interest or the locations at which an event that affects the ecosystem occurs. Examples include the locations of nesting sites of endangered species or the locations 14

15 of a particular invasive plant species recorded over an area of space. A specific example is given in Chapter 4 where the marked point patterns contain the locations of trees in several plots of land in the Joseph W. Jones Ecological Research Center. The marks are an age classification, labeling each tree as either adult or juvenile. As with these data, spatially-referenced ecological data is recorded in d = 2 dimensions. For the remainder of this dissertation, a point process will represent the general process in some d-dimensional space, and a spatial point process will specifically refer to a point process in 2-dimensions. Recently, ecological modeling methods have been proposed that utilize point process distributions (Illian and Burslem, 2007; Illian et al., 2009; Warton and Shephard, 2010). However, many of the statistical properties of the distributions themselves are unknown, and conducting inference is challenging. Summary statistics have been developed to assess the quantities of interest for point processes (i.e., the mean number of points per unit area, the interaction among points, etc.). The statistical distributions of these statistics are generally unknown, except for the simplest cases. Therefore, empirical methods of conducting inference have been proposed to analyze observed point patterns. The purpose of this dissertation is to expand upon these methods and to propose new methods for conducting inference on spatial point processes. In Chapter 4, we apply these new and some existing methods to several recorded spatial point patterns to demonstrate how they can be used to inform management decisions in forestry. 1.2 Point Process Summary A spatial process is a set of events Z occurring at locations s in a set X, a subset of R d ; that is {Z (s) : s X R d }. For spatial point processes, X is a subset of R 2. In point process theory, the exact locations of the points {s 1, s 2,..., s n } are typically lost in favor of more convenient notation. Let N be a counting measure on X. For each Borel set A, N(A) represents the number of events on A. Thus N(A) = {0, 1, 2,...} for all 15

16 A X, where X represents the Borel σ-field of X R 2. If N(A) is known for every set A X, then this is equivalent to knowing all locations of events {s 1,...s n }. If the counting measure N(A) is locally finite, then N(A) < for all bounded sets A X (Cressie, 1991). Here we also make the assumption that point processes are simple. That is, the probability of observing more than one point at a given location is 0. This assumption often holds in ecological application and violations can make interpretation and some inferential methods difficult. In the above notation, X can be thought of as a surface over which the statistical distribution is found. This statistical distribution is defined by the counting measure N. For any observed, bounded region A of this surface, the number of points N(A) observed in A is a random variable with a probability assigned to each possible value (N(A) = 0, 1, 2,...). Notating a point process in this way allows us to assign probability distributions to a process observed on a bounded, closed region and to conduct basic inference on point patterns. Similar to quantitative distributions, point processes can be defined by their moments. The first-order moment of a point process is referred to as the first-order intensity of the process (often referred to as the intensity) and is analogous to the mean of a quantitative distribution. Formally, if ν d represents a Lebesgue measure of D R d and N(ds) represents a counting measure on the infinitesimally small Borel set ds D centered at point s, then E [N(ds)] λ(s) = lim ds 0 ν d (ds). (1 1) In the case where d = 2, ν d (A) represents the area of the Borel set A D, and if d = 3, ν d (A) represents the volume of the ball for A D. If the process is stationary, then λ(s) is constant for all locations s and the intensity (λ) of a point process is the expected number of points per unit area, λ = E [# of points per unit area]. (1 2) 16

17 If the process is heterogeneous, then λ(s) represents the intensity of the process at location s X. The intensity λ may be a function of a set of covariates whose values are recorded at each location s. In this case, the average intensity over an area A can be determined by λ A = 1 λ(s)ds (1 3) A A Under the assumption of stationarity, an unbiased estimator of the intensity of a process is ^λ = n A, (1 4) where n is the observed number of points in region A. Here we will let A refer to the area of a bounded region in 2-dimensional space. Stationary, isotropic point processes can be categorized as one of three classes: completely random, spatially clustered, or regular. A point process is a completely random pattern, and is said to exhibit complete spatial randomness or CSR if 1) the average number of points per unit area is constant over the entire area A and 2) the number of events in two non-overlapping Borel sets, A 1 and A 2, are independent of one another. In this case, the number of events in A is Poisson distributed (Schabenberger and Gotway, 2005). A completely spatially random (CSR) process is also known as a homogeneous Poisson point process. Under CSR, each event has a constant probability of occurring at any location in the region (Cressie, 1991; Illian et al., 2008). More formally, the counting measure defined on a set A is distributed as P (N (A) = n) = e λν(a) (λν(a)) n ; (1 5) n! where ν(a) represents the volume of A in d-dimensional space (the area if d = 2) and λ represents the mean number of points per unit area (constant at all locations). For spatial point processes, the first null hypothesis tested is generally that a pattern exhibits CSR. If a pattern exhibits CSR, points are dispersed randomly and a researcher can do little more to summarize or explain them. If this null hypothesis is rejected, a 17

18 researcher can conclude that there is heterogeneity or interaction among events and may choose to investigate the intensity function or interpoint dependence in the pattern (Schabenberger and Gotway, 2005). A binomial point process is closely related to a homogeneous Poisson process and results from a Poisson process conditional on n points being observed. In this case, if a pattern is a realization of a homogeneous Poisson process with n points observed on A, then the number of points in any subset A sub A is distributed as a binomial random variable with a mean equal to n A sub A. Although the homogeneous Poisson process is a common benchmark distribution to test all observed realizations against, it is rarely seen in practice. Points recorded in disjoint regions might indeed be independent of one another, yet the density at which the points occur may not be homogeneous. In other cases, a constant density might be observed across a sample area, but the events are not independent of one another. Interactions might exist among points making it more or less probable that another point is located nearby relative to CSR. General classifications for stationary patterns that depart from CSR are clustered and regular. In regular patterns, the probability of observing a point near an arbitrary point in the pattern is smaller, and thus the expected number of points within a given radius of an arbitrary point is also smaller, than it is under the assumption of CSR. An example of this might include the locations of nesting sites of a species that tend to avoid others due to competition. For clustered patterns, the probability of observing points that are nearby one another is greater than under CSR. Thus the expectation of the number of points within a given radius of a particular point is greater than under CSR. Trees may be clustered at early stages of forest development because parent trees drop seeds from which seedlings grow. However, trees compete for the same resources and, after some time, may have a regular pattern. The second-order intensity of a point process describes the interaction or relative position among points. If s i and s j are two points in d-dimensional space and ds i and ds j 18

19 the infinitesimally small balls centered at these points, then the second-order intensity of a point process is λ 2 (s i, s j ) = E [N(ds i )N(ds j )] lim ν d (ds i ) 0,ν d (ds j ) 0 ν d (ds i )ν d (ds j ). (1 6) Interpretation of the second order intensity is difficult and simpler measures to assess the dependency among points are desired. Thus, Ripley introduced the use of the K-function (also known as the reduced second moment measure) to assess the second-order characteristics of stationary, isotropic processes (Ripley, 1976). The K-function is a function of distance r: K (r ) = 2π r xλ λ 2 2 (x)dx. (1 7) 0 For a simple process, λk (r ) represents the number of points less than distance r from an arbitrary point in the process and thus, K (r ) = E [N (s 0, r ) point at s 0 ]. (1 8) λ The moments of point processes can be used to define some of the typical assumptions that are made when conducting inference on point processes. A point process is considered homogeneous if the first-order moment (the intensity) is constant over space. A process is considered stationary if the process is invariant to translations (Schabenberger and Gotway, 2005). A process is called isotropic if it is invariant to rotations around a point (the second-order moment depends only on the distance between two events) (Schabenberger and Gotway, 2005). Together, K (r ) and λ define the first and second moments of a stationary and isotropic process (Stoyan et al., 1995). However, just as the mean and covariance of two random variables do not provide a complete description of their bivariate distribution, the first and second order intensity measures give an incomplete description of a point process (Baddeley and Silverman, 1984). 19

20 The K-function has several properties that make it the most common function for assessing the second-order properties of an observed pattern. Because λk (r ) represents the average number of points within distance r from an arbitrary point in the pattern, it is easily interpretable. The second-order intensity for a process can be derived if its K-function is known (Schabenberger and Gotway, 2005). The K-function is invariant to the intensity of a process so the second-order characteristics of patterns can be compared despite differences in the numbers of points (Baddeley et al., 2000). The K-function can be used to observe the interaction among points at a range of distances (Schabenberger and Gotway, 2005). This can be useful for processes that exhibit one particular type of interaction at small distances and a different form of interaction at larger distances. Finally, the K-function is invariant to observations missing completely at random (Schabenberger and Gotway, 2005). The theoretical value for K (r ) under CSR is a function of r, K (r ) = πr 2, the area of the circle of radius r: K (r ) = E [# of points < r from an arbitrary point in the observed region] λ [ e λa (λa) n E = λ = λπr 2 λ = πr 2 n! ] (1 9) Several transformations of the K-function have also been proposed. The most common of these is the L-function, which is used to correct for the heteroscedasticity of the K-function (Besag, 1977): L(r ) = K (r ) π. (1 10) 20

21 Thus, in the case of a homogeneous Poisson process, L(r ) = r. Figure 1-1 shows realizations of three spatial point processes: a CSR pattern, a clustered pattern, and a regular pattern. The corresponding K and L functions are also shown for each of the patterns. To estimate K (r ), the term λ 2 ν d (A)K (r ) is typically estimated and divided by an estimator of λ 2 ν d (A). A naive estimator of λ 2 ν d (A)K (r ) is x A y x I { y x r} where I represents the indicator function and. represents Euclidean distance. However, this estimator is biased low. The bias is due to points lying outside the boundaries of region A that are not observed, but are still within distance r of a point in the pattern. That is, if point z is unobserved because it lies outside of A but x z r, then this point pair is not included in the summation. This can make substantial differences for values of r that are large relative to ν d (A). Multiple methods can be used to obtain more accurate or unbiased estimators of K (r ) despite being unable to observe points outside of the region s boundary. Most methods assign a weight w (x, y) to each pair of points (x, y). Ripley s isotropic edge correction (Ripley, 1988) is a common edge correction weight that is useful under the assumptions of stationarity and isotropy. Let A be the 2-dimension region in which a realization of a process is observed. Let x, y A be points observed in A. If C represents the circumference of the circle centered at point x and passing directly through point y, and C represents the length of C that lies inside A, then weight w (x, y) = 1 C /C. That is, w (x, y) is the reciprocal of the proportion of the circumference of the circle centered at x and passing through y that lies inside of the region A. This implies that, if a point fell directly on the straight boundary of an open set, any circle with an arbitrary radius r centered at this point would fall half inside the region. Therefore, any point falling within distance r of this point would be weighted by two, because it would be equally likely to have observed another point outside of the boundary. Calculating this weight for each 21

22 point pair observed in A, an estimator of K (r ) is ^K (r ) = A n 2 w (x, y)i ( x y < r ). (1 11) x A y x This estimator is slightly biased but is the most common because the bias is low and it is relatively easy to calculate, especially for rectangular or circular windows (Ohser, 1983). ^K (r ) is shown to be asyptotically normal as the number of points approaches infinity (Ripley, 1981). It is also consistent for K (r ) for ergodic point processes, as the size of the study area increases in R 2 (Nguyen and Zessin, 1979). Ripley s isotropic edge correction can easily be extended to patterns in d-dimensions where d > 2. In addition, other estimators and edge correction weights have been suggested that result in an unbiased estimate of K (r ) (Cressie, 1991; Ohser, 1983). Ripley s weights are easily calculated for a rectangular or circular window; however, other weights may be more appropriate for more complex windows (Ohser, 1983). 1.3 Inference on Point Processes - Confidence Intervals For spatial point patterns, inference must be conducted on the information provided by the points themselves. Because interpoint distances are some of the most informative and defining characteristics of a particular point process, the summary statistics used to describe a pattern, such as the K-function, tend to be functions of distance. Second-order analysis of point processes is a relatively unexplored area of statistics, so many of the original methods are the most commonly used techniques today. Exact distributions of the common estimators and the statistical properties of the processes are unknown, except under CSR or other simple processes. Thus, work on spatial point processes tends to be empirical in nature. Here we review the previous work in the area of interval estimation and hypothesis testing for point processes or for a specific parameter of a point process. Note that, although these works are related, they may have different objectives. For instance, the K-function can be identical for 22

23 processes having different second-order structures (Baddeley et al., 2000; Baddeley and Silverman, 1984). Thus testing the equivalence of K (r ) for multiple processes is not equivalent to testing whether the second-order intensity of the processes is equal. However, testing the equivalence of the K-function can result in valuable information about the second-order structures of the processes. The motivation for the following works range from confidence interval calculation and hypothesis testing under model specification, to testing process equivalence. Ripley s development of the K-function (Ripley, 1976) allowed researchers to interpret the spatial dependence inherent in a point pattern. Monte Carlo approaches are the foundation for the most widely applied inferential methods for K (r ) and other summary statistics for point processes(barnard, 1963; Besag and Diggle, 1977; Chiu, 2007; Hope, 1968; Koen, 1991). Most commonly, an observed statistic is compared to the distribution of the statistic under the assumption of CSR. Suppose an observed point pattern contains n points. To determine a simulation envelope for a parameter under the assumption of CSR, n points are simulated randomly in a finite region A, and the estimate of that parameter is calculated. This is replicated B times resulting in a simulation envelope. For example, to find an interval for the K-function under CSR, B homogeneous Poisson patterns are simulated and ^K i (r ) for i = 1,..., B calculated. Then the 100% simulation envelope is given by ^K l (r ) = min { ^K i (r )} and ^K u (r ) = max { ^K i (r )}. (1 12) i=1,...,b i=1,...,b To test whether the observed pattern is from a CSR process, the observed K-function is compared to this simulated envelope. Many methods of comparison can be used, and Cressie (Cressie, 1991) suggests a test statistic of the form TS = 0 { ( ) } 2 1/2 ^K (r ) π 1/2 r dr (1 13) 23

24 where this statistic is calculated for the observed pattern and for each of the simulated patterns from a CSR process. An observed test statistic that is greater than any calculated from CSR would imply deviation from CSR, either from clustering or regularization or a combination of clustering and regularization. Tests of this nature can be used in ecology to test for pattern transference (such as the observed locations of migratory animals over different regions) and space-time interactions (such as testing for contagion in the observed locations of a particular event) by specifying a null model exhibiting the absence of these interactions and comparing the observed and simulated K-functions from the null model (Besag and Diggle, 1977). Monte Carlo methods for testing a null hypothesis have several advantages. An approximation of the distribution of the test statistic is not necessary and thus the p-values are exact in that sense (Schabenberger and Gotway, 2005). They are also flexible in that the null hypothesis can easily be adapted to test complex point process distributions. However, many critical choices are left to the researcher, such as the number of simulations to perform and the distance to which the test statistic is to be evaluated. Diggle and others (Diggle, 1977, 1979; Ho and Chiu, 2006) explored these choices as well as additional test statistics. Monte Carlo methods are beneficial under the assumption of a specified null model. Methods of model fitting and parameter estimation exist for point process data. However, variation in realizations from a parametric point process model can be great, and the power and confidence of the respective hypothesis tests and confidence intervals are contingent on the accuracy of the fitted models (Loh and Stein, 2004). Model fitting typically involves the assumption of a specific type of process, and wrong model specification can result in poor performance of simulation envelopes for both confidence interval estimation and hypothesis testing. Often, confidence intervals for the K-function are desired for an observed point pattern without specification of a null process. Replicated patterns from a process are typically not available to estimate the 24

25 standard error of K (r ). As an alternative, bootstrap methods for dependent data have been applied to point patterns (Davison and Hinkley, 1997; Loh and Stein, 2004) to estimate confidence intervals for the K-function, as well as other parameters. The simplest way to obtain confidence intervals for the K-function of a point pattern observed over a square region, called the splitting method, is to divide the region into N congruent subregions and to calculate N separate estimates of ^K (r ) (Loh and Stein, 2004). Assuming that the N estimates are independent and approximately normally distributed, an estimate for the variance of ^K (r ) can be obtained and the 100(1 α)% confidence interval can be computed by Var{ ^ ^K (r ) ± t ^K i (r )} N 1,α/2 N (1 14) where Var{ ^ ^K i (r )} is the sample variance of ^K 1 (r ), ^K 2 (r ),..., K ^ N (r ), and ^K (r ) is the overall estimate of K (r ) (Loh and Stein, 2004). For patterns with sufficient numbers of points, this method of confidence interval calculation creates fairly accurate intervals. However, these intervals can be wide due to being calculated from a small number of samples (e.g., the quadrats). An obvious limitation is in the distance range that intervals can be determined. For a unit pattern divided into quadrats of area 0.25, the length of each quadrat edge is 0.5. Calculations of K (r ) for distance greater than 0.25 begin to highly weight the edge correction because a high proportion of any circle of that radius would fall outside of each quadrat. At these distances, the type of edge correction used to estimate K (r ) has a much larger influence on the estimates of the K-function than at shorter distances. Issues with small sample sizes, dependence among quadrats, and non-normal samples from quadrats, encountered when calculating confidence intervals by dividing a pattern into multiple independent samples, led to the development of resampling methods for point patterns. Hall (1985) extended block bootstrapping methods used to estimate the statistical characteristics of time series data to spatial Boolean models 25

26 in two dimensions (Hall, 1985). Davison and Hinkley (1997) gave a more general explanation of this method, referring to it as tile resampling (Davison and Hinkley, 1997) and referred to here as tiling. In this method, the goal is to create new patterns that maintain the spatial dependence of the observed pattern. If the observed region A R 2 is partitioned into N disjoint tiles, A 1, A 2,..., A N, the statistic of interest can be defined as T = t(a 1, A 2,..., A N ). Then a resampled pattern is created by taking random samples of the N disjoint tiles, A, 1 A,..., 2 A N, and a bootstrapped estimate of the statistic of interest is calculated from this newly created pattern T = t(a 1, A 2,..., A N ). A common variation is to use moving, overlapping tiles by setting A j = U j + A j where U j is a random vector (Davison and Hinkley, 1997). Politis and Romano (1992) suggest using toroidal wrapping before resampling such that tiles are allowed to fall outside of the boundary and, in this case, are wrapped around to the opposite side of the window (Politis and Romano, 1992). This can be accomplished by creating a new region which is a 3 x 3 grid of the observed region (assuming that the observed region is rectangular). The tiles then contain identical points as if toroidal wrapping is used. This helps avoid bias created by undersampling points near the boundaries of the observed region (Davison and Hinkley, 1997). By performing B resamples and calculating ^K (r ),..., ^K 1 2 (r ),..., K^ B (r ) with the ordered estimates of the statistic of interest from bootstrapping, a 100(1 α)% confidence interval can be created for K (r ) using the formula: [ 2 ^K (r ) ^K (B+1)(1 α/2)(r ), 2 ^K (r ) ^K ] (B+1)(α/2)(r ) (1 15) (Davison and Hinkley, 1997). This creates an equal-tailed confidence interval for K (r ). It is possible that other methods of confidence interval estimation might be more appropriate. However, interest here is in the method of bootstrapping from the pattern. Figure 1-2 shows an example of the creation of an artificial realization from the process using the tile method. For further information on the statistical properties using this 26

27 resampling method, see Hall (1985) for a 2-dimensional spatial case and Künsch (1989) for a 1-dimensional time series analysis (Hall, 1985; Künsch, 1989). The primary problem with creating bootstrapped patterns using tiling is that when tiles are rearranged to produce the new pattern, points that are not originally positioned together can be placed in close proximity. Under the assumption of CSR, the placement of points within the bootstrap samples is random and the resulting patterns should still follow the same distribution. However, if the process has obvious spatial dependence, construction of a new pattern might violate the interpoint dependence structure in the process. If the spatial dependence between points is relatively short-ranged, and the blocks are large enough to adequately capture the spatial dependence between points, consistent results can be obtained (Davison and Hinkley, 1997). However, if the spatial dependence is longer ranged, this method fails at maintaining the original distribution of the observed pattern. Lahiri (1993) shows that putting independent resampled blocks together destroys the long-range dependence of the original observations (Lahiri, 1993). A simple example is that of a hardcore inhibited (regular) process. In this case, the probability of points falling within a certain radius of interaction r 0 of one another is 0. Thus for distances less than r 0, the resulting K-function is K (r ) = 0 for r < r 0. However, when a block bootstrapping approach is applied and the blocks taken from the original pattern are replaced to create a new pattern, the placement of these blocks might violate the hardcore assumption of the process (i.e. points might fall within r 0 of one another in the bootstrapped patterns) (Loh and Stein, 2004). Politis and Romano (1994) develop a similar method of resampling data based on using subsets, called the subsetting method. They extend this to cases of 1-dimensional time-series and multi-dimensional random fields of dependent data (Politis and Romano, 1994). When applied to point pattern data, subsets of the original pattern are used to estimate the value of interest. For this method, suppose a pattern is observed over region A with area A. Then N square subregions A i, i = 1,...N, 27

28 are formed with tiles of size A /N, allowing for overlapping between tiles and toroidal wrapping around A. Let xij and xik, j, k = 1,..., n i represent two distinct points in subregion A i. Then the estimate of K (r ) can be calculated using ^K (r ) = A ni ni N n i i=1 j i w Ai (xij, xik)i ( xij xij < r ), (1 16) where w Ai (xij, xik) is the new weight for point pair xij and xik calculated over subregion A i and I (.) represents the indicator function (Loh and Stein, 2004). Multiple bootstrap resamples are conducted and confidence intervals are created as in Equation Considering only pairs of points in the same subregion to estimate K (r ) reduces the problem of producing new point pairs that violate process assumptions encountered in the method previously described. If wrapping is used, some new point pairs will still exist in the resampled points; however, far fewer than using tiling. Fewer point pairs are used in each region, which is accounted for in the estimation of K (r ) by larger edge correction weights. The larger weights are a result of the smaller subregions being used to estimate the weights involved in Equation This means that the estimates of K (r ) are more highly influenced by the edge corrections used and limitations exist as to the largest value of r for which K (r ) can accurately be estimated (Loh and Stein, 2004). Though several bootstrapping methods have been suggested to resample point processes, asymptotic results of the bootstrap techniques have rarely been provided. One exception is a method developed by Braun and Kulperger (1998), which they refer to as the marked point method, and referred to here as marking or the marked method (Braun and Kulperger, 1998). The results they provide are for point processes in one-dimension, and it is not known whether the theoretical results apply to processes with dimensions greater than one. Because the authors use one-dimension and show their results using the second-order intensity instead of the K-function, we use their notation and definitions. 28

29 The first two moments of the process, p 1 (τ ), and p 2 (τ 1, τ 2 ), are defined as P(X ((τ, τ + h]) > 0) p 1 (τ ) = lim h 0 h (1 17) and P(X ((τ 1, τ 1 + h 1 ]) > 0, X ((τ 2, τ 2 + h 2 ]) > 0) p 2 (τ 1, τ 2 ) = lim, (1 18) h 1 0,h 2 0 h 1 h 2 respectively. Similar to the estimators of the first and second-order intensities, the first two moments are estimated from an observed pattern on the interval (0, T ] using the following equations (Brillinger, 1975): ^p 1 = 1 X ((0, T ]) (1 19) T ^p 2 (τ ) = 1 ht {# of points in (x i + τ, x i + τ + h)} x i (1 20) where τ = τ 2 τ 1 and h is a window, or bin width parameter. Under several assumptions regarding higher-order moments of the process defined by Brillinger (Brillinger, 1975, 1978), these estimators are approximately asymptotic normal as T increases to infinity. However, the asymptotic variance for ^p 1 is difficult to estimate, and the asymptotic variance of ^p 2 is a poor approximation to the true variance of the estimator, even for large intervals (large values of T ) (Braun and Kulperger, 1998). Thus, a new block bootstrap approach provides more accurate confidence intervals (Braun and Kulperger, 1998). Braun and Kulperger first describe an approach similar to the tile method described in Davison and Hinkley (1997). Let X represent a point process of which a pattern X is observed on an interval (0, T ]. Let A represent the set of points (locations) 29

30 t on (0, T ], such that x(t) = 1. Then the point pattern x can be bootstrapped using the following method: 1. Take b to be some positive integer. For each integer i = 1,..., b, generate uniform variates U 1, U 2,..., U b on the interval (0, T T /b]. 2. For j = 1, 2,...b, set each of the following: a) A j = (U j, U j + T /b] A b) A j = A j U j + (j 1)T b c) X j (.) = A j.. 3. Set X = b j=1 X j. Essentially, b randomly selected blocks of length T /b are taken from the observed pattern. The points occurring in the selected blocks are repositioned to create an artificial realization of X. The authors provide some asymptotic results for the distribution of the estimator of the first moment from equation 1 17 (Braun and Kulperger, 1998). As with the Hall and Davison and Hinkley methods, the Braun and Kulperger bootstrap method fails to capture the second-order characteristics of the process. In the same paper, the authors suggest another bootstrap method for estimating the second-order properties of a point process, which they label the marked point process method. Again using their notation for the 1-dimensional case, the second-order properties of the point process, ρ(τ ) (analogous to K (r )) are estimated for the 1-dimension region (0, T ]. For each observed point x (0, T ], a mark is set equal to the number of points in the interval (x + τ, x + τ + h] for a fixed value h. The estimate of the second order intensity ρ(τ ) is given by the equation: ^ρ(τ ) = 1 ht The following theorem is offered by the authors. x i {# of points in (x i + τ, x i + τ + h]}. (1 21) 30

31 THEOREM: Suppose the point process X has finite and integrable fourth moment densities in the sense of Brillinger (1975). For a given h, conditional on the observed point process X on [0,T], ( ht ^ρ 2 (τ ) E (ρ (τ ) X )) 2 N ( ) 0, σ 2 2 (1 22) as T and σ 2 = ρ 2 (τ ) + O(h). For small h, the limiting variance σ 2 2 is approximately the limiting variance ^ρ 2 (τ ) (Braun and Kulperger, 1998). Thus, Braun and Kulperger (1998) show that, for small values of h, the bootstrap estimates of the second-order moment give approximately the correct distribution asymptotically (Braun and Kulperger, 1998). It is still unclear how close of an approximation the bootstrap estimators are to the true second-order moment. In a simulation study using a Matern clustered process, use of the marked bootstrap approach led to more accurate nominal level confidence intervals than their blocked bootstrap approach. Results improved for smaller values of h. It is also unknown whether these results hold in dimensions greater than 1. Loh and Stein (2004) provide an adaptation of the Braun and Kulperger (1998) marked point method for 2-dimensional point patterns, referred to here as the marked point method or marking. For each point x in an observed pattern X in region A, a mark m x (r ) is assigned for distance r. The mark equals the sum of all weights w A (x, y) for points y within distance r of point x. Thus, m x (r ) = w A (x, y)i { y x < r} (1 23) y :y x and is the total contribution by all points within distance r of point x to the estimate of λ 2 A K (r ), where λ is the intensity of the process and A is the area of the observed region. Once marks are applied, points are resampled using one of the various resampling strategies. For the example shown in Figure 1-3, information regarding the spatial location of points y and z are recorded in the mark given to x. Here, the 31

Point Pattern Analysis

Point Pattern Analysis Point Pattern Analysis Nearest Neighbor Statistics Luc Anselin http://spatial.uchicago.edu principle G function F function J function Principle Terminology events and points event: observed location of

More information

Points. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Points. Luc Anselin.   Copyright 2017 by Luc Anselin, All Rights Reserved Points Luc Anselin http://spatial.uchicago.edu 1 classic point pattern analysis spatial randomness intensity distance-based statistics points on networks 2 Classic Point Pattern Analysis 3 Classic Examples

More information

On block bootstrapping areal data Introduction

On block bootstrapping areal data Introduction On block bootstrapping areal data Nicholas Nagle Department of Geography University of Colorado UCB 260 Boulder, CO 80309-0260 Telephone: 303-492-4794 Email: nicholas.nagle@colorado.edu Introduction Inference

More information

Spatial point processes

Spatial point processes Mathematical sciences Chalmers University of Technology and University of Gothenburg Gothenburg, Sweden June 25, 2014 Definition A point process N is a stochastic mechanism or rule to produce point patterns

More information

A Spatio-Temporal Point Process Model for Firemen Demand in Twente

A Spatio-Temporal Point Process Model for Firemen Demand in Twente University of Twente A Spatio-Temporal Point Process Model for Firemen Demand in Twente Bachelor Thesis Author: Mike Wendels Supervisor: prof. dr. M.N.M. van Lieshout Stochastic Operations Research Applied

More information

Overview of Spatial analysis in ecology

Overview of Spatial analysis in ecology Spatial Point Patterns & Complete Spatial Randomness - II Geog 0C Introduction to Spatial Data Analysis Chris Funk Lecture 8 Overview of Spatial analysis in ecology st step in understanding ecological

More information

RESEARCH REPORT. A Studentized Permutation Test for the Comparison of Spatial Point Patterns. Ute Hahn. No.

RESEARCH REPORT. A Studentized Permutation Test for the Comparison of Spatial Point Patterns.   Ute Hahn. No. CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING RESEARCH REPORT www.csgb.dk 2010 Ute Hahn A Studentized Permutation Test for the Comparison of Spatial Point Patterns No. 12, December 2010 A Studentized

More information

Statistícal Methods for Spatial Data Analysis

Statistícal Methods for Spatial Data Analysis Texts in Statistícal Science Statistícal Methods for Spatial Data Analysis V- Oliver Schabenberger Carol A. Gotway PCT CHAPMAN & K Contents Preface xv 1 Introduction 1 1.1 The Need for Spatial Analysis

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Chapter 6 Spatial Analysis

Chapter 6 Spatial Analysis 6.1 Introduction Chapter 6 Spatial Analysis Spatial analysis, in a narrow sense, is a set of mathematical (and usually statistical) tools used to find order and patterns in spatial phenomena. Spatial patterns

More information

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career. Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences

More information

Second-Order Analysis of Spatial Point Processes

Second-Order Analysis of Spatial Point Processes Title Second-Order Analysis of Spatial Point Process Tonglin Zhang Outline Outline Spatial Point Processes Intensity Functions Mean and Variance Pair Correlation Functions Stationarity K-functions Some

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

Probability and Statistics

Probability and Statistics CHAPTER 5: PARAMETER ESTIMATION 5-0 Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 5: PARAMETER

More information

Spatial Models: A Quick Overview Astrostatistics Summer School, 2017

Spatial Models: A Quick Overview Astrostatistics Summer School, 2017 Spatial Models: A Quick Overview Astrostatistics Summer School, 2017 Murali Haran Department of Statistics Penn State University April 10, 2017 1 What this tutorial will cover I will explain why spatial

More information

Lecture 5: Sampling Methods

Lecture 5: Sampling Methods Lecture 5: Sampling Methods What is sampling? Is the process of selecting part of a larger group of participants with the intent of generalizing the results from the smaller group, called the sample, to

More information

Testing Homogeneity Of A Large Data Set By Bootstrapping

Testing Homogeneity Of A Large Data Set By Bootstrapping Testing Homogeneity Of A Large Data Set By Bootstrapping 1 Morimune, K and 2 Hoshino, Y 1 Graduate School of Economics, Kyoto University Yoshida Honcho Sakyo Kyoto 606-8501, Japan. E-Mail: morimune@econ.kyoto-u.ac.jp

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Test of Complete Spatial Randomness on Networks

Test of Complete Spatial Randomness on Networks Test of Complete Spatial Randomness on Networks A PROJECT SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Xinyue Chang IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE

More information

Spatial Point Pattern Analysis

Spatial Point Pattern Analysis Spatial Point Pattern Analysis Jiquan Chen Prof of Ecology, University of Toledo EEES698/MATH5798, UT Point variables in nature A point process is a discrete stochastic process of which the underlying

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

Finite Population Correction Methods

Finite Population Correction Methods Finite Population Correction Methods Moses Obiri May 5, 2017 Contents 1 Introduction 1 2 Normal-based Confidence Interval 2 3 Bootstrap Confidence Interval 3 4 Finite Population Bootstrap Sampling 5 4.1

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano, 02LEu1 ttd ~Lt~S Testing Statistical Hypotheses Third Edition With 6 Illustrations ~Springer 2 The Probability Background 28 2.1 Probability and Measure 28 2.2 Integration.........

More information

Lifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with an Arrhenius rate relationship

Lifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with an Arrhenius rate relationship Scholars' Mine Doctoral Dissertations Student Research & Creative Works Spring 01 Lifetime prediction and confidence bounds in accelerated degradation testing for lognormal response distributions with

More information

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields 1 Introduction Jo Eidsvik Department of Mathematical Sciences, NTNU, Norway. (joeid@math.ntnu.no) February

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

GIST 4302/5302: Spatial Analysis and Modeling Point Pattern Analysis

GIST 4302/5302: Spatial Analysis and Modeling Point Pattern Analysis GIST 4302/5302: Spatial Analysis and Modeling Point Pattern Analysis Guofeng Cao www.spatial.ttu.edu Department of Geosciences Texas Tech University guofeng.cao@ttu.edu Fall 2018 Spatial Point Patterns

More information

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Spatial Analysis I. Spatial data analysis Spatial analysis and inference Spatial Analysis I Spatial data analysis Spatial analysis and inference Roadmap Outline: What is spatial analysis? Spatial Joins Step 1: Analysis of attributes Step 2: Preparing for analyses: working with

More information

Monte Carlo Simulations

Monte Carlo Simulations Monte Carlo Simulations What are Monte Carlo Simulations and why ones them? Pseudo Random Number generators Creating a realization of a general PDF The Bootstrap approach A real life example: LOFAR simulations

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

The Nonparametric Bootstrap

The Nonparametric Bootstrap The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use

More information

Introduction to Spatial Data and Models

Introduction to Spatial Data and Models Introduction to Spatial Data and Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry

More information

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Permutation Tests Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods The Two-Sample Problem We observe two independent random samples: F z = z 1, z 2,, z n independently of

More information

A homogeneity test for spatial point patterns

A homogeneity test for spatial point patterns A homogeneity test for spatial point patterns M.V. Alba-Fernández University of Jaén Paraje las lagunillas, s/n B3-053, 23071, Jaén, Spain mvalba@ujaen.es F. J. Ariza-López University of Jaén Paraje las

More information

An overview of applied econometrics

An overview of applied econometrics An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical

More information

Variance Estimation for Statistics Computed from. Inhomogeneous Spatial Point Processes

Variance Estimation for Statistics Computed from. Inhomogeneous Spatial Point Processes Variance Estimation for Statistics Computed from Inhomogeneous Spatial Point Processes Yongtao Guan April 14, 2007 Abstract This paper introduces a new approach to estimate the variance of statistics that

More information

Defect Detection using Nonparametric Regression

Defect Detection using Nonparametric Regression Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare

More information

A better way to bootstrap pairs

A better way to bootstrap pairs A better way to bootstrap pairs Emmanuel Flachaire GREQAM - Université de la Méditerranée CORE - Université Catholique de Louvain April 999 Abstract In this paper we are interested in heteroskedastic regression

More information

Testing of mark independence for marked point patterns

Testing of mark independence for marked point patterns 9th SSIAB Workshop, Avignon - May 9-11, 2012 Testing of mark independence for marked point patterns Mari Myllymäki Department of Biomedical Engineering and Computational Science Aalto University mari.myllymaki@aalto.fi

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

RESEARCH REPORT. Inhomogeneous spatial point processes with hidden second-order stationarity. Ute Hahn and Eva B.

RESEARCH REPORT. Inhomogeneous spatial point processes with hidden second-order stationarity.  Ute Hahn and Eva B. CENTRE FOR STOCHASTIC GEOMETRY AND ADVANCED BIOIMAGING www.csgb.dk RESEARCH REPORT 2013 Ute Hahn and Eva B. Vedel Jensen Inhomogeneous spatial point processes with hidden second-order stationarity No.

More information

Introduction to Geostatistics

Introduction to Geostatistics Introduction to Geostatistics Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore,

More information

Comparing Non-informative Priors for Estimation and. Prediction in Spatial Models

Comparing Non-informative Priors for Estimation and. Prediction in Spatial Models Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Vigre Semester Report by: Regina Wu Advisor: Cari Kaufman January 31, 2010 1 Introduction Gaussian random fields with specified

More information

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 004 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER II STATISTICAL METHODS The Society provides these solutions to assist candidates preparing for the examinations in future

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

Hypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Hypothesis Testing with the Bootstrap. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Hypothesis Testing with the Bootstrap Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods Bootstrap Hypothesis Testing A bootstrap hypothesis test starts with a test statistic

More information

Spatial analysis of tropical rain forest plot data

Spatial analysis of tropical rain forest plot data Spatial analysis of tropical rain forest plot data Rasmus Waagepetersen Department of Mathematical Sciences Aalborg University December 11, 2010 1/45 Tropical rain forest ecology Fundamental questions:

More information

Physics 509: Non-Parametric Statistics and Correlation Testing

Physics 509: Non-Parametric Statistics and Correlation Testing Physics 509: Non-Parametric Statistics and Correlation Testing Scott Oser Lecture #19 Physics 509 1 What is non-parametric statistics? Non-parametric statistics is the application of statistical tests

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

Statistical inference

Statistical inference Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall

More information

Summary statistics for inhomogeneous spatio-temporal marked point patterns

Summary statistics for inhomogeneous spatio-temporal marked point patterns Summary statistics for inhomogeneous spatio-temporal marked point patterns Marie-Colette van Lieshout CWI Amsterdam The Netherlands Joint work with Ottmar Cronie Summary statistics for inhomogeneous spatio-temporal

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

Statistical Perspectives on Geographic Information Science. Michael F. Goodchild University of California Santa Barbara

Statistical Perspectives on Geographic Information Science. Michael F. Goodchild University of California Santa Barbara Statistical Perspectives on Geographic Information Science Michael F. Goodchild University of California Santa Barbara Statistical geometry Geometric phenomena subject to chance spatial phenomena emphasis

More information

The Not-Formula Book for C2 Everything you need to know for Core 2 that won t be in the formula book Examination Board: AQA

The Not-Formula Book for C2 Everything you need to know for Core 2 that won t be in the formula book Examination Board: AQA Not The Not-Formula Book for C Everything you need to know for Core that won t be in the formula book Examination Board: AQA Brief This document is intended as an aid for revision. Although it includes

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

How do we compare the relative performance among competing models?

How do we compare the relative performance among competing models? How do we compare the relative performance among competing models? 1 Comparing Data Mining Methods Frequent problem: we want to know which of the two learning techniques is better How to reliably say Model

More information

Spatial Models: A Quick Overview Astrostatistics Summer School, 2009

Spatial Models: A Quick Overview Astrostatistics Summer School, 2009 Spatial Models: A Quick Overview Astrostatistics Summer School, 2009 Murali Haran Department of Statistics Penn State University May 20, 2009 1 Spatial Data Beginning statistics: Data are assumed to be

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression

Last Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer

More information

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr.

Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance ECON 509. Dr. Department of Economics Business Statistics Chapter 1 Chi-square test of independence & Analysis of Variance ECON 509 Dr. Mohammad Zainal Chapter Goals After completing this chapter, you should be able

More information

2010 HSC NOTES FROM THE MARKING CENTRE MATHEMATICS EXTENSION 1

2010 HSC NOTES FROM THE MARKING CENTRE MATHEMATICS EXTENSION 1 Contents 2010 HSC NOTES FROM THE MARKING CENTRE MATHEMATICS EXTENSION 1 Introduction... 1 Question 1... 1 Question 2... 2 Question 3... 3 Question 4... 4 Question 5... 5 Question 6... 5 Question 7... 6

More information

Stat 587: Key points and formulae Week 15

Stat 587: Key points and formulae Week 15 Odds ratios to compare two proportions: Difference, p 1 p 2, has issues when applied to many populations Vit. C: P[cold Placebo] = 0.82, P[cold Vit. C] = 0.74, Estimated diff. is 8% What if a year or place

More information

Inferences about Parameters of Trivariate Normal Distribution with Missing Data

Inferences about Parameters of Trivariate Normal Distribution with Missing Data Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 7-5-3 Inferences about Parameters of Trivariate Normal Distribution with Missing

More information

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

A Practitioner s Guide to Cluster-Robust Inference

A Practitioner s Guide to Cluster-Robust Inference A Practitioner s Guide to Cluster-Robust Inference A. C. Cameron and D. L. Miller presented by Federico Curci March 4, 2015 Cameron Miller Cluster Clinic II March 4, 2015 1 / 20 In the previous episode

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

Unit 6 Quadratic Relations of the Form y = ax 2 + bx + c

Unit 6 Quadratic Relations of the Form y = ax 2 + bx + c Unit 6 Quadratic Relations of the Form y = ax 2 + bx + c Lesson Outline BIG PICTURE Students will: manipulate algebraic expressions, as needed to understand quadratic relations; identify characteristics

More information

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner

Econometrics II. Nonstandard Standard Error Issues: A Guide for the. Practitioner Econometrics II Nonstandard Standard Error Issues: A Guide for the Practitioner Måns Söderbom 10 May 2011 Department of Economics, University of Gothenburg. Email: mans.soderbom@economics.gu.se. Web: www.economics.gu.se/soderbom,

More information

Supplement A. The relationship of the proportion of populations that replaced themselves

Supplement A. The relationship of the proportion of populations that replaced themselves ..8 Population Replacement Proportion (Year t).6.4 Carrying Capacity.2. 2 3 4 5 6 7.6.5 Allee.4 Threshold.3.2. 5 5 2 25 3 Moth Density (/trap) in Year t- Supplement A. The relationship of the proportion

More information

Bootstrapping Dependent Data in Ecology

Bootstrapping Dependent Data in Ecology Bootstrapping Dependent Data in Ecology Mark L Taper Department of Ecology 310 Lewis Hall Montana State University/ Bozeman Bozeman, MT 59717 < taper@ rapid.msu.montana.edu> Introduction I have two goals

More information

Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION

Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION INTRODUCTION Statistical disclosure control part of preparations for disseminating microdata. Data perturbation techniques: Methods assuring

More information

An Introduction to Spatial Statistics. Chunfeng Huang Department of Statistics, Indiana University

An Introduction to Spatial Statistics. Chunfeng Huang Department of Statistics, Indiana University An Introduction to Spatial Statistics Chunfeng Huang Department of Statistics, Indiana University Microwave Sounding Unit (MSU) Anomalies (Monthly): 1979-2006. Iron Ore (Cressie, 1986) Raw percent data

More information

Practical Solutions to Behrens-Fisher Problem: Bootstrapping, Permutation, Dudewicz-Ahmed Method

Practical Solutions to Behrens-Fisher Problem: Bootstrapping, Permutation, Dudewicz-Ahmed Method Practical Solutions to Behrens-Fisher Problem: Bootstrapping, Permutation, Dudewicz-Ahmed Method MAT653 Final Project Yanjun Yan Syracuse University Nov. 22, 2005 Outline Outline 1 Introduction 2 Problem

More information

Envelope tests for spatial point patterns with and without simulation

Envelope tests for spatial point patterns with and without simulation Envelope tests for spatial point patterns with and without simulation Thorsten Wiegand, 1,2, Pavel Grabarnik, 3 and Dietrich Stoyan 4 1 Department of Ecological Modelling, Helmholtz Centre for Environmental

More information

Types of Spatial Data

Types of Spatial Data Spatial Data Types of Spatial Data Point pattern Point referenced geostatistical Block referenced Raster / lattice / grid Vector / polygon Point Pattern Data Interested in the location of points, not their

More information

Why do we Care About Forest Sampling?

Why do we Care About Forest Sampling? Fundamentals of Forest Sampling Why Forest Sampling Sampling Theory Terminology Why Use a Sample? Readings: Avery and Burkhart Sampling Chapters Elzinga (website) USDA Sampling Handbook 232 (website) Why

More information

CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH. Awanis Ku Ishak, PhD SBM

CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH. Awanis Ku Ishak, PhD SBM CHOOSING THE RIGHT SAMPLING TECHNIQUE FOR YOUR RESEARCH Awanis Ku Ishak, PhD SBM Sampling The process of selecting a number of individuals for a study in such a way that the individuals represent the larger

More information

Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions

Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions Approximate and Fiducial Confidence Intervals for the Difference Between Two Binomial Proportions K. Krishnamoorthy 1 and Dan Zhang University of Louisiana at Lafayette, Lafayette, LA 70504, USA SUMMARY

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Tests for spatial randomness based on spacings

Tests for spatial randomness based on spacings Tests for spatial randomness based on spacings Lionel Cucala and Christine Thomas-Agnan LSP, Université Paul Sabatier and GREMAQ, Université Sciences-Sociales, Toulouse, France E-mail addresses : cucala@cict.fr,

More information

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics The Nature of Geographic Data Types of spatial data Continuous spatial data: geostatistics Samples may be taken at intervals, but the spatial process is continuous e.g. soil quality Discrete data Irregular:

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Political Science 236 Hypothesis Testing: Review and Bootstrapping

Political Science 236 Hypothesis Testing: Review and Bootstrapping Political Science 236 Hypothesis Testing: Review and Bootstrapping Rocío Titiunik Fall 2007 1 Hypothesis Testing Definition 1.1 Hypothesis. A hypothesis is a statement about a population parameter The

More information

HAPPY BIRTHDAY CHARLES

HAPPY BIRTHDAY CHARLES HAPPY BIRTHDAY CHARLES MY TALK IS TITLED: Charles Stein's Research Involving Fixed Sample Optimality, Apart from Multivariate Normal Minimax Shrinkage [aka: Everything Else that Charles Wrote] Lawrence

More information

INTRODUCTION TO ANALYSIS OF VARIANCE

INTRODUCTION TO ANALYSIS OF VARIANCE CHAPTER 22 INTRODUCTION TO ANALYSIS OF VARIANCE Chapter 18 on inferences about population means illustrated two hypothesis testing situations: for one population mean and for the difference between two

More information

Tree influence on understory vegetation: an edge correction and a conditional model

Tree influence on understory vegetation: an edge correction and a conditional model THESIS FOR THE DEGREE OF LICENTIATE OF PHILOSOPHY Tree influence on understory vegetation: an edge correction and a conditional model SHARON KÜHLMANN-BERENZON Department of Mathematical Statistics Chalmers

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Lecture 2: Poisson point processes: properties and statistical inference

Lecture 2: Poisson point processes: properties and statistical inference Lecture 2: Poisson point processes: properties and statistical inference Jean-François Coeurjolly http://www-ljk.imag.fr/membres/jean-francois.coeurjolly/ 1 / 20 Definition, properties and simulation Statistical

More information

Constructing Prediction Intervals for Random Forests

Constructing Prediction Intervals for Random Forests Senior Thesis in Mathematics Constructing Prediction Intervals for Random Forests Author: Benjamin Lu Advisor: Dr. Jo Hardin Submitted to Pomona College in Partial Fulfillment of the Degree of Bachelor

More information

Practical Statistics

Practical Statistics Practical Statistics Lecture 1 (Nov. 9): - Correlation - Hypothesis Testing Lecture 2 (Nov. 16): - Error Estimation - Bayesian Analysis - Rejecting Outliers Lecture 3 (Nov. 18) - Monte Carlo Modeling -

More information

PhysicsAndMathsTutor.com

PhysicsAndMathsTutor.com 1. A researcher claims that, at a river bend, the water gradually gets deeper as the distance from the inner bank increases. He measures the distance from the inner bank, b cm, and the depth of a river,

More information

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance

Chapter 8 Student Lecture Notes 8-1. Department of Economics. Business Statistics. Chapter 12 Chi-square test of independence & Analysis of Variance Chapter 8 Student Lecture Notes 8-1 Department of Economics Business Statistics Chapter 1 Chi-square test of independence & Analysis of Variance ECON 509 Dr. Mohammad Zainal Chapter Goals After completing

More information

Habilitation à diriger des recherches. Resampling methods for periodic and almost periodic processes

Habilitation à diriger des recherches. Resampling methods for periodic and almost periodic processes Habilitation à diriger des recherches Resampling methods for periodic and almost periodic processes Anna Dudek Université Rennes 2 aedudek@agh.edu.pl diriger des recherches Resampling methods for periodic

More information

Bootstrap Tests: How Many Bootstraps?

Bootstrap Tests: How Many Bootstraps? Bootstrap Tests: How Many Bootstraps? Russell Davidson James G. MacKinnon GREQAM Department of Economics Centre de la Vieille Charité Queen s University 2 rue de la Charité Kingston, Ontario, Canada 13002

More information