School on Modelling Tools and Capacity Building in Climate and Public Health April Point Event Analysis

Similar documents
Cluster Analysis using SaTScan

Cluster Analysis using SaTScan. Patrick DeLuca, M.A. APHEO 2007 Conference, Ottawa October 16 th, 2007

A Spatio-Temporal Point Process Model for Firemen Demand in Twente

Bayesian Hierarchical Models

Spatial Point Pattern Analysis

GIST 4302/5302: Spatial Analysis and Modeling Point Pattern Analysis

An adapted intensity estimator for linear networks with an application to modelling anti-social behaviour in an urban environment

Point Pattern Analysis

A homogeneity test for spatial point patterns

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

An Introduction to SaTScan

An Introduction to Pattern Statistics

Outline. Practical Point Pattern Analysis. David Harvey s Critiques. Peter Gould s Critiques. Global vs. Local. Problems of PPA in Real World

Spatial point processes

Points. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Supplementary Information

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Overview of Spatial analysis in ecology

Spatial Point Pattern Analysis

Identification of hotspots of rat abundance and their effect on human risk of leptospirosis in a Brazilian slum community

ESRI 2008 Health GIS Conference

EXPLORATORY SPATIAL DATA ANALYSIS OF BUILDING ENERGY IN URBAN ENVIRONMENTS. Food Machinery and Equipment, Tianjin , China

Inclusion of Non-Street Addresses in Cancer Cluster Analysis

Economics 671: Applied Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

In matrix algebra notation, a linear model is written as

Spatial segregation and socioeconomic inequalities in health in major Brazilian cities. An ESRC pathfinder project

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

Urbanization, Land Cover, Weather, and Incidence Rates of Neuroinvasive West Nile Virus Infections In Illinois

Analyzing the Geospatial Rates of the Primary Care Physician Labor Supply in the Contiguous United States

Everything is related to everything else, but near things are more related than distant things.

USING CLUSTERING SOFTWARE FOR EXPLORING SPATIAL AND TEMPORAL PATTERNS IN NON-COMMUNICABLE DISEASES

Spatial Pattern Analysis: Mapping Trends and Clusters

Exploring the Association Between Family Planning and Developing Telecommunications Infrastructure in Rural Peru

Long Island Breast Cancer Study and the GIS-H (Health)

Multifractal portrayal of the distribution of the Swiss population

Spatial Clusters of Rates

Spatial point process. Odd Kolbjørnsen 13.March 2017

Climate and Health Vulnerability & Adaptation Assessment Profile Manaus - Brazil

STATISTICAL DATA ANALYSIS AND MODELING OF SKIN DISEASES

Panel Data. March 2, () Applied Economoetrics: Topic 6 March 2, / 43

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

BIOMETRICS INFORMATION

Modelling the dynamics of dengue real epidemics

Change Allocation in Spatially-Explicit Models for Aedes aegypti Population Dynamics

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

Statistical Analysis of Spatio-temporal Point Process Data. Peter J Diggle

Hassan Muhsan Khormi #a,b & Lalit Kumar a

MISSING or INCOMPLETE DATA

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

Using AMOEBA to Create a Spatial Weights Matrix and Identify Spatial Clusters, and a Comparison to Other Clustering Algorithms

Lecture 1 Introduction to Multi-level Models

Spatial Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

Statistics 262: Intermediate Biostatistics Regression & Survival Analysis

Chapter 6 Spatial Analysis

Getting started with spatstat

Econometrics of Panel Data

Concepts and Applications of Kriging. Eric Krause

CPSC 540: Machine Learning

Lecture 9 Point Processes

Fully Bayesian Spatial Analysis of Homicide Rates.

Introduction to Structural Equation Modeling

Roger S. Bivand Edzer J. Pebesma Virgilio Gömez-Rubio. Applied Spatial Data Analysis with R. 4:1 Springer

Modeling Spatial Relationships Using Regression Analysis

A MODEL FOR THE EFFECT OF TEMPERATURE AND RAINFALL ON THE POPULATION DYNAMICS AND THE EFFICIENCY OF CONTROL OF Aedes aegypit MOSQUITO

Financial Econometrics and Quantitative Risk Managenent Return Properties

Beside the point? Biodiversity research and spatial point process modelling

Markov Chains. Arnoldo Frigessi Bernd Heidergott November 4, 2015

CRISP: Capture-Recapture Interactive Simulation Package

Medical GIS: New Uses of Mapping Technology in Public Health. Peter Hayward, PhD Department of Geography SUNY College at Oneonta

Bayesian Spatial Health Surveillance

Using Instrumental Variables to Find Causal Effects in Public Health

Econometrics I. Professor William Greene Stern School of Business Department of Economics 1-1/40. Part 1: Introduction

Known unknowns : using multiple imputation to fill in the blanks for missing data

Might using the Internet while travelling affect car ownership plans of Millennials? Dr. David McArthur and Dr. Jinhyun Hong

A stationarity test on Markov chain models based on marginal distribution

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Community Health Needs Assessment through Spatial Regression Modeling

GIST 4302/5302: Spatial Analysis and Modeling

A brief introduction to mixed models

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

Modeling and Performance Analysis with Discrete-Event Simulation

Typical information required from the data collection can be grouped into four categories, enumerated as below.

THE EFFECT OF GEOGRAPHY ON SEXUALLY TRANSMITTED

Computer Science, Informatik 4 Communication and Distributed Systems. Simulation. Discrete-Event System Simulation. Dr.

The regression model with one fixed regressor cont d

Cluster investigations using Disease mapping methods International workshop on Risk Factors for Childhood Leukemia Berlin May

1Department of Demography and Organization Studies, University of Texas at San Antonio, One UTSA Circle, San Antonio, TX

ORF 245 Fundamentals of Engineering Statistics. Final Exam

Hypothesis Testing hypothesis testing approach

Measuring community health outcomes: New approaches for public health services research

Multilevel/Mixed Models and Longitudinal Analysis Using Stata

Heteroscedasticity 1

Log Gaussian Cox Processes. Chi Group Meeting February 23, 2016

Spatial Mapping of Dengue Incidence: A Case Study in Hulu Langat District, Selangor, Malaysia

Disease mapping with Gaussian processes

Lecture 2: Probability and Distributions

Outline. Introduction to SpaceStat and ESTDA. ESTDA & SpaceStat. Learning Objectives. Space-Time Intelligence System. Space-Time Intelligence System

Multiscale Analysis and Modelling of Aedes aegyti Population Spatial Dynamics

Transcription:

2453-12 School on Modelling Tools and Capacity Building in Climate and Public Health 15-26 April 2013 Point Event Analysis SA CARVALHO Marilia PROCC FIOCRUZ Avenida Brasil 4365 Rio De Janeiro 21040360 BRAZIL

Point Event Analysis Marilia Sá Carvalho Fundação Oswaldo Cruz MSC (Fiocruz) Time 1 / 49

Outline 1 Introduction 2 Exploratory Analysis 3 Hypothesis tests 4 Modelling with location 5 Dengue fever MSC (Fiocruz) Time 2 / 49

Introduction Outline 1 Introduction MSC (Fiocruz) Time 3 / 49

Introduction References Bailey,T.C. and Gatrell,A.C.. Interactive Spatial Data Analysis. Longman, 1996. Baddeley, A and Turne, R. Spatstat: an R package for analyzing spatial point patterns. Journal of Statistical Software, 12(6):1-42, 2005. Baddeley, A. Analysing spatial point patterns in R. Workshop Notes, Version 4.1, 2010. Available at http://www.spatstat.org/spatstat/. Wood, S.N.. Generalized Additive Models: An Introduction with R. Chapman & Hall/CRC Texts in Statistical Science Series, 2006. MSC (Fiocruz) Time 4 / 49

Introduction What is point data The simplest spatial data Event Coord X Coord Y 1 3.5 0.34 2 1.6 0.56 3 9.2 1.45 MSC (Fiocruz) Time 5 / 49

Introduction Definition An observed point pattern x is a realisation of a random point process X in two-dimensional space: The number of points is random The locations of the points is random The aim: to estimate parameters of the distribution of X Another: to estimate the effect of a given covariate on the observed pattern Or: to model the point process MSC (Fiocruz) Time 6 / 49

Point pattern state-of-art Introduction Techniques to fit realistic models to point pattern data are new (2000 s) Most applied work is based on hypothesis testing, to detect whether the point pattern is completely random We will try to cover both the classical tests and a bit of modelling The main reason to include this topic in this course is the availability of data (GPS!), that is generally poorly analysed MSC (Fiocruz) Time 7 / 49

Is this a point process? Introduction Location of dengue fever cases Number of Aedes aegypti collected in a random sample of households Results (Positive/Negative) of a dengue fever seroprevalence survey in a random sample of households Original dataset is the counts of leprosy by census tract. As those areas area very small, can we use the centroid? MSC (Fiocruz) Time 8 / 49

Introduction Marked point process Results (Positive/Negative) of a dengue fever seroprevalence survey Counts of leprosy by census tract y = (x 1,m 1 ),...,(x n,m n ),x i W,m i M MSC (Fiocruz) Time 9 / 49

Introduction Marks and Covariates Marks are response variable, integrating the pair of plane coordinates Time Positive/negative results of tests Counts Size of trees Covariates are explanatory variables Income Education Rainfall Temperature MSC (Fiocruz) Time 10 / 49

Introduction Intensity Average density of points per unit area May be constant uniform or homogeneous May vary from location to location inhomogeneous First step in analysing a point pattern MSC (Fiocruz) Time 11 / 49

Introduction Theory If X is homogeneous in any sub-region B of two-dimensional space the expected number of points in B is proportional to the area of B: E[N(XB)] = λarea(b) The constant of proportionality λ is the intensity If a point process is homogeneous, then the empirical density of points is: ˆλ = n(x) area(w) ˆλ is an unbiased estimator of the true intensity λ MSC (Fiocruz) Time 12 / 49

Introduction Complete Spatial Randomness The basic reference model of a random point pattern is the uniform Poisson point process in the plane with intensity λ Complete Spatial Randomness (CSR) Properties: the number of points falling in any region A has a Poisson distribution with mean λ the n points inside A are uniformly distributed inside A the contents of two regions A and B are independent Uniform Poisson process are the null model in a statistical test MSC (Fiocruz) Time 13 / 49

Introduction Stationary & Isotropic If the process is stationary: λ(x) = constant Invariant to translation If the process is isotropic: λ(x,y) = λ h (h=distance between x and y) Invariant to rotation Most hypothesis tests assume a stationary and isotropic process It is a global feature MSC (Fiocruz) Time 14 / 49

Introduction Interaction Stochastic dependence between the points Interaction generates either clusterisation or repulsion Distance between points are used to investigate interaction It is a local feature MSC (Fiocruz) Time 15 / 49

Exploratory Analysis Outline 2 Exploratory Analysis MSC (Fiocruz) Time 16 / 49

Exploratory Analysis Map of Points: External Causes Mortality MSC (Fiocruz) Time 17 / 49

Exploratory Analysis Map of Points The simplest! It allows brief inspection of spatial patterns Different types of events can be depicted However... events in human populations follow the demographic pattern MSC (Fiocruz) Time 18 / 49

Exploratory Analysis Different spatial patterns MSC (Fiocruz) Time 19 / 49

Exploratory Analysis Different spatial patterns MSC (Fiocruz) Time 20 / 49

Exploratory Analysis Different spatial patterns MSC (Fiocruz) Time 21 / 49

Exploratory Analysis Kernel MSC (Fiocruz) Time 22 / 49

Kernel Map: Homicides Exploratory Analysis MSC (Fiocruz) Time 23 / 49

Hypothesis tests Outline 3 Hypothesis tests MSC (Fiocruz) Time 24 / 49

Hypothesis tests Cluster detection A cluster is a group of events geographically limited in size and density that is improbable to be due to randomness (Knox) Causes of clusters: common source, contagion In general space and time concentrated To take into account: Various risk factors age, population density Place of living and place of working Latency period Two types of tests: focused around a suspected source generic MSC (Fiocruz) Time 25 / 49

Hypothesis tests Many tests for CSR χ 2 for quadrats Kolmogorov-Smirnov Maximum likelihood for Poisson processes Nearest neighbour distances: G function Ripley s K-function with Monte Carlo envelope MSC (Fiocruz) Time 26 / 49

Hypothesis tests Pairwise distances and K-function The observed pairwise distances r ij = x i x j in the data is a biased sample of pairwise distances in the point process, in favour of small distances Observed K-function: If CRS ˆK(s) = πs 2 Envelope dimulation ˆK(s) = 1 λn I(d ij < s) i j MSC (Fiocruz) Time 27 / 49

Hypothesis tests L-function Variance stabilised: L-function: ˆL(s) = ˆK(s) π If CSR E(L) = s MSC (Fiocruz) Time 28 / 49

Hypothesis tests K and L-function MSC (Fiocruz) Time 29 / 49

Hypothesis tests K and L-function MSC (Fiocruz) Time 30 / 49

Hypothesis tests Be careful K-function (and other tests F, G) assume the process is stationary Difference between the empirical and theoretical functions are not evidence of cluster, but may be just the variation in intensity over the large scale MSC (Fiocruz) Time 31 / 49

Hypothesis tests Focused tests The cluster is around a point or a line Is there an excess number of cases as compared to some control? Tests include a function of distance to the source MSC (Fiocruz) Time 32 / 49

Hypothesis tests Focused tests Fig.: Larynx Cancer: is it due to the incinerator? MSC (Fiocruz) Time 33 / 49

Hypothesis tests Finding where Local Indicators of Spatial Association (LISA) Scan tests Both will be discussed in Areal data class MSC (Fiocruz) Time 34 / 49

Modelling with location Outline 4 Modelling with location MSC (Fiocruz) Time 35 / 49

Modelling with location Statistical models Define a statistical distribution so that: functional form reflects some properties of interest terms of the probability distribution have an interpretation introduction of covariates is possible Gibbs point processes MSC (Fiocruz) Time 36 / 49

Modelling with location GAM Models For point pattern process we need cases and controls Just the distribution of points could me modelled only as a Gibbs point processes For human diseases, controls can be negative serology or samples of demographic census, but we do need controls It reduces to logistic regression, with a spatial term y s = s(coordx,coordy )+covs s ε MSC (Fiocruz) Time 37 / 49

Modelling with location GAM Models Consider a model with just the spatial term The spatial distribution of points could be modelled as a Gibbs point processes But in GAM setting we use another approach, comparing the spatial distribution of cases to controls For human diseases, controls can be negative serology or samples of demographic census, but we do need controls It reduces to logistic regression, with a spatial term Y Bernoulli(p i ) logit(p i ) = s(coordx,coordy )+ε MSC (Fiocruz) Time 38 / 49

Modelling with location GAM Models Each covariate included could explain the spatial distribution If explained, the map goes flatter The spatial term may interact with some factor covariate logit(p i ) = s(coordx,coordy )+covs s +ε MSC (Fiocruz) Time 39 / 49

Modelling with location Interaction Time-Space Fig.: Seroprevalence for Leptopirosis, Pau da Lima/Ba, two years follow-up MSC (Fiocruz) Time 40 / 49

Dengue fever Outline 5 Dengue fever MSC (Fiocruz) Time 41 / 49

Dengue fever Dengue fever Dengue is a mosquito-borne viral infection Four serotypes, no cross imunogenicity High frequency of asymptomatic infection, as shown by seroprevalence studies: 45.5% of schoolchildren in Niterói in 1987 29.2% of schoolchildren in Paracambi in 1997 26.6% in pre-schoolchildren Salvador in 1998 and 33.2% in 2000 No easy serotype identification based on IgG MSC (Fiocruz) Time 42 / 49

Dengue fever Dengue fever transmission Aedes aegypti is the most important dengue vector worldwide It is domestic it mates, feeds, rests, and lays eggs in and around human habitation Vector population is sensitive to rainfall and temperature Virus transmission varies with temperature MSC (Fiocruz) Time 43 / 49

Dengue fever Intra-city variation Seroprevalence varies among different areas inside the same city: MSC (Fiocruz) Time 44 / 49

Dengue fever Aim of the study 1 To identify potential high-risk intra-urban areas of dengue, using data collected at household level from survey 1 Siqueira-Junior,JB; Maciel, IJ; Barcellos, C; Souza, WV; Carvalho, MS; Nascimento, NE; Oliveira, RM; Morais-Neto, O; Martelli, CMT. Spatial point analysis based on dengue surveys at household level in central Brazil. BMC Public Health 8:361, 2008. MSC (Fiocruz) Time 45 / 49

Dengue fever The data 2581 participants of the 2002 survey Individual data: household coordinates (UTM) eventually more then one person in the same location positive or negative serology age sex school: Incomplete Basic less then 8 years at school Basic complete 8 years High School 11 years College or university nrooms number of rooms in the household ninhab number of people living in the same household MSC (Fiocruz) Time 46 / 49

Dengue fever The data Census tract level data: pop2000 population count ptrashcol % of households with regular trash collection phighschool % of head of household with complete high school meanincome average household income in minimum wages of census tract MSC (Fiocruz) Time 47 / 49

Dengue fever The data 0 40 80 120 10 20 30 40 age 0 40 80 20 60 meanincome ninhab 0 5 15 phighschool 10 30 ptrashcol 50 70 90 20 40 60 80 0 5 10 15 20 50 70 90 MSC (Fiocruz) Time 48 / 49

Dengue fever The data 0 500 1000 1500 College High School Basic Incomp.Basic 0 500 1000 1500 Female Male Positive Negative Positive Negative MSC (Fiocruz) Time 49 / 49