Overview of Statistical Analysis of Spatial Data

Similar documents
GIST 4302/5302: Spatial Analysis and Modeling

Michael Harrigan Office hours: Fridays 2:00-4:00pm Holden Hall

GIST 4302/5302: Spatial Analysis and Modeling

GIST 4302/5302: Spatial Analysis and Modeling Lecture 2: Review of Map Projections and Intro to Spatial Analysis

Interaction Analysis of Spatial Point Patterns

Types of spatial data. The Nature of Geographic Data. Types of spatial data. Spatial Autocorrelation. Continuous spatial data: geostatistics

Introduction. Spatial Processes & Spatial Patterns

Nature of Spatial Data. Outline. Spatial Is Special

POPULAR CARTOGRAPHIC AREAL INTERPOLATION METHODS VIEWED FROM A GEOSTATISTICAL PERSPECTIVE

Lecture 8. Spatial Estimation

Spatial Analysis 1. Introduction

Lecture 5 Geostatistics

A spatial literacy initiative for undergraduate education at UCSB

ENGRG Introduction to GIS

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

Overview of Spatial analysis in ecology

Geometric Algorithms in GIS

Geog183: Cartographic Design and Geovisualization Spring Quarter 2018 Lecture 11: Dasymetric and isarithmic mapping

Improving Spatial Data Interoperability

Introduction to Spatial Analysis. Spatial Analysis. Session organization. Learning objectives. Module organization. GIS and spatial analysis

ARIC Manuscript Proposal # PC Reviewed: _9/_25_/06 Status: A Priority: _2 SC Reviewed: _9/_25_/06 Status: A Priority: _2

The Nature of Geographic Data

Spatial analysis. Spatial descriptive analysis. Spatial inferential analysis:

KAAF- GE_Notes GIS APPLICATIONS LECTURE 3

Concepts and Applications of Kriging. Eric Krause

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Spatial Process VS. Non-spatial Process. Landscape Process

Statistical Perspectives on Geographic Information Science. Michael F. Goodchild University of California Santa Barbara

Concepts and Applications of Kriging. Eric Krause Konstantin Krivoruchko

Introduction to Spatial Data and Models

Geographic Information Systems (GIS) in Environmental Studies ENVS Winter 2003 Session III

Bayesian Hierarchical Models

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Cell-based Model For GIS Generalization

Spatial Analysis and Modeling (GIST 4302/5302) Guofeng Cao Department of Geosciences Texas Tech University

Exploratory Spatial Data Analysis (ESDA)

Introduction to Spatial Data and Models

Software. People. Data. Network. What is GIS? Procedures. Hardware. Chapter 1

Introduction to Geostatistics

Outline. Geographic Information Analysis & Spatial Data. Spatial Analysis is a Key Term. Lecture #1

An Introduction to Geographic Information System

Geostatistics and Spatial Scales

GIST 4302/5302: Spatial Analysis and Modeling

SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA

COMPARISON OF DIGITAL ELEVATION MODELLING METHODS FOR URBAN ENVIRONMENT

Spatial Analysis II. Spatial data analysis Spatial analysis and inference

Spatial Regression. 1. Introduction and Review. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Concepts and Applications of Kriging

Introduction. Semivariogram Cloud

Concepts and Applications of Kriging

GIS and Spatial Statistics: One World View or Two? Michael F. Goodchild University of California Santa Barbara

Interpolating Raster Surfaces

Models to carry out inference vs. Models to mimic (spatio-temporal) systems 5/5/15

The Case for Space in the Social Sciences

Popular Mechanics, 1954

Fundamental Spatial Concepts. Michael F. Goodchild University of California Santa Barbara

Mapping and Analysis for Spatial Social Science

Spatial Data Mining. Regression and Classification Techniques

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Spatial Analyst. By Sumita Rai

Class 9. Query, Measurement & Transformation; Spatial Buffers; Descriptive Summary, Design & Inference

Bivariate Distributions. Discrete Bivariate Distribution Example

Outline. Introduction to SpaceStat and ESTDA. ESTDA & SpaceStat. Learning Objectives. Space-Time Intelligence System. Space-Time Intelligence System

Outline ESDA. Exploratory Spatial Data Analysis ESDA. Luc Anselin

way and atmospheric models

IV Course Spring 14. Graduate Course. May 4th, Big Spatiotemporal Data Analytics & Visualization

Outline. 15. Descriptive Summary, Design, and Inference. Descriptive summaries. Data mining. The centroid

Spatial analysis. 0 move the objects and the results change

7 Geostatistics. Figure 7.1 Focus of geostatistics

ENV208/ENV508 Applied GIS. Week 1: What is GIS?

GIS and the Built Environment

Hierarchical Modeling and Analysis for Spatial Data

Geog 469 GIS Workshop. Data Analysis

Soil Moisture Modeling using Geostatistical Techniques at the O Neal Ecological Reserve, Idaho

What is GIS? ESRI Canada. August 2011

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Temporal vs. Spatial Data

Lecture 1: Geospatial Data Models

Introduction to Spatial Statistics and Modeling for Regional Analysis

Why Is It There? Attribute Data Describe with statistics Analyze with hypothesis testing Spatial Data Describe with maps Analyze with spatial analysis

SRJC Applied Technology 54A Introduction to GIS

What are the five components of a GIS? A typically GIS consists of five elements: - Hardware, Software, Data, People and Procedures (Work Flows)

What is GIS? Introduction to data. Introduction to data modeling

2.6 Two-dimensional continuous interpolation 3: Kriging - introduction to geostatistics. References - geostatistics. References geostatistics (cntd.

Performance Analysis of Some Machine Learning Algorithms for Regression Under Varying Spatial Autocorrelation

Spatial Downscaling of TRMM Precipitation Using DEM. and NDVI in the Yarlung Zangbo River Basin

Applied Cartography and Introduction to GIS GEOG 2017 EL. Lecture-2 Chapters 3 and 4

Raster Spatial Analysis Specific Theory

Introduction to GIS. Dr. M.S. Ganesh Prasad

Introduction to GIS I

Institutional Opportunities and Constraints. Michael F. Goodchild

Spatial and Environmental Statistics

Section C: Management of the Built Environment GIS As A Tool: Technical Aspects of Basic GIS

Spatial Units, Scaling and Aggregation (Level 1) October 2017

ENGRG Introduction to GIS

GIST 4302/5302: Spatial Analysis and Modeling Point Pattern Analysis

Combining Incompatible Spatial Data

Transcription:

Overview of Statistical Analysis of Spatial Data Geog 2C Introduction to Spatial Data Analysis Phaedon C. Kyriakidis www.geog.ucsb.edu/ phaedon Department of Geography University of California Santa Barbara Santa Barbara, CA 936-6 phaedon@geog.ucsb.edu Spring Quarter 9 Outline Preliminaries Types of Spatial Data Why Spatial Statistics? Points to Remember Ph. Kyriakidis (UCSB) Geog 2C Spring 9 2 /

Introduction & Objectives Preliminaries Spatial data Geo-referenced attribute measurements; each measurement is associated with a location (point) or an entity (region or object) in geographical (or other) space attribute measurement scale can be continuous or discrete, e.g., chemical concentration, soil types, disease occurrences sample locations can have a regular or irregular spatial arrangement, i.e., data locations on a raster (regular lattice) or scattered in space; domain informed by a measurement is called the sample unit or support, e.g., points, pixels, polygons spatial data often have an additional temporal component; dynamic attribute evolution in space and time, spatiotemporal support Objectives of this handout to provide a brief overview of types of spatial data to highlight the role of spatial statistics in analyzing data of each type Ph. Kyriakidis (UCSB) Geog 2C Spring 9 3 / Preliminaries Stages in Spatial Data Analysis Exploratory analysis explore spatial data using cartographic (or other visual) representations statistical analysis for detecting possible sub-populations, outliers, trends, relationships with neighboring values or other spatial variables Modeling or confirmatory analysis establish parametric or non-parametric model(s) characterizing attribute spatial distribution estimate model parameters from data; evaluate their statistical significance; predict attribute values at other locations and/or future time instants Notes any processing of spatial data, e.g., filtering or interpolation, affects any inference made from them boundaries between above stages not always clear-cut Ph. Kyriakidis (UCSB) Geog 2C Spring 9 4 /

Types of Spatial Data Attributes Varying Continuously in Space Characteristics also known (unfortunately) as geostatistical data, e.g., temperature, rainfall, elevation, population density measurements of nominal scale, e.g., land cover types, or interval/ratio scale, e.g., sea floor depth often, sparse samples are available only at fixed set of locations 39 38.5 Bay Area rain gauge precipitation mm/day 14 12 38.5 8 6 36.5 4 1981 82 NDJ average 36 123.5 123 122.5 122 121.5 121 2 Ph. Kyriakidis (UCSB) Geog 2C Spring 9 5 / Area or Lattice Data Characteristics Types of Spatial Data attributes take values only at fixed set of areas or zones, e.g., administrative districts, pixels of satellite images typically, all possible locations have been sampled; no attribute values between sampling units (unless there are missing values) 36.5 36 35 34.5 34 33.5 From 1979 to 1984 SIDS Cases in North Carolina 84 83 82 81 8 79 78 77 76 Distinction between spatially continuous and area (lattice) data not always clear-cut, particularly when the latter are derived via aggregation from the former Ph. Kyriakidis (UCSB) Geog 2C Spring 9 6 /

Types of Spatial Data Point Pattern Data Characteristics series of point locations with recorded events, e.g., locations of trees, disease or crime incidents point locations correspond to all possible events (mapped point pattern), or to a subset (sampled point pattern) attribute values also possible at same locations, e.g., tree diameter, magnitude of earthquakes (marked point pattern) Lansing Woods tree locations Bay Area earthquake magnitudes.8.6 maple 38.5 38 5.5 5 4.5.5 4.4.2 hickory 36.5 3.5 3. 1962 1981 36 123.5 123 122.5 122 121.5 121 1.5 1 2.5..2.4.6.8 Ph. Kyriakidis (UCSB) Geog 2C Spring 9 7 / Types of Spatial Data Spatial Interaction or Network Data Characteristics attributes relate to pairs of points or areas: flows from origins to destinations, e.g., patients flow from residences to hospitals less tangible flows, e.g., information, could be defined Analysis objectives modeling of flow patterns = finding relationships between observed flows and explanatory variables, e.g., number of trips from origins to destinations as function of income classical analysis methods focus on patterns of aggregate interaction, rather than individuals themselves; more recent focus is placed on understanding individual preferences and choice modeling spatial location/allocation problems, and more generally spatial optimization problems, typically involve network data Methods for analyzing spatial interaction data are not covered in this course Ph. Kyriakidis (UCSB) Geog 2C Spring 9 8 /

Why Spatial Statistics? Univariate Statistics and Spatial Pattern? Two 1D attribute profiles with the same histogram: 3 1D population 3 1D population 2 2 1 1 value value 1 1 2 2 3 6 7 8 9 x 3 6 7 8 9 x Shortcomings of univariate statistics Univariate statistics, e.g., average, variance, histogram, do not suffice to describe spatial pattern; the spatial arrangement of attribute values matters, too Spatial auto-correlation an aspect of spatial pattern Attribute values measured at nearby supports tend to be more similar than those measured at distant supports; Tobler s 1st law(?) of Geography Ph. Kyriakidis (UCSB) Geog 2C Spring 9 9 / Why Spatial Statistics? Role of Spatial Statistics in Spatial Data Analysis Spatially continuous data model attribute spatial variation over study area from sampled point values predict attribute values at non-sampled locations (accounting for covariates) Area (lattice) data detect and model spatial patterns or trends in area values; no prediction at non-sampled locations, unless smoothing of existing values or imputation of missing values is required use covariates or relationships with adjacent attribute values for inference, e.g., disease rates in light of socioeconomic variables Point patterns detect clustering or regularity, as opposed to complete randomness, of event locations in space and/or time if clustering is detected, investigate possible relations between clusters and nearby sources or pertinent covariates Ph. Kyriakidis (UCSB) Geog 2C Spring 9 /

Why Spatial Statistics? Spatial Versus Non-Spatial Statistics Classical statistics samples assumed realizations of independent and identically distributed random variables (iid) most hypothesis testing procedures call for samples from iid random variables problems with inference and hypothesis testing in a spatial setting Spatial statistics multivariate statistics in a spatial/temporal context: each observation is viewed as a realization from a different random variable, but such random variables are auto-correlated in space and/or time each sample is not an independent piece of information, because precisely it is redundant with other samples (due to the corresponding random variables being auto-correlated) auto- and cross-correlation (in space and/or time) is explicitly accounted for to establish confidence intervals for hypothesis testing One can always choose to analyze spatial data with non-spatial statistics; problems arise when confidence intervals need to be reported... Ph. Kyriakidis (UCSB) Geog 2C Spring 9 11 / Why Spatial Statistics? Software for Statistical Analysis of Spatial Data GIS-based ESRI s Spatial Analyst, Geostatistical Analyst... opt for close or loose coupling with specialized external packages when specific functionalities are missing from a GIS Statistical packages extremely versatile in modeling; recent improvements in visualization R and SpaceStat/GeoDa most popular in Geography Image processing packages mature technology, lots of new developments IDL and Matlab most popular in Remote Sensing and Electrical Engineering Access to source code written in a straight-forward programming language is critical for research development in an academic environment... Ph. Kyriakidis (UCSB) Geog 2C Spring 9 12 /

Some Issues Specific to Spatial Data Analysis A first look differences from times series analysis: 1. irregular sampling 2. lack of clear indexing; no notion of past-present-future 3. auto- and cross-correlation in multiple directions multi-source data associated with different spatial/temporal resolutions data often reported as aggregates over arbitrarily defined zones/areas; statistics of aggregates are not the same as those of individuals: 1. Modifiable Area Unit Problem (MAUP) 2. Ecological Fallacy or Inference Problem (EIP) edge/boundary effects: samples near the edges of a study region have fewer neighbors than samples in the interior; near-edge samples might bear the effects of different spatial processes spatial process models typically distinguish between first- and second-order effects, i.e., between environmental controls and interactions (distinction between the two not always clear-cut) Ph. Kyriakidis (UCSB) Geog 2C Spring 9 13 / Modifiable Area-Unit Problem: Aggregation Effect Two spatial variables and their univariate/bivariate statistics Spatial Variable #1 87 95 72 44 24 Spatial Variable #2 72 75 85 29 58 9 55 55 38 88 34 41 26 35 38 24 14 56 34 8 18 6 49 46 84 23 21 46 22 42 45 14 19 36 48 23 8 29 8 7 6 ρ 12 =.83 49 44 51 67 17 38 47 52 52 22 48 55 25 33 32 59 54 m = 43.14 s =.17 58 46 38 35 55 m = 42.92 s = 18.32 6 7 8 9 91. 54.5 Aggregation Scheme #1 34. 73.5 57. 44. 9 35..5 61. 31. 13. 55. 33.5 27.5 32. 53.5 29.5 18.5 8 7 6 ρ 12 =.9 59. 27. 42.5 52. 35.. 32.5 56.5 m = 43.14 s = 16.79 49. 42. 45. m = 42.92 s = 12.65 6 7 8 9 Statistics and relationships between spatial attributes depend on aggregation extent Ph. Kyriakidis (UCSB) Geog 2C Spring 9 14 /

Modifiable Area-Unit Problem: Zonation Effect Upscaling spatial variables using two different aggregation schemes 91. 54.5 Aggregation Scheme #1 34. 73.5 57. 44. 9 35..5 61. 31. 13. 55. 33.5 27.5 32. 53.5 29.5 18.5 8 7 6 ρ 12 =.9 59. 27. 42.5 52. 35.. 32.5 56.5 m = 43.14 s = 16.79 49. 42. 45. m = 42.92 s = 12.65 6 7 8 9 Aggregation Scheme #2 63.5 75. 63.5.5 66. 29. 61. 67.5 67..5 71. 26.5 9 8 27.5 43. 31.5 34.5 23. 21.. 41. 35. 32.5 26.5 21.5 7 6 ρ 12 =.94 52. 34.5 42. 49.5 38. 45.5 48. 43.5 49. 45. 28.5 51.5 m = 43.14 s = 15.23 m = 42.92 s = 15.59 6 7 8 9 For a given aggregation extent, statistics and relationships between spatial attributes depend on which individual values are aggregated and how Ph. Kyriakidis (UCSB) Geog 2C Spring 9 15 / Ecological Inference Problem I Downscaling spatial variables 91. 54.5 Observed variables 34. 73.5 57. 44. 9 35..5 61. 31. 13. 55. 33.5 27.5 32. 53.5 29.5 18.5 8 7 6 ρ 12 =.9 59. 27. 42.5 52. 35.. 32.5 56.5 m = 43.14 s = 16.79 49. 42. 45. m = 42.92 s = 12.65 6 7 8 9 Spatial Variable #1 87 95 72 44 24 Spatial Variable #2 72 75 85 29 58 9 55 55 38 88 34 41 26 35 38 24 14 56 34 8 18 6 49 46 84 23 21 46 22 42 45 14 19 36 48 23 8 29 8 7 6 ρ 12 =.83 49 44 51 67 17 38 47 52 52 22 48 55 25 33 32 59 54 m = 43.14 s =.17 58 46 38 35 55 m = 42.92 s = 18.32 6 7 8 9 Statistics and relationships between spatial variables at a finer spatial resolution are different than those derived at the original coarse resolution Ph. Kyriakidis (UCSB) Geog 2C Spring 9 16 /

Ecological Inference Problem II Under-determined inverse problem 91. 54.5 Observed variables 34. 73.5 57. 44. 9 35..5 61. 31. 13. 55. 33.5 27.5 32. 53.5 29.5 18.5 8 7 6 ρ 12 =.9 59. 27. 42.5 52. 35.. 32.5 56.5 m = 43.14 s = 16.79 49. 42. 45. m = 42.92 s = 12.65 6 7 8 9 Spatial Variable #1 Spatial Variable #2 95 87 72 24 44 72 75 85 29 58 9 55 38 55 34 88 41 35 26 24 38 56 14 34 18 8 6 49 46 84 23 21 46 22 42 45 14 19 36 48 23 8 29 8 7 6 ρ 12 =.21 44 49 67 51 17 38 47 52 52 22 48 25 55 32 33 54 59 58 46 38 35 55 m = 43.14 s =.17 m = 42.92 s = 18.32 6 7 8 9 Multiple combinations of fine spatial resolution attribute values can lead to the same aggregate values at a coarser resolution (equi-finality) Ph. Kyriakidis (UCSB) Geog 2C Spring 9 17 / First- Versus Second-Order Effects 3 1D population 2 1 value 1 2 3 6 7 8 9 x First-order effects Spatial pattern explained by environmental (or extrinsic) factors, e.g., attribute value y(x) is high at location x due to another attribute value y (x) at the same location x, or another attribute value y (x ) at a nearby location x Second-order effects Spatial pattern explained by interaction (or intrinsic) factors, e.g., attribute value y(x) is low at location x due to another (same-attribute) value y(x ) at a nearby location x, provided both locations x and x lie in the same environment Ph. Kyriakidis (UCSB) Geog 2C Spring 9 18 /

Points to Remember Recap I Spatial data set of geo-referenced measurements with attribute values and coordinates (topology & context also important) data types: 1. spatial point patterns events 2. data continuously varying in space fields 3. area or lattice data objects 4. spatial interaction data flows Spatial data analysis objectives exploratory analysis: looking for patterns/relationships confirmatory analysis: establishing spatial process models from spatial patterns + model parameter estimation Ph. Kyriakidis (UCSB) Geog 2C Spring 9 19 / Recap II Spatial statistics Points to Remember statistical framework for analysis and modeling of spatial data: accounts for spatial auto-correlation and scale effects; allows assessing uncertainty in spatial analysis results multivariate statistics tailored to the analysis of spatial data Issues to be aware of any spatial analysis result is tied to a particular observation scale, i.e., to the particular sample support(s); the Modifiable Area Unit Problem (MAUP) and the Ecological Inference Problem (EIP) are consequences of this spatial process models typically distinguish between: first-order effects or environmental controls second-order effects or interactions (spatial auto-correlation) this dichotomy does not apply to actual data, only to data generating models... Ph. Kyriakidis (UCSB) Geog 2C Spring 9 /