OPEN GEODA WORKSHOP / CRASH COURSE FACILITATED BY M. KOLAK

Similar documents
Exploratory Spatial Data Analysis Using GeoDA: : An Introduction

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Lecture 3: Exploratory Spatial Data Analysis (ESDA) Prof. Eduardo A. Haddad

Soc/Anth 597 Spatial Demography March 14, GeoDa 0.95i Exercise A. Stephen A. Matthews. Outline. 1. Background

Rate Maps and Smoothing

Geographical Information Systems Institute. Center for Geographic Analysis, Harvard University. GeoDa: Exploratory Spatial Data Analysis

Mapping and Analysis for Spatial Social Science

Exploratory Spatial Data Analysis (And Navigating GeoDa)

Outline. Introduction to SpaceStat and ESTDA. ESTDA & SpaceStat. Learning Objectives. Space-Time Intelligence System. Space-Time Intelligence System

Geographical Information Systems Institute. Center for Geographic Analysis, Harvard University. GeoDa: Spatial Autocorrelation

The GeoDa Book. Exploring Spatial Data. Luc Anselin

Attribute Data. ArcGIS reads DBF extensions. Data in any statistical software format can be

Introduction. Part I: Quick run through of ESDA checklist on our data

Exploratory Spatial Data Analysis and GeoDa

Where to Invest Affordable Housing Dollars in Polk County?: A Spatial Analysis of Opportunity Areas

Spatial Regression. 1. Introduction and Review. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Spatial Data Analysis in Archaeology Anthropology 589b. Kriging Artifact Density Surfaces in ArcGIS

Spatial Analysis I. Spatial data analysis Spatial analysis and inference

Spatial Autocorrelation

Outline ESDA. Exploratory Spatial Data Analysis ESDA. Luc Anselin

In this exercise we will learn how to use the analysis tools in ArcGIS with vector and raster data to further examine potential building sites.

An area chart emphasizes the trend of each value over time. An area chart also shows the relationship of parts to a whole.

Geovisualization. Luc Anselin. Copyright 2016 by Luc Anselin, All Rights Reserved

Geog 210C Spring 2011 Lab 6. Geostatistics in ArcMap

Spatial Autocorrelation (2) Spatial Weights

Exploratory Spatial Data Analysis (ESDA)

Tutorial. Getting started. Sample to Insight. March 31, 2016

Luc Anselin Spatial Analysis Laboratory Dept. Agricultural and Consumer Economics University of Illinois, Urbana-Champaign

Basic Geostatistics: Pattern Description

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

This lab exercise will try to answer these questions using spatial statistics in a geographic information system (GIS) context.

Task 1: Open ArcMap and activate the Spatial Analyst extension.

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Outline. ArcGIS? ArcMap? I Understanding ArcMap. ArcMap GIS & GWR GEOGRAPHICALLY WEIGHTED REGRESSION. (Brief) Overview of ArcMap

Analytical Graphing. lets start with the best graph ever made

Spatial Clusters of Rates

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

Task 1: Start ArcMap and add the county boundary data from your downloaded dataset to the data frame.

FitPDF : a program to calculate and graph probability curves for data measurements with uncertainties

EXPLORATORY SPATIAL DATA ANALYSIS OF BUILDING ENERGY IN URBAN ENVIRONMENTS. Food Machinery and Equipment, Tianjin , China

Computer simulation of radioactive decay

Tutorial 8 Raster Data Analysis

Lab 1 Uniform Motion - Graphing and Analyzing Motion

Local Spatial Autocorrelation Clusters

Where Do Overweight Women In Ghana Live? Answers From Exploratory Spatial Data Analysis

Analytical Graphing. lets start with the best graph ever made

Passing-Bablok Regression for Method Comparison

The Geodatabase Working with Spatial Analyst. Calculating Elevation and Slope Values for Forested Roads, Streams, and Stands.

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Tutorial 12 Excess Pore Pressure (B-bar method) Undrained loading (B-bar method) Initial pore pressure Excess pore pressure

ncounter PlexSet Data Analysis Guidelines

KAAF- GE_Notes GIS APPLICATIONS LECTURE 3

Global Spatial Autocorrelation Clustering

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals

Chapter 4. Displaying and Summarizing. Quantitative Data

CS 147: Computer Systems Performance Analysis

GIS Workshop UCLS_Fall Forum 2014 Sowmya Selvarajan, PhD TABLE OF CONTENTS

AP Final Review II Exploring Data (20% 30%)

Learning ArcGIS: Introduction to ArcCatalog 10.1

Descriptive Data Summarization

Data Structures & Database Queries in GIS

Summary statistics. G.S. Questa, L. Trapani. MSc Induction - Summary statistics 1

Polynomial Regression

The Implementation of Autocorrelation-Based Regioclassification in ArcMap Using ArcObjects

SUMMARIZING MEASURED DATA. Gaia Maselli

Contents. 13. Graphs of Trigonometric Functions 2 Example Example

2011 Pearson Education, Inc

Hot Spot / Point Density Analysis: Kernel Smoothing

SESSION 5 Descriptive Statistics

You w i ll f ol l ow these st eps : Before opening files, the S c e n e panel is active.

3.1 Measure of Center

A Brief Introduction To. GRTensor. On MAPLE Platform. A write-up for the presentation delivered on the same topic as a part of the course PHYS 601

Chapter 3. Data Description

Web-Based Analytical Tools for the Exploration of Spatial Data

Orange Visualization Tool (OVT) Manual

41. Sim Reactions Example

Child Opportunity Index Mapping

Introduction to ArcMap

11. Kriging. ACE 492 SA - Spatial Analysis Fall 2003

Distributed Graduated Seminar in Landscape Genetics. Adaptive genetic variation

M E R C E R W I N WA L K T H R O U G H

Description Remarks and examples Reference Also see

THE CRYSTAL BALL FORECAST CHART

Module 1. Identify parts of an expression using vocabulary such as term, equation, inequality

Final Project: An Income and Education Study of Washington D.C.

Introduction GeoXp : an R package for interactive exploratory spatial data analysis. Illustration with a data set of schools in Midi-Pyrénées.

1. Double-click the ArcMap icon on your computer s desktop. 2. When the ArcMap start-up dialog box appears, click An existing map and click OK.

Two problems to be solved. Example Use of SITATION. Here is the main menu. First step. Now. To load the data.

Box-Cox Transformations

Describing distributions with numbers

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

The Rain in Spain - Tableau Public Workbook

LAB 2 - ONE DIMENSIONAL MOTION

Applying MapCalc Map Analysis Software

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

Chapter 2: Tools for Exploring Univariate Data

Spatial Analyst: Multiple Criteria Evaluation Material adapted from FOR 4114 developed by Forestry Associate Professor Steve Prisley

Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis. Nicholas M. Giner Esri Parrish S.

Scottish Atlas of Variation

BMEGUI Tutorial 6 Mean trend and covariance modeling

Transcription:

OPEN GEODA WORKSHOP / CRASH COURSE FACILITATED BY M. KOLAK

WHAT IS GEODA? Software program that serves as an introduction to spatial data analysis Free Open Source Source code is available under GNU license As of final version, runs on Windows, Mac OS, and Linux Can open shapefiles or tables

WHAT IS GEODA? Developed by Dr. Luc Anselin team Spatial econometrics Epidemiology applications Supported by the National Science Foundation and the Center for Spatially Integrated Social Science Flagship of the GeoDa Center in Arizona State University geodacenter.asu.edu/projects/opengeoda

PART I Open a file in GeoDa Make different Chloropleth Maps Open a Table in GeoDa Link between table and maps Navigate, sort, select, and query data in the Table Create a new variable Calculate raw rate for new variable Save as a new shapefile

OPENING A FILE IN GEODA Open GeoDa on Desktop File/Open Shapefile Open SIDS.shp Many ways to change the map you see in view: Right click on display and change the Category Got to Map/ in the Navigation Menu

CHLOROPLETH MAPS A choropleth map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map. - WikiPedia Quantile Map Create a quantile map for NWBIR74 and SID74 (using defaults)

CHLOROPLETH MAPS Percentile Map: Create a percentile map for NWBIR74 and SID74 (using defaults)

WORKING WITH DATA IN TABLE Navigation, selection, and sorting: (live demos) Linking between data table and map Moving selection to top Queries: Selection Dialog to select something specific Can add as a variable Could assign value as 1 if query is true, for example Can move selection to top

WORKING WITH DATA IN TABLE Creating a Variable Add Variable (Right-Click) Name your new variable SIDR74 to record a raw rate for SID occurrence in 1974 specified population

WORKING WITH DATA IN TABLE Raw Rate The raw rate is the same as the rate or the percentage. It consists of an event (numerator) and base (denominator) variable. Event and Base variables For rates, the Event field refers to the numerator, the Base field to the denominator. The Event field can be thought of as a count field since it refers to variables such as counts, dollar values, or indices. In the Base field, the reference universe for the Event variable is chosen (it cannot contain any zero values). For instance, in the St. Louis homicide dataset, an Event variable is HC7984 (homicide count, 1979-84) while a Base variable is PO7984 (population total, 1979-84).

WORKING WITH DATA IN TABLE Creating a Variable Assign the SID74Rate variable to equal the Raw Rate in the Variable Calculation tool

WORKING WITH DATA IN TABLE Rescale by 100,000 births

WORKING WITH DATA IN TABLE Confirm changes in Table

WORKING WITH DATA IN TABLE

WORKING WITH DATA IN TABLE Save as a New Shapefile (with new name), -- under File

PRACTICE Create a SID Raw Rate variable for 1979/ Save changes as a new shapefile. Try out other map options using Category options.

PART II Intro to Exploratory Data Analysis Make a Histogram and Box Plot from Data Investigate Outliers Make a Rate Map (Raw and Excess) Make an EB Smoothed Map Make a Spatial Weight file for your data

EDA BASICS - HISTOGRAM Create a Histogram for a Variable Click histogram icon in Navigation toolbar Select Variable (ie. Calculated SIDS rate) Right-Click on histogram to adjust display Change the number of intervals in histogram Link histogram to map by Clicking interested areas

EDA BASICS BOX PLOT Create a Box Plot for a Variable Click box plot icon in Navigation toolbar Select Variable (ie. Calculated SIDS rate) Right-Click on box plot to adjust display Hinge can be adjusted to 1.5 or 3 Create a map from Box Plot data

EDA BASICS BOX PLOT Depicts non-spatial distribution of a variable Represents cumulative distribution of variable, sorted by value Value in parantheses on upper right corner = # of observations Shows median, first, third quartile of distribution (50%, 25%, 75%) and an outlier Outliers: lie more than a given multiple of the interquartile range (difference in value between 75% and 25% observation) Standard Multiples used are 1.5 and 3 times the interquartile range

EDA BASICS Explore the data further by clicking on interesting areas, outliers, etc. Change the hinge and explore again.

BASIC RATE MAPPING Raw Rate Map Keep your Box Plot, Hinge 1.5 Map Open Create a new, themeless map Right-click your map, and select Rates/Raw Rate Choose SID74 as event variable, and BIR74 as base Right-click map and select Save Rate to write as a new variable (default as R_RAWRATE) Drag and drop column next to previously calculated rate (Should be off by our multiplying factor)

BASIC RATE MAPPING Raw Rate Map

BASIC RATE MAPPING Excess Rate Map Standardized mortality rate (SMR) commonly used notion to compare observed rate to a standard In GeoDa, Excess Ratio is the ratio of the observed rate to the average rate computed for all data This average is NOT the average of the all rates Calculated as ratio of total sum of all events over sum of all populations at risk

BASIC RATE MAPPING Excess Rate Map Right-Click Map, click on Rates/ Excess Rates Choose appropriate event and base variables Right-click on Map again to Save Rates, and add to table

BASIC RATE MAPPING Excess Rate Map Areas with less risk are blue (<1.00) Areas with more risk are red (>1.00) Legend Categories are hard-coded To do analysis or visualization, you must use add the rates to the table (done in previous slide) Drag and drop column to appropriate place in table

PRACTICE Create Histogram, Box Plots, and Rate Maps for the Ohio lung cancer sample data

RATE SMOOTHING Rate Smoothing techniques: To correct for the inherent variance instability of rates Empirical Bayes Smoothing (according to L. Anselin): Computing weighted average between raw rate for each county and state average, with weights proportional to the underlying population at risk IE. Small counties, with small populations at risk, will tend to have rates adjusted considerably, whereas large counties will barely change

RATE SMOOTHING Empirical Bayes (EB) Smoothed Rates Right-click map, Select Rates/ Empirical Bayes Choose your event and base variables Use a 1.5-hinge box plot Can use a Percentil Map if Appropriate Use Box Plot if <100 observations Right-Click to Save Rates and add to table Compare EB-smoothed map with previous rate maps How are outliers affected?

RATE SMOOTHING

RATE SMOOTHING Spatial Weight Smoothing Does proximity to neighbors affect the results? In GeoDa, neighbors are defined as a spatial weights file Create a simple spatial weights file for 8 nearest neighbors for each county: Go to the menu: Tools/ Weights/ Create Choose FIPSNO for the ID variable Each county (or tract or block) will have a unique ID no. Leave the defaults for the Distance Weights Section Click on the k-nearest Neighbors radio button, and adjust for 8 neighbors Save as a.gwt file in your folder

RATE SMOOTHING Spatial Weight Smoothing Load spatial weight file you just created Go to the menu: Tools/ Weights/ Open Spatial Weights will now be loaded for next maps Create a new map with spatial rate smoothing Right-click and choose Rates / Spatial Rates Use the same Base and Event variables Use the Box Plot with 1.5 Hinge Compare to previous box plot maps!

RATE SMOOTHING Spatial Weight Smoothing Spatially smoothed maps emphasize broad regional patterns. What happened to the outliers?

SPATIAL WEIGHTS Contiguity-Based Spatial Weights Definition of a neighbor is based on sharing a common boundary. Connectivity Histogram (according to L. Anselin) Histogram reflects connectivity distribution in data set Detects strange features in the distribution which could affect spatial autocorrelation and spatial regression specifications Beware of 1) islands, or unconnected observations, and 2) bimodal distribution of locations

SPATIAL WEIGHTS Rook-Based Contiguity Go to Tools/ Weights /Create create a Rook-Based Weights File use the Key variable Go to Tools/ Weights/ Connectivity Histogram to see results

SPATIAL WEIGHTS Queen-Based Contiguity Go to Tools/ Weights /Create create a Queen-Based Weights File use the Key variable Go to Tools/ Weights/ Connectivity Histogram to see results

SPATIAL WEIGHTS How are neighboring units determined? Queen criterion determines neighboring units as those that have any point in common, including both common boundaries and common corners Number of neighbors for any given unit will be equal to or greater that the rook criterion

SPATIAL WEIGHTS

SPATIAL WEIGHTS Higher Order Contiguity Two definitions of higher order contiguity: Pure: does not include locations that were also contiguous of a lower order Cumulative: includes all lower order neighbors

SPATIAL LAG CONSTRUCTION Spatially Lagged Variables Load a weighted file Open Table, Right-Click and select Variable Calculation Choose Spatial Lag construction Can Add Variable with new name (W_INC) Spatial Weights file will already be loaded Choose Variable to be spatially lagged (HH_INC) New Variable is calculated and added to Table For contiguity weights file, spatially lagged variable is the simple average of the values for the neighboring units

SPATIAL LAG CONSTRUCTION Value for one value is the average of values of weighted variable in neighboring units.

SPATIAL AUTOCORRELATION Moran Scatter Plot Plot with variable of interest on x-axis, and spatial lag on y-axis Use the Scatter Plot icon to manually create a Moran Scatter Plot: W_INC in left side, HH_INC on the right side Slope of regression line is the Moran s I Statistics for HH_INC using a rook contiguity weights definition

SPATIAL AUTOCORRELATION Global Spatial Autocorrelation We will work with the univariate case and Moran scatter plot. Scottish Lip Cancer Data: Map/ Raw Rate Cancer as Event, and Pop as Base variable Set map to the Box Type with Hinge 1.5 Save Rates (R_RAWRATE is the default) Create a weights file with 5 nearest neighbors (try k)

SPATIAL AUTOCORRELATION Moran I Plot and Statistic Go to the Menu, and select Space/ Univariate Moran I Select R_RAWRATE as variable Select your weights file Notice x and y axis set up accordingly Spatial lag variable constructed for y-axis R_RAWRATE on x-axis has been standardized to correspond to standard deviations (beyond 2SD as outlier) Centered on Mean with axes drawn in 4 quadrants

SPATIAL AUTOCORRELATION Moran I Plot and Statistic 4 quadrants correspond to different types of spatial autocorrelation: High-high and low-low for positive autocorrelation Low-high and high-low for negative spatial autocorrelation Value listed at the top is the Moran s I Statistic You can exclude selected as an option Intermediate calculations can be saved to data table Right-click on graph and select Save Results

SPATIAL AUTOCORRELATION Inference Inference for Moran I is based on random permutation procedure (calculates statistic many times to generate reference distribution) Obtained statistic compared to reference distribution for a pseudo significance level computation Right-click plot, Select Randomization > 999 permutations Click on Run to assess sensitivity of results Most significant p-level depends directly on # of permutations