GEOGRAPHICAL STATISTICS & THE GRID

Similar documents
Grid Enabling Geographically Weighted Regression

Geographically Weighted Regression LECTURE 2 : Introduction to GWR II

1. calibrating the size of the kernel or search window to the amount of spatial autocorrelation found in the attributes of the data being examined;

Geographically Weighted Regression (GWR)

CSISS Tools and Spatial Analysis Software

Context-dependent spatial analysis: A role for GIS?

Measures from the Adult Social Care Outcomes Framework, England

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Sam Williamson

Introduction to ArcGIS Server Development

ArcGIS Deployment Pattern. Azlina Mahad

A GEOSTATISTICAL APPROACH TO PREDICTING A PHYSICAL VARIABLE THROUGH A CONTINUOUS SURFACE

GeoDa-GWR Results: GeoDa-GWR Output (portion only): Program began at 4/8/2016 4:40:38 PM

Geog 469 GIS Workshop. Managing Enterprise GIS Geodatabases

Data Aggregation with InfraWorks and ArcGIS for Visualization, Analysis, and Planning

GIS Spatial Statistics for Public Opinion Survey Response Rates

Joint MISTRAL/CESI lunch workshop 15 th November 2017

Modeling Spatial Relationships Using Regression Analysis

Spatial analysis in XML/GML/SVG based WebGIS

Modeling Spatial Relationships Using Regression Analysis. Lauren M. Scott, PhD Lauren Rosenshein Bennett, MS

Overview of Geospatial Open Source Software which is Robust, Feature Rich and Standards Compliant

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde

Geographically weighted methods for examining the spatial variation in land cover accuracy

Exploratory Spatial Data Analysis (ESDA)

How does ArcGIS Server integrate into an Enterprise Environment? Willy Lynch Mining Industry Specialist ESRI, Denver, Colorado USA

Applying cluster analysis to 2011 Census local authority data

Geodatabase Replication for Utilities Tom DeWitte Solution Architect ESRI Utilities Team

Exploring the Impact of Ambient Population Measures on Crime Hotspots

Design and implementation of a new meteorology geographic information system

The econ Planning Suite: CPD Maps and the Con Plan in IDIS for Consortia Grantees Session 1

Demographic Data in ArcGIS. Harry J. Moore IV

Time: the late arrival at the Geocomputation party and the need for considered approaches to spatio- temporal analyses

Lecture 3 GIS outputs. Dr. Zhang Spring, 2017

Exelis and Esri Technologies for Defense and National Security. Cherie Muleh

Geographical Inequalities and Population Change in Britain,

Outline. Introduction to SpaceStat and ESTDA. ESTDA & SpaceStat. Learning Objectives. Space-Time Intelligence System. Space-Time Intelligence System

ESRI 2008 Health GIS Conference

Changes in Esri GIS, practical ways to be ready for the future

A Space-Time Model for Computer Assisted Mass Appraisal

Introduction to ArcGIS Maps for Office. Greg Ponto Scott Ball

ArcGIS Pro: Analysis and Geoprocessing. Nicholas M. Giner Esri Christopher Gabris Blue Raster

Modeling Spatial Relationships using Regression Analysis

Gridded Ambient Air Pollutant Concentrations for Southern California, User Notes authored by Beau MacDonald, 11/28/2017

UNIT 4: USING ArcGIS. Instructor: Emmanuel K. Appiah-Adjei (PhD) Department of Geological Engineering KNUST, Kumasi

What is GIS and How Can It Help Me?

GeoWEPP Tutorial Appendix

Introduction to Portal for ArcGIS. Hao LEE November 12, 2015

Portal for ArcGIS: An Introduction. Catherine Hynes and Derek Law

Exploring Digital Welfare data using GeoTools and Grids

Northrop Grumman Concept Paper

INTRODUCTION TO ARCGIS Version 10.*

Geographical Information Processing for Cultural Resources

HASSET A probability event tree tool to evaluate future eruptive scenarios using Bayesian Inference. Presented as a plugin for QGIS.

Geographically Weighted Regression and Kriging: Alternative Approaches to Interpolation A Stewart Fotheringham

NATO Headquarters The Situation Center GIS experience.

Finding Hot Spots in ArcGIS Online: Minimizing the Subjectivity of Visual Analysis. Nicholas M. Giner Esri Parrish S.

Spatial Analysis with ArcGIS Pro STUDENT EDITION

Portal for ArcGIS: An Introduction

Working with Census 2000 Data from MassGIS

YYT-C3002 Application Programming in Engineering GIS I. Anas Altartouri Otaniemi

Watershed Application of WEPP and Geospatial Interfaces. Dennis C. Flanagan

Among various open-source GIS programs, QGIS can be the best suitable option which can be used across partners for reasons outlined below.

Introducing the WISERD Geoportal. WISERD DATA TEAM Dr Robert Berry & Dr Richard Fry, University of Glamorgan Dr Scott Orford, Cardiff University

Enabling ENVI. ArcGIS for Server

ArcGIS Pro Q&A Session. NWGIS Conference, October 11, 2017 With John Sharrard, Esri GIS Solutions Engineer

ArcGIS Earth for Enterprises DARRON PUSTAM ARCGIS EARTH CHRIS ANDREWS 3D

Spatial information sharing technology based on Grid

Evaluating Physical, Chemical, and Biological Impacts from the Savannah Harbor Expansion Project Cooperative Agreement Number W912HZ

Incorporating ArcGIS Pro in your Curriculum

Models for Count and Binary Data. Poisson and Logistic GWR Models. 24/07/2008 GWR Workshop 1

GIS at UCAR. The evolution of NCAR s GIS Initiative. Olga Wilhelmi ESIG-NCAR Unidata Workshop 24 June, 2003

Administering your Enterprise Geodatabase using Python. Jill Penney

Introduction to Portal for ArcGIS

Lecture 2. Introduction to ESRI s ArcGIS Desktop and ArcMap

WEB-BASED SPATIAL DECISION SUPPORT: TECHNICAL FOUNDATIONS AND APPLICATIONS

CHAPTER 22 GEOGRAPHIC INFORMATION SYSTEMS

GEOGRAPHICAL INFORMATION SYSTEMS. GIS Foundation Capacity Building Course. Introduction

Comparison of spatial methods for measuring road accident hotspots : a case study of London

What are we like? Population characteristics from UK censuses. Justin Hayes & Richard Wiseman UK Data Service Census Support

Running head: GEOGRAPHICALLY WEIGHTED REGRESSION 1. Geographically Weighted Regression. Chelsey-Ann Cu GEOB 479 L2A. University of British Columbia

Geospatial Fire Behavior Modeling App to Manage Wildfire Risk Online. Kenyatta BaRaKa Jackson US Forest Service - Consultant

GIS Software. Evolution of GIS Software

Forecast solutions for the energy sector

Leveraging Web GIS: An Introduction to the ArcGIS portal

Speakers: Jeff Price, Federal Transit Administration Linda Young, Center for Neighborhood Technology Sofia Becker, Center for Neighborhood Technology

Enabling Web GIS. Dal Hunter Jeff Shaner

GIS and the Other Enterprise Database

ArcGIS GeoAnalytics Server: An Introduction. Sarah Ambrose and Ravi Narayanan

An introduction to ArcGIS Maps for Office. Scott Ball & Mike Flanagan

Data Aggregation with InfraWorks and ArcGIS for Visualization, Analysis, and Planning

Esri Defense Mapping: Cartographic Production. Bo King

ArcGIS. for Server. Understanding our World

an accessible interface to marine environmental data Russell Moffitt

Didacticiel - Études de cas

LEHMAN COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF ENVIRONMENTAL, GEOGRAPHIC, AND GEOLOGICAL SCIENCES CURRICULAR CHANGE

What is GIS? ESRI Canada. August 2011

WindNinja Tutorial 3: Point Initialization

Gridded Traffic Density Estimates for Southern

BY: Daniel Kwame Ladzagla (Bsc Eng.,Msc I.T) The Energy Centre, KNUST-Kumasi. July 27 Dakar, Senegal

The CrimeStat Program: Characteristics, Use, and Audience

Point data in mashups: moving away from pushpins in maps

Transcription:

GEOGRAPHICAL STATISTICS & THE GRID Rich Harris, Chris Brunsdon and Daniel Grose (Universities of Bristol, Leicester & Lancaster) http://rose.bris.ac.uk

OUTLINE About Geographically Weighted Regression (GWR) Chris Brunsdon, Department of Geography, University of Leicester cb179@le.ac.uk An example to illustrate a problem of using GWR with large datasets The solution Rich Harris, School of Geographical Sciences, University of Bristol rich.harris@bris.ac.uk

LOCAL VS GLOBAL STATISTICS Global similarities across space single-valued statistics non-mappable GIS unfriendly search for regularities aspatial Local differences across space multi-valued statistics mappable GIS friendly search for exceptions spatial Local statistics are spatial disaggregations of global statistics

WHY MIGHT RELATIONSHIPS VARY SPATIALLY? Sampling variation Relationships intrinsically different across space e.g. differences in attitudes, preferences or different administrative, political or other contextual effects produce different responses to the same stimuli. Model misspecification - suppose a global statement can ultimately be made but models not properly specified to allow us to make it. Local models good indicator of how model is misspecified.

GEOGRAPHICALLY WEIGHTED REGRESSION (GWR) Regression What is it? Point Extension of regression model Allows model to vary over space How it works... Data Points

IN GWR WE CAN ALSO estimate local standard errors calculate local leverage measures perform tests to assess the significance of the spatial variation in the local parameter estimates perform tests to determine if the local model performs better than the global one

BUT Computationally very demanding Need to fit weighted regression models in several places Sometimes not viable on a single computer How do we address this problem?

EXAMPLE Y: Proportion of households without a car X 1 : Proportion of persons of working age unemployed X 2 : Proportion of households in public housing X 3 : Proportion of households that are lone parent households X 4 : Proportion of persons 16 or above that are single X 5 : Proportion of persons that are white British n = 165,665

SPATIAL VARIATION IN THE LONE PARENT COEFFICIENT (WEST MIDLANDS)

SPATIAL VARIATION IN THE LONE PARENT COEFFICIENT (LONDON)

SCALING PROBLEMS There are n regression models But actually there are many more: n g g is the number of iterations to optimize the bandwidth, b And you also need to calculate the distance matrix, D: (n 2 )

TAKES A LONG TIME! If n = 100,000, on a single processor Would take about half a day to calculate D Would take about a fortnight to find b But GWR is intended for exploratory analysis! The main bottleneck is the calibration of b Because the regression calculations are O(n 3 ), the distance calculations are O(n 2 )

FORTUNATELY The regression models are fitted entirely independent of each other. The results are pooled and compared at the end. The process is sequential but it can also be embarrassingly parallel For GWR, the (distance-weighted regression) function stays the same, only the data are changing. Each spatial subset is handled separately from the next. True of many methods of spatially localized analysis Suitable for a computational grid The UK s National Grid Service (NGS)

MULTIR AND PARALLEL R R is a free software environment for statistical computing and graphics http://www.r-project.org/ There is an implementation of GWR in R The spgwr package In R, there is a method to invoke a function a number of times with varying argument values sapply The idea is to invoke the function on different processors running on the NGS. multir (thanks to Daniel Grose) is both an add in to R and a server (currently at Lancaster) which provides middleware between desktop R and the NGS.

THE THREE TIER CLIENT/SERVER ARCHITECTURE EMPLOYED BY MULTIR

THIS IS R (IN WINDOWS) Available free from http://cran.rproject.org/ We have a tutorial www.esrcsocietyto day.ac.uk type Grid Enabled Spatial Regression Models into the Search

FITTING A GWR MODEL IN R > library(spgwr) > load("carsmsoa.rdata") > names(car.msoa) [1] "Name" "Borough" "ProfMan" "Renting" "HHNoCar" "Easting" "Northing" > coords = cbind(car.msoa$easting, car.msoa$northing) > bandwidth = gwr.sel(hhnocar ~ Renting, data = car.msoa, coords) Bandwidth: 25571.63 CV score: 4.278767 Bandwidth: 41334.45 CV score: 4.373939 Bandwidth: 1467.076 CV score: 2.922873 > gwr.model1 = gwr(hhnocar ~ Renting, data = car.msoa, coords, bandwidth)

FITTING A GRID RUN GWR MODEL > library(spgwr.dist) > load("d:\\data\\gwr- Workshop\\Data\\Exercises\\carsmsoa.RData") > names(car.msoa) [1] "Name" "Borough" "ProfMan" "Renting" "HHNoCar" "Easting" "Northing" > coords = cbind(car.msoa$easting, car.msoa$northing) > session = multir.session.dlg() > bandwidth = gwr.sel.dist(session, HHNoCar ~ Renting, data = car.msoa, coords, max.processors = 50) > gwr.model2 = gwr.dist(session, HHNoCar ~ Renting, data = car.msoa, coords, bandwidth, max.processors = 50)

THE SESSION It is loading and dealing with the various security certificates that are needed to use the NGS

OBTAINING THE CERTIFICATES This is required for all NGS services User certificates: Apply at https://ca.grid-support.ac.uk/ The certificate is issued for a web browser: the one you apply from needs to be the same as the one which will receive it. Then need to: Export the certificate from the browser use OpenSSL toolkit to convert into two files (the certificate and its key file) and to generate a proxy certificate See www.grid-support.ac.uk/content/view/67/184/ You also need to apply for an NGS account http://www.grid-support.ac.uk/content/view/221/171/ CA certificate: Download from http://www.gridsupport.ac.uk/content/view/182/184/

CONCLUSIONS AND CAVEATS There is no point in using Grid GWR for small data sets (e.g. n < 1000) And you should not expect instant results with large datasets (it still takes a second or so for each regression fit and you don t have that many processors) You cannot presently disconnect from R when running a Grid GWR but that should follow (and return later to collect the results) The software are still being tested

POTENTIAL multir is more generic than GWR Imagine function1 = function(its_parameters) { what it does } Then some_results = multir(session, function1, list=(the_data)) In other words, the function is running in parallel on the NGS Applications include spatial statistics, geostatistics, hot spot analysis, simulation, etc.

ACKNOWLEDGMENTS The ESRC (RES-149-25-1041) The National Centre for e-social Science (NCeSS) Particular thanks to the Centre for e-science, Lancaster University