Challenges in Geocoding Socially-Generated Data

Similar documents
Challenges in Geocoding Socially-Generated Data

Understanding and accessing 2011 census aggregate data

Development and application of a spray- can tool for fuzzy geographical analysis.

Exploring Digital Welfare data using GeoTools and Grids

What are we like? Population characteristics from UK censuses. Justin Hayes & Richard Wiseman UK Data Service Census Support

Grade 6 Social Studies

The role of topological outliers in the spatial analysis of georeferenced social media data

Dr Arulsivanathan Naidoo Statistics South Africa 18 October 2017

A Space-Time Model for Computer Assisted Mass Appraisal

Measures from the Adult Social Care Outcomes Framework, England

My Map Activity MINNESOTA SOCIAL STUDIES STANDARDS & BENCHMARKS

The Global Statistical Geospatial Framework and the Global Fundamental Geospatial Themes

Integration for Informed Decision Making

ArcGIS Platform For NSOs

You can't get there from here: Mapping access to Canada's teacher education programs

@CrawshawGeog. A Level Geography. Crawshaw Academy

ASSESSING THE ARTS COMMUNITY: Mapping with GIS and Social Media Tools

GIS Visualization: A Library s Pursuit Towards Creative and Innovative Research

Hennig, B.D. and Dorling, D. (2014) Mapping Inequalities in London, Bulletin of the Society of Cartographers, 47, 1&2,

Spatial Big Data. Amol G. Deshmukh Geomatics Specialist

BOWIE SENIORS COMPUTER CLUB

GCSE GEOGRAPHY. Your guide to GCSE St. Richard s Catholic College

Cadcorp Introductory Paper I

Population health across space & time: geographical harmonisation of the ONS Longitudinal Study for England & Wales

December 3, Dipartimento di Informatica, Università di Torino. Felicittà. Visualizing and Estimating Happiness in

Exploring Urban Areas of Interest. Yingjie Hu and Sathya Prasad

Linking Neighbourhood Data in the Analysis of Crime

Spatial Autocorrelation

Grade 2 Social Studies

Title: Storm of the Century: Documenting the 1993 Superstorm

Urban GIS for Health Metrics

Merging statistics and geospatial information

Rural Pennsylvania: Where Is It Anyway? A Compendium of the Definitions of Rural and Rationale for Their Use

Spatial Analytics Workshop

Seymour Centre 2017 Education Program 2071 CURRICULUM LINKS

A Street Named for a King

GRADE 8 LEAP SOCIAL STUDIES ASSESSMENT STRUCTURE. Grade 8 Social Studies Assessment Structure

Historical Fisheries off the Grand Banks (Wet and Dry Fisheries)

Reimaging GIS: Geographic Information Society. Clint Brown Linda Beale Mark Harrower Esri

Transport, Travel and Trade. Geography teaching resource years. Paula Owens.

Alleghany County Schools Curriculum Guide GRADE/COURSE: World Geography

The Scope and Growth of Spatial Analysis in the Social Sciences

SRI Briefing Note Series No.8 Communicating uncertainty in seasonal and interannual climate forecasts in Europe: organisational needs and preferences

Ohio s State Tests ANSWER KEY & SCORING GUIDELINES GRADE 6 SOCIAL STUDIES PART 1

Exploring the Geography of Communities in Social Networks

Online to Offline : Translating Media Usage To Real Life Public

A spatial literacy initiative for undergraduate education at UCSB

Introducing GIS analysis

Mapping Your Educational Research: Putting Spatial Concepts into Practice with GIS. Mark Hogrebe Washington University in St.

Overview. INFOWO Statistics lecture S1: Descriptive statistics. Detailed Overview of the Statistics track. Definition

Why choose Geography at GCSE?

Why Do We Live Here? : A Historical Geographical Study of La Tabatiere, Quebec North Shore

SOCIAL MEDIA IN THE COMMUNICATIONS CENTRE

CES Seminar 2016: Geospatial information services based on official statistics Key issues from the session II papers

Nurture Nature Center Receives Grant From National Oceanic and Atmospheric Administration To Study Flood Forecast and Warning Tools

Census Geography, Geographic Standards, and Geographic Information

geographic patterns and processes are captured and represented using computer technologies

Analysis of travel-to-work patterns and the identification and classification of REDZs

Capturing and recording spatial data Guidelines, standards and best practices

Geographic Data Science - Lecture II

INSPIRE in the context of EC Directive 2002/49/EC on Environmental Noise

PALS: Neighborhood Identification, City of Frederick, Maryland. David Boston Razia Choudhry Chris Davis Under the supervision of Chao Liu

Your web browser (Safari 7) is out of date. For more security, comfort and. the best experience on this site: Update your browser Ignore

Rebecca Nayler Medium term planning. Curriculum area GEOGRAPHY - AUTUMN 2016

European Migration to Tilting, Newfoundland

Curriculum Planner Class 8 Agriculture

Gridded population. redistribution models and applications. David Martin 20 February 2009

Representation of Geographic Data

FSC Pembrokeshire. FSC Pembrokeshire offers two stunning locations: Orielton and Dale Fort Field Centres. AS WJEC Options FIELD STUDIES COUNCIL

Globally Estimating the Population Characteristics of Small Geographic Areas. Tom Fitzwater

Of topic specific knowledge and understanding To encourage progressive development in 'thinking geographically'

Chapter 3: Logic. Diana Pell. A statement is a declarative sentence that is either true or false, but not both.

Texas A&M University

Economic and Social Council 2 July 2015

Spatial Regression. 1. Introduction and Review. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

GIS at the Regional District

NRPC GIS BROWNBAG SERIES. Monday, October 7, 13

Geographical Inequalities and Population Change in Britain,

Seasonal Climate Watch July to November 2018

Using Social Media for Geodemographic Applications

Geographical Sources Lab

Second Administrative Level Boundaries (SALB) initiative & geographic names in cartography. Geospatial Information Section

Population 24/7 Download: User Guide. Purpose

Visualization of Commuter Flow Using CTPP Data and GIS

Visualization of Origin- Destination Commuter Flow Using CTPP Data and ArcGIS

Touring Around the Islands of Atlantic Canada

BROOKINGS May

INTRODUCTION. In March 1998, the tender for project CT.98.EP.04 was awarded to the Department of Medicines Management, Keele University, UK.

First Nations Early Survival and Trade: The Coast Tsimshian

Comparison of spatial methods for measuring road accident hotspots : a case study of London

Exploiting ensemble members: forecaster-driven EPS applications at the met office

Defining Metropolitan Regions (MRs): coping with complexity

Saskatchewan s Early Trade Routes: The Impact of Geography on Trade

NAME: DATE: Leaving Certificate GEOGRAPHY: Maps and aerial photographs. Maps and Aerial Photographs

Human Geography Places And Regions In Global Context 7th Edition Online

Understanding China Census Data with GIS By Shuming Bao and Susan Haynie China Data Center, University of Michigan

Spatial Analysis Unit Carolina Population Center

APPLIED FIELDWORK ENQUIRY SAMPLE ASSESSMENT MATERIALS

Changes in the Level of Convenience of the Iwate Prefecture Temporary Housing Complexes Constructed after the 2011 Tohoku Earthquake

Academic Standards for Geography

Transcription:

Challenges in Geocoding Socially-Generated Data Jonny Huck (2 nd year part-time PhD student) Duncan Whyatt Paul Coulton Lancaster Environment Centre School of Computing and Communications

Royal Wedding Whilst we were all on the way back from GISRUK at Portsmouth last year, Prince William and Kate Middleton got married. c.1.7 million Tweets collected worldwide. Emotive Predictable Timescale

Socially-Generated Data Data created within social networking websites (Twitter, Facebook etc.). Potentially a rich dataset. Significant growth in use as geographical data. c.1% has coordinates already attached. Most data will need to be geocoded, using the place name specified in the profile of the user.

Geocoding Adding spatial information, to non-spatial data. Both coordinates, and address components. Formerly the domain of skilled specialist operators. This changed with free, online global geocoding services. Multiple results often returned.

Aims and Objectives Highlight the issues that we have found. Explore the impact that this can have upon analysis. Suggest a methodology to attempt to address both of these issues. Investigate the effect that applying these techniques can have upon analysis.

Problem 1: Place Name Ambiguity

Place-Name Ambiguity Place-names are not unique identifiers. Multiple places have the same name. A single place can have multiple names. Automated identification of the correct place is therefore un-reliable.

Place Name Ambiguity Exact Match Rough Match

Problem 2: Undefined Level of Detail

Undefined Level of Detail Comparison of data at a multitude of levels of detail within the same analysis. False Hotspots occur at the centroid of places. Creates the false impression of activity Masks variations in actual activity.

High Low

High Low

Methodology 1: Place Name Ambiguity

Ambiguous Place Names Single locations with multiple names: Use standard administrative data. Deal only with the coordinates associated with each location from the geocoder. If administrative information is required, it should then be extracted to the tweets using the coordinates.

Ambiguous Place Names Multiple locations with the same name: Tobler s law (Everything is related to everything else, but near things are more related than distant things). Locations based other (non-ambiguous) tweets collected on the same topic. Rankings determined by the density of non-ambiguous tweets at each ambiguous location.

Unique Tweet Locations High Low

Methodology 2: Undefined Level of Detail

Undefined Level of Detail The aim is to standardise the level of detail Retrieve all of the address components for each tweet with the geocoder. Get coordinates for each address component individually. At analysis time, locations of the required level of detail are used. Data with locations at insufficient detail are discarded from the analysis.

Twitter location Text e.g. Lancaster Submit to the Geocoder Returns a location with no scale attached to it Detailed address Lancaster Lancashire England United Kingdom Re-submit to the Geocoder To geocode every level of geography in the address. Select a scale at which analysis will take place e.g. County-scale analysis of Tweet activity Attach appropriate location to tweet at analysis time. e.g. Lancashire

Raw Data Processed Data High Low

Data at an Undefined Scale Trade-off : scale of analysis vs volume of data 100 80 Raw Country County Town % Tweets 60 40 20 Area 0

Why Does This Matter? Case Study

Why Does this Matter? Number of tweets per 1000 people of tweeting age, across the Counties and Unitary Authorities in the UK. Tweeting age was determined as being 10-59. A count of tweets was taken for each county, and normalised for the tweeting population in that county.

Raw Data Processed Data

Summary Data derived from social websites are frequently and increasingly used in spatial analysis. The locations attached to such data tend to rely on place names: non-unique lacking information regarding level of detail.

Summary This poses two issues in attempting to geocode the data: Establishing which place is the correct one; The introduction of false hotspots. A methodology is demonstrated to address these issues:

Summary It has been demonstrated that this methodology has a significant impact upon analysis of this data. Our example was very UK-Centric, but these issues have a global significance, and are intensified at the global scale. Geocoders are powerful, but can be misleading if taken at face value.

Questions?