Exploring the Geography of Communities in Social Networks

Similar documents
COSMIC: COmplexity in Spatial dynamic

Geographically weighted methods for examining the spatial variation in land cover accuracy

The Importance of Spatial Literacy

DM-Group Meeting. Subhodip Biswas 10/16/2014

ARCHITECTURAL SPACE AS A NETWORK

City, University of London Institutional Repository

Profiling Burglary in London using Geodemographics

Distribution Pattern Analysis of Green space in Al-Madinah Using GIS Haifaa Al-Ballaa 1, Alexis Comber 2, Claire Smith 3

The Building Blocks of the City: Points, Lines and Polygons

Challenges in Geocoding Socially-Generated Data

A Cloud Computing Workflow for Scalable Integration of Remote Sensing and Social Media Data in Urban Studies

A new metric of crime hotspots for Operational Policing

Time: the late arrival at the Geocomputation party and the need for considered approaches to spatio- temporal analyses

A Modified DBSCAN Clustering Method to Estimate Retail Centre Extent

Efficient Network Structures with Separable Heterogeneous Connection Costs

Monsuru Adepeju 1 and Andy Evans 2. School of Geography, University of Leeds, LS21 1HB 1

Infrastructure to Explore Geographic Systems through Models and Maps

Basics of GIS reviewed

On Resolution Limit of the Modularity in Community Detection

Book Review: A Social Atlas of Europe

Uncovering the Digital Divide and the Physical Divide in Senegal Using Mobile Phone Data

1. Richard Milton 2. Steven Gray 3. Oliver O Brien Centre for Advanced Spatial Analysis (UCL)

Efficient Network Structures with Separable Heterogeneous Connection Costs

Exploring Digital Welfare data using GeoTools and Grids

Generalisation and Multiple Representation of Location-Based Social Media Data

Clustering Analysis of London Police Foot Patrol Behaviour from Raw Trajectories

Using Social Media for Geodemographic Applications

A spatial literacy initiative for undergraduate education at UCSB

The role of topological outliers in the spatial analysis of georeferenced social media data

Museumpark Revisit: A Data Mining Approach in the Context of Hong Kong. Keywords: Museumpark; Museum Demand; Spill-over Effects; Data Mining

Geographic Data Science - Lecture II

Visualisation of Uncertainty in a Geodemographic Classifier

Subject: Geography- Grade Descriptors

Tackling the Curse of Cartograms: addressing misrepresentation due to invisibility and to distortion

THE GEOGRAPHY OF KNOWLEDGE CREATION

Regionalizing and Understanding Commuter Flows: An Open Source Geospatial Approach

Lecture 2: Modelling Histories: Types and Styles:

Smart Solutions for Spatial Planning

GeoComputation 2011 Session 4: Posters Accuracy assessment for Fuzzy classification in Tripoli, Libya Abdulhakim khmag, Alexis Comber, Peter Fisher ¹D

Assessment Schedule 2014 Geography: Demonstrate understanding of how interacting natural processes shape a New Zealand geographic environment (91426)

Improving Geographical Data Finder Using Tokenize Approach from GIS Map API

The Nature of Geographic Data

Confeego: Tool Set for Spatial Configuration Studies

Multislice community detection

THE CRASH INTENSITY EVALUATION USING GENERAL CENTRALITY CRITERIONS AND A GEOGRAPHICALLY WEIGHTED REGRESSION

Fernando Benitez, Alexis Comber, Sergi Trilles, Joaquin Huerta T R A N S A C T I O N S I N G I S A R T I C L E

Simulating Geodesign:

M. Saraiva* 1 and J. Barros 1. * Keywords: Agent-Based Models, Urban Flows, Accessibility, Centrality.

A Land Use Transport Model for Greater London:

Geographic Information Systems (GIS) in Environmental Studies ENVS Winter 2003 Session III

Exploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data

Behavioral Data Mining. Lecture 2

Detecting Origin-Destination Mobility Flows From Geotagged Tweets in Greater Los Angeles Area

Sample assessment task. Task details. Content description. Year level 7

CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University

How Geographers View the World: Human Geography. ESSENTIAL QUESTION: How does geography influence the way people live?

Modelling Spatial Behaviour in Music Festivals Using Mobile Generated Data and Machine Learning

Why Is Cartographic Generalization So Hard?

THE GENERAL URBAN MODEL: RETROSPECT AND PROSPECT. Alan Wilson Centre for Advanced Spatial Analysis University College London

A Modified Method Using the Bethe Hessian Matrix to Estimate the Number of Communities

Grade 6 Social Studies

Unit 1 Part 2. Concepts Underlying The Geographic Perspective

THE DIGITAL SOIL MAP OF WALLONIA (DSMW/CNSW)

Cities, AI, Design, & the Future Can Artificial Intelligence Improve Design Intelligence?

Linking Neighbourhood Data in the Analysis of Crime

December 3, Dipartimento di Informatica, Università di Torino. Felicittà. Visualizing and Estimating Happiness in

MACHINE LEARNING FOR MOBILITY STUDIES IN URBAN AND NATURAL AREAS TUULI TOIVONEN / DIGITAL GEOGRAPHY LAB

A 3D GEOVISUALIZATION APPROACH TO CRIME MAPPING

Outline. Geographic Information Analysis & Spatial Data. Spatial Analysis is a Key Term. Lecture #1

Spatio-temporal dynamics of the urban fringe landscapes

Abstract 1. The challenges facing Cartography in the new era.

WELCOME TO GCSE GEOGRAPHY WHERE WILL IT TAKE US TODAY?

Community detection in time-dependent, multiscale, and multiplex networks

A simple agricultural land use model implemented using Geographical Vector Agents (GVA)

Simple Spatial Growth Models The Origins of Scaling in Size Distributions

Design of Indicators to monitor Strategic priorities: Mapping Knowledge Clusters

Profiling Burglary in London using Geodemographics. Chris Gale 1, Alex Singleton 2, Paul Longley 1

Assessing the impact of seasonal population fluctuation on regional flood risk management

Tools for building and delivering 3D models Perspectives by the BGS

Data science with multilayer networks: Mathematical foundations and applications

GIS Institute Center for Geographic Analysis

ENV208/ENV508 Applied GIS. Week 1: What is GIS?

GIS = Geographic Information Systems;

SPACE Workshop NSF NCGIA CSISS UCGIS SDSU. Aldstadt, Getis, Jankowski, Rey, Weeks SDSU F. Goodchild, M. Goodchild, Janelle, Rebich UCSB

Geography Progression

Visualization Schemes for Spatial Processes

The linguistic landscape of social media: A case study of the Senate Square, Helsinki

2018 ESRI Education Summit. San Diego. California. Sunday July 8 th 2018 Harper College, Palatine, Illinois, USA Dr. Tong Cheng (Biology), Dr.

Exploring the Impact of Ambient Population Measures on Crime Hotspots

Space-adjusting Technologies and the Social Ecologies of Place

Urban form, resource intensity & renewable energy potential of cities

ELA A NOTE ON GRAPHS WHOSE LARGEST EIGENVALUE OF THE MODULARITY MATRIX EQUALS ZERO

Level 1 Geography PROGRAMME OVERVIEW 2014

Geography. Programme of study for key stage 3 and attainment target (This is an extract from The National Curriculum 2007)

Welcome to GCSE Geography. Where will it take us today?

INTRODUCTION TO HUMAN GEOGRAPHY. Chapter 1

Causality and communities in neural networks

Year 8 standard elaborations Australian Curriculum: Geography

Table of Contents. Foreword... xiii Michael F. GOODCHILD Introduction... xix Stéphane ROCHE and Claude CARON

GIS and Spatial Statistics: One World View or Two? Michael F. Goodchild University of California Santa Barbara

Transcription:

Exploring the Geography of Communities in Social Networks Alexis Comber 1, Michael Batty 2, Chris Brunsdon 3, Andrew Hudson-Smith 2, Fabian Neuhaus 2 and Steven Gray 2 1 Department of Geography, University of Leicester, Leicester, LE1 7RH UK Tel. +44 116 252 3812/ 3823 Fax +44 116 252 3854 ajc36@le.ac.uk, http://www2.le.ac.uk/departments/geography/people/ajc36 2 Centre for Advanced Spatial Analysis, University College London, W1T 4TJ UK mbatty@geog.ucl.ac.uk, a.hudson-smith@ucl.ac.uk, fabian.neuhaus@ucl.ac.uk, steven.gray@ucl.ac.uk 3 Department of Geography, University of Liverpool, Liverpool, L69 3BX UK christopher.brunsdon@liverpool.ac.uk, http://www.liv.ac.uk/geography/staff/brunsdon.htm Summary: This research analyses social network data to identify communities or sub-graph regions. These sub-graph areas are indentified based on the arrangement of edges between vertices. The geographies of the communities are analysed, compared and visualised using kernel density estimations. A research agenda is suggested. KEYWORDS: geo-code, sub-graph, geography, statistical analysis 1. Introduction This paper develops a spatial analysis of social network data. A vast body of research analysing structure in networks and graphs exists in order to identify communities or sub-graph areas that homogenous in some respect. One of the recurring themes in the community detection research concerns the reliability of the communities that are identified and the difficulty in understanding what they mean. For example, Porter (2009, p1098) notes that few methods have been developed to use or even validate the communities that we find and Newman (2008) states that methods for understanding what the communities mean after you find them are... still quite primitive (Newman, 2008, p38). Much social network data now has a geographical component and it is possible to explore how the geographies different communities and partitioning methods compare. Given these contexts, the objectives of this research were: i) To apply community detection algorithms to social network data; ii) To analyse the how virtual communities defined in social network space and social network structures relate to geographic space; 2. Case studies Real networks tend to have a wide distribution of the number of edges connected to each vertex. Thus they are highly heterogeneous with specific sub-graph regions having high concentrations of interconnected vertices. The basis of community detection is to identify areas of the network with higher than expected concentrations of edges connecting groups of vertices and with lower than expected concentrations of edges between these groups. These areas can be considered as communities (Girvan and Newman, 2002). An important area of research in network sciences has been the development of methods for partitioning networks to identify communities. One of the primary aims of these activities is to identify areas within the graph (sub-graph areas) where the nature of interactions between vertices indicates some local clustering of interactions. The assumption is that sub-graph areas with high internal interactions are homogenous to some degree. The interested reader is directed to number of reviews of the methods arising from statistical physics (Porter et al.,

2009; Fortunato, 2010; Leicht & Newman, 2008)). The case studies consider the identification of subgraph areas from 4 different perspectives: the connectedness of concepts, communities based on interaction, user geography and interaction content. In the case studies specific tags in communications between social network data users ( @user in Twitter data in this case) are used to identify and illustrate the connectedness of different concepts. The network is defined by the interactions (edges) between users (vertices). A subset of the data was analysed. It contained 87,555 records. Of these 52,397 contained tweets at ( @ ) a specific user, 52,280 that were not self tweets and 11,968 at users with a spatial reference (Geotag). At the end of the data cleaning, the network comprised 6,659 vertices and 7,491 edges (Figure 1). a) b) Figure 1. a) The network of users, displayed using a Fruchterman-Reingold layout b) The network displayed over geographic space. 3. Results The Walktrap or Random Walk algorithm (Pons and Latapy, 2005) was used to partitioned the network into sub-graph areas based. It identified 1181 communities and in total, 33 communities contain more than 20 vertices. These communities can be visualised in geographic and nongeographic space, and the users / vertices that perform hub functions identified by their degree value the degree of each vertex is the number of edges connected to it (Figure 2). a) b)

Figure 2. The network of communities, displayed a) using a Fruchterman-Reingold layout b) over geographic space, with vertices with size related to their degree. Some examples for different communities are shown in Figure 3, where he varying cohesion and edge degree of community members are evident. Some communities, some sub-graph regions exists over a small specific geographic space, specific locales in London while other exhibit much greater spatial heterogeneity. Geographic Space Network Community 4 Community 11 Community 124 Figure 3. Examples of communities extracted using the Walktrap algorithm displayed using a Fruchterman-Reingold layout (top) and over geographic space (below), with the vertex size related to its degree. Kernel Density Estimation (KDE) can be used to describe the spatial distribution of membership to individual communities. KDE models the relative density of the vertex locations (community members) as a surface created by summing the vertices in the sub-graph over a kernel function (a 2 dimensional distribution curve). Then for each cell, x, in a grid of discrete subdivisions of space, the relative likelihood of the occurrence of a vertex in that grid cell f(x) are computed. KDE s were generated for communities illustrated in Figure 4. The surfaces in Figure 4 reveal much more than simply locating the community member (vertex) locations as in Figure 4. The density surface values describe relative the intensity of the presence of the community in different locations. What they reveal are the very different spatial extents and densities of different communities. For example, Community 124 for example is relative large but very spatially concentrated in a specific area. By contrast, Community 5 is a smaller community with much greater spatial spread but with much lower density or spatial concentration of vertices.

a) Community 5 b) Community 124 Figure 4. Kernel density surfaces showing the spatial extent and spatial concentrations of for different communities. 4. Discussion This research raises a number of discussion points: The results the use of methods from statistical physics for partitioning networks based on the network structure, where edges indicate interactions between users. The geography of these networks can be visualised by plotting user geo-codes. Surfaces describing the relative likelihood of the occurrence of a user in any given community in any given location can be generated using kernel density estimates and future work will use text-mining of the content to communications in order to to identify connected users around specific themes. Emphasising the geographic attributes of such networks provides greater insight into the structure, and possibly the activities and interests, of the communities they represent. Other areas for investigation include - examining other partitioning algorithms (Spinglass, Leading Eigenvector, Fastgreedy and Edge Betweenness etc) ad they have been shown to be more or less suited to analysing geographic networks (Comber et a., in press); - Exploring the use of different statistical models to determine surprising arrangements of edges with the Modulaity measure a widely used partition quality measure. - Using methods for analysing semantics such as LDA to generate a weighted network which can be partitioned. - Temporal / dynamic nature of social networks need for on the fly analyses - Exploration of heuristic search methods that explicitly deal with groups and set such as modified Grouping GAs for partitioning the network. - Partitioning of text weighted networks 5. References Comber A., Brunsdon, C. and Farmer, C. (in press). Community detection in spatial networks: inferring land use from a planar graph of land cover objects. Paper accepted for publication in International Journal of Applied Earth Observation and Geoinformation (January 2012) Fortunato S (2010). Community detection in graphs. Physics Reports, 486(3-5): 75-174. Girvan M and Newman MEJ (2002). Community structure in social and biological networks, Proceedings of the National Academy of Sciences, 99: 7821-7826. Leicht EA and Newman MEJ (2008), Community structure in directed networks, Physical Review Letters, 100: 118703. Newman MEJ (2008). The physics of networks, Physics Today, 61(11): 33-38. Pons P and Latapy M (2005). Computing communities in large networks using random walks. http://arxiv.org/abs/physics/0512106v1. Porter, MA, Onnela, J-P and Mucha, PJ, (2009). Communities in Networks. Notices of the AMS,

56(9): 1082-1166. 6. Biography Lex Comber and Chris Brunsdon have research interests in spatial analysis geocomputation. Lex is now a boring old git and Chris can claim to have been to every GISRUK that ever was. Michael Batty is Emeritus Professor of Planning at the Centre for Advanced Spatial Analysis and has a long interest in all things urban and spatial. Fabian Neuhaus is a researcher at the Bartlett Centre for Advanced Spatial Analysis examining the temporal aspects of the city, focusing on routine and repetition. Andrew Hudson-Smith has an extensive research interest in digital technologies for communication. Steven Gray is a researcher at the Bartlett Centre for Advanced Spatial Analysis examining the temporal aspects of the city, focusing on routine and repetition.