Spatial Extension of the Reality Mining Dataset

Similar documents
Spatial Data Science. Soumya K Ghosh

Urban Mobility Mining and Its Facility POI Proportion Analysis based on Mobile Phone Data Rong XIE 1, a, Chao GONG 2, b

Geographical Bias on Social Media and Geo-Local Contents System with Mobile Devices

Implementing Visual Analytics Methods for Massive Collections of Movement Data

Methodological issues in the development of accessibility measures to services: challenges and possible solutions in the Canadian context

CARTOGRAPHY in a Web World

An Implementation of Mobile Sensing for Large-Scale Urban Monitoring

Lessons From the Trenches: using Mobile Phone Data for Official Statistics

Introduction to Field Data Collection

VISUAL EXPLORATION OF SPATIAL-TEMPORAL TRAFFIC CONGESTION PATTERNS USING FLOATING CAR DATA. Candra Kartika 2015

DEVELOPMENT OF GPS PHOTOS DATABASE FOR LAND USE AND LAND COVER APPLICATIONS

Extracting mobility behavior from cell phone data DATA SIM Summer School 2013

Exploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data

Encapsulating Urban Traffic Rhythms into Road Networks

Estimating Large Scale Population Movement ML Dublin Meetup

Tutorial: Urban Trajectory Visualization. Case Studies. Ye Zhao

International Journal of Computing and Business Research (IJCBR) ISSN (Online) : APPLICATION OF GIS IN HEALTHCARE MANAGEMENT

Clustering Analysis of London Police Foot Patrol Behaviour from Raw Trajectories

Texas A&M University

Attraction and Avoidance Detection from Movements Zhenhui Li, Bolin Ding, Fei Wu, Tobias Kin Hou Lei, Roland Kays, and Margaret Crofoot.

Introduction to ArcGIS Server Development

What is (certain) Spatio-Temporal Data?

ArcGIS. for Server. Understanding our World

D2D SALES WITH SURVEY123, OP DASHBOARD, AND MICROSOFT SSAS

Collection and Analyses of Crowd Travel Behaviour Data by using Smartphones

A Framework of Detecting Burst Events from Micro-blogging Streams

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Sam Williamson

Extracting Patterns of Individual Movement Behaviour from a Massive Collection of Tracked Positions

Assessing pervasive user-generated content to describe tourist dynamics

Automatic Classification of Location Contexts with Decision Trees

Inferring activity choice from context measurements using Bayesian inference and random utility models

Adaptive Contact Probing Mechanisms for Delay Tolerant Applications. Wang Wei, Vikram Srinivasan, Mehul Motani

Data Aggregation with InfraWorks and ArcGIS for Visualization, Analysis, and Planning

Visualizing Energy Usage and Consumption of the World

Enabling ENVI. ArcGIS for Server

1. Richard Milton 2. Steven Gray 3. Oliver O Brien Centre for Advanced Spatial Analysis (UCL)

A route map to calibrate spatial interaction models from GPS movement data

Unit I Terms. 1.1 Terms

Seismic source modeling by clustering earthquakes and predicting earthquake magnitudes

Gridded Ambient Air Pollutant Concentrations for Southern California, User Notes authored by Beau MacDonald, 11/28/2017

Tomas Mildorf New technologies for the information society University of west bohemia in pilsen, Czech republic

Regional Centre for Mapping of Resources for Development (RCMRD), Nairobi, Kenya

GEBCO 2013 TSCOM. EMODNET Hydrography status report

Airport Meteorology Analysis

Data Mining II Mobility Data Mining

L.S. Lee* Hong Kong Observatory, Hong Kong, China

Using spatial-temporal signatures to infer human activities from personal trajectories on location-enabled mobile devices

Data evaluation of BDS registers from airborne transponders

Capturing and recording spatial data Guidelines, standards and best practices

Bus Landscapes: Analyzing Commuting Pattern using Bus Smart Card Data in Beijing

Regional Centre for Mapping of Resources for Development (RCMRD), Nairobi, Kenya. Introduction GIS ( 2 weeks: 10 days)

Projective Clustering by Histograms

Methodology for Computer Science Research Lecture 4: Mathematical Modeling

ArcGIS Online Routing and Network Analysis. Deelesh Mandloi Matt Crowder

Probabilistic Cardinal Direction Queries On Spatio-Temporal Data

A framework for spatio-temporal clustering from mobile phone data

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

The Development of Historical Data Visualization a spatiotemporal web application supporting teaching and learning at the Harvard Business School

Evaluating Travel Impedance Agreement among Online Road Network Data Providers

Regional Centre for Mapping of Resources for Development (RCMRD), Nairobi, Kenya. Introduction GIS (2 weeks: 10 days)

Performing. Geospatial Analysis. Using Latitude and Longitude Data. Ginger Ni - Software Engineer

Exploratory Hierarchical Clustering for Management Zone Delineation in Precision Agriculture

Road Ahead: Linear Referencing and UPDM

Evaluating Physical, Chemical, and Biological Impacts from the Savannah Harbor Expansion Project Cooperative Agreement Number W912HZ

Network Analysis with ArcGIS Online. Deelesh Mandloi Dmitry Kudinov

Now That You ve Downloaded Some StreetLight Data, What Should You Do First? Data Representativeness and Expansion Considerations

A Prototype of a Web Mapping System Architecture for the Arctic Region

Exploring the Association Between Family Planning and Developing Telecommunications Infrastructure in Rural Peru

Interactive GIS in Veterinary Epidemiology Technology & Application in a Veterinary Diagnostic Lab

Application of WebGIS and VGI for Community Based Resources Inventory. Jihn-Fa Jan Department of Land Economics National Chengchi University

Regional Centre for Mapping of Resources for Development (RCMRD), Nairobi, Kenya

Welcome! Power BI User Group (PUG) Copenhagen

Demographic Data in ArcGIS. Harry J. Moore IV


The Challenge of Geospatial Big Data Analysis

Advanced Algorithms for Geographic Information Systems CPSC 695

Mobility Analytics through Social and Personal Data. Pierre Senellart

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde

Write a report (6-7 pages, double space) on some examples of Internet Applications. You can choose only ONE of the following application areas:

GIS Workshop Data Collection Techniques

Patrick McGregor Department of Resource Analysis, Saint Mary s University of Minnesota, Minneapolis, MN 55404

Data Aggregation with InfraWorks and ArcGIS for Visualization, Analysis, and Planning

Geo-identification and pedestrian navigation with geo-mobile applications: how do users proceed?

Understanding Social Characteristic from Spatial Proximity in Mobile Social Network

Unit 1, Lesson 2. What is geographic inquiry?

Unit 1, Lesson 3 What Tools and Technologies Do Geographers Use?

Plow Camera and Location Sharing Practices. National Rural ITS Conference October 23, 2018

Crime Analysis. GIS Solutions for Intelligence-Led Policing

Validating general human mobility patterns on massive GPS data

Introduction to IsoMAP Isoscapes Modeling, Analysis, and Prediction

Classification in Mobility Data Mining

ETSI TS V8.0.0 ( ) Technical Specification

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Policy Paper Alabama Primary Care Service Areas

Research Group Cartography

Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data

PHY323:Lecture 7 Dark Matter with Gravitational Lensing

The Concept of Geographic Relevance

Location-Based R-Tree and Grid-Index for Nearest Neighbor Query Processing

Transcription:

R&D Centre for Mobile Applications Czech Technical University in Prague Spatial Extension of the Reality Mining Dataset Michal Ficek, Lukas Kencl sponsored by

Mobility-Related Applications Wanted! Urban planning Transport management Content delivery Cloud computing for mobiles Delay Tolerant Networks... Appropriate user-tracking datasets to study mobility are missing! 2

Reality Mining Dataset Nathan Eagle & Alex Pentland, Massachusetts Institute of Technology, 2005 Machine-sensed data mobile terminal-based recording communication (voice, SMS, data, duration) proximity (Bluetooth devices nearby) location (date, area, Cell-ID, network) phone status (charging, idle, apps in use) social ties (friends, colleagues) 100 users during 9 months MIT students, staff unique and rare source of data Publicly available for download* tens of publications Spatial dimension not available! exploit dataset for further research support and validate results derived to date * http://reality.media.mit.edu/download.php 3

From Cell to Geographical Coordinates - I Cell Identification in GSM / UMTS mobile network: Cell Global Identity Mobile Country Code Mobile Network Code Location Area Code Cell Identifier 4

From Cell to Geographical Coordinates - II MCC, MNC LAC, Cell-ID Cell-ID list from operator not publicly available Cell-ID databases publicly available (OpenCellID, CellDB, CellSpotting) only sparse coverage commercial (Location-API.com, Cell-id Look-up API) limited access Longitude / Latitude need full Cell Global Identifier Reality Mining Dataset contains only Location Area Code and Cell-ID LAC / Cell-ID Google Location API HTTP hidden API for My Location service direct request from plain PC possible MCC, MNC LAC / Cell-ID computer mimics mobile phone accepts even only LAC and Cell-ID 5 Lat / Lon Lat / Lon

Location Data Acquisition Almost 33,000 unique cells present in Reality Mining Dataset Retrieved 46.75% of locations with geographical coordinates 1. Five years delay between dataset recording and location retrieval 2. Mobile networks evolution (3G) and renumbering 3. Boston (Massachusetts, USA) area completely missing due to operator acquisitions All retrieved cell locations 6

Outliers Detection and Removal - I Unlikely places Distorted trajectories impossible distant hops (between continents in 2 seconds) Why not simply remove distant hops? because airplanes fly fast and far Distorted trajectories Why not use Mobile Country Code to detect cells outside corresponding Country? trajectory distortion caused by LAC-Cell/ID pair reuse MCC and MNC codes are missing! Unlikely places 7

Outliers Detection and Removal - II Observation: Location Area consistency cells with same LAC form compact clusters Location Areas cover small areas common mobile network design pattern All retrieved cell locations Location Areas in CZ Location Areas from cell locations Location Areas neither compact nor covering small areas 8

LAC-clustering Algorithm LAC-clustering algorithm heuristic extension of general agglomerative hierarchical clustering 1. Select cells with the same LAC 2. Let each cell location be a cluster 3. repeat Merge the two closest clusters 4. until only one cluster remains 5. Use distance criterion for forming clusters 6. Select one Location Area representative... and iterate over all LACs 35Km GSM radio limit 9

Location Data Acquisition Outliers Removed Removed ~1,500 unique cells from ~15,000 cell locations with coordinates 42% of all unique locations in the Reality Mining Dataset left Locations all around the World, not only Boston! Note: can t verify result correctness LAC-clustered cell locations 10

Movement Trajectory Reconstruction Space-time cube visualization Missing locations will distort trajectory unknown locations mobility info missing Trajectory example from Reality Mining Dataset Reconstruct user trajectory from consistent subsequences have majority of known locations... (e.g. > 95%) are representatively long... (e.g. > 300 locations) 11

Finding Consistent Subsequences - I Example desired subsequence length L 5 desired known locations ratio C 60% (3 out of 5) Two consistent subsequences time 6 out of 10 60% 6 out of 9 66% known location unknown location consistent subsequence 12

Finding Consistent Subsequences - II 1. Handle locations sequence as discrete signal: known location... 1 unknown location... 0 2. Apply moving average filter with window size L 3. Select locations above desired known locs ratio threshold C L > 300 locations C > 95% 13

Reality Mining Dataset Locations Summary Fraction of unique known locations per user varies between 0% and 68% retrieved locations don t cover whole user pool Time spent by users on known locations similar same ratio, but different distribution heavy/long tail users spend most of their time at only few places Consistent subsequences describe approx. 0.6% to 15% of user mobility trace from total 9200 hours of tracking based on parameters of consistent subsequences cell locations cover most likely business, conference & vacation trips Total time in consistent subsequences 95% of known locations, length 300 locations total time in all consistent subsequences approx. 326 days 14

Conclusion Method for retrieving geographical locations for GSM / UMTS Cells based on querying Google Locations API LAC-clustering for outliers detection and removal representative movement trajectory reconstruction Retrieved coordinates for 42% of unique cells from Reality Mining Dataset method suitable for similar datasets Spatial information opens further research possibilities 326 days of valuable user-mobility traces 15

What Next? What can be derived from such spatial data? usage patterns at different locations, when traveling at different speeds mobile user movement prediction validation and support of previously published results active vs. passive tracking comparison correlation of mobility and behavior of the user... Greater level of Cell-ID obfuscation for further dataset recordings? hashing / obfuscation preserving cellular network nature Limits of informed consent? Google Locations API did not exist in the time when Reality Mining Dataset was recorded can we provide trustworthy guarantees about restrictions on future information retrieval from monitored data at all? 16

Thank you! Interested? Why not read our Paper? Michal Ficek, Lukas Kencl: Spatial Extension of the Reality Mining Dataset Czech Technical University in Prague michal.ficek@rdc.cz www.rdc.cz 17