Exploring Human Mobility with Multi-Source Data at Extremely Large Metropolitan Scales. ACM MobiCom 2014, Maui, HI

Similar documents
Exploring Human Mobility with Multi-Source Data at Extremely Large Metropolitan Scales

A Heterogeneous Model Integration for Multi-source Urban Infrastructure Data

Exploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data

Travel Pattern Recognition using Smart Card Data in Public Transit

Encapsulating Urban Traffic Rhythms into Road Networks

ArcGIS GeoAnalytics Server: An Introduction. Sarah Ambrose and Ravi Narayanan

Bus Landscapes: Analyzing Commuting Pattern using Bus Smart Card Data in Beijing

The Challenge of Geospatial Big Data Analysis

Analysis of the urban travel structure using smartcard and GPS data from Santiago, Chile

Smart Card clustering to extract typical temporal passenger habits in Transit network. Two case studies: Rennes in France and Gatineau in Canada

Assessing spatial distribution and variability of destinations in inner-city Sydney from travel diary and smartphone location data

Typical information required from the data collection can be grouped into four categories, enumerated as below.

Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

Data Collection. Lecture Notes in Transportation Systems Engineering. Prof. Tom V. Mathew. 1 Overview 1

Characterizing Travel Time Reliability and Passenger Path Choice in a Metro Network

Mapping Accessibility Over Time

* Abstract. Keywords: Smart Card Data, Public Transportation, Land Use, Non-negative Matrix Factorization.

Now That You ve Downloaded Some StreetLight Data, What Should You Do First? Data Representativeness and Expansion Considerations

Assessing pervasive user-generated content to describe tourist dynamics

Extracting mobility behavior from cell phone data DATA SIM Summer School 2013

Sensitivity of estimates of travel distance and travel time to street network data quality

VISUAL EXPLORATION OF SPATIAL-TEMPORAL TRAFFIC CONGESTION PATTERNS USING FLOATING CAR DATA. Candra Kartika 2015

GIS Analysis of Crenshaw/LAX Line

Data Mining II Mobility Data Mining

Exploiting Geographic Dependencies for Real Estate Appraisal

PATREC PERSPECTIVES Sensing Technology Innovations for Tracking Congestion

Detecting Origin-Destination Mobility Flows From Geotagged Tweets in Greater Los Angeles Area

ArcGIS is Advancing. Both Contributing and Integrating many new Innovations. IoT. Smart Mapping. Smart Devices Advanced Analytics

Forecasts from the Strategy Planning Model

Discovering Urban Spatial-Temporal Structure from Human Activity Patterns

Research Seminar on Urban Information Systems

Traffic Demand Forecast

Understanding individual and collective mobility patterns from smart card records: A case study in Shenzhen

The 3V Approach. Transforming the Urban Space through Transit Oriented Development. Gerald Ollivier Transport Cluster Leader World Bank Hub Singapore

Shared Subway Shuttle Bus Route Planning Based on Transport Data Analytics

BROOKINGS May

Visualizing Big Data on Maps: Emerging Tools and Techniques. Ilir Bejleri, Sanjay Ranka

The prediction of passenger flow under transport disturbance using accumulated passenger data

Collection and Analyses of Crowd Travel Behaviour Data by using Smartphones

Inferring Passenger Boarding and Alighting Preference for the Marguerite Shuttle Bus System

Combining smart card data and household travel survey to analyze jobs-housing relationships in Beijing

Status Report: Ongoing review of O-D cellular data for the TPB modeled area

A route map to calibrate spatial interaction models from GPS movement data

Understanding Individual Daily Activity Space Based on Large Scale Mobile Phone Location Data

Research Article Urban Mobility Dynamics Based on Flexible Discrete Region Partition

Validating general human mobility patterns on massive GPS data

GIS Based Transit Information System for Metropolitan Cities in India

Spatial Data Science. Soumya K Ghosh

Policy Note 6. Measuring Unemployment by Location and Transport: StepSA s Access Envelope Technologies

Where to Find My Next Passenger?

City monitoring with travel demand momentum vector fields: theoretical and empirical findings

Location Determination Technologies for Sensor Networks

Measuring connectivity in London

Measurement of human activity using velocity GPS data obtained from mobile phones

Mobility Analytics through Social and Personal Data. Pierre Senellart

Orbital Insight Energy: Oil Storage v5.1 Methodologies & Data Documentation

Modelling exploration and preferential attachment properties in individual human trajectories

Perspectives on Stability and Mobility of Passenger s Travel Behavior through Smart Card Data

True Smart and Green City? 8th Conference of the International Forum on Urbanism

Integrating Origin and Destination (OD) Study into GIS in Support of LIRR Services and Network Improvements

Travel behavior of low-income residents: Studying two contrasting locations in the city of Chennai, India

Continental Divide National Scenic Trail GIS Program

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Shreyas Shinde

Data Analysis of NYC Cab Services

Unit 1, Lesson 2. What is geographic inquiry?

A Cloud Computing Workflow for Scalable Integration of Remote Sensing and Social Media Data in Urban Studies

DATA SOURCES AND INPUT IN GIS. By Prof. A. Balasubramanian Centre for Advanced Studies in Earth Science, University of Mysore, Mysore

Profiling underprivileged residents with mid-term public transit smartcard data of Beijing

Web Visualization of Geo-Spatial Data using SVG and VRML/X3D

Central Coast Tracking Trash. Trash Webinar #3 September 19, 2017

ESRI Survey Summit August Clint Brown Director of ESRI Software Products

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Sam Williamson

Geographical Bias on Social Media and Geo-Local Contents System with Mobile Devices

Smart Data Collection and Real-time Digital Cartography

GIS = Geographic Information Systems;

Regularity and Conformity: Location Prediction Using Heterogeneous Mobility Data

ArcGIS Enterprise: Out-of-the-Box Spatial Analysis. Vicki Cove Hilary Curtis

Data for transport planning and management Why integration and management? Issues in transport data integration

Introduction to ArcGIS GeoAnalytics Server. Sarah Ambrose & Noah Slocum

Developing harmonised indicators on urban public transport in Europe

Understand Urban Human Mobility through Crowdsensed Data

Simplified Trips-on-Project Software (STOPS): Strategies for Successful Application

Space-adjusting Technologies and the Social Ecologies of Place

Caesar s Taxi Prediction Services

Estimating Large Scale Population Movement ML Dublin Meetup

Metropolitan Wi-Fi Research Network at the Los Angeles State Historic Park

Geospatial SDI Portal for effective Governance of Pune METROPOLIS region

California Urban Infill Trip Generation Study. Jim Daisa, P.E.

Evaluation of urban mobility using surveillance cameras

Understanding Travel Time to Airports in New York City Sierra Gentry Dominik Schunack

The Model Research of Urban Land Planning and Traffic Integration. Lang Wang

COMBINATION OF MACROSCOPIC AND MICROSCOPIC TRANSPORT SIMULATION MODELS: USE CASE IN CYPRUS

Geographic Data Science - Lecture II

Supplementary Figures

Estimating the vehicle accumulation: Data-fusion of loop-detector flow and floating car speed data

Classification in Mobility Data Mining

Knowledge claims in planning documents on land use and transport infrastructure impacts

A Machine Learning Approach to Trip Purpose Imputation in GPS-Based Travel Surveys

Visitor Flows Model for Queensland a new approach

SPATIAL INEQUALITIES IN PUBLIC TRANSPORT AVAILABILITY: INVESTIGATION WITH SMALL-AREA METRICS

Transcription:

Exploring Human Mobility with Multi-Source Data at Extremely Large Metropolitan Scales Desheng Zhang & Tian He University of Minnesota, USA Jun Huang, Ye Li, Fan Zhang, Chengzhong Xu Shenzhen Institute of Advanced Technology, China ACM MobiCom 2014, Maui, HI 1

Outline Introduction Design Evaluation Application Conclusion 2

Human Mobility Patterns Mobile Networking Location Based Services Real-time Navigation Transit Services Social Networking Mobile Networking Location Based Services Real-Time Navigation Transit Services Social Networking 3

Human Location Tracking Devices GPS Devices Cellphones by Call Detail Records (CDR) Cell Tower Levels Automatic Fare Collection System (AFC) Station Levels: Subways, Buses, Taxicabs Massive Empirical Data Collection Onboard GPS Cellphone Subway Station Bus Taxi 4

Empirical Mobility Data Empirical Data for Mobility Modeling Large Scale Fine Granularity Long Collection Period Taxicab Passengers in Shenzhen 5

Human Mobility Models Legacy Mobility Models: MobiCom 03: Obstacles based Mobility Model by Jardosh et al. MobiCom 04: Weighted Waypoint Model by Hsu et al. MobiCom 07: Mobility Modeling in Bus-based DTN: Zhang et al. UbiComp 11: Mobility Modeling with Smartcards: Lathia et al. KDD 11: Mobility in Social Networks: Cho et al. MobiSys 12: Cellphone based Mobility Model: Isaacman et al. MobiCom 13: Residence Time Prediction: Baumann et al. MobiCom 13: Ballistic Model: Bogo et al. Models based on Single-Source Data Cellphone One Kind of Urban Transit Taxicab, Subway or Bus 6

Common Drawback: Biased Sampling Using Residents in Single-Source Data as a Sample for ALL 7

Common Drawback: Biased Sampling Using Residents in Single-Source Data as a Sample for ALL Introducing a Bias against Residents not Involved 7

Models Based on Cellphone Data Use Residents with cellphone activities as a Sample for all Biased against Residents without cellphone activities Biased Sample 9

Models Based on Transit Data Use a type of Passengers (e.g., taxicab) as a sample Biased against Residents using other transit Sample Biased Biased Biased Taxi Passengers Bus Passengers Subway Passengers Private Cars 10

Visualizing Biased Sampling with Mobility Graph Vertex: a Urban Region Vertex Size: Number of Mobile Residents Edge: Mobility between a Pair of Regions Edge thickness: Mobility Volume Mobility Volume Urban Region Urban Region Number of Residents Urban Region 11

Mobility Graph based on Cellphone Data Urban Region Urban Region Industrial 12 Colors: Different Urban Districts

Biased Low Industrial Cellphone Mobility High Industrial Transit Mobility 13 (Subway+Bus+Taxi)

Possible Solution: Multi-Source Data Quick Expansion of Urban Infrastructures Enabling Multi-Source Data to address biased sampling Integrating Transit Networks with Cellular networks Subway Network SmartCard Data Bus Network Taxicab Network Bus Data Taxicab Data Multi-Source Data Cellular Network Cellphone Dataset Cellphone Data 14

Contributions Mitigating biased sampling in single-source data by crossreferencing multi-source data Analyzing spatial-temporal dynamics of multi-source data to infer real-time mobility Designing the first generic architecture mpat for mobility modeling, separating low level data collection and high level service design Implementing mpat with extremely large-scale multi-source data capturing 10 million residents in Shenzhen Enabling an inter-region mobility inference with a 75% accuracy Developing a transit service based on inferred inter-region mobility to reduce 46% of passenger travel time 15

Outline Introduction Design Evaluation Application Conclusion 16

mpat Architecture Application Design Layer Wi-Fi AP Deployment Cell Tower Depolyment Ad hoc Networking Inter- Region Transit Mobility Abstraction Layer Mobility Abstraction Real-Time Data Feed Layer Cellular Data Feed Taxicab Data Feed Bus Data Feed Subway Data Feed Urban Infrastructures (Taxicabs, Buses, Subway, Cellular Networks) 17

Data Feed Layer: Overview Data Feeding Data Managing Data Storing Data Cleaning Data Protecting 18

Data Feeding Close Collaboration Shenzhen Government Agencies A Reliable Feeding Mechanism Cellphones: CDRs for 10.4 Million Users Smart Cards: Fare Transactions for 16 Million Users Taxicabs: GPS for 14 Thousand Taxicabs Buses: GPS for 10 Thousand Buses Shenzhen Transport Commission 19

Data Managing Hardware: 11 Node Cluster with 34 TB Storage Node with 32 Cores and 32 GB RAM Software: Hadoop Distributed File System (HDFS) Pig and Hive Cluster in Shenzhen 20

Data Storing 21

Data Cleaning Errant Data in mpat Errant GPS Data Duplicated Data Data with Logical Errors Missing Data 11% of Data Removed Raw GPS Removing Errant and Duplicated GPS Map Matching 22

Data Protecting: Privacy Anonymization: Anonymizing All Data Replacing IDs with Serial Numbers Minimal Exposure: Processing Mobility Info Only Dropping Other Info Aggregation: Presenting Mobility in Aggregation Not Focusing on Individual Users 23

Mobility Abstraction Layer: Overview Trip Extraction Spatial and Temporal Characteristic Analysis Urban Region Partition Inter-Region Mobility Inference 24

Trip Extraction Cellphone User Trips Obtaining trips by a continuous trace of cellphone towers associated CDRs for the same user Taxicab Passenger Trips By finding pickup and related dropoff locations Bus Passenger Trips By finding boarding and alighting bus stations Subway Passenger Trips By finding entering and exiting metro stations Details in the paper 25

Trip Extraction Cell User Origin Bus/Sub Passenger Origin Taxi Passenger Destination Taxi Passenger Origin Cell User Destination Bus/Sub Passenger Destination 26

Characteristic Analysis Classifying All Trips Cellphone Trips Transit Trips Spatial Characteristic Variety in Lengths Temporal Characteristic Variety in Time Periods 50% 40% 30% 20% 10% 0% 27

Spatial Characteristic Trips from Transit Data Trips between 1 km and 35 km Trips from Cellphone data Trips with various lengths 28

Temporal Characteristic Captured Trips in the slot 7-8 AM in 14 different Mondays Fewer Trips from Cellphone data More Trips from Transit data 29

Empirical Insights Bias in Transit Data: Capturing fewer short (<1km) or long (>35km) trips Difficult to be mitigated Bias in Cellphone Data: Capturing fewer trips in a given time slot Possible to be mitigated Mitigating the Bias in Cellphone Data: Urban trips are highly repeatable, e.g., daily commute A traveling resident may use cellphone before Accumulatively using historical data to capture residents 30

Cumulatively using Historical Data Captured Trips from Unique Residents in Accumulative Mondays Cellphone Data are better 31

Urban Region Partition Utilizing 496 Shenzhen Urban Regions as a spatial partition Bus Trips Cell Trips Taxi Trips Downtown Subway Trips Color: Population Level 32

Cellphone User Mobility Graph Taxicab Passenger Mobility Graph Urban Region Partition Mobility Graph Vertex: a Urban Region Vertex Size: Number of Mobile Residents Edge: Mobility between a Pair of Regions Edge thickness: Mobility Volume Bus Passenger Mobility Graph Subway Passenger Mobility Graph 33

Online Inference: Objective: inferring the real-time mobility among different urban regions by a mobility graph G for the current slot Aggregating individual mobility to obtain mobility volumes for every region pairs Trips from Data Urban Region Partition A mobility graph G 34

Online Inference by Cellphone Data: 90% of urban residents have cellphones Infer G = G c +G c in a slot τ G c for active cellphone users with activities in τ G c for inactive cellphone users without activities in τ Active Users G c Inactive Users Key Challenge: G c? G c for inactive users is Unknown 35

Online Inference: Solution: Infer G c by accumulatively using historical data Active Users Inactive Users All Cellphone Users G c Plus G c Approx ~ G Today Real Time Cellphone Data Yester day Day Before Day X Yesterday Historical Cellphone Data Repeatable Trip: Inactive users may use cellphones before for same 36trip

Design Issue: Accumulation When to Stop Accumulatively Using Historical Data? One Day or One Week or One Month Avoiding Under or Overestimated Finding a Bound by another Data Source to stop the accumulation Using Mobility from Transit Data as a Lower Bound for Total Mobility Yester day Inactive Users G c Day X Historical Cellphone Data 37

Online Inference: Active Users Inactive Users Lower Bound Transit Users G c Today X Real Time Cellphone Data Plus Day X-1 G c Day X-2 Day X-3 Historical Cellphone Data Cover? G t Today X Real Time Transit Data Stop Accumulation, if G c plus G c covers G t in terms of edge weights 38

Online Inference: Active Users Inactive Users All Cellphone Users G c G c Approx ~ Today X Real Time Cellphone Data Day X-1 Day X-2 Day X-3 Historical Cellphone Data Using G c plus G c to approximate G for all inter region mobility G 39

Outline Introduction Design Evaluation Application Conclusion 40

Evaluation Summary Comparison: Radiation: Statistical Model without Real-Time Data WHERE: Single-Source Model with Cellphone Data Metric: Mean Average Percent Error (MAPE) Among : Inferred Mobility in an OD pair region pairs, i.e., an OD pair : Real Mobility in an OD pair (Ground Truth) Ground Truth: Obtained by a location updating dataset of 7 million cellphone users Logging locations of all users in every 15 mins even without activities Did not use in analysis since it cannot generalize to their cities, and need extra support in terms of software, hardware, and policies 41

Accuracy on different levels Region Levels Street Levels 42

Impact of slot length 43

Impact of Historical Data Accuracy Running Time mpat-s: using all historical cellphone data without analyzing the correlation with transit data 44

Outline Introduction Design Evaluation Application Conclusion 45

Inter Region Transit Based on mpat, finding urban region pairs with High human mobility (Cellphone Data) Low public transit mobility (Transit Data) Indicating Inadequate Transit Service Providing non-stop express inter region transit (IRT) services between these region pairs 46

Real World Implementation Implementing between two urban regions Using 3 taxis as IRT Vehicles to deliver 12 volunteers Logging Travel Time for 30 days 47

Experiment Results Comparing IRT with walking and taking regular bus Quantifying speed difference between taxicabs and buses with a factor v 48

Conclusion Design an architecture mpat for the analysis and inference of the human mobility with a 75% inference accuracy Two key insights models based on single-source data introduce biases, which can be mitigated by multi-source data multi-source data can be used for cross-referencing to increase the performance 49

Thanks 50