Improving forecasting under missing data on sparse spatial networks

Similar documents
Graphical LASSO for local spatio-temporal neighbourhood selection

Transportation Big Data Analytics

Improving the travel time prediction by using the real-time floating car data

Spatio-Temporal Analytics of Network Data

TRAFFIC FLOW MODELING AND FORECASTING THROUGH VECTOR AUTOREGRESSIVE AND DYNAMIC SPACE TIME MODELS

A Dynamic Spatial Weight Matrix and Localised STARIMA for Network Modelling. T Cheng, J Wang, J Haworth, BG Heydecker, AHF Chow

Clustering Analysis of London Police Foot Patrol Behaviour from Raw Trajectories

MODELLING TRAFFIC FLOW ON MOTORWAYS: A HYBRID MACROSCOPIC APPROACH

Traffic Flow Forecasting for Urban Work Zones

Estimating Urban Traffic Patterns through Probabilistic Interconnectivity of Road Network Junctions

Space-Time Kernels. Dr. Jiaqiu Wang, Dr. Tao Cheng James Haworth University College London

THE POTENTIAL OF APPLYING MACHINE LEARNING FOR PREDICTING CUT-IN BEHAVIOUR OF SURROUNDING TRAFFIC FOR TRUCK-PLATOONING SAFETY

Dynamic OD Matrix Estimation Exploiting Bluetooth Data in Urban Networks

Estimating the vehicle accumulation: Data-fusion of loop-detector flow and floating car speed data

Cheng Soon Ong & Christian Walder. Canberra February June 2018

IBM Research Report. Road Traffic Prediction with Spatio-Temporal Correlations

Predicting freeway traffic in the Bay Area

Path and travel time inference from GPS probe vehicle data

DEVELOPMENT OF ROAD SURFACE TEMPERATURE PREDICTION MODEL

Estimation of the Subjective Topological Structure of Action Spaces using Learning Agents

Airborne Traffic Flow Data and Traffic Management. Mark Hickman and Pitu Mirchandani

Travel Time Estimation with Correlation Analysis of Single Loop Detector Data

CAPACITY DROP: A RELATION BETWEEN THE SPEED IN CONGESTION AND THE QUEUE DISCHARGE RATE

Belief Propagation for Traffic forecasting

transportation research in policy making for addressing mobility problems, infrastructure and functionality issues in urban areas. This study explored

Chapter 2 Travel Time Definitions

GIS-BASED VISUALIZATION FOR ESTIMATING LEVEL OF SERVICE Gozde BAKIOGLU 1 and Asli DOGRU 2

MODELLING AND UNDERSTANDING MULTI-TEMPORAL LAND USE CHANGES

Examining travel time variability using AVI data

Empirical assessment of urban traffic congestion

Predicting Link Travel Times from Floating Car Data

A Method for Predicting Road Surface Temperature Distribution Using Pasquill Stability Classes

Urban traffic flow prediction: a spatio-temporal variable selectionbased

Regression Methods for Spatially Extending Traffic Data

J1.2 Short-term wind forecasting at the Hong Kong International Airport by applying chaotic oscillatory-based neural network to LIDAR data

Data - Model Synchronization in Extended Kalman Filters for Accurate Online Traffic State Estimation

Real-Time Travel Time Prediction Using Multi-level k-nearest Neighbor Algorithm and Data Fusion Method

Open Research Online The Open University s repository of research publications and other research outputs

Spatio Temporal Analysis of Network Data. Dr Tao Cheng

Loop detector data error diagnosing and interpolating with probe vehicle data

LARGE-SCALE TRAFFIC STATE ESTIMATION

Existence, stability, and mitigation of gridlock in beltway networks

2.2 Geographic phenomena

Urban Link Travel Time Estimation Using Large-scale Taxi Data with Partial Information

STARIMA-based Traffic Prediction with Time-varying Lags

NATHAN HALE HIGH SCHOOL PARKING AND TRAFFIC ANALYSIS. Table of Contents

Understanding Travel Time to Airports in New York City Sierra Gentry Dominik Schunack

Solar Nowcasting with Cluster-based Detrending

Multi-Plant Photovoltaic Energy Forecasting Challenge: Second place solution

Diagnosing New York City s Noises with Ubiquitous Data

The Tyndall Cities Integrated Assessment Framework

GIS Based Transit Information System for Metropolitan Cities in India

Encapsulating Urban Traffic Rhythms into Road Networks

A GIS-based Traffic Control Strategy Planning at Urban Intersections

Estimation of Measures of Effectiveness Based on Connected Vehicle Data

5.1 Introduction. 5.2 Data Collection

Empirical Study of Traffic Velocity Distribution and its Effect on VANETs Connectivity

Optimizing Urban Modeling based on Road Network and Land Use/Cover Changes by Using Agent Base Cellular Automata Model

California Urban Infill Trip Generation Study. Jim Daisa, P.E.

A hydrodynamic theory based statistical model of arterial traffic

CHAPTER 5 FORECASTING TRAVEL TIME WITH NEURAL NETWORKS

Comparison of traffic forecasting methods in urban and suburban context

The document was not produced by the CAISO and therefore does not necessarily reflect its views or opinion.

Delay of Incidents Consequences of Stochastic Incident Duration

$QDO\]LQJ$UWHULDO6WUHHWVLQ1HDU&DSDFLW\ RU2YHUIORZ&RQGLWLRQV

Real Time Traffic Control to Optimize Waiting Time of Vehicles at A Road Intersection

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Maximum Direction to Geometric Mean Spectral Response Ratios using the Relevance Vector Machine

Spatial Thinking and Modeling of Network-Based Problems

Cumulative Count Curve and Queueing Analysis

Changes in the Spatial Distribution of Mobile Source Emissions due to the Interactions between Land-use and Regional Transportation Systems

Analysis and Design of Urban Transportation Network for Pyi Gyi Ta Gon Township PHOO PWINT ZAN 1, DR. NILAR AYE 2

Appendix C Final Methods and Assumptions for Forecasting Traffic Volumes

Motivation Traffic control strategies Main control schemes: Highways Variable speed limits Ramp metering Dynamic lane management Arterial streets Adap

Univariate Short-Term Prediction of Road Travel Times

NEURAL NETWORK FOR TRAVEL DEMAND FORECAST USING GIS AND REMOTE SENSING

Developing travel time estimation methods using sparse GPS data

Traffic Flow Theory & Simulation

Modifications of asymmetric cell transmission model for modeling variable speed limit strategies

PATREC PERSPECTIVES Sensing Technology Innovations for Tracking Congestion

Predicting flight on-time performance

July Forecast Update for Atlantic Hurricane Activity in 2017

THE CRASH INTENSITY EVALUATION USING GENERAL CENTRALITY CRITERIONS AND A GEOGRAPHICALLY WEIGHTED REGRESSION

July Forecast Update for Atlantic Hurricane Activity in 2016

VEHICULAR TRAFFIC FLOW MODELS

1 Spatio-temporal Variable Selection Based Support

Transportation and Road Weather

Assessing the uncertainty in micro-simulation model outputs

The development of a Kriging based Gauge and Radar merged product for real-time rainfall accumulation estimation

Capacity Drop. Relationship Between Speed in Congestion and the Queue Discharge Rate. Kai Yuan, Victor L. Knoop, and Serge P.

APPENDIX IV MODELLING

FREEWAY SHORT-TERM TRAFFIC FLOW FORECASTING BY CONSIDERING TRAFFIC VOLATILITY DYNAMICS AND MISSING DATA SITUATIONS. A Thesis YANRU ZHANG

A Hybrid ARIMA and Neural Network Model to Forecast Particulate. Matter Concentration in Changsha, China

Visitor Flows Model for Queensland a new approach

Leaving the Ivory Tower of a System Theory: From Geosimulation of Parking Search to Urban Parking Policy-Making

CHAPTER 6 CONCLUSION AND FUTURE SCOPE

Open Research Online The Open University s repository of research publications and other research outputs

AN INTERNATIONAL SOLAR IRRADIANCE DATA INGEST SYSTEM FOR FORECASTING SOLAR POWER AND AGRICULTURAL CROP YIELDS

TRAVEL TIME PREDICTION BY ADVANCED NEURAL NETWORK

Appendix BAL Baltimore, Maryland 2003 Annual Report on Freeway Mobility and Reliability

Transcription:

Improving forecasting under missing data on sparse spatial networks J. Haworth 1, T. Cheng 1, E. J. Manley 1 1 SpaceTimeLab, Department of Civil, Environmental and Geomatic Engineering, University College London, Chadwick Building, Gower Street, London WC1E 6BT Telephone: +44 Email: j.haworth@ucl.ac.uk, t.cheng@ucl.ac.uk, edward.manley.09@ucl.ac.uk 1. Introduction Missing data is a major issue in many real world sensor networks. It can complicate the calculation of diagnostic statistics in an offline setting, as well as making prediction of processes difficult in a real time setting. The longer the period of missing data, the more difficult it is to mitigate its effects. In spatial sensor networks, data from neighbouring locations can be used to impute or forecast missing values. Efforts have been made to deal with missing spatio-temporal data in environmental monitoring (Glasbey, 1995; Smith et al, 1996, 2003) and traffic forecasting (Whitlock and Queen, 2001; van Lint et al, 2005; Haworth and Cheng, 2012) amongst others. Crucial to the use of spatio-temporal approaches for dealing with missing data is correctly capturing the dependency structure between locations. In networks of flows this can be implicit in the network structure: for instance, on road networks traffic flows from upstream to downstream in free flowing conditions, and queues build up in the opposite direction in congested conditions. Although this relationship is complicated by traffic signals in the urban environment, it can still be captured using physical models if flow data are available with sufficient spatial and temporal resolution. However, on real world sensor networks, the data collection system often does not capture a sufficient level of spatial and/or temporal granularity, or indeed the right type of data, to model the physical process in detail. In this study, we extend our previous work (Haworth and Cheng, 2012) to improve forecasting of road link travel times under missing data. We combine two networks; the data collection network and the underlying road network. The structure of the road network is used to improve spatio-temporal forecasting of missing data on the sensor network, which is spatially sparse. 2. Data and Methods 2.1 The sensor network and the road network The sensor network is a system of Automatic number plate recognition (ANPR) cameras that record travel times, as part of Transport for London s (TfL) London Congestion Analysis Project (LCAP). Automatic number plate recognition (ANPR) technology is used to record the elapsed time between vehicles entering and exiting a road link, and the individual observations are averaged at 5 minute intervals. Although the LCAP network is a fully connected network, it is spatially sparse, consisting only of major roads that are

of strategic importance to TfL. Therefore, many of the spatial connections between LCAP links are ignored in its network structure. Figure 1. Connecting the LCAP and ITN Networks A more complete representation of London s road network is provided by Ordnance Survey s Integrated Transport Network (ITN), upon which the LCAP network lies. The two networks are shown in Fig. 1, illustrating the sparse coverage of the LCAP network. A comparison of their characteristics is provided in Table 1. In this study, we use the ITN to build up a more complete network structure that captures the possible flows between LCAP links in order to improve spatio-temporal forecasting. This is described in the following section. Variable LCAP ITN No. Links ~1500 500,000+ Avg. Link length (m) 2700 97 Total Length (km) ~3800 60,000+ Table 1. Comparison of LCAP and ITN in the London area

2.2 Methodology First, the ITN links that intersect with each of the LCAP links are identified. Following this, the shortest paths are calculated between each of the LCAP links on the ITN network, to provide a measure of network proximity. This is shown in Fig 2, and is described below. Figure 2. Connecting the LCAP and ITN Networks Fig. 2 a) shows an example of the typical relationship between LCAP and ITN links. Each LCAP link consists of several ITN links, and there are various paths between LCAP links on the ITN network. Fig. 2 b) depicts the approach we take here graphically. The ITN links associated with an LCAP link are viewed as a single node, with the ITN network forming edges between them. The arcs in the diagram represent the ITN links in Fig. 2 a). Shortest paths are calculated along the ITN network between all LCAP links using the igraph library in R statistical package. An N*N spatial weight matrix W is then created from these paths, where N is the number of LCAP links. The i,j elements of W contain the average shortest path distance between the ITN links on LCAP links i and j. 3. Experimental setup To train the models we use the kernel regression model from Haworth and Cheng, 2012 (Eq. 6, p. 543). For brevity, the equations are not repeated here. For each LCAP link, a neighbour set is constructed consisting of those LCAP links that are within a specified network distance. The spatial weight matrix W is converted into a binary adjacency matrix, where, 0 otherwise. When making the forecasts, we assume no data is being collected at the current sensor location, therefore. In total, the travel times of 108 LCAP links are forecast. The size of the training set is 25 days, and k-fold cross validation is used as the training method, with k=5. The selected model is the one which minimises the root mean squared error criterion. A further 10

days, immediately following the training period, are used as a validation set to evaluate the performance of the fitted models. 3.1 Comparison Model For comparison purposes, a further model is created based on the connectivity structure of the LCAP network. For each LCAP link, its directly adjacent upstream and downstream neighbours are used to forecast its values. The experimental setup is identical to that outlined above. 4. Initial Results For the new model, the average RMSE of training across the 108 test links was 94.7 seconds. This rises to 101.2 seconds for the validation data. The RMSE of training and testing for the comparison model was 96.04 and 98.5 seconds respectively. The new model performs slightly better than the comparison model in terms of training, but slightly worse in terms of forecasting, indicating that the LCAP connectivity structure alone is a good predictor of network conditions overall. However, in terms of the performance of the models on individual links, the new model performed better than the comparison model on 46 (42%) of the 108 links, with 4 having exactly equal performance. This indicates that, despite the comparison model having better forecasting performance overall, performance can be improved on a significant number of the LCAP links by incorporating the ITN network structure. An extreme example of this is given in Fig. 3. Figure 3. Example of model performance

5. Conclusions The approach used here demonstrates that spatio-temporal forecasts on sparse sensor networks can be improved by incorporating knowledge of the underlying network structure. In this example, network shortest paths were used to build up spatial neighbourhoods of road links, which were used for forecasting under the assumption of missing data. This study opens up a number of avenues for continuing research. Firstly, it is necessary to examine the characteristics of LCAP links that lead to one model being favoured over the other. It can be speculated that centrally located links, where the network is denser, will see a greater performance increase from the inclusion of the ITN network. Secondly, although shortest path lengths were used to define the connectivity structure in this study, other measures could be used. For instance, the paths could be weighted using other traffic variables such as historical travel times. Furthermore, the number of; or strength of, connections between links on the sensor network could be used a proxy for their degree of connectedness. A further consideration is the weighting of each of the members of the neighbour set. In this study they are weighted equally, but an inverse distance weighting could improve results further. 6. References Haworth, J & Cheng, T, 2012. Non-parametric regression for space time forecasting under missing data. Computers, Environment and Urban Systems, 36(6): 538 550. Glasbey, C A, 1995. Imputation of missing values in spatio-temporal solar radiation data. Environmetrics, 6(4): 363 371 Queen, C M, & Albers, C J, 2008. Forecasting traffic flows in road networks: A graphical dynamic model approach. In Proceedings of the 28th international symposium of forecasting. International Institute of Forecasters. Smith, T M, Reynolds, R W, Livezey, R E & Stokes, D C. 1996. Reconstruction of historical sea surface temperatures using empirical orthogonal functions. Journal of Climat,e 9: 1403 1420. Smith, R L, Kolenikov, S, & Cox, L H, 2003. Spatio-temporal modeling of PM2.5 data with missing values. Journal of Geophysical Research Atmospheres, 128: 10 1029. van Lint, J W C, Hoogendoorn, S P, & van Zuylen, H J, 2005. Accurate freeway travel time prediction with state-space neural networks under missing data. Transportation Research Part C: Emerging Technologies, 13(5 6): 347 369.