Belief Propagation for Traffic forecasting

Similar documents
An Ising Model for Road Traffic Inference

Latent binary MRF for online reconstruction of large scale systems

Bayesian Learning in Undirected Graphical Models

A Belief Propagation Approach to Traffic Prediction using Probe Vehicles

Lecture 16 Deep Neural Generative Models

Bayesian Machine Learning - Lecture 7

Inference in kinetic Ising models: mean field and Bayes estimators

Causal Modeling with Generative Neural Networks

A brief introduction to the inverse Ising problem and some algorithms to solve it

Bayesian Learning in Undirected Graphical Models

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Probabilistic and Bayesian Machine Learning

Path and travel time inference from GPS probe vehicle data

Lecture 15. Probabilistic Models on Graph

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

VISUAL EXPLORATION OF SPATIAL-TEMPORAL TRAFFIC CONGESTION PATTERNS USING FLOATING CAR DATA. Candra Kartika 2015

Information, Physics, and Computation

Junction Tree, BP and Variational Methods

Approximate Inference Part 1 of 2

Rapid Introduction to Machine Learning/ Deep Learning

Approximate Inference Part 1 of 2

The Origin of Deep Learning. Lili Mou Jan, 2015

Introduction to graphical models: Lecture III

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

Undirected Graphical Models: Markov Random Fields

13: Variational inference II

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

Some recent results on the inverse Ising problem. Federico Ricci-Tersenghi Physics Department Sapienza University, Roma

Dynamic Probabilistic Models for Latent Feature Propagation in Social Networks

Reconstruction for Models on Random Graphs

Probabilistic Graphical Models

Chapter 17: Undirected Graphical Models

Undirected graphical models

Spatial Bayesian Nonparametrics for Natural Image Segmentation

6.867 Machine learning, lecture 23 (Jaakkola)

Probabilistic Graphical Models

Max$Sum(( Exact(Inference(

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

Linear Response for Approximate Inference

Knowledge Extraction from DBNs for Images

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

UNSUPERVISED LEARNING

Inferring Sparsity: Compressed Sensing Using Generalized Restricted Boltzmann Machines. Eric W. Tramel. itwist 2016 Aalborg, DK 24 August 2016

14 : Theory of Variational Inference: Inner and Outer Approximation

STA 4273H: Statistical Machine Learning

An Introduction to Bayesian Machine Learning

Probabilistic Graphical Models Lecture Notes Fall 2009

Pattern Recognition and Machine Learning

Learning in Markov Random Fields An Empirical Study

CSC 412 (Lecture 4): Undirected Graphical Models

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Loopy Belief Propagation for Bipartite Maximum Weight b-matching

Random Field Models for Applications in Computer Vision

11. Learning graphical models

Variational Inference (11/04/13)

COMP90051 Statistical Machine Learning

Improving forecasting under missing data on sparse spatial networks

3 : Representation of Undirected GM

2 : Directed GMs: Bayesian Networks

A Deep Interpretation of Classifier Chains

Independent Component Analysis and Unsupervised Learning. Jen-Tzung Chien

Spatio-Temporal Analytics of Network Data

Does Better Inference mean Better Learning?

Deep Learning Basics Lecture 8: Autoencoder & DBM. Princeton University COS 495 Instructor: Yingyu Liang

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Active and Semi-supervised Kernel Classification

Probabilistic Graphical Models

LARGE-SCALE TRAFFIC STATE ESTIMATION

ABSTRACT INTRODUCTION

STA 4273H: Statistical Machine Learning

Probabilistic Graphical Models

Dynamic Approaches: The Hidden Markov Model

Deep unsupervised learning

Lecture 18 Generalized Belief Propagation and Free Energy Approximations

Clique trees & Belief Propagation. Siamak Ravanbakhsh Winter 2018

Learning in Bayesian Networks

Graphical Models for Collaborative Filtering

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Improving the travel time prediction by using the real-time floating car data

Statistical Approaches to Learning and Discovery

Graphical Models and Kernel Methods

CSCI-567: Machine Learning (Spring 2019)

Uncertainty quantification and visualization for functional random variables

CSCE 478/878 Lecture 6: Bayesian Learning

p L yi z n m x N n xi

High-dimensional graphical model selection: Practical and information-theoretic limits

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Restricted Boltzmann Machines for Collaborative Filtering

Learning Deep Architectures for AI. Part II - Vijay Chakilam

CPSC 540: Machine Learning

Algorithmisches Lernen/Machine Learning

Directed and Undirected Graphical Models

5. Sum-product algorithm

Modeling Freeway Traffic with Coupled HMMs

7.1 Basis for Boltzmann machine. 7. Boltzmann machines

High-dimensional graphical model selection: Practical and information-theoretic limits

Independent Component Analysis and Unsupervised Learning

Transcription:

Belief Propagation for Traffic forecasting Cyril Furtlehner (INRIA Saclay - Tao team) context : Travesti project http ://travesti.gforge.inria.fr/) Anne Auger (INRIA Saclay) Dimo Brockhoff (INRIA Lille) Yufei Han (CAOR Armines and INRIA Rocquencourt) Jean-Marc Lasgouttes (INRIA Rocquencourt) Fabrice Marchal (formerly at CNRS LET) Victorin Martin (INRIA) Fabien Moutarde (CAOR Armines) Maxim Samsonov (formerly at INRIA Saclay) C. Furtlehner IWI INRIA-PARIS June 12 2012 1

Problem at hand Goal : reconstruct and predict road traffic on secondary network Existing solutions based on data coming from static sensors (magnetic loops) on main roads. Too expensive to be installed in all streets. Solution : more and more vehicles are equipped with GPS and able to exchanging data (through cellular phone connections) : Floating Car Data (FCD). = we want to build an inference schema adapted to these FCD C. Furtlehner IWI INRIA-PARIS June 12 2012 2

Project guidelines (Traffic Volume Estimation by Spatio-Temporal Inference) (http ://travesti.gforge.inria.fr/) Goal 1 : is to find a macroscopic description of a large scale traffic network machine learning approach, clustering & dimensional reduction At microscopic level we consider travel time along each segment (FCD observations) Hypothesis : large scale behaviour is dominated by various dynamical congestion patterns. Multi-modal joint measure of travel time mixing macroscopic & microscopic variables Goal 2 : Setting of a MRF to encode dependencies between degrees of freedom. (dynamics at the macroscopic level) Real time inference : search for a model compatible with the belief propagation algorithm Related project : FUI Pumas (http ://pumas.inria.fr/), concrete implementation in the Rouen agglomeration (1000 probe vehicles) C. Furtlehner IWI INRIA-PARIS June 12 2012 3

An Ising model for traffic? Is it possible to encode traffic data on the basis of a binary latent state s i,t { 1, 1} (Ising) corresponding to congested/non-congested. Input data x i,t R : static sensor : density and speed. floating car data : speed and travel times. 2 questions : Assuming an historical dataset of {ˆx i,t } : which probabilistic model P ({x i,t Ω})? Given actual observations {x i,t, (i, t) Ω } : how to infer {x i,t, (i, t) Ω\Ω }? (b) (a) C. Furtlehner IWI INRIA-PARIS June 12 2012 4

Ising based proposed solution x j,t x i,t i : spatial segment label t : discretized time label x R s { 1 1} s j,t s i,t Single variable Statistical model Λ(x) = P (s = 1 x) Pairwise interaction Statistical model ˆp 11 = P (s = 1, s = 1) Probabilistic pairwise MRF model 4 components of the model : single variable statistical model translating real-valued observations into binary latent states pairwise statistical model of the dependency between latent states MRF model to encode the network of dependencies Belief propagation algorithm to decode a partially observed network {x i,t } C. Furtlehner IWI INRIA-PARIS June 12 2012 5

Building and testing the model Input : a learning set {x k i, i {1... N}, k = 1... M}. Step 1 : define a mapping Λ(x) = P (s = 1 x) for each link. This fixes the set of ˆp i (s i ) Step 2 : build (EM) set of pairwise marginals ˆp ij (s i, s j ) based on the model hypothesis : P (x i, x j, s i, s j ) = ˆp ij (s i, s j )P (x i s i )P (x j s j ). Step 3 : from the set {ˆp i, ˆp ij } find a MRF inverse Ising model Experimental Test : from a test set {y k i, i = 1... N, k = 1... Q}, for each k, reveal one by one the variables at random and infer the other hidden one with BP. plot the prediction error as a function ρ of the fraction of observed variables. C. Furtlehner IWI INRIA-PARIS June 12 2012 6

Inverse Ising Problem Inverse Problem : from observations to an underlying Markov Random field. Given a set of M joint observations {ŝ k i, i {1... N}, k = 1... M}, look for a model P Ising (s) = 1 Z(h, J) exp( i h i s i + i,j J ij s i s j ) Find the set h, J of parameters s.t. the Log Likelihood is maximal L(h, J) = D KL ( ˆP P Ising ) = log(z(h, J)) i h i Ê(s i ) ij J ij Ê(s i s j ). L(h, J) is concave and optimal when the moment matching constraints are satisfied : log Z h i [h, J] = Ê(s i) log Z J ij [h, J] = Ê(s is j ). Problem : Z(h, J) is difficult to evaluate in general (exponential cost). C. Furtlehner IWI INRIA-PARIS June 12 2012 7

Inverse Ising Problem Generic question encountered in biology, image processing... Statistical physics based methods : Boltzmann machines (Hinton, Sejnowski 83) Linear response theory, Plefka expansion (Welling Teh 03) Susceptibility-Propagation (Mézard-Mora 07) Advanced linear response theory (Yasuda,Tanaka 09) Adaptative cluster expansion for neural coupling determination (Cocco et. al 09) Sparse optimization methods in Machine learning : L 1 optimization with belief propagation (Lee,Ganapathi,Koller 2006) L 1 regularization for graph selection (Wainwright,Ravikumar,Lafferty 2008) L 1 penalized pseudo-likelihood (Höfling,Tibshirani 2009) C. Furtlehner IWI INRIA-PARIS June 12 2012 8

Mean-Field methods Input are the magnetization ˆm i = Ê(s i) and susceptibilities ˆχ ij = Cov(s i, s j ). Small interaction expansion : Naive Mean Field : J MF ij = Thouless-Anderson-Palmer (TAP, 77) J T AP ij ˆχ ij (1 ˆm 2 i )(1 ˆm2 j ) [ ˆχ 1 ] ij. J T AP 2[ ˆχ 1 ] ij ij = 1 +, 1 8 ˆm i ˆm j [ ˆχ 1 ] ij Tree-like interactions graph : Bethe approximation P Bethe (s) = (i,j) T ˆp ij (s i, s j ) ˆp i (s i )ˆp j (s j ) i ˆp i (s i ). with ˆp ij (s i, s j ) = 1 4 ( 1 + ˆmi s i + ˆm j s j + ( ˆχ ij + ˆm i ˆm j )s i s j ) C. Furtlehner IWI INRIA-PARIS June 12 2012 9

Mean-Field methods Direct identification leads to J B1 ij J B1 ij = 1 ( 4 log ˆpij (1, 1) ˆp ij ( 1, 1 ). ˆp ij ( 1, 1) ˆp ij (1, 1) Remark : Z Bethe (h, J) explicit function of ˆp i and ˆp ij. Inverse susceptibility (Nguyen, Berg 2012) [ 1 [ ˆχ 1 di ] ij = + 1 m 2 ] k ˆχ ij 1 m 2 i (1 m 2 k i i )(1 m2 k ) δ ij ˆχ2 ik (1 m 2 i )(1 m2 j ) ˆχ2 ij δ j i. leads to J B2 ij J B2 (equivalent to susceptibility propagation) ij = 1 ( 2 atanh 2[ ˆχ 1 ] ij 1 + 4(1 ˆm 2 i )(1 ˆm2 j )[ ˆχ 1 ] 2 ij 2 ˆm i ˆm j [ ˆχ 1 ] ij ). C. Furtlehner IWI INRIA-PARIS June 12 2012 10

Two-parameters Bethe Model Additional constraint : compatibility with belief propagation In practice we consider 2 calibration parameters : a global rescaling factor α of the interactions ( inverse temperature ) : J ij αj ij for all (i, j) a mean connectivity K of the graph various pruning procedures Bethe model 1 P α Bethe (s) = ( ˆpij (s i, s j ) ) α ˆp i (s i ). ˆp i (s i )ˆp j (s j ) (i,j) G Various pruning procedure : e.g. Max spanning tree + thresholding on other links. i C. Furtlehner IWI INRIA-PARIS June 12 2012 11

BP fixed points corresponds to congestion patterns Real-valued synthetic multimodal data : combining the approximate MRF with the mapping Λ. L_1 tt Error 50 40 30 N=200 C=5 a=0.1 b=0.4 dd=0.1 BP (alpha = 0.08 K=90) median tt prediction mean tt prediction KNN predictor 20 0 0,2 0,4 0,6 0,8 1 ρ C. Furtlehner IWI INRIA-PARIS June 12 2012 12

Simulation data : Sioux Falls Network Data generated by the METROPOLIS traffic simulator written by Fabrice Marchal (formerly at LET, CNRS-Lyon) and André de Palma (ENS Cachan) C. Furtlehner IWI INRIA-PARIS June 12 2012 13

Experiments I Variable are revealed one by one in a random order. L_1 TT error 90 80 70 60 50 Median TT predictor Mean TT predictor BP1 alpha = 0.024 K=28 BP2 alpha=0.098 K=8 Mean Field alpha=0.044 K=20 Hopfield alpha=0.042 K=18 KNN Sioux Falls N=60 40 30 0 0,2 0,4 0,6 0,8 1 ρ (ρ = fraction of revealed variables) C. Furtlehner IWI INRIA-PARIS June 12 2012 14

Variable are revealed one by one in a random order. Experiments II 150 IAURIF N=1000 100 L_1 Error 50 BP tt predictor ENCOD 3 BP tt predictor ENCOD 7 BP tt predictor ENCOD 9 Median tt predictor Mean tt predictor K-NN tt predictor 0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 ρ (ρ = fraction of revealed variables) C. Furtlehner IWI INRIA-PARIS June 12 2012 15

Traffic index for real data 0,01 For a given travel time tt the index is defined by : x i = f(tt) def = P (tt i < tt), using a weighted cumulative based on the segmentation Probability Density 0,001 0,0001 1e-05 1e-06 1e-07 0 500 1000 1500 2000 Travel Time 1 0,8 travel time distribution Traffic index 0,6 0,4 0,2 0 0 1000 2000 3000 4000 5000 Travel time (s) Various index maps C. Furtlehner IWI INRIA-PARIS June 12 2012 16

Highway data RMSE on unrevealed variables 300 280 260 240 220 62 segments, t = 3 200 minutes, strong correlations Extraction of a spanning 180 tree with maximal weight 160 the weight is the mutual 140 information 3 temporal layers 120 Decimation 0 0.1 0.2 0.3 0.4 0.5 % of revealed variables BP-prediction t-mean t0-prediction mean C. Furtlehner IWI INRIA-PARIS June 12 2012 17

Conclusion Ising model can be used as a building block Macroscopic variables have to be incorporated in the model Real data needed to check various hypothesis C. Furtlehner IWI INRIA-PARIS June 12 2012 18