Belief Propagation for Traffic forecasting

Belief Propagation for Traffic forecasting Cyril Furtlehner (INRIA Saclay - Tao team) context : Travesti project http ://travesti.gforge.inria.fr/) Anne Auger (INRIA Saclay) Dimo Brockhoff (INRIA Lille) Yufei Han (CAOR Armines and INRIA Rocquencourt) Jean-Marc Lasgouttes (INRIA Rocquencourt) Fabrice Marchal (formerly at CNRS LET) Victorin Martin (INRIA) Fabien Moutarde (CAOR Armines) Maxim Samsonov (formerly at INRIA Saclay) C. Furtlehner IWI INRIA-PARIS June 12 2012 1

Problem at hand Goal : reconstruct and predict road traffic on secondary network Existing solutions based on data coming from static sensors (magnetic loops) on main roads. Too expensive to be installed in all streets. Solution : more and more vehicles are equipped with GPS and able to exchanging data (through cellular phone connections) : Floating Car Data (FCD). = we want to build an inference schema adapted to these FCD C. Furtlehner IWI INRIA-PARIS June 12 2012 2

Project guidelines (Traffic Volume Estimation by Spatio-Temporal Inference) (http ://travesti.gforge.inria.fr/) Goal 1 : is to find a macroscopic description of a large scale traffic network machine learning approach, clustering & dimensional reduction At microscopic level we consider travel time along each segment (FCD observations) Hypothesis : large scale behaviour is dominated by various dynamical congestion patterns. Multi-modal joint measure of travel time mixing macroscopic & microscopic variables Goal 2 : Setting of a MRF to encode dependencies between degrees of freedom. (dynamics at the macroscopic level) Real time inference : search for a model compatible with the belief propagation algorithm Related project : FUI Pumas (http ://pumas.inria.fr/), concrete implementation in the Rouen agglomeration (1000 probe vehicles) C. Furtlehner IWI INRIA-PARIS June 12 2012 3

An Ising model for traffic? Is it possible to encode traffic data on the basis of a binary latent state s i,t { 1, 1} (Ising) corresponding to congested/non-congested. Input data x i,t R : static sensor : density and speed. floating car data : speed and travel times. 2 questions : Assuming an historical dataset of {ˆx i,t } : which probabilistic model P ({x i,t Ω})? Given actual observations {x i,t, (i, t) Ω } : how to infer {x i,t, (i, t) Ω\Ω }? (b) (a) C. Furtlehner IWI INRIA-PARIS June 12 2012 4

Ising based proposed solution x j,t x i,t i : spatial segment label t : discretized time label x R s { 1 1} s j,t s i,t Single variable Statistical model Λ(x) = P (s = 1 x) Pairwise interaction Statistical model ˆp 11 = P (s = 1, s = 1) Probabilistic pairwise MRF model 4 components of the model : single variable statistical model translating real-valued observations into binary latent states pairwise statistical model of the dependency between latent states MRF model to encode the network of dependencies Belief propagation algorithm to decode a partially observed network {x i,t } C. Furtlehner IWI INRIA-PARIS June 12 2012 5

Building and testing the model Input : a learning set {x k i, i {1... N}, k = 1... M}. Step 1 : define a mapping Λ(x) = P (s = 1 x) for each link. This fixes the set of ˆp i (s i ) Step 2 : build (EM) set of pairwise marginals ˆp ij (s i, s j ) based on the model hypothesis : P (x i, x j, s i, s j ) = ˆp ij (s i, s j )P (x i s i )P (x j s j ). Step 3 : from the set {ˆp i, ˆp ij } find a MRF inverse Ising model Experimental Test : from a test set {y k i, i = 1... N, k = 1... Q}, for each k, reveal one by one the variables at random and infer the other hidden one with BP. plot the prediction error as a function ρ of the fraction of observed variables. C. Furtlehner IWI INRIA-PARIS June 12 2012 6

Inverse Ising Problem Inverse Problem : from observations to an underlying Markov Random field. Given a set of M joint observations {ŝ k i, i {1... N}, k = 1... M}, look for a model P Ising (s) = 1 Z(h, J) exp( i h i s i + i,j J ij s i s j ) Find the set h, J of parameters s.t. the Log Likelihood is maximal L(h, J) = D KL ( ˆP P Ising ) = log(z(h, J)) i h i Ê(s i ) ij J ij Ê(s i s j ). L(h, J) is concave and optimal when the moment matching constraints are satisfied : log Z h i [h, J] = Ê(s i) log Z J ij [h, J] = Ê(s is j ). Problem : Z(h, J) is difficult to evaluate in general (exponential cost). C. Furtlehner IWI INRIA-PARIS June 12 2012 7

Inverse Ising Problem Generic question encountered in biology, image processing... Statistical physics based methods : Boltzmann machines (Hinton, Sejnowski 83) Linear response theory, Plefka expansion (Welling Teh 03) Susceptibility-Propagation (Mézard-Mora 07) Advanced linear response theory (Yasuda,Tanaka 09) Adaptative cluster expansion for neural coupling determination (Cocco et. al 09) Sparse optimization methods in Machine learning : L 1 optimization with belief propagation (Lee,Ganapathi,Koller 2006) L 1 regularization for graph selection (Wainwright,Ravikumar,Lafferty 2008) L 1 penalized pseudo-likelihood (Höfling,Tibshirani 2009) C. Furtlehner IWI INRIA-PARIS June 12 2012 8

Mean-Field methods Input are the magnetization ˆm i = Ê(s i) and susceptibilities ˆχ ij = Cov(s i, s j ). Small interaction expansion : Naive Mean Field : J MF ij = Thouless-Anderson-Palmer (TAP, 77) J T AP ij ˆχ ij (1 ˆm 2 i )(1 ˆm2 j ) [ ˆχ 1 ] ij. J T AP 2[ ˆχ 1 ] ij ij = 1 +, 1 8 ˆm i ˆm j [ ˆχ 1 ] ij Tree-like interactions graph : Bethe approximation P Bethe (s) = (i,j) T ˆp ij (s i, s j ) ˆp i (s i )ˆp j (s j ) i ˆp i (s i ). with ˆp ij (s i, s j ) = 1 4 ( 1 + ˆmi s i + ˆm j s j + ( ˆχ ij + ˆm i ˆm j )s i s j ) C. Furtlehner IWI INRIA-PARIS June 12 2012 9

Mean-Field methods Direct identification leads to J B1 ij J B1 ij = 1 ( 4 log ˆpij (1, 1) ˆp ij ( 1, 1 ). ˆp ij ( 1, 1) ˆp ij (1, 1) Remark : Z Bethe (h, J) explicit function of ˆp i and ˆp ij. Inverse susceptibility (Nguyen, Berg 2012) [ 1 [ ˆχ 1 di ] ij = + 1 m 2 ] k ˆχ ij 1 m 2 i (1 m 2 k i i )(1 m2 k ) δ ij ˆχ2 ik (1 m 2 i )(1 m2 j ) ˆχ2 ij δ j i. leads to J B2 ij J B2 (equivalent to susceptibility propagation) ij = 1 ( 2 atanh 2[ ˆχ 1 ] ij 1 + 4(1 ˆm 2 i )(1 ˆm2 j )[ ˆχ 1 ] 2 ij 2 ˆm i ˆm j [ ˆχ 1 ] ij ). C. Furtlehner IWI INRIA-PARIS June 12 2012 10

Two-parameters Bethe Model Additional constraint : compatibility with belief propagation In practice we consider 2 calibration parameters : a global rescaling factor α of the interactions ( inverse temperature ) : J ij αj ij for all (i, j) a mean connectivity K of the graph various pruning procedures Bethe model 1 P α Bethe (s) = ( ˆpij (s i, s j ) ) α ˆp i (s i ). ˆp i (s i )ˆp j (s j ) (i,j) G Various pruning procedure : e.g. Max spanning tree + thresholding on other links. i C. Furtlehner IWI INRIA-PARIS June 12 2012 11

BP fixed points corresponds to congestion patterns Real-valued synthetic multimodal data : combining the approximate MRF with the mapping Λ. L_1 tt Error 50 40 30 N=200 C=5 a=0.1 b=0.4 dd=0.1 BP (alpha = 0.08 K=90) median tt prediction mean tt prediction KNN predictor 20 0 0,2 0,4 0,6 0,8 1 ρ C. Furtlehner IWI INRIA-PARIS June 12 2012 12

Simulation data : Sioux Falls Network Data generated by the METROPOLIS traffic simulator written by Fabrice Marchal (formerly at LET, CNRS-Lyon) and André de Palma (ENS Cachan) C. Furtlehner IWI INRIA-PARIS June 12 2012 13

Experiments I Variable are revealed one by one in a random order. L_1 TT error 90 80 70 60 50 Median TT predictor Mean TT predictor BP1 alpha = 0.024 K=28 BP2 alpha=0.098 K=8 Mean Field alpha=0.044 K=20 Hopfield alpha=0.042 K=18 KNN Sioux Falls N=60 40 30 0 0,2 0,4 0,6 0,8 1 ρ (ρ = fraction of revealed variables) C. Furtlehner IWI INRIA-PARIS June 12 2012 14

Variable are revealed one by one in a random order. Experiments II 150 IAURIF N=1000 100 L_1 Error 50 BP tt predictor ENCOD 3 BP tt predictor ENCOD 7 BP tt predictor ENCOD 9 Median tt predictor Mean tt predictor K-NN tt predictor 0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 ρ (ρ = fraction of revealed variables) C. Furtlehner IWI INRIA-PARIS June 12 2012 15

Traffic index for real data 0,01 For a given travel time tt the index is defined by : x i = f(tt) def = P (tt i < tt), using a weighted cumulative based on the segmentation Probability Density 0,001 0,0001 1e-05 1e-06 1e-07 0 500 1000 1500 2000 Travel Time 1 0,8 travel time distribution Traffic index 0,6 0,4 0,2 0 0 1000 2000 3000 4000 5000 Travel time (s) Various index maps C. Furtlehner IWI INRIA-PARIS June 12 2012 16

Highway data RMSE on unrevealed variables 300 280 260 240 220 62 segments, t = 3 200 minutes, strong correlations Extraction of a spanning 180 tree with maximal weight 160 the weight is the mutual 140 information 3 temporal layers 120 Decimation 0 0.1 0.2 0.3 0.4 0.5 % of revealed variables BP-prediction t-mean t0-prediction mean C. Furtlehner IWI INRIA-PARIS June 12 2012 17

Conclusion Ising model can be used as a building block Macroscopic variables have to be incorporated in the model Real data needed to check various hypothesis C. Furtlehner IWI INRIA-PARIS June 12 2012 18