Efficient MCMC Samplers for Network Tomography

Size: px

Start display at page:

Download "Efficient MCMC Samplers for Network Tomography"

Iris Merritt
5 years ago
Views:

1 Efficient MCMC Samplers for Network Tomography Martin Hazelton 1 Institute of Fundamental Sciences Massey University 7 December m.hazelton@massey.ac.nz AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

2 Networks AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

3 Networks AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

4 Networks AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

5 Networks AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

6 Networks AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

7 Networks AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

8 Networks AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

9 Networks and Statistics Network data generates a host of interesting statistical problems. In some cases interest centres on the structure of an abstract network. Biological networks. Social networks. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

10 A Social Network AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

11 Network Tomography This talk focuses on inference for flows on physical networks. Road networks Electronic communication networks Biological networks Target for inference often higher dimensional than observed data. Led Vardi (1996) to coin the phrase network tomography for such problems. Vardi, Y. (1996). Network tomography: estimating source-destination traffic intensities from link data. Journal of the American Statistical Association 91, AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

12 Modelling Framework Traffic Network System abstracted to network G = (N, A). AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

13 Modelling Framework Traffic Network System abstracted to network G = (N, A). N is set of nodes. Nodes represent: Origins and/or destinations of travel; Intersections. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

14 Modelling Framework Traffic Network System abstracted to network G = (N, A). N is set of nodes. Nodes represent: Origins and/or destinations of travel; Intersections. A is set of directed links. Links also referred to as arcs. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

15 Modelling Framework Routes Travel is possible only between pre-specified origin-destination (OD) node pairs. E.g. node 1 to node 6 might be only OD pair. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

16 Modelling Framework Routes Travel is possible only between pre-specified origin-destination (OD) node pairs. E.g. node 1 to node 6 might be only OD pair. Typically travel can be by a variety of routes (paths). AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

17 Modelling Framework Routes Travel is possible only between pre-specified origin-destination (OD) node pairs. E.g. node 1 to node 6 might be only OD pair. Typically travel can be by a variety of routes (paths). AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

18 Modelling Framework Routes Travel is possible only between pre-specified origin-destination (OD) node pairs. E.g. node 1 to node 6 might be only OD pair. Typically travel can be by a variety of routes (paths). Sometimes no route choice. E.g. if (4, 3) is another OD pair. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

19 Traffic Models Traffic models typically specified in terms of route flows (volumes). Let x denote vector of route flows for some observational period. Model parameterized by θ. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

20 Traffic Models Traffic models typically specified in terms of route flows (volumes). Let x denote vector of route flows for some observational period. Model parameterized by θ. Canonical Example x Pois(θ) (interpret elementwise, with independence) OD flows z defined by z i = j i x j j i if route j serves OD pair i. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

21 Traffic Models Traffic models typically specified in terms of route flows (volumes). Let x denote vector of route flows for some observational period. Model parameterized by θ. Canonical Example x Pois(θ) (interpret elementwise, with independence) OD flows z defined by z i = j i x j j i if route j serves OD pair i. More Sophisticated Example z NegBin (allows for over-dispersion) Route choice probabilities {p j } defined by random utility models. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

22 Data for Model Fitting and Assessment Traffic model defined by probability function f X (x θ) for route flows. Want to conduct inference for θ. Route flows, x Link counts, y Nature Direct observation on x Indirect observation on x Example sources Vehicle tracking Numberplate matching Travel surveys Vehicle counters on road Counts at router switch Computer packet tracking Cost High Low AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

23 Data for Model Fitting and Assessment Traffic model defined by probability function f X (x θ) for route flows. Want to conduct inference for θ. Route flows, x Link counts, y Nature Direct observation on x Indirect observation on x Example sources Vehicle tracking Numberplate matching Travel surveys Vehicle counters on road Counts at router switch Computer packet tracking Cost High Low Focus on inference from link counts. Relevant unless routing data complete; e.g. Parry & H. (2012). Parry, K. and Hazelton, M.L. (2012). Estimation of origin-destination matrices from link counts and sporadic routing data. Transp. Res. Part B 46, AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

24 Toy Example y 1 = 10 y 2 = 10 Assume three OD pairs: (1, 2), (1, 3), (2, 3). Fixed routing, so we have r = 3 routes. For definiteness, suppose x = (x 1, x 2, x 3 ) T Pois(θ). Vector of n = 2 observed link counts is y = (y 1, y 2 ) T. What does this tell us about latent route flows x and hence about parameter vector θ? AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

25 Links Counts and Route Flows Fundamental Relationship Link count vector y and route flow vector x related by y = Ax where A = (a ij ) is the link-path incidence matrix defined by { 1 if link i forms part of route j a ij = 0 otherwise. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

26 Links Counts and Route Flows Fundamental Relationship Link count vector y and route flow vector x related by y = Ax where A = (a ij ) is the link-path incidence matrix defined by { 1 if link i forms part of route j a ij = 0 otherwise. Just a counting exercise: y i = r j=1 a ijx j. Usually massively under-determined linear system. Link counts provide only indirect information about x and θ. Inference is a form of linear inverse problem. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

27 Back to the Toy Example y 1 = 10 y 2 = 10 y = [ y1 y 2 ] = [ ] x 1 x 2 x 3 = Ax AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

28 Feasible Route Flow Set Feasible Route Flow Set For a given link count vector y, the set of feasible route flows is X y = {x : y = Ax, x 0} where inequality is interpreted component-wise. Set X y is of fundamental importance for inference. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

29 Back to the Toy Example y 1 = 10 y 2 = 10 y = [ y1 y 2 ] = [ ] x 1 x 2 x 3 = Ax X y = = {(x 1, x 2, x 3 ) T Z 3 0 : x 1 + x 2 = y 1, x 2 + x 3 = y 2 } { (0, 10, 0) T, (1, 9, 1) T,..., (10, 0, 10) T}. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

30 Likelihood Functions for Traffic Models Probability function for observed link count data y denoted f Y (y θ). Model likelihood given by L(θ) = f Y (y θ) = f Y X (y x, θ)f X (x θ) x = f X (x θ). x X y Explanation: f Y X (y x, θ) is indicator of event {y = Ax, x 0}. I.e. that route flow is compatible with observed link counts. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

31 Likelihood Functions for Traffic Models Probability function for observed link count data y denoted f Y (y θ). Model likelihood given by L(θ) = f Y (y θ) = f Y X (y x, θ)f X (x θ) x = f X (x θ). x X y Explanation: f Y X (y x, θ) is indicator of event {y = Ax, x 0}. I.e. that route flow is compatible with observed link counts. For canonical Poisson example: r If x Pois(θ) then L(θ) = x X y i=1 Awkward mathematical form. e θ i θ x i i x i!. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

32 Model Identifiability Single link count vector y provides limited information about x and hence θ. What about an iid sample, y (1),..., y (n)? Model identifiable from link count data if f Y ( θ) = f Y ( θ) θ = θ. Nice study for normal models by Singhal & Michailidis (2007). Extends to correlated data cases but places restrictions on network topology and model parameterization. Singhal, H. and Michalidis, G. (2007). Identifiability of flow distributions from link measurements in computer networks. Inverse Problems 23, AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

33 Identifiability Theory for the Integer-Valued Traffic Case Vardi (1996) proved identifiability for canonical Poisson traffic model with single route per OD pair. New result generalizes this: Theorem (H. 2015) Assume that the route flows are independent, and that f X and f Y have support equal to the non-negative integers. Then if θ is identifiable from independent observations on x, it is also identifiable from independent observations on y. Hazelton, M.L. (2015). Network tomography for integer-valued traffic. Annals of Applied Statistics, 9(1), Vardi, Y. (1996). Network tomography: estimating source-destination traffic intensities from link data. Journal of the American Statistical Association 91, AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

34 Learning from the Dependence Structure Identifiability can be explained in part by information from link count dependence structure. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

35 Learning from the Dependence Structure Identifiability can be explained in part by information from link count dependence structure. Scenario 1 Scenario 2 Observation 1 Observation y 1 = 11 y 2 = 11 Observation y 1 = 6 y 2 = 6 Observation y 1 = 7 y 2 = y 1 = 8 y 2 = 6 Observation y 1 = 9 y 2 = 12 Observation y 1 = 10 y 2 = 11 AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

36 Learning from the Dependence Structure Identifiability can be explained in part by information from link count dependence structure. Scenario 1 Scenario 2 Observation 1 Observation y 1 = 11 y 2 = 11 Observation y 1 = 6 y 2 = 6 Observation y 1 = 7 y 2 = 7 θ = (0, 10, 0) T y 1 = 8 y 2 = 6 Observation y 1 = 9 y 2 = 12 Observation y 1 = 10 y 2 = 11 θ = (10, 0, 10) T AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

37 Identifiability: Open Research Questions 1 For integer-valued flows, results require independence of route flows, and independence between days. Can either of these independence assumptions be relaxed? If so, in what ways? What are minimal requirements? AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

38 Identifiability: Open Research Questions 1 For integer-valued flows, results require independence of route flows, and independence between days. Can either of these independence assumptions be relaxed? If so, in what ways? What are minimal requirements? 2 Proof is constructive in nature, and relies on observation of very low probability events. What are practical implication of theoretical identifiability from link count data? AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

39 Likelihood-Based Inference Observe n link count vectors: y (1),..., y (n). Likelihood is L(θ) = n t=1 f Y(y (t) θ). Recall that for each single link count vector, f Y (y θ) = x X y f X (x θ). Usually computationally infeasible to enumerate X y = {x: y = Ax, x 0}. Hence not feasible to explicitly evaluate f Y (y θ) and L(θ). AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

40 Likelihood-Based Inference Observe n link count vectors: y (1),..., y (n). Likelihood is L(θ) = n t=1 f Y(y (t) θ). Recall that for each single link count vector, f Y (y θ) = x X y f X (x θ). Usually computationally infeasible to enumerate X y = {x: y = Ax, x 0}. Hence not feasible to explicitly evaluate f Y (y θ) and L(θ). Passing comment: Many ad hoc methods of estimation seek single optimal element of X y. That is demonstrably sub-optimal. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

41 Sampling Based Inference Crux of the idea: replace enumeration of X y = {x : y = Ax, x 0} by a representative sample therefrom. Practical implementations: stochastic EM algorithm for likelihood inference, MCMC algorithms for Bayesian inference. Li (2005) discusses EM and stochastic EM algorithms for trip matrix estimation. Tebaldi & West (1998) is seminal reference for MCMC Bayesian network tomography. All previous methods have struggled because of difficulty in sampling from X y. Li, B. (2005). Bayesian inference for origin-destination matrices of transport networks using the EM algorithm. Technometrics 47(4), Tebaldi, C. & West, M. (1998). Bayesian inference on network traffic using link count data (with discussion). Journal of the Amer. Statist. Assoc. 93, AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

42 Stochastic EM Algorithm Iterative algorithm. Suppose θ current value of θ. Step 1 Compute ˆQ(θ θ ) = M 1 M i=1 N t=1 log{f X (x (t) i θ)} where x (t) 1,..., x (t) M is a random sample from f X Y( y (t), θ ). Step 2 Maximize ˆQ(θ θ ) to give new value of θ. Algorithm converges to maximum likelihood estimate ˆθ. Standard errors available from (approximate) information matrix. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

43 Bayesian Inference using MCMC Metropolis Hastings Algorithm Prior: π(θ). Posterior : p(θ y (1),..., y (n) ) L(θ)π(θ) MCMC Algorithm (for n = 1 case) 1 Generate initial values for θ, x. 2 Sample candidate x from proposal distribution q. Want q to have support X y. 3 Accept x with probability min(1, α) where α = f X Y(x y, θ)q(x) f X Y (x y, θ)q(x ) = f X,Y(x, y θ)q(x) f X,Y (x, y θ)q(x ) = f X(x θ)q(x) f X (x θ)q(x ). 4 Sample θ from p(θ x) (usually straightforward). 5 Go to step 2. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

44 Sampling Feasible Route Flow Vectors Trick is to sample from proposal distribution q with support matching X y = {x : y = Ax, x 0}. Can be an iterative process, updating one element of x at at time. Development of efficient algorithm for such sampling is 20 year old problem. Recent geometrical insight from Airoldi & Haas (2011), Airoldi & Blocker (2013) for continuous flow models but still no reliable solution for integer valued traffic. Airoldi, E.M. & Blocker, A.W. (2013). Estimating latent processes on a network from indirect measurements. Journal Amer. Statist. Assoc. 108, Airoldi, E.M. & Hass, B. (2011). Polytope samplers for inference in ill-posed inverse problems. In Int. Conf. Artificial Intelligence and Statistics, Vol. 15. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

45 Geometry of the Feasible Route Flow Set X y is intersection of linear manifold {x : y = Ax} with non-negative orthant {x 0}. Hence X y = {x : y = Ax, x 0} is a convex polytope. If r routes and n links then X y is an r n dimensional object embedded in r-dimensional space. Have flexibility in representation. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

46 Geometry of the Feasible Route Flow Set Example Nodes 1, 2 are origins of flow, nodes 3, 4 and 5 destinations. Hence six routes, indexed (o, d) for o = 1, 2, d = 3, 4, 5. Link-route incidence matrix is A = [A 1 A 2 ] = Partitioned so that A 1 is invertible, and hence polytope represented by flows on last two routes, (2, 4) and (2, 5). AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

47 Geometry of the Feasible Route Flow Set Example continued Suppose link counts are y = (20, 30, 20, 10) T (single day). Polytope are represented by routes 5 (2, 4) and 6 (2, 5). x 5 x AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

48 Sampling Geometry of the Feasible Route Flow Set Example continued Sampling (e.g. uniformly) from polytope not straightforward, because no convenient expression for support. Sampling iteratively in coordinate directions much simpler. x 5 x AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

49 Sampling Geometry of the Feasible Route Flow Set Example continued Sampling (e.g. uniformly) from polytope not straightforward, because no convenient expression for support. Sampling iteratively in coordinate directions much simpler. x 4 x AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

50 Sampling for Awkward Geometries Revised example Suppose y = (10, 20, 20, 10) T (left) or y = (10, 20, 19, 9) T (right). Sampling in coordinate directions impossible (left) or very slow to converge (right). x x x 5 x 5 AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

51 Change of Representation Tebaldi & West (1998) proposed coordinate direction iterative sampling but failed to appreciate that it won t necessarily work for awkward geometries. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

52 Change of Representation Tebaldi & West (1998) proposed coordinate direction iterative sampling but failed to appreciate that it won t necessarily work for awkward geometries. But recall there is flexibility in choice of representation of X y. H. (2015) proved recently that there is always a convenient representation of X y for coordinate direction sampling. Caveat: so long as the link-path matrix A is unimodular. Hazelton, M.L. (2015). Network tomography for integer-valued traffic. Annals of Applied Statistics, 9(1), Tebaldi, C. & West, M. (1998). Bayesian inference on network traffic using link count data (with discussion). Journal of the Amer. Statist. Assoc. 93, AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

53 Change of Representation Back to the Example As before, y = (10, 20, 20, 10) T (left) or y = (10, 20, 19, 9) T (right). Represent polytope by flows on routes 4 (2, 3) and 6 (2, 5). Sampling in coordinate directions then efficient. x x x 5 x 5 AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

54 Adaptive MCMC Samplers We can construct a good route flow sampler by allowing adaptive representation of X y. Strategy is to exclude routes carrying heavy traffic from basis for polytope. Intuitively this maximizes the slack and hence permits longest possible moves. Implement by monitoring estimated route flows and updating basis periodically. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

55 Example: Road Intersection in Leicester, UK Leicester AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

56 Example: Road Intersection in Leicester, UK Network AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

57 Example: Road Intersection in Leicester, UK Network OD pairs: (o, d) {1, 2, 3, 5, 6} {1, 2, 3, 5, 6}. Traffic counted on all links except 5. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

58 Example: Road Intersection in Leicester, UK Model Canonical Poisson model: x Pois(θ). Fixed routing, so route flows equal OD flows. Single link vector observed over 15 mins on weekday in May. y = (72, 56, 217, 120, 119, 127, 178, 117, 181) T Prior information available. Critical otherwise useful inference impossible. Specified through gamma priors on elements of θ. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

59 Example: Road Intersection in Leicester, UK Model Canonical Poisson model: x Pois(θ). Fixed routing, so route flows equal OD flows. Single link vector observed over 15 mins on weekday in May. y = (72, 56, 217, 120, 119, 127, 178, 117, 181) T Prior information available. Critical otherwise useful inference impossible. Specified through gamma priors on elements of θ. Employ MCMC algorithm to sample x and θ. Switch between sampling x θ, y and θ x. Adaptive choice of basis for polytope X y. Updates at 10,000 and 20,000 iterations. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

60 Example: Road Intersection in Leicester, UK Trace Plots for x AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

61 Example: Road Intersection in Leicester, UK Results Route O-D Prior mean Posterior mean 95% CI (27.6, 52.7) (12.0, 30.3) (42.5, 76.3) (16.3, 38.4) (12.8, 33.8) (25.9, 53.8) (43.5, 80.2) (1.6, 11.0) (11.5, 31.8) (21.9, 46.3) (26.1, 48.9) (5.2, 18.0) (37.3, 67.0) (5.8, 27.1) (16.1, 39.8) (0.3, 16.0) (48.7, 85.1) (1.5, 9.5) (3.4, 14.3) (27.8, 51.6) AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

62 Closing Comments: More Open Research Questions Preceding theory and methods work if A is totally unimodular. A matrix is totally unimodular if all square sub-matrices have integer-valued inverses. Most realistic link-route incidence matrices seem to have this property, but can construct artificial counter-examples. Why do real matrices tend to have totally unimodular link-route incidence matrices? Are we likely to find counter-examples in practice? AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

63 For a Copy of These Slides... AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),