Efficient MCMC Samplers for Network Tomography
|
|
- Iris Merritt
- 5 years ago
- Views:
Transcription
1 Efficient MCMC Samplers for Network Tomography Martin Hazelton 1 Institute of Fundamental Sciences Massey University 7 December m.hazelton@massey.ac.nz AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
2 Networks AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
3 Networks AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
4 Networks AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
5 Networks AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
6 Networks AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
7 Networks AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
8 Networks AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
9 Networks and Statistics Network data generates a host of interesting statistical problems. In some cases interest centres on the structure of an abstract network. Biological networks. Social networks. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
10 A Social Network AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
11 Network Tomography This talk focuses on inference for flows on physical networks. Road networks Electronic communication networks Biological networks Target for inference often higher dimensional than observed data. Led Vardi (1996) to coin the phrase network tomography for such problems. Vardi, Y. (1996). Network tomography: estimating source-destination traffic intensities from link data. Journal of the American Statistical Association 91, AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
12 Modelling Framework Traffic Network System abstracted to network G = (N, A). AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
13 Modelling Framework Traffic Network System abstracted to network G = (N, A). N is set of nodes. Nodes represent: Origins and/or destinations of travel; Intersections. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
14 Modelling Framework Traffic Network System abstracted to network G = (N, A). N is set of nodes. Nodes represent: Origins and/or destinations of travel; Intersections. A is set of directed links. Links also referred to as arcs. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
15 Modelling Framework Routes Travel is possible only between pre-specified origin-destination (OD) node pairs. E.g. node 1 to node 6 might be only OD pair. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
16 Modelling Framework Routes Travel is possible only between pre-specified origin-destination (OD) node pairs. E.g. node 1 to node 6 might be only OD pair. Typically travel can be by a variety of routes (paths). AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
17 Modelling Framework Routes Travel is possible only between pre-specified origin-destination (OD) node pairs. E.g. node 1 to node 6 might be only OD pair. Typically travel can be by a variety of routes (paths). AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
18 Modelling Framework Routes Travel is possible only between pre-specified origin-destination (OD) node pairs. E.g. node 1 to node 6 might be only OD pair. Typically travel can be by a variety of routes (paths). Sometimes no route choice. E.g. if (4, 3) is another OD pair. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
19 Traffic Models Traffic models typically specified in terms of route flows (volumes). Let x denote vector of route flows for some observational period. Model parameterized by θ. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
20 Traffic Models Traffic models typically specified in terms of route flows (volumes). Let x denote vector of route flows for some observational period. Model parameterized by θ. Canonical Example x Pois(θ) (interpret elementwise, with independence) OD flows z defined by z i = j i x j j i if route j serves OD pair i. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
21 Traffic Models Traffic models typically specified in terms of route flows (volumes). Let x denote vector of route flows for some observational period. Model parameterized by θ. Canonical Example x Pois(θ) (interpret elementwise, with independence) OD flows z defined by z i = j i x j j i if route j serves OD pair i. More Sophisticated Example z NegBin (allows for over-dispersion) Route choice probabilities {p j } defined by random utility models. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
22 Data for Model Fitting and Assessment Traffic model defined by probability function f X (x θ) for route flows. Want to conduct inference for θ. Route flows, x Link counts, y Nature Direct observation on x Indirect observation on x Example sources Vehicle tracking Numberplate matching Travel surveys Vehicle counters on road Counts at router switch Computer packet tracking Cost High Low AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
23 Data for Model Fitting and Assessment Traffic model defined by probability function f X (x θ) for route flows. Want to conduct inference for θ. Route flows, x Link counts, y Nature Direct observation on x Indirect observation on x Example sources Vehicle tracking Numberplate matching Travel surveys Vehicle counters on road Counts at router switch Computer packet tracking Cost High Low Focus on inference from link counts. Relevant unless routing data complete; e.g. Parry & H. (2012). Parry, K. and Hazelton, M.L. (2012). Estimation of origin-destination matrices from link counts and sporadic routing data. Transp. Res. Part B 46, AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
24 Toy Example y 1 = 10 y 2 = 10 Assume three OD pairs: (1, 2), (1, 3), (2, 3). Fixed routing, so we have r = 3 routes. For definiteness, suppose x = (x 1, x 2, x 3 ) T Pois(θ). Vector of n = 2 observed link counts is y = (y 1, y 2 ) T. What does this tell us about latent route flows x and hence about parameter vector θ? AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
25 Links Counts and Route Flows Fundamental Relationship Link count vector y and route flow vector x related by y = Ax where A = (a ij ) is the link-path incidence matrix defined by { 1 if link i forms part of route j a ij = 0 otherwise. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
26 Links Counts and Route Flows Fundamental Relationship Link count vector y and route flow vector x related by y = Ax where A = (a ij ) is the link-path incidence matrix defined by { 1 if link i forms part of route j a ij = 0 otherwise. Just a counting exercise: y i = r j=1 a ijx j. Usually massively under-determined linear system. Link counts provide only indirect information about x and θ. Inference is a form of linear inverse problem. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
27 Back to the Toy Example y 1 = 10 y 2 = 10 y = [ y1 y 2 ] = [ ] x 1 x 2 x 3 = Ax AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
28 Feasible Route Flow Set Feasible Route Flow Set For a given link count vector y, the set of feasible route flows is X y = {x : y = Ax, x 0} where inequality is interpreted component-wise. Set X y is of fundamental importance for inference. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
29 Back to the Toy Example y 1 = 10 y 2 = 10 y = [ y1 y 2 ] = [ ] x 1 x 2 x 3 = Ax X y = = {(x 1, x 2, x 3 ) T Z 3 0 : x 1 + x 2 = y 1, x 2 + x 3 = y 2 } { (0, 10, 0) T, (1, 9, 1) T,..., (10, 0, 10) T}. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
30 Likelihood Functions for Traffic Models Probability function for observed link count data y denoted f Y (y θ). Model likelihood given by L(θ) = f Y (y θ) = f Y X (y x, θ)f X (x θ) x = f X (x θ). x X y Explanation: f Y X (y x, θ) is indicator of event {y = Ax, x 0}. I.e. that route flow is compatible with observed link counts. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
31 Likelihood Functions for Traffic Models Probability function for observed link count data y denoted f Y (y θ). Model likelihood given by L(θ) = f Y (y θ) = f Y X (y x, θ)f X (x θ) x = f X (x θ). x X y Explanation: f Y X (y x, θ) is indicator of event {y = Ax, x 0}. I.e. that route flow is compatible with observed link counts. For canonical Poisson example: r If x Pois(θ) then L(θ) = x X y i=1 Awkward mathematical form. e θ i θ x i i x i!. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
32 Model Identifiability Single link count vector y provides limited information about x and hence θ. What about an iid sample, y (1),..., y (n)? Model identifiable from link count data if f Y ( θ) = f Y ( θ) θ = θ. Nice study for normal models by Singhal & Michailidis (2007). Extends to correlated data cases but places restrictions on network topology and model parameterization. Singhal, H. and Michalidis, G. (2007). Identifiability of flow distributions from link measurements in computer networks. Inverse Problems 23, AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
33 Identifiability Theory for the Integer-Valued Traffic Case Vardi (1996) proved identifiability for canonical Poisson traffic model with single route per OD pair. New result generalizes this: Theorem (H. 2015) Assume that the route flows are independent, and that f X and f Y have support equal to the non-negative integers. Then if θ is identifiable from independent observations on x, it is also identifiable from independent observations on y. Hazelton, M.L. (2015). Network tomography for integer-valued traffic. Annals of Applied Statistics, 9(1), Vardi, Y. (1996). Network tomography: estimating source-destination traffic intensities from link data. Journal of the American Statistical Association 91, AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
34 Learning from the Dependence Structure Identifiability can be explained in part by information from link count dependence structure. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
35 Learning from the Dependence Structure Identifiability can be explained in part by information from link count dependence structure. Scenario 1 Scenario 2 Observation 1 Observation y 1 = 11 y 2 = 11 Observation y 1 = 6 y 2 = 6 Observation y 1 = 7 y 2 = y 1 = 8 y 2 = 6 Observation y 1 = 9 y 2 = 12 Observation y 1 = 10 y 2 = 11 AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
36 Learning from the Dependence Structure Identifiability can be explained in part by information from link count dependence structure. Scenario 1 Scenario 2 Observation 1 Observation y 1 = 11 y 2 = 11 Observation y 1 = 6 y 2 = 6 Observation y 1 = 7 y 2 = 7 θ = (0, 10, 0) T y 1 = 8 y 2 = 6 Observation y 1 = 9 y 2 = 12 Observation y 1 = 10 y 2 = 11 θ = (10, 0, 10) T AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
37 Identifiability: Open Research Questions 1 For integer-valued flows, results require independence of route flows, and independence between days. Can either of these independence assumptions be relaxed? If so, in what ways? What are minimal requirements? AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
38 Identifiability: Open Research Questions 1 For integer-valued flows, results require independence of route flows, and independence between days. Can either of these independence assumptions be relaxed? If so, in what ways? What are minimal requirements? 2 Proof is constructive in nature, and relies on observation of very low probability events. What are practical implication of theoretical identifiability from link count data? AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
39 Likelihood-Based Inference Observe n link count vectors: y (1),..., y (n). Likelihood is L(θ) = n t=1 f Y(y (t) θ). Recall that for each single link count vector, f Y (y θ) = x X y f X (x θ). Usually computationally infeasible to enumerate X y = {x: y = Ax, x 0}. Hence not feasible to explicitly evaluate f Y (y θ) and L(θ). AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
40 Likelihood-Based Inference Observe n link count vectors: y (1),..., y (n). Likelihood is L(θ) = n t=1 f Y(y (t) θ). Recall that for each single link count vector, f Y (y θ) = x X y f X (x θ). Usually computationally infeasible to enumerate X y = {x: y = Ax, x 0}. Hence not feasible to explicitly evaluate f Y (y θ) and L(θ). Passing comment: Many ad hoc methods of estimation seek single optimal element of X y. That is demonstrably sub-optimal. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
41 Sampling Based Inference Crux of the idea: replace enumeration of X y = {x : y = Ax, x 0} by a representative sample therefrom. Practical implementations: stochastic EM algorithm for likelihood inference, MCMC algorithms for Bayesian inference. Li (2005) discusses EM and stochastic EM algorithms for trip matrix estimation. Tebaldi & West (1998) is seminal reference for MCMC Bayesian network tomography. All previous methods have struggled because of difficulty in sampling from X y. Li, B. (2005). Bayesian inference for origin-destination matrices of transport networks using the EM algorithm. Technometrics 47(4), Tebaldi, C. & West, M. (1998). Bayesian inference on network traffic using link count data (with discussion). Journal of the Amer. Statist. Assoc. 93, AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
42 Stochastic EM Algorithm Iterative algorithm. Suppose θ current value of θ. Step 1 Compute ˆQ(θ θ ) = M 1 M i=1 N t=1 log{f X (x (t) i θ)} where x (t) 1,..., x (t) M is a random sample from f X Y( y (t), θ ). Step 2 Maximize ˆQ(θ θ ) to give new value of θ. Algorithm converges to maximum likelihood estimate ˆθ. Standard errors available from (approximate) information matrix. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
43 Bayesian Inference using MCMC Metropolis Hastings Algorithm Prior: π(θ). Posterior : p(θ y (1),..., y (n) ) L(θ)π(θ) MCMC Algorithm (for n = 1 case) 1 Generate initial values for θ, x. 2 Sample candidate x from proposal distribution q. Want q to have support X y. 3 Accept x with probability min(1, α) where α = f X Y(x y, θ)q(x) f X Y (x y, θ)q(x ) = f X,Y(x, y θ)q(x) f X,Y (x, y θ)q(x ) = f X(x θ)q(x) f X (x θ)q(x ). 4 Sample θ from p(θ x) (usually straightforward). 5 Go to step 2. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
44 Sampling Feasible Route Flow Vectors Trick is to sample from proposal distribution q with support matching X y = {x : y = Ax, x 0}. Can be an iterative process, updating one element of x at at time. Development of efficient algorithm for such sampling is 20 year old problem. Recent geometrical insight from Airoldi & Haas (2011), Airoldi & Blocker (2013) for continuous flow models but still no reliable solution for integer valued traffic. Airoldi, E.M. & Blocker, A.W. (2013). Estimating latent processes on a network from indirect measurements. Journal Amer. Statist. Assoc. 108, Airoldi, E.M. & Hass, B. (2011). Polytope samplers for inference in ill-posed inverse problems. In Int. Conf. Artificial Intelligence and Statistics, Vol. 15. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
45 Geometry of the Feasible Route Flow Set X y is intersection of linear manifold {x : y = Ax} with non-negative orthant {x 0}. Hence X y = {x : y = Ax, x 0} is a convex polytope. If r routes and n links then X y is an r n dimensional object embedded in r-dimensional space. Have flexibility in representation. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
46 Geometry of the Feasible Route Flow Set Example Nodes 1, 2 are origins of flow, nodes 3, 4 and 5 destinations. Hence six routes, indexed (o, d) for o = 1, 2, d = 3, 4, 5. Link-route incidence matrix is A = [A 1 A 2 ] = Partitioned so that A 1 is invertible, and hence polytope represented by flows on last two routes, (2, 4) and (2, 5). AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
47 Geometry of the Feasible Route Flow Set Example continued Suppose link counts are y = (20, 30, 20, 10) T (single day). Polytope are represented by routes 5 (2, 4) and 6 (2, 5). x 5 x AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
48 Sampling Geometry of the Feasible Route Flow Set Example continued Sampling (e.g. uniformly) from polytope not straightforward, because no convenient expression for support. Sampling iteratively in coordinate directions much simpler. x 5 x AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
49 Sampling Geometry of the Feasible Route Flow Set Example continued Sampling (e.g. uniformly) from polytope not straightforward, because no convenient expression for support. Sampling iteratively in coordinate directions much simpler. x 4 x AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
50 Sampling for Awkward Geometries Revised example Suppose y = (10, 20, 20, 10) T (left) or y = (10, 20, 19, 9) T (right). Sampling in coordinate directions impossible (left) or very slow to converge (right). x x x 5 x 5 AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
51 Change of Representation Tebaldi & West (1998) proposed coordinate direction iterative sampling but failed to appreciate that it won t necessarily work for awkward geometries. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
52 Change of Representation Tebaldi & West (1998) proposed coordinate direction iterative sampling but failed to appreciate that it won t necessarily work for awkward geometries. But recall there is flexibility in choice of representation of X y. H. (2015) proved recently that there is always a convenient representation of X y for coordinate direction sampling. Caveat: so long as the link-path matrix A is unimodular. Hazelton, M.L. (2015). Network tomography for integer-valued traffic. Annals of Applied Statistics, 9(1), Tebaldi, C. & West, M. (1998). Bayesian inference on network traffic using link count data (with discussion). Journal of the Amer. Statist. Assoc. 93, AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
53 Change of Representation Back to the Example As before, y = (10, 20, 20, 10) T (left) or y = (10, 20, 19, 9) T (right). Represent polytope by flows on routes 4 (2, 3) and 6 (2, 5). Sampling in coordinate directions then efficient. x x x 5 x 5 AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
54 Adaptive MCMC Samplers We can construct a good route flow sampler by allowing adaptive representation of X y. Strategy is to exclude routes carrying heavy traffic from basis for polytope. Intuitively this maximizes the slack and hence permits longest possible moves. Implement by monitoring estimated route flows and updating basis periodically. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
55 Example: Road Intersection in Leicester, UK Leicester AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
56 Example: Road Intersection in Leicester, UK Network AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
57 Example: Road Intersection in Leicester, UK Network OD pairs: (o, d) {1, 2, 3, 5, 6} {1, 2, 3, 5, 6}. Traffic counted on all links except 5. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
58 Example: Road Intersection in Leicester, UK Model Canonical Poisson model: x Pois(θ). Fixed routing, so route flows equal OD flows. Single link vector observed over 15 mins on weekday in May. y = (72, 56, 217, 120, 119, 127, 178, 117, 181) T Prior information available. Critical otherwise useful inference impossible. Specified through gamma priors on elements of θ. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
59 Example: Road Intersection in Leicester, UK Model Canonical Poisson model: x Pois(θ). Fixed routing, so route flows equal OD flows. Single link vector observed over 15 mins on weekday in May. y = (72, 56, 217, 120, 119, 127, 178, 117, 181) T Prior information available. Critical otherwise useful inference impossible. Specified through gamma priors on elements of θ. Employ MCMC algorithm to sample x and θ. Switch between sampling x θ, y and θ x. Adaptive choice of basis for polytope X y. Updates at 10,000 and 20,000 iterations. AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
60 Example: Road Intersection in Leicester, UK Trace Plots for x AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
61 Example: Road Intersection in Leicester, UK Results Route O-D Prior mean Posterior mean 95% CI (27.6, 52.7) (12.0, 30.3) (42.5, 76.3) (16.3, 38.4) (12.8, 33.8) (25.9, 53.8) (43.5, 80.2) (1.6, 11.0) (11.5, 31.8) (21.9, 46.3) (26.1, 48.9) (5.2, 18.0) (37.3, 67.0) (5.8, 27.1) (16.1, 39.8) (0.3, 16.0) (48.7, 85.1) (1.5, 9.5) (3.4, 14.3) (27.8, 51.6) AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
62 Closing Comments: More Open Research Questions Preceding theory and methods work if A is totally unimodular. A matrix is totally unimodular if all square sub-matrices have integer-valued inverses. Most realistic link-route incidence matrices seem to have this property, but can construct artificial counter-examples. Why do real matrices tend to have totally unimodular link-route incidence matrices? Are we likely to find counter-examples in practice? AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
63 For a Copy of These Slides... AUT Mathematical Sciences Symposium Auckland, 7 8 December / 39
Bayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationBayesian inference for origin-destination matrices of transport networks using the EM algorithm
Loughborough University Institutional Repository Bayesian inference for origin-destination matrices of transport networks using the EM algorithm his item was submitted to Loughborough University's Institutional
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationBayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence
Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationStatistical Inference for Stochastic Epidemic Models
Statistical Inference for Stochastic Epidemic Models George Streftaris 1 and Gavin J. Gibson 1 1 Department of Actuarial Mathematics & Statistics, Heriot-Watt University, Riccarton, Edinburgh EH14 4AS,
More informationInteger programming: an introduction. Alessandro Astolfi
Integer programming: an introduction Alessandro Astolfi Outline Introduction Examples Methods for solving ILP Optimization on graphs LP problems with integer solutions Summary Introduction Integer programming
More informationMarkov Chain Monte Carlo methods
Markov Chain Monte Carlo methods Tomas McKelvey and Lennart Svensson Signal Processing Group Department of Signals and Systems Chalmers University of Technology, Sweden November 26, 2012 Today s learning
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationTheory of Stochastic Processes 8. Markov chain Monte Carlo
Theory of Stochastic Processes 8. Markov chain Monte Carlo Tomonari Sei sei@mist.i.u-tokyo.ac.jp Department of Mathematical Informatics, University of Tokyo June 8, 2017 http://www.stat.t.u-tokyo.ac.jp/~sei/lec.html
More informationStat 451 Lecture Notes Numerical Integration
Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
More informationCS 361: Probability & Statistics
October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite
More informationIntroduction to Integer Programming
Lecture 3/3/2006 p. /27 Introduction to Integer Programming Leo Liberti LIX, École Polytechnique liberti@lix.polytechnique.fr Lecture 3/3/2006 p. 2/27 Contents IP formulations and examples Total unimodularity
More informationMultimodal Nested Sampling
Multimodal Nested Sampling Farhan Feroz Astrophysics Group, Cavendish Lab, Cambridge Inverse Problems & Cosmology Most obvious example: standard CMB data analysis pipeline But many others: object detection,
More informationMetropolis-Hastings Algorithm
Strength of the Gibbs sampler Metropolis-Hastings Algorithm Easy algorithm to think about. Exploits the factorization properties of the joint probability distribution. No difficult choices to be made to
More informationOverview of Course. Nevin L. Zhang (HKUST) Bayesian Networks Fall / 58
Overview of Course So far, we have studied The concept of Bayesian network Independence and Separation in Bayesian networks Inference in Bayesian networks The rest of the course: Data analysis using Bayesian
More information15-780: LinearProgramming
15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationThe Monte Carlo Method: Bayesian Networks
The Method: Bayesian Networks Dieter W. Heermann Methods 2009 Dieter W. Heermann ( Methods)The Method: Bayesian Networks 2009 1 / 18 Outline 1 Bayesian Networks 2 Gene Expression Data 3 Bayesian Networks
More informationOn Markov chain Monte Carlo methods for tall data
On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational
More informationLinear and Integer Programming - ideas
Linear and Integer Programming - ideas Paweł Zieliński Institute of Mathematics and Computer Science, Wrocław University of Technology, Poland http://www.im.pwr.wroc.pl/ pziel/ Toulouse, France 2012 Literature
More informationMCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17
MCMC for big data Geir Storvik BigInsight lunch - May 2 2018 Geir Storvik MCMC for big data BigInsight lunch - May 2 2018 1 / 17 Outline Why ordinary MCMC is not scalable Different approaches for making
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Infinite Feature Models: The Indian Buffet Process Eric Xing Lecture 21, April 2, 214 Acknowledgement: slides first drafted by Sinead Williamson
More informationABC methods for phase-type distributions with applications in insurance risk problems
ABC methods for phase-type with applications problems Concepcion Ausin, Department of Statistics, Universidad Carlos III de Madrid Joint work with: Pedro Galeano, Universidad Carlos III de Madrid Simon
More informationEstimating Latent Processes on a Network From Indirect Measurements
Estimating Latent Processes on a Network From Indirect Measurements The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation
More information39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017
Permuted and IROM Department, McCombs School of Business The University of Texas at Austin 39th Annual ISMS Marketing Science Conference University of Southern California, June 8, 2017 1 / 36 Joint work
More informationOn the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems
MATHEMATICS OF OPERATIONS RESEARCH Vol. xx, No. x, Xxxxxxx 00x, pp. xxx xxx ISSN 0364-765X EISSN 156-5471 0x xx0x 0xxx informs DOI 10.187/moor.xxxx.xxxx c 00x INFORMS On the Power of Robust Solutions in
More information11. Learning graphical models
Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical
More informationEstimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio
Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationNetwork Flows. 6. Lagrangian Relaxation. Programming. Fall 2010 Instructor: Dr. Masoud Yaghini
In the name of God Network Flows 6. Lagrangian Relaxation 6.3 Lagrangian Relaxation and Integer Programming Fall 2010 Instructor: Dr. Masoud Yaghini Integer Programming Outline Branch-and-Bound Technique
More informationExponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that
1 More examples 1.1 Exponential families under conditioning Exponential families also behave nicely under conditioning. Specifically, suppose we write η = η 1, η 2 R k R p k so that dp η dm 0 = e ηt 1
More informationLinear Programming Duality P&S Chapter 3 Last Revised Nov 1, 2004
Linear Programming Duality P&S Chapter 3 Last Revised Nov 1, 2004 1 In this section we lean about duality, which is another way to approach linear programming. In particular, we will see: How to define
More informationan introduction to bayesian inference
with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena
More informationMonte Carlo in Bayesian Statistics
Monte Carlo in Bayesian Statistics Matthew Thomas SAMBa - University of Bath m.l.thomas@bath.ac.uk December 4, 2014 Matthew Thomas (SAMBa) Monte Carlo in Bayesian Statistics December 4, 2014 1 / 16 Overview
More informationViable and Sustainable Transportation Networks. Anna Nagurney Isenberg School of Management University of Massachusetts Amherst, MA 01003
Viable and Sustainable Transportation Networks Anna Nagurney Isenberg School of Management University of Massachusetts Amherst, MA 01003 c 2002 Viability and Sustainability In this lecture, the fundamental
More information1.1.1 Algebraic Operations
1.1.1 Algebraic Operations We need to learn how our basic algebraic operations interact. When confronted with many operations, we follow the order of operations: Parentheses Exponentials Multiplication
More informationGAUSSIAN PROCESS REGRESSION
GAUSSIAN PROCESS REGRESSION CSE 515T Spring 2015 1. BACKGROUND The kernel trick again... The Kernel Trick Consider again the linear regression model: y(x) = φ(x) w + ε, with prior p(w) = N (w; 0, Σ). The
More informationAdvances in Network Tomography
Advances in Network Tomography Edoardo M. Airoldi eairoldi@stat.cmu.edu supervised by: Christos N. Faloutsos christos@cs.cmu.edu 1 Abstract Knowledge about the origin-destination (OD) traffic matrix allows
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationTime-Sensitive Dirichlet Process Mixture Models
Time-Sensitive Dirichlet Process Mixture Models Xiaojin Zhu Zoubin Ghahramani John Lafferty May 25 CMU-CALD-5-4 School of Computer Science Carnegie Mellon University Pittsburgh, PA 523 Abstract We introduce
More informationLecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1
Lecture 5 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationDeriving the EM-Based Update Rules in VARSAT. University of Toronto Technical Report CSRG-580
Deriving the EM-Based Update Rules in VARSAT University of Toronto Technical Report CSRG-580 Eric I. Hsu Department of Computer Science University of Toronto eihsu@cs.toronto.edu Abstract Here we show
More informationSC7/SM6 Bayes Methods HT18 Lecturer: Geoff Nicholls Lecture 2: Monte Carlo Methods Notes and Problem sheets are available at http://www.stats.ox.ac.uk/~nicholls/bayesmethods/ and via the MSc weblearn pages.
More informationNetwork Equilibrium Models: Varied and Ambitious
Network Equilibrium Models: Varied and Ambitious Michael Florian Center for Research on Transportation University of Montreal INFORMS, November 2005 1 The applications of network equilibrium models are
More informationBayes: All uncertainty is described using probability.
Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationOn the Power of Robust Solutions in Two-Stage Stochastic and Adaptive Optimization Problems
MATHEMATICS OF OPERATIONS RESEARCH Vol. 35, No., May 010, pp. 84 305 issn 0364-765X eissn 156-5471 10 350 084 informs doi 10.187/moor.1090.0440 010 INFORMS On the Power of Robust Solutions in Two-Stage
More informationMARKOV CHAIN MONTE CARLO
MARKOV CHAIN MONTE CARLO RYAN WANG Abstract. This paper gives a brief introduction to Markov Chain Monte Carlo methods, which offer a general framework for calculating difficult integrals. We start with
More information1 The linear algebra of linear programs (March 15 and 22, 2015)
1 The linear algebra of linear programs (March 15 and 22, 2015) Many optimization problems can be formulated as linear programs. The main features of a linear program are the following: Variables are real
More informationAnswers and expectations
Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E
More informationMarkov Chain Monte Carlo, Numerical Integration
Markov Chain Monte Carlo, Numerical Integration (See Statistics) Trevor Gallen Fall 2015 1 / 1 Agenda Numerical Integration: MCMC methods Estimating Markov Chains Estimating latent variables 2 / 1 Numerical
More informationBayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference
Bayesian Inference for Discretely Sampled Diffusion Processes: A New MCMC Based Approach to Inference Osnat Stramer 1 and Matthew Bognar 1 Department of Statistics and Actuarial Science, University of
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationPROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:
More informationPetr Volf. Model for Difference of Two Series of Poisson-like Count Data
Petr Volf Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodárenskou věží 4, 182 8 Praha 8 e-mail: volf@utia.cas.cz Model for Difference of Two Series of Poisson-like
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationBayesian GLMs and Metropolis-Hastings Algorithm
Bayesian GLMs and Metropolis-Hastings Algorithm We have seen that with conjugate or semi-conjugate prior distributions the Gibbs sampler can be used to sample from the posterior distribution. In situations,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationRelation of Pure Minimum Cost Flow Model to Linear Programming
Appendix A Page 1 Relation of Pure Minimum Cost Flow Model to Linear Programming The Network Model The network pure minimum cost flow model has m nodes. The external flows given by the vector b with m
More informationLectures 6, 7 and part of 8
Lectures 6, 7 and part of 8 Uriel Feige April 26, May 3, May 10, 2015 1 Linear programming duality 1.1 The diet problem revisited Recall the diet problem from Lecture 1. There are n foods, m nutrients,
More informationVariational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures
17th Europ. Conf. on Machine Learning, Berlin, Germany, 2006. Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures Shipeng Yu 1,2, Kai Yu 2, Volker Tresp 2, and Hans-Peter
More informationLikelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University
Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate
More informationA Parametric Simplex Algorithm for Linear Vector Optimization Problems
A Parametric Simplex Algorithm for Linear Vector Optimization Problems Birgit Rudloff Firdevs Ulus Robert Vanderbei July 9, 2015 Abstract In this paper, a parametric simplex algorithm for solving linear
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationStat 451 Lecture Notes Monte Carlo Integration
Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:
More informationMaintaining Nets and Net Trees under Incremental Motion
Maintaining Nets and Net Trees under Incremental Motion Minkyoung Cho, David Mount, and Eunhui Park Department of Computer Science University of Maryland, College Park August 25, 2009 Latent Space Embedding
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationA Bayesian Approach to Phylogenetics
A Bayesian Approach to Phylogenetics Niklas Wahlberg Based largely on slides by Paul Lewis (www.eeb.uconn.edu) An Introduction to Bayesian Phylogenetics Bayesian inference in general Markov chain Monte
More informationSection Notes 8. Integer Programming II. Applied Math 121. Week of April 5, expand your knowledge of big M s and logical constraints.
Section Notes 8 Integer Programming II Applied Math 121 Week of April 5, 2010 Goals for the week understand IP relaxations be able to determine the relative strength of formulations understand the branch
More informationStat 542: Item Response Theory Modeling Using The Extended Rank Likelihood
Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal
More informationApproximate Likelihoods
Approximate Likelihoods Nancy Reid July 28, 2015 Why likelihood? makes probability modelling central l(θ; y) = log f (y; θ) emphasizes the inverse problem of reasoning y θ converts a prior probability
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationAfternoon Meeting on Bayesian Computation 2018 University of Reading
Gabriele Abbati 1, Alessra Tosi 2, Seth Flaxman 3, Michael A Osborne 1 1 University of Oxford, 2 Mind Foundry Ltd, 3 Imperial College London Afternoon Meeting on Bayesian Computation 2018 University of
More informationProbabilistic Graphical Models & Applications
Probabilistic Graphical Models & Applications Learning of Graphical Models Bjoern Andres and Bernt Schiele Max Planck Institute for Informatics The slides of today s lecture are authored by and shown with
More informationStatistical Tools and Techniques for Solar Astronomers
Statistical Tools and Techniques for Solar Astronomers Alexander W Blocker Nathan Stein SolarStat 2012 Outline Outline 1 Introduction & Objectives 2 Statistical issues with astronomical data 3 Example:
More informationLinear Programming Redux
Linear Programming Redux Jim Bremer May 12, 2008 The purpose of these notes is to review the basics of linear programming and the simplex method in a clear, concise, and comprehensive way. The book contains
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More information3.3 Easy ILP problems and totally unimodular matrices
3.3 Easy ILP problems and totally unimodular matrices Consider a generic ILP problem expressed in standard form where A Z m n with n m, and b Z m. min{c t x : Ax = b, x Z n +} (1) P(b) = {x R n : Ax =
More informationBayesian Graphical Models
Graphical Models and Inference, Lecture 16, Michaelmas Term 2009 December 4, 2009 Parameter θ, data X = x, likelihood L(θ x) p(x θ). Express knowledge about θ through prior distribution π on θ. Inference
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationEE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18
EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 18 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 31, 2012 Andre Tkacenko
More information19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process
10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationThe method of lines (MOL) for the diffusion equation
Chapter 1 The method of lines (MOL) for the diffusion equation The method of lines refers to an approximation of one or more partial differential equations with ordinary differential equations in just
More informationPrinciples of Bayesian Inference
Principles of Bayesian Inference Sudipto Banerjee University of Minnesota July 20th, 2008 1 Bayesian Principles Classical statistics: model parameters are fixed and unknown. A Bayesian thinks of parameters
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 27 Course arrangements Lectures M.2
More informationNew Bayesian methods for model comparison
Back to the future New Bayesian methods for model comparison Murray Aitkin murray.aitkin@unimelb.edu.au Department of Mathematics and Statistics The University of Melbourne Australia Bayesian Model Comparison
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationLog-Density Estimation with Application to Approximate Likelihood Inference
Log-Density Estimation with Application to Approximate Likelihood Inference Martin Hazelton 1 Institute of Fundamental Sciences Massey University 19 November 2015 1 Email: m.hazelton@massey.ac.nz WWPMS,
More information