ECEN 689 Special Topics in Data Science for Communications Networks

Size: px

Start display at page:

Download "ECEN 689 Special Topics in Data Science for Communications Networks"

Agatha Glenn
5 years ago
Views:

1 ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 13 Measuring and Inferring Traffic Matrices

2 Measuring and Inferring Network Usage Flow records à Origin Destination Traffic Matrices What if we don t have collect flow records network wide Core backbone network: yes Access networks: sometimes not

3 Sources of ISP Operational Data Router Centers Peering Access Backbone Business Management Datacenters

4 Sources of ISP Operational Data Traffic flow through core: NetFlow capture in core routers

5 Sources of ISP Operational Data Traffic flow in access network: does not touch core: no NetFlow

6 Sources of ISP Operational Data Router Centers Peering Access Backbone Link Traffic Rates Timeseries of traffic per router interface, 5 minute granularity Business 0:00 0:05 0:10 0:15 0:20 0:25 0:30 0:35 Management Datacenters

7 Router Management Information Base Management Information Base (MIB) Database used for managing routers and other network devices Hierarchical database on managed objects maintained in router E.g. operational statistics, interface parameters and settings Interact with MIB via SNMP Simple Network Management Protocol Operates on a pull model Poll device using SNMP to retrieve statistics Contrast push model of NetFlow: export records to a collector Actually, can set traps Event contingencies that generate actions if they occur When interface goes down, send warning to Doesn t scale to anything resembling NetFlow

8 Router Interface Counters Managed objects in table maintained in MIB, per interface Byte and packet counters 64 bit counters; 32 bit can wrap too quickly Poll interfaces periodically Time series of counter value Average rate = CounterDifference / PollingInterval Time series of average rates 5 minute polling interval is common practice Conservative, historical concerns about load on router Data issues Round robin polling of routers during period: intervals don t align Counter rollover detection and correction Detect and flag missing data values Router did not respond to SNMP poll, network congestion / outage

9 Packet level traffic measurements The other end of the spectrum! Special purpose platforms collecting packet header traces Enables detailed statistical analysis of packet arrival Help develop traffic models for use in analysis

Poisson Process increasing window width t Poisson(λ) Poisson process with intensity λ Interarrival times IID exponential Mean interarrival time 1/λ Arrivals in time window t Poisson RV:

10 Poisson Process increasing window width t Poisson(λ) Poisson process with intensity λ Interarrival times IID exponential Mean interarrival time 1/λ Arrivals in time window t Poisson RV: mean = variance = λt Coefficient of variation Mean/SD = (λt) -1/2 Process becomes smoother as window width t increases Good model for telephone call start process Time-of-day rate dependence

Actual Packet Arrival Process increasing window width t Arrivals do not smooth out with increasing time window Variability at all timescales Self-similarity Process looks identical at all timescale

11 Actual Packet Arrival Process increasing window width t Arrivals do not smooth out with increasing time window Variability at all timescales Self-similarity Process looks identical at all timescale Example of self-similar process Fractional Brownian motion (fbm) Window scaling by factor a B H (a t) = a H/2 B H (t) some H > 0 H = ½ is standard Brownian motion H > ½ positive correlations Long range correlations give rise to burstiness at all time scales

12 Traffic matrices from link measurements? Y 1 Y n+1 C Y 2n Y n Know link aggregate rates into C: (Y 1,,Y n ) and out of C: (Y n+1,,y 2n ) Want traffic matrices X ij = traffic from i to j

13 Linear Relations Y i : traffic in from i, Y n+j : traffic out to j; X ij : traffic from i to j Speak of ij as origin-destination (OD) pair What do X and Y model? First model: Y, as average rates over the same single measurement interval Y i = Σ j X ij, i = 1,,n traffic from origin i = sum over destinations j of traffic from i to j Y n+j = Σ i X ij j=1,,n traffic to destination j = sum over origins i of traffic from i to j Linear relation Y = A.X A is routing matrix A lp = 1 if traffic on OD pair p is carried over link l, 0 otherwise

14 Can we solve Y = A X? X ij : n 2 OD rates from Y i: 2n link rates? E.g. n = 3: 9 OD rates, 6 link rates X=( X 11, X 12, X 13, X 21, X 22, X 23, X 31, X 32, X 33 ) Y=( Y 1, Y 2, Y 3, Y 4, Y 5, Y 6 ): 1,2,3 in, 4,5,6 out A = Underconstrained linear system A is not of full rank only 5 linearly independent rows/columns

15 Statistical Modeling Make a statistical model of OD traffic Specified by some parameter set ϕ Link traffic is composite of OD that traverses it Analysis to find composite parameters θ = F(ϕ) of link traffic Suppose we could determine the link parameters θ Measure, estimate θ If F is invertible then we could estimate ϕ = F -1 (θ ) Is F invertible? are the OD parameters ϕ uniquely identifiable from link measurements?

16 Simple Poisson Model Model OD traffic as independent Poisson processes Traffic for OD pair ij is Poisson(X ij ) X ij is intensity, i.e. mean arrival rate per unit time Inter-arrival times IID, exponential distribution, mean 1/X ij What is the composite model for link traffic Additivity Poisson(X) + Poisson(X ) = d Poisson(X+X ) Link processes Link i process is Poisson with intensity Y i = (A.X) j Invertible? (Find X uniquely in terms of Y?) Same problem as before: under-constrained linear system

17 Exploit whole random process? Consider {x ij (t): t=1,2,..,t} n values of OD rates during T successive time periods Independent random processes Corresponding link processes: y i (t) = (A. x(t)) i How to get n 2 known quantities? Covariances amongst the n 2 pairs of inbound and outbound traffic Inbound y i (t) = Σ m x im (t), i = 1,,n Outbound y n+j (t) = Σ k x kj (t), j = 1,,n Cov( y i, y n+j ) = Σ mk Cov(x im, x kj ) = Var(x ij ) Assume independent OD processes: only nonzero terms are m=j, k=i Recover variance of OD processes Later: how to generalize this approach

18 General Setting Origin-Destination (OD) Pairs: j {1,,c); Poisson processes of intensity λ j; Observed values X j (t) in time period t=1,,t Directed Links i (1,,r) Routing matrix A ij = 1 if OD pair j routed over link i Link processes Y= AX Y i (t) = Σ j A ij X j (t) Poissonian with rates = Σ j A ij λ j; When can the { λ j } be identified from the processes Y Identified from the values {Y i (t): i=1,,r; t=1,,t}

19 Identifiability and Estimation Identifiability Distinct sets of OD rates { λ j } give distinct distributions of Y = { Y i (t) } Estimation If { λ j } identifiable A mapping from the outcomes Y = { Y i (t) } to estimates { λ j } of { λ j } Good statistical properties desirable Some measure of accuracy, e.g. unbiased, variance, Computable in useful cases Complexity feasible for use in real networks

20 When is general Poissonian model identifiable? Theorem: if routing matrix A Has distinct columns Each column has at least one non-zero entry Then the { λ j } are identifiable Necessity If columns j and k identical, Y depends only on sum (X j + X k ) Could only identify the sum λ j + λ k, not the constituents If column j zero, Y does not depend on X j at all Sufficiency See Vardi 1996

21 Moment-based Estimator Match model mean and covariance to sample versions Normal approxination Mean: E[Y] = A λ Sample mean: Y = t Y(t) / T Covariance: Cov(Y j, Y k ) = = il A ji. cov(x i, X l ) A kl = (A. diag(λ). A T ) ij Sample covariance: S jk = t (Y j (t)y k (t) Y j Y k ) / T

22 Linear Estimation Equate sample and model versions Y = Aλ and S = Bλ where B jk,i = A ji A ik Back to solving linear equation for λ Difficulties System generally inconsistent if S ii (Y i ) 2 System generally over-constrained Approach A = r x c, B = r(r-1) x c Iterative approach LinInPos (Linear Inversion with Positive Constraints) Some success in example cases

23 Maximum Likelihood Estimator Likelihood function: L(λ) = Pr λ (Y) = X: Y=AX Pr λ (X) Maximum likelihood estimate Find MLE λ that maximizes L(λ) Often difficult to compute Difficult to determine set {X: Y=AX} MLE may lie on boundary of {X : Y= AX } Iterative methods Expectation-Maximization Algorithm Given value λ, find λ* that maximizes E X Y,λ* [ log L(λ) ] Iterative mapping λ à λ* converge (in some cases) to MLE But not always

24 Time-dependent model (Cao 2000) OD process: X = Normal(λ t,σ t ): λ =(λ 1,,λ n ), σ t = covariance matrix Link process: Y t = A.X t = Normal( Aλ t, Aσ t A T ) Constraint: σ t = k * diag(λ tc ) Some constant k > 0 and exponent c > 1 Exponent c > 1 captures higher variance that Poisson Simple characterization inspired by network traffic measurement MLE via EM High computation cost O(r 5 )

25 Can structural models help? Gravity model Assume product form for OD traffic Estimate X ij by X ij G = k Y in i * Yout j k is constant chosen for correct normalization of total traffic Gravity model by analogy to Newton s Law of Gravitation Attractive force between two objects proportional to product of masses How good is the gravity model? Computation Very quick to compute Accuracy X ij G does not solve the linear system Y = A.X in general Need to investigate experimentally

26 Gravity Model Performance Not very accurate, although some improvement possible by aggregation in real networks

27 Generalized Gravity Model Internet routing is asymmetric Hot potato routing: use the closest exit point Generalized gravity model For outbound traffic, assumes proportionality on per-peer basis as opposed to per-router peer links access links

28 Generalized Gravity Model

29 Can we improve on the Gravity Model? For gravity model : quick to compute Against gravity model: does not satisfy constraint Y = AX Can we get a best of both worlds approach? Compromise between Gravity Model and Constraint?

30 Tomogravity Tomogravity = tomography + gravity modeling Use least-squares method to get the solution, which Satisfies the constraints Y = A.X Is closest to the gravity model solution Can use weighted least-squares to make more robust least square solution gravity model solution constraint subspace Y = A.X

31 Tomogravity: Accuracy Accurate within 10-20% (esp. for large elements)

32 Regularization Want a solution that satisfies constraints: Y = AX Many more unknowns than measurement: N end points: O(N 2 ) paths, O(N) links Underconstrained linear system Many solutions satisfy the equations Must somehow choose the best solution Regularization Include penalty function J(X) J(X) expresses preference amongst different possible solutions Estimate X : X = argmin X { Y- AX 2 + µ 2 J(X) }

33 Regularization in Tomogravity Consider J(X) = X X G 2 Norm distance from Gravity Model Solution X = argmin X { µ -2 Y- AX 2 + X X G 2 } For small µ, X approaches least squares solution argmin X: Y- AX { X X G 2 } More robust results found with weighted least squares solution J(X) = (X X G ) / X G 2 component-wise division by X G

34 Information and Entropy X with discrete distribution {p(x i ), i=1,..,n}} Entropy: H(X) = - Σ i p(x i ) log p(x i ) H(X) = 0 if one p(x i ) = 1 (all others 0) H(X) = n -1 log n id all p(x i )= 1/n : max value H(X) a measure of uncertainty of X - p log p

35 Mutual Information Entropy: H(X) = - Σ i p(x i ) log p(x i ) Conditional entropy H(Z X) X-averaged uncertainty in conditional distribution of Z X H(Z X) = - Σ j p (x j ) Σ j p (z i x j ) log Σ j p (z i x j ) Note if Z and X are independent p (z i x j ) = p (z i ) : H(Z X) = H(Z) Mutual information I(Z,X) = H(Z) H(Z X) I(Z,X) 0 with equality iff Z, X are independent I(Z,X) = Σ ij p (x i, z i ) log p (x i, z i ) / p (x i ) p (z i ) Independent case: p (x i, z i ) = p (x i ) p (z i ): I(Z,X) = 0

36 Traffic Matrices and Information What does the distribution of source traffic tell us about the distribution at the destinations First: normalize the OD matrix p(s,d) = fraction of all traffic from source s to destination d Joint distribution of source and destination traffic p S (d) = Σ s p(s,d): fraction of traffic to destination d p D (s) = Σ d p(s,d): fraction of traffic from source s Suppose p(s,d) = p D (s) * p S (d) Source and destination distributions independent: GRAVITY MODEL Mutual information I(S,D) = Σ s,d p (s, d ) log p (s, d ) / p D (s) p S (d) How much source distribution tells us about destination distribution Gravity model: no mutual information I(S,D) = 0

37 Regularization and MMI MMI: minimal mutual information Regularization with penalty function J = I(S,D) X = argmin X { Y- AX 2 +µ 2 I(S,D) } Intuition: gravity model represents lack of information No model of source / destination dependence Find compromise between that and constraints Y = AX Relation to original tomogravity For close distributions P and Q I(P,Q) Σ i (p i q i ) / q i 2 Weighted mean square distance from before

38 References Network Tomography: Estimating Source-Destination Traffic Intensities From Link Data, Vardi, Time varying network tomography: router link data, Cao et. al, Estimating Point-to-Point and Point-to-Multipoint Traffic Matrices: An Information-Theoretic Approach, Zhang, Roughan, Lund & Donoho, Chapter 9 of Kolaczyk: Statistical Analysis of Network Data: Methods and Models.

An Overview of Traffic Matrix Estimation Methods

An Overview of Traffic Matrix Estimation Methods Nina Taft Berkeley www.intel.com/research Problem Statement 1 st generation solutions 2 nd generation solutions 3 rd generation solutions Summary Outline