Sampling and Censoring in Estimation of Flow Distributions

Size: px
Start display at page:

Download "Sampling and Censoring in Estimation of Flow Distributions"

Transcription

1 Sampling and Censoring in Estimation of Flow Distributions Nelson Antunes Center for Computacional and Stochastic Mathematics University of Lisbon, Portugal Vladas Pipiras Department of Statistics and Operations Research University of North Carolina, USA Abstract Traffic monitoring and estimation of flow characteristics, such as the size and duration distributions, can be problematic when the length of an observation window is constrained (e.g., due to hard network resources). Indeed, as shown in this work, sampled flows are usually affected by censoring in an observation window, which leads to biased estimators. To account for censoring, a mathematical framework that describes sampling of flows in a time window is developed. Using censoring analysis, we provide nonparametric maximum likelihood estimators for the flow duration and size distributions. The estimators are computed using the EM algorithm. Finally, the estimators are applied to an actual traffic trace, and are found to perform very well. I. INTRODUCTION Measuring and monitoring flow traffic plays a central role in operating today s computer networks. Among the flow metrics considered in inference is the flow size (number of packets) [], useful for traffic modeling, management and urity. Flow duration [2] is another important metric for traffic prediction, traffic engineering to support QoS and in accounting, and it is arguably the most important one from the viewpoint of the user [3]. Due to the massive volume of traffic, and storing and processing costs, it has become impossible to reconstruct and track all flows on a network link. To overcome this problem, sampling flows have become commonplace [4]. One issue which has seemingly attracted little attention yet in the networking community is the role of the observation (time) window in making inference about flow metrics. An exception seems to be the work [5] for the flow duration distribution relying on full (non-sampled) traffic. An observation window of a typical order of several minutes naturally censors the packets of a flow (see Figure ). This can be more pronounced as the number of long-lived flows increases. The length of the observation window is often constrained by the need to sample flows under limited hard resources in the measurement infrastructure [6]. On the other hand, the length of the time window also depends on the targeted application. For instance, in network attacks [7], to detect a sudden increase in the number of flows with one packet, short time intervals are adequate. In addition, it is well known that Internet traffic is nonstationary [8] over long time periods because of daily variations of interactive services (video, web, etc), limiting the lenght of the observation window. What are censoring effects in estimation and how significant are they? The answer actually depends on the quantities used in estimation, as well as other factors (see Section II-D below). For example, if only uncensored flows are used in estimation of the distribution of flow sizes or durations, then the true distribution function will tend naturally to be overestimated for larger values of sizes and durations. Note that this not only leads to a bias but also a loss of information from censored flows. Perhaps more surprising effects on estimation are when all the sampled flows are used, as if they were not censored. To leave the reader in some suspense, the answer can be found in Section II-D below. But independently of what sampled flows are used, estimation ignoring censoring will lead to bias. Moreover, the bias will be larger for smaller observation windows. The problem might be mitigated if the observation window is sufficiently long, however, the measurement cost would be prohibitive. The main goal of this paper is to show how the bias can be eliminated by taking censoring into account. We develop an analytical framework that describes the sampling of flows in an observation window. Estimating the flow duration distribution under censoring turns out to be related to the problem considered by Wijers [9] in censoring analysis. The resulting estimator contrasts with Kaplan-Meier estimator [] used in [5] which does not fully describe all types of censored flows (only the ones that start in the observation window). It should be noted here that while censoring was not considered in greater depth in the networking problems, it is widespread in other contexts, especially biostatistics []. The only assumption needed to apply the Wijers setting concerns the arrival process of flows. The problem of interest concerning the flow size distribution is more challenging. However, if one is willing to assume a simple and natural model for the packet arrival process within a flow, we show that the censoring framework is close to that considered in a celebrated work of Laslett [2] on the socalled line segment problem. The corresponding nonparametric maximum likelihood estimators for flow duration and size distributions are implemented using the EM algorithm. We assess the adequacy of the modeling assumptions and apply the results to infer the flow distributions from censored flows on an Internet trace. The estimates show a very good accuracy compared to the true distributions. The paper is organized as follows. Section II describes the analytical framework considered. Section III gives the nonparametric maximum likelihood estimators for the flow duration and size distributions based on sampled censored

2 flows. The results are checked on an Internet trace in Section IV. Finally, conclusions are drawn in Section V. II. ANALYTICAL FRAMEWORK In this tion, we introduce the conceptual framework for the censoring analysis. We first describe the traffic model and the sampling method used. We then classify the types of censored flows and show that ignoring censoring can lead to biased estimators. A. Traffic modeling A flow is defined by the usual 5-tuple of origin and destination IP addresses, port numbers and protocol field. We need first to describe the dynamics of network traffic in terms of flow arrivals and the internal flow structure of packet arrivals. The model that we use, was verified against a large number of Internet traffic data and was found to reproduce the main features of the network traffic [3]. We suppose that flows arrive in time according to a Poisson process with rate λ. A flow arriving at random time T consists of the number of packets (size) W, with the probability mass function f W (w) = P (W = w), w, the distribution function F W (w) = P (W w), w and the mean µ W = E(W ). The W packets of the flow are separated by interarrival times (IATs) D i, i =,..., W, which are assumed to be i.i.d. and independent of W. Let D be a variable with a common distribution of D i, with the distribution function F D (t) = P (D t), t, and the mean µ D = E(D). Note that the flow duration is V = W D i, with F V (v) = P (V v W 2) and µ V = E(V W 2) representing the distribution function and the mean of the duration of flow with at least two packets. We note that the distribution functions of W and D are considered nonparametric, and the flows are i.i.d. In the stochastics literature, this model is known as the Bartlett-Lewis cluster point process [4], the difference being just the terminology used (cluster points instead of flow packets). B. Flow sampling in a time window Consider a measurement interval of duration t >, with and t denoting the endpoints of the interval. Conceptually, the FS (flow sampling) scheme is very simple: flows are sampled independently with probability p and thus discarded with probability p within the observation window (, t). Sampling a given flow means that each packet of the flow is sampled within the observation window and none of its other packets are captured outside this window. Hence, the size W and duration V of a sampled flow may not be observed, that is, they may be censored. C. Censored and uncensored flows Suppose that n sampled flows arrive during the interval (, t). Suppose that we also observe m sampled flows in (, t) that arrive before time. According to the traffic model above the random variables that describe the quantities n and m are independent and have a Poisson distribution. l.c. d.c. r.c. Observation window Fig. : Classification of sampled flows (both empty and full circles correspond to packets and full circles to sampled packets). Trace name Local start time Duration Packets TCP Flows Auckland IX 2:: :: 48 million,37,756 TABLE I: Summary Statistics of the data trace. Assume that sampled flows are enumerated in some way. Let T i, W i and V i represent the arrival time, size and duration of the sampled flow i. For a sampled flow arriving within (, t) if T i +V i < t, then the whole flow is observed and uncensored (, in short). If T i + V i t, then the sampled flow is right censored (r.c.). For sampled flows arriving before, they are necessarily either left censored (l.c.) in the case T i + V i < t or double censored (d.c.) when T i + V i t. See Figure. The original sizes W i and durations D i are not observed for both single end censored (s.e.c., i.e., r.c. or l.c.) and d.c. flows. Moreover, the observed duration of the d.c. flow is always equal to t. We consider only TCP flows which account for 8-9% of packets in the Internet traffic [5]. The classification of the sampled flows is based on looking at SYN and FIN flags in the packet header. The analysis and results of this paper can be applied to other kinds of flows provided that suitable substitutes could be found for connection startup (SYN) and connection termination (FIN). D. Trivial estimators In this paper, we are interested in estimating the complementary distribution function (CDF) F W = F W of the sizes of flows, and the CDF F V = F V of the durations of flows, from the available (and hence possibly censored) data in the observation window (, t). In this regard, it is important first to understand the relationship of these CDF s to the corresponding ones obtained using some of the available data without taking care of censoring. We call these estimators the trivial estimators. We consider the publicly available Internet trace, Auckland IX [6] (see Table I), spanning 6 min, and take the relative time of the observation window, 2 min after the start of the trace and an interval length of 3 min. Figure 2 presents the log plot of F V using different available types of censored flows without sampling (i.e., p = ) in order to show the full magnitude of the differences. If a sampled flow has only one SYN packet, we assume that it is if a timeout (3 ) since its arrival has expired during the observation window. Otherwise, it is assumed to be r.c. and is used in the estimation of the CDF of flow durations. The solid line t

3 F V 2 +r.c. +s.e.c.+d.c Fig. 2: CDF of flow durations in an observation window (t = 3 min). in Figure 2 corresponds to the empirical F V computed from all the flows with at least two packets in the trace, called original. The figure provides the estimates of the CDF when using only flows and when adding the r.c. flows in the estimation. In both cases, as expected, the complementary distribution is underestimated. Somewhat surprisingly, when all flows (i.e.,, s.e.c. and d.c.) in the observation interval are considered, the estimator lies above the original distribution. This is due to the fact that the l.c. and d.c. flows that started before the beginning of a measurement interval have in general a larger number of packets due to the heavy-tailed nature of the flow size distribution and therefore longer durations. These are the flows that have more chance to be active at given time. The bias of the trivial estimators that ignore censoring in Figure 2 motivates the study of more accurate estimators. The same can be observed in a time window of 5, or even 3 min but to a lower extent. Similar conclusions can also be drawn for the CDF of the flow sizes F W using the trivial estimators. III. CENSORING ANALYSIS The censoring analysis is based on the construction of a maximum likelihood function each type of sampled flows is treated differently. The likelihood function can be maximized in practice using the EM algorithm. To avoid defining different variables to denote similar quantities for the censoring analysis of flow durations and sizes, the notation introduced below is restricted to each tion. A. Flow duration We are interested here in estimation of the distribution of flow durations F V from the data of possibly censored durations in the observation window (, t). Suppose that N = n sampled flows (with at least two packets) start in the observation window at times T i, with the corresponding durations V i, i =,..., n. The random variable N is Poisson with mean λpf W ()t. Let {, Vi < t T Z i = min(v i, t T i ), L i = i,, V i t T i, be the observed (possibly censored) durations and the observed variables indicating whether right censoring occurs, respectively. Similarly, we suppose that there are M = m sampled flows (with at least two packets) that started at times, abusing the notation, T j before (T j < ) and that continue in the observation window (T j + V j > ). M follows a Poisson distribution with parameter λpf W ()µ V. For j =,..., m, let Y j = min(t j + V j, t), E j = {, Tj + V j < t,, T j + V j t, be the corresponding observed (censored) durations and the observed variables indicating whether right or double censoring occurs, respectively. Note that the values E j =, and L i =, correspond, respectively, to the left censored (l.c.), double censored (d.c.), uncensored () and right censored (r.c.) durations. Again, the basic problem is to estimate the distribution F V from the available data (Z i = z i, L i = l i ) and (Y j = y j, E j = e j ). Given N +M = m+n, the distribution F V can be estimated by a nonparametric maximum likelihood estimator (NPMLE) F V, maximizing the likelihood proportional to: (t + µ V ) n+m (df V (z i )) li ( F V (z i )) li m ( ej ( F V (y j )) ej ( F V (u))du). () y j=t Note that the likelihood () is defined in a natural way. For example, when l i =, (df (z i )) li = df (z i ) can be thought as the probability that duration V equals z i and is within the observation window (, t). The term ( F V (y j )) ej = F V (y j ) when e j = above, comes from the stationary density of the duration of the flow after time, if the flow started before zero. Indeed, recall that this density is ( F V (v))/µ V, v >. The solution of the problem stated above can be obtained following the censoring analysis developed by Wijers [9]. It is more convenient to find and compute the solution F V through a related probability distribution G satisfying dg(v) = t + v df V (v). t + µ V The likelihood () can be rewritten using G as r ( ) (dg(x i )) φi v=x i t + v dg(v) γi ( v=t v t t + v dg(v) ) n+m r, (2) x < x 2 <... < x r are the ordered values of y j and z i for which either e j = or l i =, (that is, uncensored and single end censored durations), and φ i and γ i are, respectively, the number of the uncensored and single end censored values at x i. Note that, unlike (), the likelihood (2) no longer involves the term /(t + µ V ) n+m at the very front. The likelihood (2) can further be expressed as r ( t ) (dg(x i )) φi v=x i t + v dg(v) + g(t) γi h n+m r, (3)

4 g(t) = t t + v dg(v), h = v t t t + v dg(v). The log-likelihood for (3) can finally be written as t log(dg(v))df u.c. (v) + g(v) = t v t log( g(v))df s.e.c. (v) dg(u) + g(t) t + u + log(h)f d.c. (t), (4) with g(t) defined by G(t) + 2t g(t) + h =, and F, F s.e.c. and F d.c. are, respectively, the empirical distributions of, s.e.c. and d.c. durations but normalized by n + m rather than the number of respective points see [9] for further details. When maximizing (4), the NPMLE (Ĝ, ĥ) of (G, h) can be found using the EM algorithm. This amounts to essentially differentiating (4) with respect to the unknown parameters dg(v) and setting the derivative equal to, leading to the so-called self-consistency equations dĝ(v) = df (v) + ĥ = F d.c. (t), s=v s= ĝ(s) df s.e.c. (s) t + v dĝ(v), 2tĝ(t) = ĥ Ĝ(t). In practice, the solution (Ĝ, ĥ) is found through the following steps: Algorithm : Input: z i, l i, i =,..., n, y j, e j, j =,..., m. The following quantities can then be computed: x i, i =,..., r (see above), F (v), F s.e.c. (v), F d.c. (v). E.g. F d.c. (t) = (# of y j = t)/(n + m). Initialization: Let ĥ = F d.c. (t) and, for k =, set dĝk i = ( ĥ)/(n + m), i =,..., r. Loop: Until the stopping condition below is reached, compute: ( i dĝk+ i = F (x i )+ ĝ k = ĝ k i = ĝ k i k := k + dĝk i t + x i + ĝ k (t), dĝk i t + x i, i = 2,..., r. F s.e.c. (x j ) ) d Ĝ k i ĝj k, i =,..., r, t + x i ĝ k (t) = ĥ r dĝk i, 2t Stopping condition: max,...,r dĝk+ i dĝk i < ɛ, for a given ɛ >. The implementation of the algorithm in MATLAB code is available upon request. The estimator of F V is obtained from dĝi, i =,..., r, and ĥ through: F V (x i ) = ν i dĝj, t + x j ν = ĥ r dĝi + 2t dĝj t + x j. Moreover, the mean flow duration estimator is µ V = t. (5) ν B. Flow size The estimation of the flow size distribution under censoring is more challenging. We first need to develop a framework flow sizes and censoring variables are independent. In particular, we will have to consider explicitly the IATs between conutive packets of a flow. We show below how this formulation can be achieved. Suppose that N = n sampled flows start in (, t), and T i and W i denote the arrival time and size of the sampled flow i, respectively, which are independent. The arrival times T i, i =,..., n, are independently and uniformly distributed in (, t). With these sampled n flows, we associate n flows with an infinite number of packets. For infinite size flow i, let S i,k be the time of the kth packet defined as S i, = T i, S i,k = S i, + D i, + D i, D i,k, k = 2, 3,..., the IATs D i,, D i,2,..., D i,k are i.i.d. and have the same distribution as D. Then, C i = max{k : S i,k < t} represents the number of packets of infinite size flow i over [T i, t). The size of the sampled flow i can now be viewed as censored by the independent random variable C i. That is, we observe Z i = z i, L i = l i, i =,..., n, {, Wi C Z i = min(w i, C i ), L i = i,, W i > C i. To derive the estimator of the flow size distribution, we assume for simplicity that the size of flow is driven by a positive continuous random variable. More specifically, this is used to simplify the likelihood (7) below. The approximation is a common assumption in the literature [7] since the range of flow sizes is very large. (The accuracy of the estimator obtained using this approximation is good see Section IV.) Thus, the function f W should be read as a probability density function. The probability that we observe the pairs (Z i = z i, L i = l i ) for the n sampled flows is proportional to f W (z i ) li ( F W (z i )) li. (6)

5 Maximization of (6) yields the celebrated Kaplan-Meier estimator [] of F W for right censored flows. Recall that we also observe M = m sampled flows that start before time. Similarly to above, consider m flows of infinite size that have started a long time ago (i.e., the infinite renewal process is in a stationary regime). For infinite size flow j, denote the arrival time of the kth packet after time by S j,k = D j, + D j, D j,k, k =, 2,..., Dj, has the equilibrium density function of D given by ( F D (u))/µ D, u, and D j,2,..., D j,k have the same distribution as D. Thus, C j = max{k : S j,k < t} is the number of packets of an infinite size flow j in (, t). For the m sampled flows that start before, let Wj denote the number of packets of the sampled flow j after time. The random variable Wj has density ( F W (w))/µ W and is possibly censored by the independent random variable Cj. We thus observe (Y j = y j, E j = e j ) for the sampled flow j that starts before time, {, W Y j = min(wj, Cj ), E j = j Cj,, Wj > C j. The probability that we observe the pairs (Y j = y j, E i = e j ), j =,..., m, for the sampled flows is proportional to ( m ) ej (( F W (y j ))/µ W ) ej ( F W (v))/µ W dv. y j (7) Under the traffic model, the total number of flows sampled N + M has Poisson distribution with mean λp(e[v ] + t) and E[V ] µ W µ D. Conditioning on N + M = n + m, and using (6) and (7), the likelihood to be maximized is ( ) ( m + n µw µ D ) m ( t ) n m µ W µ D + t µ W µ D + t m f W (z i ) li ( F W (z i )) li (( F W (y j ))/µ W ) ej which simplifies to ( ) ej ( F W (v))/µ W dv, y j f W (z i ) li ( F W (z i )) li m ( F W (y j )) ej ( ) ej / ( F W (v)) dv (µ W + t/µ D ) m+n. (8) y j Since µ W in the denominator also depends on F W, the maximization of (8) is a non-standard problem. The likelihood in (8) has the same form as the one considered in the Laslett s line segment problem [2]. Let x <... < x r be the ordered values of z i and y j for which either l i =, or e j = (these are the uncensored and single end censored sampled sizes), and let φ i and γ i be the number of uncensored and single censored values respectively at x i (with x = ). Denote by n i the double-end censored flow sizes in (x i, x i ], as u i,..., u ini, x r+ =. Instead of using the likelihood (8) with the distribution F W, we have to deal with µ W, Laslett proposed to write (8) in terms of v i = (t + x i )f W (x i ) t + µ W, i =,..., r, with t = t/µ D and let v r+ be such that v +...+v r+ =. We get the equivalent likelihood r n i (θ i θ i+ ) φi θ γi (H i (u ij x i )θ i )Hr nr+, (9) θ i = i v j t + x j, H i = v r+ + ( ) v j t + x i t, + x j i =,..., r, are linear functins of v,..., v r and θ r+ = and H = (t H + x )/(t + x ). The log-likelihood (9) can be written in the form L(v) = N log ( r+ α ij v j ), () α ij >. Turnbull [8] gives an algorithm (now known to be the EM algorithm) for maximization of the likelihood () with respect to v j s as follows: Algorithm 2: Input: x i, u ij, i =,..., r, j =,..., n i, t. Initialization: vi k = /(r + ), k =, i =,..., r +. Loop: Until the stopping condition below is reached, compute: θ k i = v k j t + x j, H k i = v k r+ + v k j ( ) t + x i t, + x j H k = (t H k + x )/(t + x ), ( i v k+ i = φ i + vk i γ j t + x i θ k + vk i j t (x i u lj ) + x i u lj x i )/ (Hl k (u lj x l )θl k ) ) (n + m), i =,..., r. k := k + Stopping condition: max,...,r v k+ i given ɛ >. vi k < ɛ, for a The Matlab code is available upon request. The estimator of the flow size distribution is obtained by going from v i s

6 .9 Empirical distribution function (a) Empirical distribution of the flow arrival times in an observation window (t = 3 min). (b) Flow density for flow size against the average IAT. Fig. 3: Checking the model assumptions. (c) Average autocorrelation function of IATs at lag 3 for flow size against average IAT. back to F W through F W (x i ) = (H i H i )t, i =,..., r. (x i x i )(H ) IV. EXPERIMENTS In this tion, we assess the adequacy of the traffic model and the accuracy of the estimators for the flow duration and flow size distributions under censoring with real Internet traffic [6]. We consider that relative time of the observation window starts 2 min after the beginning of the trace (one hour long). A. Model checking To check the Poisson arrival assumption of flows, we may test if conditionally on the number of arrivals until time t, the arrival times come from a uniform distribution on (, t). Figure 3a plots the empirical distribution of the arrival times of flows during an observation window of 3 min, which agrees with the linear distribution function of the uniform distribution. The standard Kolmogorov-Smirov test for the uniform distribution gives the p-value smaller than 6. The adequacy of the finite renewal process to describe a flow can be assessed in several ways. We need to check if IATs between packets of a flow are i.i.d. and independent of flow size. This is particulary important for the analysis in Section III-B. Note that V i /(W i ), W i 2, can be thought as the average IAT between packets of flow i. We consider all the flows in trace. Figure 3b depicts W against the average IAT in logarithmic scale. The level corresponds to the flow density (number of flows in each small square). If IATs are identically distributed, the average IAT should cover a wide range of values for small flow sizes. Since most flows have a small number of packets, the highest density is concentrated in this region. We do not see multiple regions of high concentration that would suggest several non-identical IATs distribution. As the flow size W increases, the independence of IATs and size should translate into the top of the triangle and the average IAT being around the value the density is higher. In the plot shown some large flows have shorter IATs indicating small deviations from the finite renewal process assumptions for large flows. Regarding the independence of IATs, Figure 3c gives the average of the autocorrelation function of IATs at lag 3 within a flow, calculated individually for each flow with at least three packets and then averaged over squares in a log-log plot, for flow size against average IAT. We conclude that the autocorrelation is weak and we have observed that it diminishes as the lag between IATs increases. In conclusion, there is no indication of any severe problem with model assumptions used in the censoring analysis. B. Flow duration and size distribution Figure 4a depicts the log plot of the complementary distribution function (CDF) of flow duration with at least two packets F V obtained with the trivial estimators and the estimator developed in the censoring analysis of Section III-A. The length of the observation window is 3 min and the probability of sampling a flow is.9. If p is very small, we do not have enough sampled flows to use in the estimation. This also depends on the number of active flows in the window of time. Sampled flows are classified based on SYN and FIN flags and a timeout between packets (3 ) - see Sections II-C and II-D. The estimators are plotted against the original CDF computed with all flows in trace with at least two packets. Comparing with Figure 2, the trivial estimators shows the same poor performance with sampling. When we consider all the sampled types of flows (i.e.,, s.e.c. and d.c. flows), there is an intriguing overestimation of the complementary distribution in comparison with the other trivial estimators see Section II-D for a discussion. The estimator of Section III-A that uses the information of the censored sampled flows shows a very good accuracy and the converge of the EM algorithm is quite fast. We only observe durations within (, 8), and so we are not able to estimate the C.D.F on [8, ). However, we can use (5) to estimate the mean flow duration µ V. This gives.9425, with the true value being equal to.353. Figure 4b shows the estimation of F V over a 5 min window. The sampling probability is p =.5 since more flows now can be sampled in a larger window. As the window size increases, the bias of the trivial estimators is smaller. We have more variability in the tail of the estimator based on censored analysis due to a reduced number of large durations in the sample. The mean flow duration estimate is

7 F V 2 +r.c. +s.e.c.+d.c (a) t = 3 min and p =.9. F V 2 +r.c. +s.e.c.+d.c (b) t = 5 min and p =.5. Fig. 4: CDF of flow durations. Figure 5a presents the log plot of the CDF of flow size F W given by the trivial estimators and the estimator developed using censoring analysis (Section III-B). We use the same setting as in Figure 4a with respect to the observation window length and sampling probability. The original flow size distribution is computed using all flows in the trace. For the trivial estimator that considers l.c. and d.c. sampled flows gives a higher estimation of the complementary distribution. These sampled flows that start before the beginning of the observation window tend in general to have a large number of packets due to the heavy-tailed distribution of the flow sizes. The estimator which takes into account the information of the censored flows produces an accurate estimate of the distribution. Contrary to the flow duration which is limited to the length of the window, we do not have here a bound on the flow size of packets since this depends on the flow IATs between packets. We compute this nonparametric estimator using the Turnbull algorithm which converges quickly and plot it until its performance is satisfactory. The mean IAT between conutive packets of a flow, µ D, used as input in the algorithm is estimated from the sampled flows. F W 2 +r.c. +s.e.c+d.c packets (a) t = 3 min and p =.5. F W 2 +r.c. +s.e.c.+d.c packets Fig. 5: CDF of flow sizes. (b) t = 5 min and p =.5. Finally, Figure 5b shows F W in log scale t = 5 min and p =.5. Similar conclusions can be drawn as a larger part of the distribution tail can be estimated with censoring analysis. Due to space limitations we only presented the results for one data trace. The performance of the estimators were also good for other network traces found in [6]. V. CONCLUSIONS AND FUTURE WORK This paper was motivated by the need to sample flows in a constrained time interval due to limited network resources or application needs. The incomplete (censored) record of sampled flows can seriously degrade the accuracy of the estimates of the flow duration and size distributions. Under an appropriate mathematical description of flow sampling in an observation window and using censoring analysis, we presented nonparametric maximum likelihood estimators that rely on the information of the censored sampled flows. The estimators are computed using the EM algorithm. We evaluated them on a real Internet trace, and showed that they provide an accurate estimation of the distributions of flow level characteristics. As future work we would like to extend this analysis to other available and popular sampling methods such as Dual Sampling, and Sample and Hold [9]. REFERENCES [] N. Duffield, C. Lund, and M. Thorup, Estimating flow distributions from sampled flow statistics, IEEE/ACM Trans. Netw., vol. 3, pp , Oct. 25. [2] A. Chen, Y. Jin, J. Cao, and L. E. Li, Tracking long duration flows in network traffic, in Proc. INFOCOM, March 2, pp. 5. [3] N. Dukkipati and N. McKeown, Why flow-completion time is the right metric for congestion control, SIGCOMM Comput. Commun. Rev., vol. 36, pp , January 26. [4] N. Duffield, Sampling for passive internet measurement: A review, Statist. Sci., vol. 9, no. 3, pp , 24. [5] Y. Sakai, M. Uchida, M. Tsuru, and Y. Oie, Impact of censoring on estimation of flow duration distribution and its mitigation using Kaplan- Meier-based method, IEICE Trans. Inf. & Syst., vol. E92-D, no., pp , 29. [6] N. Duffield, C. Lund, and M. Thorup, Flow sampling under hard resource constraints, in Proc. ACM SIGMETRICS, 24, pp [7] M. Thottan and C. Ji, Anomaly detection in IP networks, Signal Processing, IEEE Transactions on, vol. 5, no. 8, pp , Aug 23. [8] J. Cao, W. S. Cleveland, D. Lin, and D. X. Sun, On the nonstationarity of internet traffic, in Proc. ACM SIGMETRICS, 2, pp [9] B. J. Wijers, Consistent non-parametric estimation for a onedimensional line segment process observed in an interval, Scandinavian Journal of Statistics, vol. 22, no. 3, pp , 995. [] E. L. Kaplan and P. Meier, Nonparametric estimation from incomplete observations, Journal of the American Statistical Association, vol. 53, no. 282, pp , 958. [] J. P. Klein and M. L. Moeschberger, Survival Analysis: Techniques for Censored and Truncated Data. Springer, 23. [2] G. M. Laslett, The survival curve under monotone density constraints with application to two-dimensional line segment processes, Biometrika, vol. 69, no., pp. 53 6, 982. [3] N. Hohn, D. Veitch, and P. Abry, Cluster processes: a natural language for network traffic, IEEE Trans. Signal Process., vol. 5, no. 8, pp , Aug. 23. [4] D. J. Daley and D. Vere-Jones, An Introduction to the theory of point processes. Springer-Verlag, 988. [5] B. Ribeiro, D. Towsley, T. Ye, and J. C. Bolot, Fisher information of sampled packets: an application to flow size estimation, in Proc. ACM SIGCOMM, 26, pp [6] Auckland IX, file , Available: [7] C. Barakat, G. Iannaccone, and C. Diot, Ranking flows from sampled traffic, in Proc. CoNEXT, 25, pp [8] B. W. Turnbull, The empirical distribution function with arbitrarily grouped, censored and truncated data, Journal of the Royal Statistical Society. Series B, vol. 38, no. 3, pp , 976. [9] P. Tune and D. Veitch, Fisher information in flow size estimation, IEEE Trans. Info. Theory, vol. 57, no., pp , Oct. 2.

Mice and Elephants Visualization of Internet

Mice and Elephants Visualization of Internet Mice and Elephants Visualization of Internet Traffic J. S. Marron, Felix Hernandez-Campos 2 and F. D. Smith 2 School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY, 4853,

More information

Maximum Likelihood Estimation of the Flow Size Distribution Tail Index from Sampled Packet Data

Maximum Likelihood Estimation of the Flow Size Distribution Tail Index from Sampled Packet Data Maximum Likelihood Estimation of the Flow Size Distribution Tail Index from Sampled Packet Data Patrick Loiseau 1, Paulo Gonçalves 1, Stéphane Girard 2, Florence Forbes 2, Pascale Vicat-Blanc Primet 1

More information

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm

Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Empirical likelihood ratio with arbitrarily censored/truncated data by EM algorithm Mai Zhou 1 University of Kentucky, Lexington, KY 40506 USA Summary. Empirical likelihood ratio method (Thomas and Grunkmier

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

and Comparison with NPMLE

and Comparison with NPMLE NONPARAMETRIC BAYES ESTIMATOR OF SURVIVAL FUNCTIONS FOR DOUBLY/INTERVAL CENSORED DATA and Comparison with NPMLE Mai Zhou Department of Statistics, University of Kentucky, Lexington, KY 40506 USA http://ms.uky.edu/

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

Robust Lifetime Measurement in Large- Scale P2P Systems with Non-Stationary Arrivals

Robust Lifetime Measurement in Large- Scale P2P Systems with Non-Stationary Arrivals Robust Lifetime Measurement in Large- Scale P2P Systems with Non-Stationary Arrivals Xiaoming Wang Joint work with Zhongmei Yao, Yueping Zhang, and Dmitri Loguinov Internet Research Lab Computer Science

More information

ST745: Survival Analysis: Nonparametric methods

ST745: Survival Analysis: Nonparametric methods ST745: Survival Analysis: Nonparametric methods Eric B. Laber Department of Statistics, North Carolina State University February 5, 2015 The KM estimator is used ubiquitously in medical studies to estimate

More information

Chapter 3. Chord Length Estimation. 3.1 Introduction

Chapter 3. Chord Length Estimation. 3.1 Introduction Chapter 3 Chord Length Estimation 3.1 Introduction Consider a random closed set W R 2 which we observe through a bounded window B. Important characteristics of the probability distribution of a random

More information

CHAPTER 7. Trace Resampling and Load Scaling

CHAPTER 7. Trace Resampling and Load Scaling CHAPTER 7 Trace Resampling and Load Scaling That which is static and repetitive is boring. That which is dynamic and random is confusing. In between lies art. John A. Locke ( 70) Everything that can be

More information

Capturing Network Traffic Dynamics Small Scales. Rolf Riedi

Capturing Network Traffic Dynamics Small Scales. Rolf Riedi Capturing Network Traffic Dynamics Small Scales Rolf Riedi Dept of Statistics Stochastic Systems and Modelling in Networking and Finance Part II Dependable Adaptive Systems and Mathematical Modeling Kaiserslautern,

More information

Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

More information

Empirical Likelihood in Survival Analysis

Empirical Likelihood in Survival Analysis Empirical Likelihood in Survival Analysis Gang Li 1, Runze Li 2, and Mai Zhou 3 1 Department of Biostatistics, University of California, Los Angeles, CA 90095 vli@ucla.edu 2 Department of Statistics, The

More information

SPLITTING AND MERGING OF PACKET TRAFFIC: MEASUREMENT AND MODELLING

SPLITTING AND MERGING OF PACKET TRAFFIC: MEASUREMENT AND MODELLING SPLITTING AND MERGING OF PACKET TRAFFIC: MEASUREMENT AND MODELLING Nicolas Hohn 1 Darryl Veitch 1 Tao Ye 2 1 CUBIN, Department of Electrical & Electronic Engineering University of Melbourne, Vic 3010 Australia

More information

Product-limit estimators of the gap time distribution of a renewal process under different sampling patterns

Product-limit estimators of the gap time distribution of a renewal process under different sampling patterns Product-limit estimators of the gap time distribution of a renewal process under different sampling patterns arxiv:13.182v1 [stat.ap] 28 Feb 21 Richard D. Gill Department of Mathematics University of Leiden

More information

Thomas J. Fisher. Research Statement. Preliminary Results

Thomas J. Fisher. Research Statement. Preliminary Results Thomas J. Fisher Research Statement Preliminary Results Many applications of modern statistics involve a large number of measurements and can be considered in a linear algebra framework. In many of these

More information

11 Survival Analysis and Empirical Likelihood

11 Survival Analysis and Empirical Likelihood 11 Survival Analysis and Empirical Likelihood The first paper of empirical likelihood is actually about confidence intervals with the Kaplan-Meier estimator (Thomas and Grunkmeier 1979), i.e. deals with

More information

Network Traffic Characteristic

Network Traffic Characteristic Network Traffic Characteristic Hojun Lee hlee02@purros.poly.edu 5/24/2002 EL938-Project 1 Outline Motivation What is self-similarity? Behavior of Ethernet traffic Behavior of WAN traffic Behavior of WWW

More information

Censoring mechanisms

Censoring mechanisms Censoring mechanisms Patrick Breheny September 3 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/23 Fixed vs. random censoring In the previous lecture, we derived the contribution to the likelihood

More information

Statistical Models and Algorithms for Real-Time Anomaly Detection Using Multi-Modal Data

Statistical Models and Algorithms for Real-Time Anomaly Detection Using Multi-Modal Data Statistical Models and Algorithms for Real-Time Anomaly Detection Using Multi-Modal Data Taposh Banerjee University of Texas at San Antonio Joint work with Gene Whipps (US Army Research Laboratory) Prudhvi

More information

A Stochastic Model for TCP with Stationary Random Losses

A Stochastic Model for TCP with Stationary Random Losses A Stochastic Model for TCP with Stationary Random Losses Eitan Altman, Kostya Avrachenkov Chadi Barakat INRIA Sophia Antipolis - France ACM SIGCOMM August 31, 2000 Stockholm, Sweden Introduction Outline

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints and Its Application to Empirical Likelihood

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints and Its Application to Empirical Likelihood Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints and Its Application to Empirical Likelihood Mai Zhou Yifan Yang Received:

More information

Hazard Function, Failure Rate, and A Rule of Thumb for Calculating Empirical Hazard Function of Continuous-Time Failure Data

Hazard Function, Failure Rate, and A Rule of Thumb for Calculating Empirical Hazard Function of Continuous-Time Failure Data Hazard Function, Failure Rate, and A Rule of Thumb for Calculating Empirical Hazard Function of Continuous-Time Failure Data Feng-feng Li,2, Gang Xie,2, Yong Sun,2, Lin Ma,2 CRC for Infrastructure and

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

Network Simulation Chapter 5: Traffic Modeling. Chapter Overview

Network Simulation Chapter 5: Traffic Modeling. Chapter Overview Network Simulation Chapter 5: Traffic Modeling Prof. Dr. Jürgen Jasperneite 1 Chapter Overview 1. Basic Simulation Modeling 2. OPNET IT Guru - A Tool for Discrete Event Simulation 3. Review of Basic Probabilities

More information

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen

TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES. Mika Inki and Aapo Hyvärinen TWO METHODS FOR ESTIMATING OVERCOMPLETE INDEPENDENT COMPONENT BASES Mika Inki and Aapo Hyvärinen Neural Networks Research Centre Helsinki University of Technology P.O. Box 54, FIN-215 HUT, Finland ABSTRACT

More information

Stock Sampling with Interval-Censored Elapsed Duration: A Monte Carlo Analysis

Stock Sampling with Interval-Censored Elapsed Duration: A Monte Carlo Analysis Stock Sampling with Interval-Censored Elapsed Duration: A Monte Carlo Analysis Michael P. Babington and Javier Cano-Urbina August 31, 2018 Abstract Duration data obtained from a given stock of individuals

More information

1 Random walks and data

1 Random walks and data Inference, Models and Simulation for Complex Systems CSCI 7-1 Lecture 7 15 September 11 Prof. Aaron Clauset 1 Random walks and data Supposeyou have some time-series data x 1,x,x 3,...,x T and you want

More information

Observed structure of addresses in IP traffic

Observed structure of addresses in IP traffic Observed structure of addresses in IP traffic Eddie Kohler, Jinyang Li, Vern Paxson, Scott Shenker ICSI Center for Internet Research Thanks to David Donoho and Dick Karp Problem How can we model the set

More information

Small and large scale behavior of moments of Poisson cluster processes

Small and large scale behavior of moments of Poisson cluster processes Small and large scale behavior of moments of Poisson cluster processes Nelson Antunes CEMAT/University of Lisbon Patrice Abry Darryl CNRS and École Normale Supérieure de Lyon Vladas Pipiras University

More information

A source model for ISDN packet data traffic *

A source model for ISDN packet data traffic * 1 A source model for ISDN packet data traffic * Kavitha Chandra and Charles Thompson Center for Advanced Computation University of Massachusetts Lowell, Lowell MA 01854 * Proceedings of the 28th Annual

More information

EE 550: Notes on Markov chains, Travel Times, and Opportunistic Routing

EE 550: Notes on Markov chains, Travel Times, and Opportunistic Routing EE 550: Notes on Markov chains, Travel Times, and Opportunistic Routing Michael J. Neely University of Southern California http://www-bcf.usc.edu/ mjneely 1 Abstract This collection of notes provides a

More information

Submitted to IEEE Transactions on Computers, June Evaluating Dynamic Failure Probability for Streams with. (m; k)-firm Deadlines

Submitted to IEEE Transactions on Computers, June Evaluating Dynamic Failure Probability for Streams with. (m; k)-firm Deadlines Submitted to IEEE Transactions on Computers, June 1994 Evaluating Dynamic Failure Probability for Streams with (m; k)-firm Deadlines Moncef Hamdaoui and Parameswaran Ramanathan Department of Electrical

More information

Estimation of the Bivariate and Marginal Distributions with Censored Data

Estimation of the Bivariate and Marginal Distributions with Censored Data Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new

More information

class class ff ff (t) packet loss packet loss (t) - - RED controlled queue Figure : Illustration of a Differentiad Services framework. RED has been an

class class ff ff (t) packet loss packet loss (t) - - RED controlled queue Figure : Illustration of a Differentiad Services framework. RED has been an Modeling RED with Two Traffic Classes P. Kuusela and J. T. Virtamo Laboratory of Telecommunications Technology Helsinki University of Technology P. O. Box 3000, FIN-005 HUT, Finland Email: fpirkko.kuusela,

More information

On the Response Time of Large-scale Composite Web Services

On the Response Time of Large-scale Composite Web Services On the Response Time of Large-scale Composite Web Services Michael Scharf Institute of Communication Networks and Computer Engineering (IKR) University of Stuttgart, Pfaffenwaldring 47, 70569 Stuttgart,

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Sample Based Estimation of Network Traffic Flow Characteristics

Sample Based Estimation of Network Traffic Flow Characteristics Sample Based Estimation of Network Traffic Flow Characteristics by Lili Yang A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Statistics) in The

More information

A flow-based model for Internet backbone traffic

A flow-based model for Internet backbone traffic A flow-based model for Internet backbone traffic Chadi Barakat, Patrick Thiran Gianluca Iannaccone, Christophe iot Philippe Owezarski ICA - SC - EPFL Sprint Labs LAAS-CNRS {Chadi.Barakat,Patrick.Thiran}@epfl.ch

More information

An Early Traffic Sampling Algorithm

An Early Traffic Sampling Algorithm An Early Traffic Sampling Algorithm Hou Ying ( ), Huang Hai, Chen Dan, Wang ShengNan, and Li Peng National Digital Switching System Engineering & Techological R&D center, ZhengZhou, 450002, China ndschy@139.com,

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

COMPUTE CENSORED EMPIRICAL LIKELIHOOD RATIO BY SEQUENTIAL QUADRATIC PROGRAMMING Kun Chen and Mai Zhou University of Kentucky

COMPUTE CENSORED EMPIRICAL LIKELIHOOD RATIO BY SEQUENTIAL QUADRATIC PROGRAMMING Kun Chen and Mai Zhou University of Kentucky COMPUTE CENSORED EMPIRICAL LIKELIHOOD RATIO BY SEQUENTIAL QUADRATIC PROGRAMMING Kun Chen and Mai Zhou University of Kentucky Summary Empirical likelihood ratio method (Thomas and Grunkmier 975, Owen 988,

More information

Evaluation of Effective Bandwidth Schemes for Self-Similar Traffic

Evaluation of Effective Bandwidth Schemes for Self-Similar Traffic Proceedings of the 3th ITC Specialist Seminar on IP Measurement, Modeling and Management, Monterey, CA, September 2000, pp. 2--2-0 Evaluation of Effective Bandwidth Schemes for Self-Similar Traffic Stefan

More information

Censoring and Truncation - Highlighting the Differences

Censoring and Truncation - Highlighting the Differences Censoring and Truncation - Highlighting the Differences Micha Mandel The Hebrew University of Jerusalem, Jerusalem, Israel, 91905 July 9, 2007 Micha Mandel is a Lecturer, Department of Statistics, The

More information

Maximum likelihood estimation of a log-concave density based on censored data

Maximum likelihood estimation of a log-concave density based on censored data Maximum likelihood estimation of a log-concave density based on censored data Dominic Schuhmacher Institute of Mathematical Statistics and Actuarial Science University of Bern Joint work with Lutz Dümbgen

More information

A NOVEL APPROACH TO THE ESTIMATION OF THE HURST PARAMETER IN SELF-SIMILAR TRAFFIC

A NOVEL APPROACH TO THE ESTIMATION OF THE HURST PARAMETER IN SELF-SIMILAR TRAFFIC Proceedings of IEEE Conference on Local Computer Networks, Tampa, Florida, November 2002 A NOVEL APPROACH TO THE ESTIMATION OF THE HURST PARAMETER IN SELF-SIMILAR TRAFFIC Houssain Kettani and John A. Gubner

More information

Visualization Challenges in Internet Traffic Research

Visualization Challenges in Internet Traffic Research Visualization Challenges in Internet Traffic Research Bárbara González Arévalo Department of Mathematics University of Louisiana at Lafayette Lafayette, LA 754- Félix Hernández-Campos Department of Computer

More information

Model Fitting. Jean Yves Le Boudec

Model Fitting. Jean Yves Le Boudec Model Fitting Jean Yves Le Boudec 0 Contents 1. What is model fitting? 2. Linear Regression 3. Linear regression with norm minimization 4. Choosing a distribution 5. Heavy Tail 1 Virus Infection Data We

More information

Multiplicative Multifractal Modeling of. Long-Range-Dependent (LRD) Trac in. Computer Communications Networks. Jianbo Gao and Izhak Rubin

Multiplicative Multifractal Modeling of. Long-Range-Dependent (LRD) Trac in. Computer Communications Networks. Jianbo Gao and Izhak Rubin Multiplicative Multifractal Modeling of Long-Range-Dependent (LRD) Trac in Computer Communications Networks Jianbo Gao and Izhak Rubin Electrical Engineering Department, University of California, Los Angeles

More information

Measurements made for web data, media (IP Radio and TV, BBC Iplayer: Port 80 TCP) and VoIP (Skype: Port UDP) traffic.

Measurements made for web data, media (IP Radio and TV, BBC Iplayer: Port 80 TCP) and VoIP (Skype: Port UDP) traffic. Real time statistical measurements of IPT(Inter-Packet time) of network traffic were done by designing and coding of efficient measurement tools based on the Libpcap package. Traditional Approach of measuring

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Optimal Measurement-based Pricing for an M/M/1 Queue

Optimal Measurement-based Pricing for an M/M/1 Queue Optimal Measurement-based Pricing for an M/M/1 Queue Yezekael Hayel 1 and Bruno Tuffin 1 IRISA-INRIA Rennes, Campus universitaire de Beaulieu 3542 Rennes Cedex - FRANCE {yhayel,btuffin}@irisa.fr tel: (+33)299847134

More information

ECEN 689 Special Topics in Data Science for Communications Networks

ECEN 689 Special Topics in Data Science for Communications Networks ECEN 689 Special Topics in Data Science for Communications Networks Nick Duffield Department of Electrical & Computer Engineering Texas A&M University Lecture 13 Measuring and Inferring Traffic Matrices

More information

An Assessment of Crime Forecasting Models

An Assessment of Crime Forecasting Models An Assessment of Crime Forecasting Models FCSM Research and Policy Conference Washington DC, March 9, 2018 HAUTAHI KINGI, CHRIS ZHANG, BRUNO GASPERINI, AARON HEUSER, MINH HUYNH, JAMES MOORE Introduction

More information

Quantile Regression for Residual Life and Empirical Likelihood

Quantile Regression for Residual Life and Empirical Likelihood Quantile Regression for Residual Life and Empirical Likelihood Mai Zhou email: mai@ms.uky.edu Department of Statistics, University of Kentucky, Lexington, KY 40506-0027, USA Jong-Hyeon Jeong email: jeong@nsabp.pitt.edu

More information

Solutions to COMP9334 Week 8 Sample Problems

Solutions to COMP9334 Week 8 Sample Problems Solutions to COMP9334 Week 8 Sample Problems Problem 1: Customers arrive at a grocery store s checkout counter according to a Poisson process with rate 1 per minute. Each customer carries a number of items

More information

Survival Analysis I (CHL5209H)

Survival Analysis I (CHL5209H) Survival Analysis Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca January 7, 2015 31-1 Literature Clayton D & Hills M (1993): Statistical Models in Epidemiology. Not really

More information

Routing. Topics: 6.976/ESD.937 1

Routing. Topics: 6.976/ESD.937 1 Routing Topics: Definition Architecture for routing data plane algorithm Current routing algorithm control plane algorithm Optimal routing algorithm known algorithms and implementation issues new solution

More information

Overall Plan of Simulation and Modeling I. Chapters

Overall Plan of Simulation and Modeling I. Chapters Overall Plan of Simulation and Modeling I Chapters Introduction to Simulation Discrete Simulation Analytical Modeling Modeling Paradigms Input Modeling Random Number Generation Output Analysis Continuous

More information

An Automatic and Dynamic Parameter Tuning of a Statistic-based Anomaly Detection Algorithm

An Automatic and Dynamic Parameter Tuning of a Statistic-based Anomaly Detection Algorithm An Automatic and Dynamic Parameter Tuning of a Statistic-based Anomaly Detection Algorithm Yosuke Himura The University of Tokyo him@hongo.wide.ad.jp Kensuke Fukuda National Institute of Informatics /

More information

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks Recap Probability, stochastic processes, Markov chains ELEC-C7210 Modeling and analysis of communication networks 1 Recap: Probability theory important distributions Discrete distributions Geometric distribution

More information

REAL-TIME DELAY ESTIMATION BASED ON DELAY HISTORY SUPPLEMENTARY MATERIAL

REAL-TIME DELAY ESTIMATION BASED ON DELAY HISTORY SUPPLEMENTARY MATERIAL REAL-TIME DELAY ESTIMATION BASED ON DELAY HISTORY SUPPLEMENTARY MATERIAL by Rouba Ibrahim and Ward Whitt IEOR Department Columbia University {rei2101, ww2040}@columbia.edu Abstract Motivated by interest

More information

A Queueing System with Queue Length Dependent Service Times, with Applications to Cell Discarding in ATM Networks

A Queueing System with Queue Length Dependent Service Times, with Applications to Cell Discarding in ATM Networks A Queueing System with Queue Length Dependent Service Times, with Applications to Cell Discarding in ATM Networks by Doo Il Choi, Charles Knessl and Charles Tier University of Illinois at Chicago 85 South

More information

Computer Science Department

Computer Science Department Computer Science Department Technical Report NWU-CS-- January, Network Traffic Analysis, Classification, and Prediction Yi Qiao Peter Dinda Abstract This paper describes a detailed study of aggregated

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

Discrete-event simulations

Discrete-event simulations Discrete-event simulations Lecturer: Dmitri A. Moltchanov E-mail: moltchan@cs.tut.fi http://www.cs.tut.fi/kurssit/elt-53606/ OUTLINE: Why do we need simulations? Step-by-step simulations; Classifications;

More information

Capacity management for packet-switched networks with heterogeneous sources. Linda de Jonge. Master Thesis July 29, 2009.

Capacity management for packet-switched networks with heterogeneous sources. Linda de Jonge. Master Thesis July 29, 2009. Capacity management for packet-switched networks with heterogeneous sources Linda de Jonge Master Thesis July 29, 2009 Supervisors Dr. Frank Roijers Prof. dr. ir. Sem Borst Dr. Andreas Löpker Industrial

More information

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

More information

Stabilizing Customer Abandonment in Many-Server Queues with Time-Varying Arrivals

Stabilizing Customer Abandonment in Many-Server Queues with Time-Varying Arrivals OPERATIONS RESEARCH Vol. 6, No. 6, November December 212, pp. 1551 1564 ISSN 3-364X (print) ISSN 1526-5463 (online) http://dx.doi.org/1.1287/opre.112.114 212 INFORMS Stabilizing Customer Abandonment in

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Effective Bandwidth for Traffic Engineering

Effective Bandwidth for Traffic Engineering Brigham Young University BYU ScholarsArchive All Faculty Publications 2-5- Effective Bandwidth for Traffic Engineering Mark J. Clement clement@cs.byu.edu Rob Kunz See next page for additional authors Follow

More information

Analysis of Scalable TCP in the presence of Markovian Losses

Analysis of Scalable TCP in the presence of Markovian Losses Analysis of Scalable TCP in the presence of Markovian Losses E Altman K E Avrachenkov A A Kherani BJ Prabhu INRIA Sophia Antipolis 06902 Sophia Antipolis, France Email:altman,kavratchenkov,alam,bprabhu}@sophiainriafr

More information

Information in Aloha Networks

Information in Aloha Networks Achieving Proportional Fairness using Local Information in Aloha Networks Koushik Kar, Saswati Sarkar, Leandros Tassiulas Abstract We address the problem of attaining proportionally fair rates using Aloha

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests

Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests Biometrika (2014),,, pp. 1 13 C 2014 Biometrika Trust Printed in Great Britain Size and Shape of Confidence Regions from Extended Empirical Likelihood Tests BY M. ZHOU Department of Statistics, University

More information

An Architecture for a WWW Workload Generator. Paul Barford and Mark Crovella. Boston University. September 18, 1997

An Architecture for a WWW Workload Generator. Paul Barford and Mark Crovella. Boston University. September 18, 1997 An Architecture for a WWW Workload Generator Paul Barford and Mark Crovella Computer Science Department Boston University September 18, 1997 1 Overview SURGE (Scalable URL Reference Generator) is a WWW

More information

Simulation. Where real stuff starts

Simulation. Where real stuff starts 1 Simulation Where real stuff starts ToC 1. What is a simulation? 2. Accuracy of output 3. Random Number Generators 4. How to sample 5. Monte Carlo 6. Bootstrap 2 1. What is a simulation? 3 What is a simulation?

More information

ELE539A: Optimization of Communication Systems Lecture 16: Pareto Optimization and Nonconvex Optimization

ELE539A: Optimization of Communication Systems Lecture 16: Pareto Optimization and Nonconvex Optimization ELE539A: Optimization of Communication Systems Lecture 16: Pareto Optimization and Nonconvex Optimization Professor M. Chiang Electrical Engineering Department, Princeton University March 16, 2007 Lecture

More information

Performance Analysis of Priority Queueing Schemes in Internet Routers

Performance Analysis of Priority Queueing Schemes in Internet Routers Conference on Information Sciences and Systems, The Johns Hopkins University, March 8, Performance Analysis of Priority Queueing Schemes in Internet Routers Ashvin Lakshmikantha Coordinated Science Lab

More information

A New Technique for Link Utilization Estimation

A New Technique for Link Utilization Estimation A New Technique for Link Utilization Estimation in Packet Data Networks using SNMP Variables S. Amarnath and Anurag Kumar* Dept. of Electrical Communication Engineering Indian Institute of Science, Bangalore

More information

Processor Sharing Flows in the Internet

Processor Sharing Flows in the Internet STANFORD HPNG TECHNICAL REPORT TR4-HPNG4 Processor Sharing Flows in the Internet Nandita Dukkipati, Nick McKeown Computer Systems Laboratory Stanford University Stanford, CA 9434-93, USA nanditad, nickm

More information

Wavelet and SiZer analyses of Internet Traffic Data

Wavelet and SiZer analyses of Internet Traffic Data Wavelet and SiZer analyses of Internet Traffic Data Cheolwoo Park Statistical and Applied Mathematical Sciences Institute Fred Godtliebsen Department of Mathematics and Statistics, University of Tromsø

More information

Nonparametric Bayes Estimator of Survival Function for Right-Censoring and Left-Truncation Data

Nonparametric Bayes Estimator of Survival Function for Right-Censoring and Left-Truncation Data Nonparametric Bayes Estimator of Survival Function for Right-Censoring and Left-Truncation Data Mai Zhou and Julia Luan Department of Statistics University of Kentucky Lexington, KY 40506-0027, U.S.A.

More information

Modelling TCP with a Discrete Time Markov Chain

Modelling TCP with a Discrete Time Markov Chain Modelling TCP with a Discrete Time Markov Chain José L Gil Motorola josegil@motorola.com ABSTRACT TCP is the most widely used transport protocol in the Internet. The end-to-end performance of most Internet

More information

Sensitivity Analysis for Discrete-Time Randomized Service Priority Queues

Sensitivity Analysis for Discrete-Time Randomized Service Priority Queues Sensitivity Analysis for Discrete-Time Randomized Service Priority Queues George Kesidis 1, Takis Konstantopoulos 2, Michael Zazanis 3 1. Elec. & Comp. Eng. Dept, University of Waterloo, Waterloo, ON,

More information

Statistical inference for Markov deterioration models of bridge conditions in the Netherlands

Statistical inference for Markov deterioration models of bridge conditions in the Netherlands Statistical inference for Markov deterioration models of bridge conditions in the Netherlands M.J.Kallen & J.M. van Noortwijk HKV Consultants, Lelystad, and Delft University of Technology, Delft, Netherlands

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Analytical Bootstrap Methods for Censored Data

Analytical Bootstrap Methods for Censored Data JOURNAL OF APPLIED MATHEMATICS AND DECISION SCIENCES, 6(2, 129 141 Copyright c 2002, Lawrence Erlbaum Associates, Inc. Analytical Bootstrap Methods for Censored Data ALAN D. HUTSON Division of Biostatistics,

More information

14 Random Variables and Simulation

14 Random Variables and Simulation 14 Random Variables and Simulation In this lecture note we consider the relationship between random variables and simulation models. Random variables play two important roles in simulation models. We assume

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 43, NO. 3, MARCH

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 43, NO. 3, MARCH IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 43, NO. 3, MARCH 1998 315 Asymptotic Buffer Overflow Probabilities in Multiclass Multiplexers: An Optimal Control Approach Dimitris Bertsimas, Ioannis Ch. Paschalidis,

More information

ON SPATIAL GOSSIP ALGORITHMS FOR AVERAGE CONSENSUS. Michael G. Rabbat

ON SPATIAL GOSSIP ALGORITHMS FOR AVERAGE CONSENSUS. Michael G. Rabbat ON SPATIAL GOSSIP ALGORITHMS FOR AVERAGE CONSENSUS Michael G. Rabbat Dept. of Electrical and Computer Engineering McGill University, Montréal, Québec Email: michael.rabbat@mcgill.ca ABSTRACT This paper

More information

Extreme Value Analysis and Spatial Extremes

Extreme Value Analysis and Spatial Extremes Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models

More information

Adaptive wavelet decompositions of stochastic processes and some applications

Adaptive wavelet decompositions of stochastic processes and some applications Adaptive wavelet decompositions of stochastic processes and some applications Vladas Pipiras University of North Carolina at Chapel Hill SCAM meeting, June 1, 2012 (joint work with G. Didier, P. Abry)

More information

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters

Goodness-of-fit tests for randomly censored Weibull distributions with estimated parameters Communications for Statistical Applications and Methods 2017, Vol. 24, No. 5, 519 531 https://doi.org/10.5351/csam.2017.24.5.519 Print ISSN 2287-7843 / Online ISSN 2383-4757 Goodness-of-fit tests for randomly

More information

Fast Evaluation of Ensemble Transients of Large IP Networks. University of Maryland, College Park CS-TR May 11, 1998.

Fast Evaluation of Ensemble Transients of Large IP Networks. University of Maryland, College Park CS-TR May 11, 1998. Fast Evaluation of Ensemble Transients of Large IP Networks Catalin T. Popescu cpopescu@cs.umd.edu A. Udaya Shankar shankar@cs.umd.edu Department of Computer Science University of Maryland, College Park

More information

Modeling Residual-Geometric Flow Sampling

Modeling Residual-Geometric Flow Sampling Modeling Residual-Geometric Flow Sampling Xiaoming Wang Joint work with Xiaoyong Li and Dmitri Loguinov Amazon.com Inc., Seattle, WA April 13 th, 2011 1 Agenda Introduction Underlying model of residual

More information

Reliability Engineering I

Reliability Engineering I Happiness is taking the reliability final exam. Reliability Engineering I ENM/MSC 565 Review for the Final Exam Vital Statistics What R&M concepts covered in the course When Monday April 29 from 4:30 6:00

More information

Game Theory and its Applications to Networks - Part I: Strict Competition

Game Theory and its Applications to Networks - Part I: Strict Competition Game Theory and its Applications to Networks - Part I: Strict Competition Corinne Touati Master ENS Lyon, Fall 200 What is Game Theory and what is it for? Definition (Roger Myerson, Game Theory, Analysis

More information

Simulation. Where real stuff starts

Simulation. Where real stuff starts Simulation Where real stuff starts March 2019 1 ToC 1. What is a simulation? 2. Accuracy of output 3. Random Number Generators 4. How to sample 5. Monte Carlo 6. Bootstrap 2 1. What is a simulation? 3

More information