Data Streaming Algorithms for Efficient and Accurate Estimation of Flow Size Distribution

Size: px

Start display at page:

Download "Data Streaming Algorithms for Efficient and Accurate Estimation of Flow Size Distribution"

Rodger Houston
6 years ago
Views:

1 Data Streaing Algoriths for Efficient and Accurate Estiation of Flow Size Distribution Abhishek Kuar Minho Sung Jun (Ji) Xu College of Coputing Georgia Institute of Technology Jia Wang AT&T Labs Research ABSTRACT Knowing the distribution of the sizes of traffic flows passing through a network link helps a network operator to characterize network resource usage, infer traffic deands, detect traffic anoalies, and accoodate new traffic deands through better traffic engineering. Previous work on estiating the distribution has been focused on aking inferences fro sapled network traffic. Its accuracy is liited by the (typically) low sapling rate required to ake the sapling operation affordable. In this paper we present a novel data streaing algorith to provide uch ore accurate estiates of flow distribution, using a lossy data structure which consists of an array of counters fitted well into SRAM. For each incoing packet, our algorith only needs to increent one underlying counter, aking the algorith fast enough even for 40 Gbps (OC-768) links. The data structure is lossy in the sense that sizes of ultiple flows ay collide into the sae counter. Our algorith uses Bayesian statistical ethods such as Expectation Maxiization to infer the ost likely distribution that results in the observed counter values after collision. Evaluations of this algorith on large Internet traces obtained fro several sources (including a tier- ISP) deonstrate that it has very high easureent accuracy (within %). Our algorith not only draatically iproves the accuracy of flow distribution easureent, but also contributes to the field of data streaing by foralizing an existing ethodology and applying it to the context of estiating the flow-distribution. Categories and Subject Descriptors C..3 [COMPUTER-COMMUNICATION NETWORKS]: Network Operations - Network Monitoring E. [DATA STRUCTURES] General Ters Algoriths, Measureent, Theory Perission to ake digital or hard copies of all or part of this work for personal or classroo use is granted without fee provided that copies are not ade or distributed for profit or coercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific perission and/or a fee. SIGMETRICS/Perforance 04, June 6, 004, New York, NY, USA. Copyright 004 ACM /04/ $5.00. Keywords Network Measureent, Traffic Analysis, Data Streaing, Statistical Inference. INTRODUCTION The proble of estiating flow distribution on a highspeed link has received considerable attention recently [,, 3, 4, 5, 6]. In this proble, given an arbitrary s, we are interested in knowing the nuber of flows that contain s packets, within a onitoring interval. In other words, we would like to know how the total traffic volue splits into flows of different sizes. An estiate of the flow distribution contains knowledge about the nuber of flows for all possible s, including elephants (large flows), kangaroos/rabbits (ediu flows), and ice (sall flows).. Motivation Flow distribution inforation can be useful in a nuber of applications in network easureent and onitoring. First, flow distribution inforation ay allow service providers to infer the usage pattern of their networks, such as the approxiate nuber of users with dial-up or broadband access. Such inforation on usage patterns can be iportant for the purpose of pricing, billing, infrastructure engineering, and resource planning. In addition, network operators ay also infer the type of applications that are running over a network link without looking into the details of traffic such as how any users are using streaed usic, streaed video, and voice over IP. In the future, we expect ore network applications to be recognizable through flow distribution inforation. Second, flow distribution inforation can help locally detect the existence of an event that causes the transition of the global network dynaics fro one ode to another. An exaple of such ode transition is a sudden increase in the nuber of large flows (i.e., elephants) in a link. Possible events that ay cause this include link failure or route flapping. Merely looking at the total load of the link ay not detect such a transition since this link could be consistently heavily used anyway. Furtherore, flow distribution inforation ay also help us detect various types of Internet security attacks such as DDoS and Internet wors. In the case of DDoS attacks, if the attackers are using spoofed IP addresses, we will observe a significant increase in flows of size. In the case of Internet Here we iniized the overlap with the otivation provided in [].

2 wors, we ay suddenly find a large nuber of flows of a particular size in Internet links around the sae tie, if the wor is a naive one that does not change in size. Also, the historical flow distribution inforation stored at various links ay help us study its evolution over tie. Finally, knowing the flow distribution of each link ay help other network easureent applications such as traffic atrix estiation [7, 8, 9, 0]. Recent work [9, 0] show that it is possible to use toography techniques to infer the traffic atrix fro link load and aggregate input/output traffic at each node. We have preliinary evidence to believe that flow distribution at each node will ake such toography uch ore accurate, since it allows the correlation of not only the total traffic volue (load), but also the correlation of its distribution into different flows.. Proble stateent The proble of coputing the distribution of the sizes of the flows can be foralized as follows. The set of possible s is the set of all positive integers between to z. Here z is the axiu that can be deterined fro the observed data. We denote the total nuber of flows as n, and the nuber of flows that have i packets as n i. We denote the fraction of flows that have i packets as φ i, i.e., φ i = n i. The data that need to be estiated are n the values of n and φ = {φ, φ,, φ z}. Our goal is to find an efficient schee to estiate this flow distribution inforation on a high-speed link (e.g., OC-9 to OC-768) with high accuracy. A naive solution to this proble is to use a hash table of per-flow counters to keep track of all active flows. These counters will later be exained to obtain the flow distribution. Although this approach is straightforward, it is not suitable for a high-speed link for the following reasons. Each flow entry in the hash table is large ( 60 bits) because it needs to store flow labels ( 00 bits), a pointer ( 3 bits) to the next entry if chaining is used to resolve hash collision, and a packet counter ( 3 bits). Since there can be a large nuber of flows (e.g., 0.5 illion) on backbone links during a typical easureent period, a hash table of this size typically can only fit into DRAM. However, DRAM speed cannot keep up with the link rate of OC-9 and higher 3. Another possible approach [] is to saple a sall percentage of packets and then infer the flow distribution fro the sapled traffic. The algorith proposed in [] ay well be the best algorith in getting as uch inforation fro the sapled data as possible. However, its accuracy is liited by the typically low sapling rate (e.g., %) required to ake the sapling operation affordable. Recent work [] has provided theoretical insights into the liitation of inferring flow distribution fro sapled traffic..3 Our approach and contributions The ain contribution of this paper is a novel data streaing algorith to provide uch ore accurate estiates of Linear probing and double hashing will not help save space since there is a tradeoff between the occupancy ratio and probe length. 3 With an average packet size of 000 bits, per-packet processing tie can be no ore than 00 ns and 5 ns, for OC-9 and OC-768, respectively. A hash table operation in DRAM will take hundreds of nanoseconds due to the need to retrieve the correct flow entry, copare the flow labels, and increent and write back the counter. flow distribution. Our algorith uses a lossy data structure that consists of an array of counters. Its total size is sall enough to fit easily in fast SRAM. For each incoing packet, our algorith only needs to increent one underlying counter (in SRAM), aking the algorith fast enough even for 40 Gbps (OC-768) links. The data structure is lossy in the sense that, due to collision in hashing, sizes of ultiple flows ay be accuulated in the sae counter. Therefore, the raw inforation obtained fro the counters can be far away fro the actual flow distribution. Our algorith then uses Bayesian statistical ethods such as Expectation Maxiization (EM) to infer the ost likely distribution that results in the observed counter values after collision. Experients of this algorith on a nuber of large traces deonstrate that it has very high easureent accuracy (within % relative error). However, to achieve this level of accuracy, our algorith needs to know the approxiate (± 50%) value of n, the total nuber of flows, in order to provision sufficient nuber of counters for streaing. Provisioning for the worst case (i.e., when the nuber of concurrent flows are the largest) leads to unnecessary waste of precious SRAM resource in the average case. To address this challenge, we propose a ulti-resolution variant of our algorith that uses a sall and fixed aount of SRAM and does not require any prior knowledge about the approxiate range of n. It guarantees high accuracy in the average case and graceful degradation in accuracy in the worst case. Our algorith not only draatically iproves the accuracy of flow distribution easureent, but also contributes to the field of data streaing by foralizing an existing yet iplicit ethodology, and exploring it in a new direction. Data streaing [] is concerned with processing a long strea of data ites in one pass using a sall working eory in order to answer a class of queries regarding the strea. The challenge is to use this sall eory to reeber as uch inforation pertinent to the queries as possible. In designing this algorith, we foralize the following ethodology. Lossy data structure + Bayesian statistics = Accurate streaing Its ain idea is to first perfor data streaing at very high speed in a sall eory to get the streaing results that are lossy. There are two causes for this loss to be inevitable. First, due to the stringent coputational coplexity requireent of the application (e.g., 5ns per packet when processing OC-768 traffic), the streaing algorith does not have enough processing tie to put the data into the exact place. Second, the streaing algorith does not have enough space to store all the relevant data. Due to the loss, the streaing result is typically far away fro the inforation we would like to estiate. Bayesian statistics is therefore used to recover inforation fro the streaing result as uch as possible. While Bayesian statistics is typically used in existing streaing algoriths to recover the second cause of loss, our algorith uses it ainly to recover the first cause of loss. Also, to the best of our knowledge, our algorith is the first to use sophisticated Bayesian tools such as EM in this recovery. The rest of this paper is organized as follows. In the next section, we provide an overview of the data collection portion of our solution and describe the design of our streaing

3 Online Streaing Module. Update. Raw streaing result Packet strea Header Header Header Offline Processing Module 3. Flow distribution Figure : Syste odel of using data-streaing to estiate flow distribution.. Initialize. A[i] := 0, i =,,..., 3. Update 4. Upon the arrival of a packet pkt 5. ind := hash(pkt.flow label); 6. A[ind] := A[ind] + ; 7. Export data when an epoch ends 8. y j := nuber of j s in A, j =,,..., z; 9. Forward the value y i s to offline analysis; Figure : Algorith for updating the online streaing odule data structure in detail. Section 3 describes our estiation echanis. We foralize the estiation echanis and analyze its correctness in Section 4. Section 5 presents a ulti-resolution version of our echanis that can operate with an array of fixed size. Section 6 evaluates the proposed schee over a nuber of large packet header traces obtained fro various places including a tier- ISP backbone network. We present a brief look at related work with a discussion about the context of our work in Section 7 before concluding in Section 8.. DATA STREAMING USING A LOSSY DATA STRUCTURE In this section, we first give an overview of the syste odel and the design philosophy of our approach. Then we describe our online update schee (i.e., the lossy data structure ) and analyze its coputational and storage coplexity. Finally, we show how our schee interfaces with the technique in [] to reduce the storage coplexity.. Syste odel The overall architecture of our solution is shown in Figure. The online streaing odule is updated upon each packet arrival (arc in Figure ). The easureent proceeds in epochs. At the end of each easureent epoch, the counter values, which we refer to as the raw data, will be paged out fro the online streaing odule, and these counters will be reset to 0 for the next easureent epoch. This raw data will be processed by an offline processing odule (arc in Figure ) that produces a final estiate (arc 3 in Figure ) of the flow distribution 4 using statistical inference techniques. This syste odel reflects our aforeentioned design philosophy of collecting as uch pertinent inforation as possible at the streaing odule, and then copensating for the inforation loss during data collection using Bayesian statistics.. Online streaing odule Our algorith for updating the data-streaing odule upon packet arrivals is shown in Figure. The streaing data structure used by our echanis is extreely siple an array of counters. Upon arrival of a packet at the router, its flow label 5 is hashed to generate an index into this array, 4 In practice, the raw data collected at the streaing odule can also be suarized and paged to persistent storage, where it can be stored till subsequent retrieval and estiation. 5 Our design does not place any constraints on the definition and the counter at this index is increented by. Collisions due to hashing ight cause two or ore flow labels to be hashed to sae indices. Counters at such an index would contain the total nuber of packets belonging to all of the flows colliding into this index. We do not have any explicit echaniss to handle collisions as any such echanis would ipose additional processing and storage overheads that are unsustainable at high speeds. This akes the encoding process very siple and fast. Efficient ipleentations of hash functions [3] allow the online streaing odule to operate at speeds as high as OC-768 without issing any packets..3 Coplexity of online streaing odule In this section, we discuss the storage and coputational coplexities of operating the data streaing odule.. Storage coplexity. This refers to both the aount of fast eory required for ipleenting the array of counters, and the aount of space (in DRAM or disk) to store the raw counter values for later retrieval and estiation by the offline estiation odule. Leveraging on the techniques for efficient ipleentation of a counter array proposed in [], we require 9 bits of SRAM per counter (to be discussed in Section.4). This allows us to ipleent about illion counters with. MB of SRAM. Interestingly, this raw data can be suarized to a very sall size when paged to DRAM or disk. The key fact here is that our estiation echanis does not need to know the apping between counter values and indices. Instead, it only needs to know, for each possible counter value, the nuber (i.e., ) of counters that have this value. Therefore, we can suarize this raw data into a list of <counter value, > tuples. It turns out that, while the nuber of flows is large, the unique s (and consequently, unique counter values) are usually quite sall. For exaple, in a trace with.6 illion packets and 9,000 flows, we observed only about 500 unique counter values. This iplies that ost counter values do not occur (i.e., occur with a of zero) in the array, resulting in a very sall list of <counter value, > tuples. For the above exaple, the suary can be stored in 8KB on persistent storage, thus requiring less than 0.05 bits per packet, or bit for 40 packets.. Coputational coplexity. For each packet, the data streaing odule needs to copute exactly one hash of flow label. It can be any cobination of fields fro the packet header.

4 function and increent exactly one counter. This is anageable even at OC768 (40 Gbps) speeds with off-the-shelf 0ns SRAM. We will show that our efficient (copact) ipleentation of counters (discussed in Section.4) causes very little overhead, allowing operation at OC-768 speed..4 Efficient ipleentation of an array of counters Internet traffic is known to have the property that a few flows can be very large, while ost other flows are sall. Thus, the counters in our array need to be large enough to accoodate the largest. On the other hand, the counter size needs to be ade as sall as possible to save precious SRAM. Recent work on efficient ipleentation of statistical counters [] provides an ideal echanis to balance these two conflicting requireents, which we will leverage on in our schee. For each counter in the array, say 3 bits wide, this echanis uses 3 bits of slow eory (DRAM) to store a large counter and aintains a saller counter, say 7 bits wide, in fast eory (SRAM). As the counters in SRAM exceed a certain threshold value (say 64) due to increents, it increents the value of the corresponding counter in DRAM by 64 and resets the counter in SRAM to 0. There is a -bit per counter overhead that covers the cost of keeping track of counters above the threshold, bringing the total nuber of bits per counter in SRAM to 9. For suitable choices of paraeters, this schee allows an efficient ipleentation of wide counters using a sall aount of SRAM. This technique can be applied sealessly to ipleenting the array of counters required in our data streaing odule. In our algorith 6, the size of each counter in SRAM is 9 bits and in DRAM is 3. Also, since the schee in [] incurs very little extra coputational and eory access overhead, our streaing algorith running on top of it can still achieve high speeds such as OC ESTIMATION MECHANISMS In this section, we describe a collection of estiation echaniss used in the offline processing odule (shown in Figure ). They help to infer the actual flow distribution fro the counter values collected by the online streaing odule. Consider the hypothetical case where there are no hash collisions. In this case the distribution of counter values is the sae as the actual flow distribution. However, collisions do occur with real-world hash functions, thus distorting the distribution of counter values away fro the true flow distribution. This effect 7 is shown in Figure 3, where we process a traffic trace (with 560K flows in it) on arrays of 04K, 5K, 56K, and 8K counters, respectively. We can see that, as the load factor (forally defined later in this section) of the array increases, the nuber of collisions increases, which further exacerbates this distortion. 3. Estiating the total nuber of flows The first quantity that we can estiate fro our counter array is the total nuber of flows during the easureent interval. The first echanis for estiating this quantity (in a different application) using a (0-) bitap is proposed 6 We have carefully checked these paraeters against the specifications in []. 7 Our experients on other traffic traces exhibit siilar distortion effects. e =04K =5K =56K =8K Figure 3: The distribution of s and raw counter values using varies nuber of counters (both x and y axes are in log-scale). = nuber of counters. in [4]. It can be used in our context with slight adaptation. The process of inserting (with collisions) flow counts into our counter array can be odeled as a coupon collector s proble, under the assuption of unifor hashing. As shown in [4], in an array of counters, if the nuber of zero entries is 0 after the insertion of n flows, the axiu likelihood estiator for n is ˆn = ln 0 () This result is also exploited in [6] to design a ore general ulti-resolution bitap schee to estiate n using uch saller eory. 3. Estiating the nuber of flows of size The total nuber of flows containing exactly one packet is arguably the ost significant single piece of inforation hidden in the distribution of s. Fro a odeling perspective, this nuber helps affir or reject statistical hypotheses such as whether the distribution is Zipfian. More iportantly, abnoral or alicious behavior in the Internet, such as port-scanning and DDoS attacks, often anifests itself as a significant increase in the nuber of flows of size. To estiate the nuber of flows of size (denoted by n ), let us look at the process of inserting flow counts into the counter array. Note that a counter of value ust contain exactly one flow of size (i.e., no collision). Based on this insight, we can derive a very accurate estiator for n. Let ˆλ = ˆn be the estiated load factor (in ters of the average nuber of flows that are apped to the sae index) on the array. Our siple estiator for n is ˆn = y eˆλ, where y is the nuber of counters with value. This surprisingly siple estiator ˆn turns out to be very accurate. In our experients shown later, we observed an accuracy of ±% using ˆn. Next, we explain the reasoning behind ˆn. Since the order of packet or flow arrivals does not affect the final values in the counter array, we consider a hypothetical situation where all flows of size and above were inserted in the counter array first. There are altogether n n of the. At this point, none of the flows of size has been inserted. The nuber of flows hashed to an index can be odeled as a binoial distribution Bino(n n, ), which in turn

5 can be approxiated by Poisson( n n ). The total nuber of indices that are not hit by any flow at this point (i.e., indices where the counter value is 0) can be estiated as 0 e n n. Now, assue all the flows of size are inserted into this array. Due to this insertion, soe of these 0 counters will becoe non-zero. The counters with value will be those out of a total of 0 that were zero before the insertion of n flows of size, and were hit by exactly one of these new insertions. By the sae arguent as above, the total nuber of such indices is 0λ e λ, where λ = n. But this nuber should be equal to y. Therefore we have y = 0λ e λ = e n n n which can be siplified as e n = ne n n = y e n () 3.3 Estiating the flow distribution One is tepted to generalize the above process to derive an estiator for the nuber of flows of size, 3, and so on (i.e., estiating n, n 3,..., n z). However, this proves to be difficult due to the following reason. While a counter of value is definitely not involved in a collision, countervalues of and above could be caused by the collision of two or ore flows. For exaple, aong the counters of value, soe correspond to a flow of size, while the others could be two flows of size hashing to the sae index. Thus, while the estiate for n (i.e., the nuber of flows of size ) depends only on our estiate of n, the estiate of n will depend on both n and n. More generally, the estiate of n i will depend on our estiates of n, n, n,, n i. Thus for a large i, the estiate is ore susceptible to errors due to this cuulative dependence effect, resulting in a sharp increase in estiation errors. Therefore, to accurately estiate flow distribution, we will take a ore holistic approach, rather than estiating each quantity step by step. This approach, based on Expectation Maxiization (EM) ethod for coputing Maxiu Likelihood Estiation (MLE), is the sole topic of the next section. 4. ESTIMATING FLOW DISTRIBUTION US- ING EXPECTATION MAXIMIZATION In this section, we describe our Maxiu Likelihood Estiation (MLE) algorith that coputes the flow distribution that is ost likely to result in the observed counter values after the hash collisions. To find this MLE directly is difficult because there is neither a closed-for forula nor a coputation procedure for p(φ y), the distribution of the flow distribution φ conditioned on the observation y. The difficulty of coputing p(φ y) can be attributed to the fact that our observed data is incoplete. To address this proble, we adopted a powerful ethod in statistics called Expectation Maxiization (EM) to iteratively copute the local 8 MLE. EM is especially effective in finding MLE when the observation can be viewed as incoplete data. In our context, the observed counter values can be viewed as incoplete data, and the issing part is 8 EM algoriths in general can only guarantee to converge to a local axiu [5], while MLE often refers to the global axiu. With this understanding, we will oit the word local fro subsequent discussions of MLE using EM. how flows collide with each other during hashing. The evaluation in Section 6 shows that our EM algorith accurately estiates the flow distribution aong all traces we have experiented with. To the best of our knowledge, this is the first work that applies EM algorith to coputing the MLE fro a lossy data structure. 4. Background on EM Let y denote our observation and φ denote the rando variable whose value we would like to estiate. In MLE, we would like to find out φ that axiizes p(φ y). However, it is usually hard to copute such a φ because the forula p(φ y) is either coplicated or does not have a closed for due to issing data. The EM algorith, which captures our intuition on handling issing data, works as follows. It starts with a guess of the paraeters, and then replaces issing values by their expectations given the guessed paraeters, and finally estiates the paraeters assuing the issing data are equal to their estiated values. This new estiate of issing values gives us a better estiate of paraeters. This process will be iterated ultiple ties until the estiated paraeters converge to a set of values (typically a local axiu as entioned above). Forally, EM begins with a guess of the paraeter φ ini, which will serve as φ old for the first iteration. Then the following two alternating steps will be executed iteratively. Expectation step. E old (log p(γ, φ y)) = (log p(γ, φ y)) p(γ φ old, y)dγ, where the expectation averages over the conditional posterior distribution of the issing data γ, given the current estiate φ old. We use the notation Q(φ, φ old ) to denote E old (log p(γ, φ y)) per the convention in statistics literature. For any applications, both p(γ φ, y) and p(φ γ, y) inside the integration forula above are straightforward to copute. Maxiization step. Let φ new be the value of φ that axiizes Q(φ, φ old ). This φ new will serve as φ old for the next iteration. These two steps will be iterated for a nuber steps until φ old and φ new are close enough to each other, a notion that will becoe rigorous later in Section Applying EM to our context Our observation y, obtained fro the output of online streaing odule, is y i (i =,,..., z), the nuber of counters that have value i. Our goal is to estiate φ i, the fraction of flows that are of size i (i =,,..., z). Here z is the axiu counter value observed fro the array. Our EM algorith for estiating φ is shown in Figure 4. We first need a guess of the flow distribution φ ini, and the total nuber of flows n ini. In our algorith, we siply use the distribution obtained fro the raw counter values as φ ini and the total nuber of non-zero counters as n ini. Based on this φ ini and n ini, we can copute, for each possible way of splitting an observed counter value, its average nuber of occurrences. Then the counts n i for flows of corresponding sizes will be credited according to this average. For exaple, when the value of a counter is 3, there are three possible events that result in this observation: (i) 3 = 3 (no hash collision); (ii) 3 = + (a flow of size colliding with a flow of size ); and (iii) 3 = + + (three flows of size hashed to the sae index). Given a guess of the flow distribution, we can estiate the posterior probabilities of these three cases. Say the respective probabilities of these three events

6 Input: y i, nuber of counters that have value i ( i z) Output: MLE for the flow distribution φ. Initialization: pick an initial flow distribution φ (ini) and estiate the total flow count n ini fro Section 3... φ new := φ ini ; n new = n ini 3. while (convergence condition is not satisfied) 4. φ old := φ new ; n old := n new 5. for i := to z 6. foreach β Ω i 7. /*Ω i is the set of all collision patterns 8. that add up to i, defined in Theore */ 9. Suppose β is that f flows of size s, f flows of 0. size s,..., and f q flows of size s q collide into. a counter of value i, then. for j := to q 3. n sj := n sj + y i f j p(β φ old, n, V = i) 4. /* Procedure for coputing p(β φ old, n, V = i) 5. is shown in Theore and Lea.*/ 6. end 7. end 8. end 9. n new z := i= n i 0. for i:= to z. φ new i := n i /n new. end 3. /* noralize the counts n is into flow distribution φ*/. 4. end Figure 4: EM algorith for coputing flow distribution are 0.5, 0.3, and 0., and there are 000 counters with value 3. Then we estiate that, on the average, 500, 300, and 00 counters split in the three above ways, respectively. So we credit 300 * + 00 * 3 = 900 to n, the count of flows of size, and credit 300 and 500 to n and n 3, respectively. Finally, after all observed counter values are split this way, we get the new counts n, n,..., n z, and obtain n new z (= i= ni). We then renoralize the into a new (and refined) flow distribution φ new. We will prove in Section 4.3 that this progra is indeed an instance of the EM algorith. Coputing the probability p(β φ, n, v). Let both n and (size of counter array) be very large so that we can approxiate binoial distribution using Poisson. This approxiation is necessary since our estiates of flow counts can be non-integers. Let λ i denote the average nuber of size i flows (before collision) that are hashed to an (arbitrary) index in the array. In other words, λ i = n i = nφ i. We z define λ = i= λi, which is the average nuber of flows (of all sizes) that is hashed to an (arbitrary) index. Let ind be an arbitrary index into the array and v be the observed value at this index. Let β be the event that f flows of size s, f flows of size s,..., f q flows of size s q collide into this slot, where s < s <... < s q z. Lea. Given φ and n, the a priori (i.e., before observing this value v) probability that event β happens is p(β φ, n) = e λ q i= λ f i s i f i!. Proof. Let B i be the event that f i flows of size s i be apped to the counter indexed by ind. Let C be the event that all other flows have zero arrivals to ind. Since the hashing is unifor, these events B, B,..., B q, and C are independent. Therefore, p(β φ, n) = p(c φ, n) q i= p(bi φ, n). by Pois- Let I = {s, s,..., s q}. Then p(b i φ, n) = e λs i λf i s i f i! son approxiation of binoial distribution. So, q i= p (Bi φ, n) = q i= e λs i λ f i s i f i =! Also, p (C φ, n) = j I e λ j. Therefore, p (β φ, n) = j I e λ j j I e λ j q i= j I e λ j λ f i s i f i! = e λ q q λ f i s i i= f i! However, the situation changes after we have already seen v, the value at the counter indexed by ind. Theore. Let Ω v be the set of all collision patterns p(β φ,n) that add up to v. Then p(β φ, n, v) =, where p(α φ,n) α Ωv p(β φ, n) and p(α φ, n) can be coputed using Lea. Proof. Let Ω be the set of all possible collision patterns as defined before. Let us choose an arbitrary index ind and let V be the counter value at this index. By Bayes rule, p(β φ, n, V = v) = p(v = v β, φ, n)p(β φ, n) p(v = v α, φ, n)p(α φ, n) α Ω However, note that p(v = v α, φ, n) = for all α Ω v (including β) and p(v = v α, φ, n) = 0 for all α Ω Ω v. Therefore, p(β φ, n, v) = p(β φ, n) α Ω v p(α φ, n) 4.3 Our algorith is an EM algorith We next prove that the algorith shown in Figure 4 is indeed an EM algorith. This proof is iportant since the fact that the algorith is an instance of EM guarantees that the outputs fro the iterations of the algorith will converge to a set of local MLEs, according to [5]. Theore. The algorith in Figure 4 is an EM algorith. Proof. Let γ ij denote the nuber of size i flows that are collided (erged) into counters of value j ( i j z). These are the issing data that the algorith in Figure 4 needs to guess in order to estiate the flow distribution φ. The coplete data likelihood function L(φ) (i.e., p(γ, φ y) defined in Section 4.), assuing γ ij is known, is z z i= j=i γij log φi. Then in the expectation step, E (φ old,n)[l(φ) y] = i= z z i= j=i E[γij φold, y, n] log φ i λ f i s i f i! This corresponds to Q(φ, φ old ) in Section 4.. Let γ i = z j=i γij. Define nij = E[γij φold, y, n] and n i = E[γ i φ old, y, n]. z By the linearity of expectation, we know that n i = j=i nij. z Therefore, E (φ old,n)[l(φ) y] = i= ni log φi. Note that the definition of n i here atches the coputation of n i in our algorith (lines 5 to 8). Finally in the axiization step, we need to axiize φi =. Here z i= ni log φi, subject to the constraint z i=

7 n i (i =,,..., z) are constants and φ is are the variables. Using the ethod of Lagrange ultiplier, we know that the n axiu value is achieved when φ i = i z j= n j. This is exactly the renoralization step in our progra (lines 9 to 3) shown in Figure 4. Therefore, our algorith is indeed an EM algorith. 4.4 Coputational coplexity of the EM algorith. It is easy to enuerate all possible events that give rise to a sall counter value. But, for large counter values, the nuber of possible events (hash collisions) that could give rise to the observed value is iense. Thus it is not possible to exhaustively copute the probabilities for all such events. The Zipfian nature of flow-size distribution coes to our rescue here. To reduce the coplexity of enuerating all events that could give rise to a large counter value (say larger than 300), we ignore the cases involving the collision of 4 or ore flows at the corresponding index. Since the nuber of counters with a value larger than 300 is quite sall, and collisions involving 4 or ore flows occur with a low probability, this assuption has very little ipact on the overall estiation echanis. With siilar justifications we ignore events involving 5 or ore collisions for counters larger than 50 but saller than 300 and those involving 7 or ore collisions for all other counters. This reduces the asyptotic coputational coplexity of splitting a counter-value j to O(j 3 ) (for j > 300). Note that we need to do this coputation only once for all counters that have a value j, and the nuber of unique counter-values is quite sall (as discussed earlier in Section.3). Finally, since the nubers of counters with very large values (say larger than 000) is extreely sall, we can ignore splitting such counter values entirely and instead report the counter value as the size of a single flow. This will clearly lead to a slight overestiation of the size of such large flows, but since the average ( 0) is two to three orders of agnitude saller than these large flows, this error is inuscule in relative ters. These optiizations bring the overall coputational coplexity well under control. On a 3. GHz Intel Pentiu 4 desktop, each iteration of the EM takes about 0 seconds. If the easureent epoch is 00 seconds long and we terinate the estiation after five iterations, then the estiation can run as fast as the data streaing odule. 5. MULTI-RESOLUTION ESTIMATION OF FLOW DISTRIBUTION As shown in Figure 3, the raw counter value distribution deviates ore and ore fro the actual flow distribution as the size of the counter array decreases. Our experients in r R r R A A R r R r Section 6 show that the accuracy of estiation falls sharply if the size of the array is less than of the total nuber of 3 flows n. Therefore, for the accurate estiation of flow distribution, we need a counter array that contains at least n 3 entries. However, in real-world Internet traffic, the nuber of flows in the worst case can be any ties ore than in the average case. Provisioning enough counters for the worst case would result in excessive waste of precious SRAM in the average case. In this section, we present a ulti-resolution version of our solution that uses a fixed-size array of counters, and allows a graceful degradation in estiation accu R r+ A r A r A r+ Figure 5: The Multi-Resolution Array of Counters.. Initialize. r = log (M/) ( 3. R i = ( i )M, ( i )M i =,, 3,..., r r )M, M i = r + 4. Arrays A, A,, A r+ are all initialized to 0 5. Update 6. Upon the arrival of a packet pkt 7. ind := hash(pkt.flow label); 8. if (ind R j ) 9. A j [ind od ]++; Figure 6: Algorith for updating MRAC. racy when the total nuber of flows increases. This akes the schee accurate and eory-efficient for the average case while its accuracy degrades only slightly for the worst case. Our design is inspired by a ulti-resolution schee used in [6]. We apply it here to a different context. Our Multi-Resolution Array of Counters (MRAC) schee works as follows. Iagine a virtual array of counters that is large enough to accurately estiate the flow distribution even in the worst case. However, the physical (actual) counter array size is uch saller. Therefore, the virtual array needs to be apped/folded to the actual physical array as shown in Figure 5. Here we describe a base- version of our apping. Its generalization to any arbitrary base b is straightforward. In the base- version, we ap a logical array of M = r counters to r + physical arrays of size each. Half of the hash space will be apped to (folded 4 of into) array, half of the reaining hash space (i.e., the total hash space) will be apped to array, and so on. Finally, we are left with two blocks of hash space of size each. They are directly apped to arrays r and (r+). The M total space taken by the arrays is (log + ). This actual apping/folding algorith is shown in Figure 6. As described above, the arrays A, A,..., A r, A r+ cover the respective hash ranges of [0, M), [ M, 3 M), [ 3 M, M),, [( )M, ( )M), [( )M, M). If a 8 r r r hash index ind is apped to a array, the counter indexed by (ind od ) in that array will be increented. Therefore, the values of r counters in the virtual array ap to (fold into) counter in array A, and the values of r virtual counters ap to counter in array A, and so on. The (r + ) arrays together cover the entire virtual hash space. The regions covered by any two arrays are disjoint. Such a apping is iplicitly a flow sapling (not packet sapling) schee. Array A processes approxiately of the flows (i.e., every packet in approxiately half of the flows), array A processes approxiately of the flows, 4 and so on. Note that the coputational coplexity of this schee is alost the sae as the baseline approach, which

8 is one hash function coputation and one eory access to SRAM. The only additional processing here is to recognize the range that a hash value falls into and to perfor a odulo operation ( ind od in line 9 of Figure 6). Since all operations involve s powers, they can be ipleented efficiently using siple binary logic. The estiation algorith works as follows. It first picks an array that will result in the best estiate of the original flow distribution. The criteria of picking such an array will be discussed next. Suppose the array we pick is i of the size of the virtual array, that is, this array saples approxiately i fraction of the flows. The algorith first estiates the flow distribution fro the array using the baseline approach described in the previous two sections, and then scales the result by i to obtain the estiate for the overall traffic. Since the nuber of very large flows (say larger than 000 packets) is quite sall, we can use the counter values larger than 000 fro all resolutions to refine our estiation for the tail of the distribution. For each of these large counter values, we subtract the average counter value in the corresponding resolution and use the result as the estiated size of the large flows hashed to this counter. In general, the arrays where the sapling rate is high (i.e., the arrays that cover large portions of the virtual hash space) tend to be over-crowded (i.e., with higher average nuber of flows apped to the sae slot). This corresponds to using a very sall array of counters, which results in inaccurate estiation. On the other hand, when the sapling rate is low (i.e., when the array covers a very sall portion of the virtual hash space), the estiation fro the corresponding array will be accurate, but the errors due to (flow) sapling becoe high. Therefore, there is a clear tradeoff between the loss of accuracy due to over-crowding on the one hand and due to sapling on the other. We find that there exists an optial array size in the iddle that iniizes the overall loss of accuracy, which can be found using the following criteria. We pick an array with as high sapling rate as possible, under the constraint that no ore than.5 flows are apped to the sae slot on the average. The reasoning behind this rule is siilar to that used in two existing ultiresolution based schees [6, 6] (for different applications). We oit the details here in the interest of space. Finally, the above design with base- can be generalized to any arbitrary base. Choosing a base that is a power of allows efficient hardware and software ipleentation. Our ipleentation evaluated in Section 6 uses base-4. The base-4 algorith needs 50% less eory than base-, with noinal loss in estiation accuracy. 6. EVALUATION In this section, we evaluate the accuracy of our estiation echanis using real-world Internet traffic traces. We also copare our results with those obtained in [] fro sapled traffic. Our experients deonstrate that our echanis achieves very high accuracy, which is typically an order of agnitude better than sapling-based approaches. 6. Traffic traces We use three sets of traces in our evaluation. The first set coprises of two packet header traces obtained fro a tier- ISP backbone, collected by a Gigascope server [7] on a high speed link leaving a data center in October, 003. Each of the packet header traces lasts a few hours, consists of 700 Source Trace # of flows # of packets ISP Weekday,34,89 68,595,755 Weekend,39,746 8,86,457 NLANR Long 563,080,769,43 Mediu 9,380,668,69 Short 55,55 58,43 [] CAMPUS 45,70 0,065,600 COS 6,038,554 37,000,000 PEERING,89,85 0,000,000 Table : Traces used in our evaluation. illion packet headers and carries 350 GB traffic. In our experients, we used segents taken fro these two traces, one for heavier traffic load on a weekday and the other for light traffic load at a weekend. Table lists the nuber of flows and packets in each trace. The second set of traces we used are publicly available traffic traces fro NLANR. We use three NLANR traces 9 naed Long, Mediu, and Short, based on the nuber of flows in each trace (Table ). Notice that the trace Long actually has fewer packets than Mediu. However, the attribute of significance in our evaluation is the nuber of flows in each trace, and the naes are intuitive in this light. Finally, we use a set of three traces fro [] to copare with previous work on estiating flow-distribution fro sapled statistics. Trace CAMPUS was collected at a LAN near the border of a capus network during a period of 300 inutes. Trace COS was collected at an OC3 link at Colorado State University during January 5 and 6, 003. This period overlaps the onset of Slaer wor [8]. Trace PEERING was collected at a peering link for a period of 37 inutes. 6. Evaluation etrics For coparing the estiated flow distribution with the actual distribution, we considered two possible etrics, Mean Relative Difference (MRD) and Weighted Mean Relative Difference (WMRD). We eventually adopt WMRD as our evaluation etric. The rationale for this choice is given below. The etric MRD is often used in easuring the distance between two probability distributions or ass functions, defined in our context as follows. Suppose the nuber of flows of size i is n i and our estiate of this nuber is ˆn i. The relative error in estiation (i.e., relative difference) is given by n i ˆn i /( n i+ ˆn i ). The ean relative difference over all s is obtained by taking the ean of relative difference over all possible s,, 3,..., z. Therefore, the MRD between the estiated and actual distribution is given by: MRD = z i n i ˆn i n i + ˆn i However, this etric is not suitable for estiating flow distribution for the following reason. The Zipfian nature of the Internet traffic iplies that there are a large nuber of sall flows and only a few large flows. In other words, when i becoes larger, n i becoes saller, and n i ˆn i /( n i+ ˆn i ) becoes larger. Therefore, the errors in estiating the tail 9 We experiented on any other NLANR traces, which yield siilar results as reported in this paper.

9 of the distribution (i.e., n i ˆn i /( n i+ ˆn i ) for large values of i) doinate the value of MRD. This akes no sense since the ain body of the distribution is the large nuber of sall flows, the estiation accuracy of which is discounted in MRD. To reflect the errors in estiating the nuber of large and sall flows in proportion to their actual population, we adopt the aforeentioned second etric called Weighted Mean Relative Difference (WMRD). It is proposed and used in [], for the sae purpose of evaluating the accuracy of estiated flow distribution. In WMRD, we assign a weight of n i+ ˆn i to the relative error in estiating the nuber of flows of size i. Thus the value of WMRD is given by: W MRD = i n i ˆn i n i + ˆn i i n i + ˆn i n i + ˆn i = i n i ˆn i n i i + ˆn i WMRD is also used in our EM algorith to deterine how close our estiate is to the convergence point. In our algorith, we choose a threshold ɛ, and terinate our iterative estiation procedure when the WMRD of the estiates produced by two consecutive estiates falls below ɛ. The intuition here is that, as estiates get closer to the convergence point, the iproveent fro one iteration to the next becoes saller, iplying a saller WMRD between the estiates produced by two consecutive estiates. 6.3 Bucketing flow distribution As can be seen fro Figure 3, the tail of the flow distribution plot is very noisy. This is due to the fact that a sall nuber of large flows are distributed in a large size range in a very sparse way. To obtain a ore intuitive visual depiction of the flow distribution, we use a bucketing schee to sooth out the noise. Buckets are sets of one or ore consecutive integers. The total nuber of flows in a bucket is the su of the nuber of flows of each unique size in the bucket. For sall s, where there are a large nuber of flows for each unique size, we use a bucket size of, iplying no soothing. As we proceed towards large s, gaps between two s that have non-zero counts start appearing (and then widening). We scale the bucket size appropriately so that ost buckets have at least one flow. In the figures depicting flow distribution, each bucket is depicted as a data point, with the id-point of the bucket as its x-coordinate and the total nuber of flows in the bucket divided by the bucket size as its y-coordinate. We ephasize that this bucketing schee is used only for better visualization of results. Our estiation echanis and nuerical results (in WMRD) reported later in this section do not use soothing of any for. 6.4 Estiation using array of counters As entioned earlier in Section 5, we show that the estiation procedure is likely to be ore accurate when the nuber of counters is close to or larger than the total nuber of flows in the easureent epoch. Table shows the effect of the choice of nuber of counters over estiation accuracy for the NLANR traces. The deviation of both the initial guess (taken fro the observed counter value distribution) and the final estiate after 0 iterations of the EM algorith becoes larger when the nuber of counters becoe saller. However, this increase in WMRD is very sall when the nuber of counters stay larger than or equal Trace # of flows Array WMRD of WMRD of in trace size raw data final estiate 04K Long 563,080 5K K K K Mediu 9,380 56K K K K Short 55,55 64K K K WMRD of initial guesses and final esti- Table : ates. WMRD =04K =5K =56K =8K Iteration Figure 7: The WMRD of the estiate vs. the nuber of iterations of the estiation algorith, for the trace Long. to the nuber of flows. The increase becoes pronounced only after the nuber of counters drops to less than of 3 the nuber of flows. Figure 7 shows how the WMRD of the estiates decreases when the nuber of EM iterations increases. The trace used for this experient is trace Long containing 563,080 flows. Each curve corresponds to a different choice of the nuber of counters, ranging fro 8K ( 7 ) to M ( 0 ). The points for iteration 0 correspond to the WMRD of the initial guess, obtained fro the distribution of the raw counter values. All the curves show a downward trend on WMRD to approach zero, indicating progress toward convergence. The curves for = 04K and = 5K begin with uch better initial guesses, thus achieving a uch saller WMRD (in absolute value) within a sall nuber of iterations. This reinforces the notion that using approxiately the sae nuber of counters as the nuber of flows provides uch better estiation accuracy than using a significantly saller nuber of counters. We observe siilar results on other traces. Figure 8 presents the results of running our estiation echanis on the trace Long (siilar results are observed on all traces in Table ). In this experient, the nuber of counters was set to a s power that is closest to the nuber of flows n. Each figure has three curves, corresponding to the actual distribution of s, the distribution of raw counter values, and the result of our estiation echanis, respectively. The near overlap of our estiate with

Birthday Paradox Calculations and Approximation

Birthday Paradox Calculations and Approxiation Joshua E. Hill InfoGard Laboratories -March- v. Birthday Proble In the birthday proble, we have a group of n randoly selected people. If we assue that birthdays