Analysis of an algorithm catching elephants on the Internet

Size: px
Start display at page:

Download "Analysis of an algorithm catching elephants on the Internet"

Transcription

1 Fifth Colloquiu on Matheatics and Coputer Science DMTCS proc. AI, 2008, Analysis of an algorith catching elephants on the Internet Yousra Chabchoub 1, Christine Fricker 1, Frédéric Meunier 2 Tibi 3 and Danielle 1 INRIA, Doaine de Voluceau, Le Chesnay CX, France 2 Université Paris Est, LVMT, Ecole des Ponts et Chaussées, 6 et 8 avenue Blaise Pascal, Cité Descartes, Chaps sur Marne Marne-la-Vallée 3 Université Paris 7, UMR 7599, 2 Place Jussieu, Paris Cedex 05 The paper deals with the proble of catching the elephants in the Internet traffic. The ai is to investigate an algorith proposed by Azzana based on a ultistage Bloo filter, with a refreshent echanis called shift in the present paper), able to treat on-line a huge aount of flows with high traffic variations. An analysis of a siplified odel estiates the nuber of false positives. Liit theores for the Markov chain that describes the algorith for large filters are rigorously obtained. The asyptotic behavior of the stochastic odel is here deterinistic. The liit has a nice forulation in ters of a M/G/1/C queue, which is analytically tractable and which allows to tune the algorith optially. Keywords: attack detection; Bloo filter; coupon collector; elephants and ice; network ining Introduction One traditionally distinguishes two kinds of flows in the Internet traffic: long flows, called elephants, which are the less nuerous typically 5-10%), and short flows, called ice, which are the ost nuerous. The convention is to fix a threshold C and to call elephant any flow having ore than C packets, and ouse any flow having strictly less than C packets. For various reasons, detections of attacks, pricing, statistics, it is an iportant task to be able to catch the elephants, that eans to be able to get the list of all elephants, with their IP addresses, flowing through a given router. We ephasize the fact that this proble is distinct fro the one consisting only in providing statistical estiates for the traffic. Even a siple on-line counting of the nuber of distinct flows reveals to be difficult due to the high throughput of the traffic. There is a wide literature on algoriths for fast estiations of cardinality i.e. of the nuber of distinct eleents in a set with repeated eleents) of huge data sets see [6], [9] and [11]). A siilar question consists in finding the k ost frequent flows the so-called icebergs see [4] and [12]). If one asks for the proportion of elephants or the size distribution of elephants, it is possible to use the Adaptive Sapling algorith proposed by Wegan and analyzed by Flajolet [8], which provides a saple of the flows independently fro their size. This saple can then be used to copute statistics for the elephants c 2008 Discrete Matheatics and Theoretical Coputer Science DMTCS), Nancy, France

2 296 Yousra Chabchoub, Christine Fricker, Frédéric Meunier and Danielle Tibi and actually for all the flows). For counting the elephants, Gandouet and Jean-Marie have proposed in [10] an algorith based on sapling, thus requiring a knowledge on the flow size distribution, which reduces its application. For the target applications, a unique elephant hidden in a huge traffic of ice which does not exist fro a statistical point of view has to be detected. For this purpose, an algorith based on Bloo filters has been already presented by Estan and Varghese [7] in As explained here, a Bloo filter allows elephants to accuulate but due to huge traffic, collisions occur and ice can be detected as elephants. Collisions will be controlled by taking several stages and cleverly flushing the filter. More precisely, the principle of this latter algorith is the following. The IP header of each packet is hashed with k independent hashing functions in a k-ulti-stage filter. Counters are increented and when a counter reaches C, the corresponding flow is declared as an elephant and its packets are counted to give eventually its size. The proble is that in fact, under a heavy Internet traffic, the ultistage filter quickly gets totally filled. To avoid this proble, Estan and Varghese propose to periodically every 5 seconds) reinitialize the filter to zero. But, without any a priori knowledge about the traffic intensity, flows arrival rate...), 5 seconds can either be too long in which case the filter can be saturated) or too short a lot of long flows can be issed). Therefore the accuracy of the algorith closely depends on the characteristics of the traffic trace. In order to settle this proble, Azzana [2] introduces an adaptative refreshent echanis, that we will call shift, in the ulti-stage filter algorith. It is an efficient ethod to adapt the algorith to traffic variations: The filter is refreshed with a frequency depending on the current traffic intensity. Moreover the filter is not reinitialized to zero, but a softer technique is used to avoid issing soe elephants. The ain difference with the Estan and Varghese algorith is its ability to deal with traffic variations. Azzana shows in [2] that the refreshent echanis iproves notably the efficiency and accuracy of the algorith see Section 3 for practical results). Paraeters, as the filter size or related to the refreshent echanis, are experientally optiized. Azzana proposes soe eleents of analysis for this algorith. Our purpose is to go further and to provide analytical results when the filter is large. stage 1 Flow A h 1A) h A) 2 h k A) stage 2 stage k All > C Elephants Meory Fig. 1: The ulti-stage filter Description of the algorith The algorith designed by Azzana uses a Bloo filter with counters defined below) and involves four paraeters in input: the nuber k of stages in the Bloo filter, the nuber of counters in each stage, the axiu value C of each counter, i.e. the size threshold C to be declared as an elephant and the filling rate r. A Bloo filter is a set of k stages, each of these stages being a set of counters, initially at 0 and

3 Analysis of an algorith catching elephants on the Internet 297 taking values in {0,..., C}. Together with the k stages F 1,..., F k, one supposes that k hashing functions h 1,..., h k are given, one for each stage. We ake the strong) assuption that these hashing functions are independent, which iplies that k is sall k = 10 is probably the upper liit). Each hashing function h i aps the part of the IP header of a packet indicating the flow to which it belongs, to one of the counters of stage F i. The algorith works on-line on the strea, processing the packets one after the other. Flows identified as elephants are stored in a list E. When a packet is processed, it is first checked if it belongs to a flow already identified as an elephant that is a flow already in E). Indeed, in this case, there is no interest in apping it to the k counters, and the algorith siply forgets this packet. If not, it is apped by the hashing functions on one counter per stage and it increases these counters by one, except for those that have already reached C, in which case they reain at C. When, after processing soe packet, all the k counters are at C, the flow is declared to be an elephant and stored in the dedicated eory E. When the proportion of nonzero counters reaches r in the whole set of k counters, one decreases all nonzero counters by one. This last operation is called the shift. See Figure 1 for an illustration. Motivation of the algorith Packets of a sae flow hit the sae k counters, but two distinct flows ay also increase the sae counter in one or several stages. The idea of using several stages where flows are apped independently and uniforly, intends to reduce the probability of collisions between flows. The shift is crucial in the sense that it prevents the filters to be copletely saturated, that is, to have any counters with high values. Without the shift operation, ice would be very quickly apped to counters equal to C and declared as elephants. The algorith would have a finite lifetie because when the filter is saturated, nothing can be detected. False positive and false negative A false positive is a ouse detected as an elephant by the algorith. A false negative is an elephant not declared as such hence considered as a ouse) by the algorith. Generally, a false negative is worse than a false positive. Think of an attack: One does not want to iss it, and a false alar has less serious consequences than a successful attack. In our context, a false positive is a ouse one packet of which is apped onto counters all C 1. A false negative is due to the shift, and if it happens, it eans that there were at least f C + 1 shifts during the transission tie of soe elephant of size f. If shifts do not occur too often, a false negative is then an elephant whose packets are broadcast at a slow rate. Intuitively, and it will be confired by the forthcoing analysis, if the paraeters actually r) are chosen so as to aintain counters at low values, then shifts occur often, and if one tries to decrease the shift frequency, then the counters tend to have high values. Therefore, a coproise has to be found between these two properties frequency of the shifts, height of the counters), which translates into a coproise between false positives and false negatives. This last coproise depends on the applications. A Markovian representation In this paper, we will ainly focus our analysis on the one-stage filter case when the traffic is ade up only of ice of size 1. The ai is to estiate the proportion of false positives. Fro this analysis, we will then derive results for the general case. Let us now introduce our ain notations.

4 298 Yousra Chabchoub, Christine Fricker, Frédéric Meunier and Danielle Tibi We thus assue that k = 1 and that all flows have size 1. Throughout the paper, Wn i) denotes the proportion of counters having value i just before the nth shift in a filter with counters. According to this notation, Wn 0) is close to and C i=0 W n i) = 1. Notice that the nth shift exactly decreases the nuber of nonzero counters by Wn 1). An iportant part of our analysis will consist in estiating Wn C 1) + Wn C). Indeed, it gives an upper bound on the probability that a flow is declared as an elephant that is a false positive) between the n 1)th and the nth shifts, since, according to our assuptions, there is no elephant at all. In this fraework k = 1 and flows of size one), the algorith has a siple description in ters of urns and balls. Each flow is a ball thrown at rando into one of urns each urn being one of the counters). When a ball falls into an urn with C balls, it is iediately reoved, in order to have at ost C balls in each urn. When the proportion of non epty urns) reaches r, one ball is reoved in every non epty urn. For fixed, Wn ) n N = Wn i)) i=0,...,c is an ergodic Markov chain on soe finite state n N space. Its invariant probability easure π is the distribution of soe variable W. For C = 2, the first non-trivial case, even the expression of the transition atrix P of the Markov chain is cobinatorially quite coplicated and an expression for π sees out of reach. In practice, the nuber of counters per stage is large. This suggests to look at the liiting behavior of the algorith when tends to. We use as far as possible the Markovian structure of the algorith in order to derive rigorous liit theores and analytical expressions for the liiting regie. This is the longest and ost technical part of the paper, which also contains the ain result, fro a atheatical point of view. Main results The odel considered in the paper describes the collisions between ice in order to evaluate the nuber of false positives due to these collisions. In a one-stage filter where all flows are ice of size 1, the Markov chain Wn ) n N describes the evolution of the counters observed just before shift ties. The ain result is that, when is large, the rando vector W converges in distribution to soe deterinistic value w. This result is not quite copletely proved. The way to proceed is classical for large Markovian odels see for exaple [5] and [1]). The idea is to study the convergence of the process over finite ties. It is shown that the Markov chain given by the epirical distributions Wn ) n N converges to a deterinistic dynaical syste w n+1 = F w n ), which has a unique fixed point w. The situation is analogous in discrete tie to the study by Antunes and al. [1]. A Lyapunov function for F would allow to prove the convergence in distribution of W. Such a Lyapunov function is exhibited in the particular case C = 2. The dynaical syste provides a liiting description of the original chain which stationary behavior is then described by w. The fixed point w has the following interpretation. The fixed point w is identified as the invariant probability easure µ λ of the nuber of custoers in an M/G/1/C queue where service ties are 1 and arrival rate is soe λ satisfying the fixed point equation or equivalently µ λ 0) = λ = log 1 + µ λ 1) ). As a byproduct, the stationary tie between two shifts divided by converges in distribution to the constant λ. Thus the inter-shift tie closely related to the nuber of false negatives) and the probability

5 Analysis of an algorith catching elephants on the Internet 299 of false positives are respectively approxiated by λ and bounded by µ λ C 1) + µ λ C) when is large. When ice have general size distribution, the previous odel is extended to an approxiated odel where packets of a given ouse arrive siultaneously. The involved quantity is the invariant easure of an M/G/1/C queue with arrivals by batches with distribution the ouse size distribution. In the case of size 1 ice, the ulti-stage filter case is investigated. Even if µ λ is not explicit, which coplicates the exhibition of a Lyapunov function, the quantities λ and µ λ C 1) + µ λ C) can be nuerically coputed. It appears that the latter quantity is an increasing function of r as r varies fro 0 to 1). Hence, given the ouse size distribution, one can nuerically deterine the values of r for which the algorith perfors well. Section 1 is the ost technical part of the paper. It investigates the one-stage filter in case of size 1 flows. In Section 2, this analysis is generalized to a general ouse size distribution in a siplified odel and to a ulti-stage filter. Then Section 3 is devoted to discussing the perforance of the algorith, to experiental results and iproveents validated through an ipleentation). 1 The Markovian urn and ball odel In this section, C is fixed and we consider the sequence Wn ) n N, where Wn denotes the vector of the proportions of urns with 0,..., C balls just before the nth shift tie. For 1, Wn ) n N is an ergodic Markov chain on the finite state space P r) = { w = w0),..., wc)) ) C+1 N, C wi) = 1 and i=0 C i=1 } wi) = r, where r denotes the sallest integer larger or equal to r) with transition atrix P defined as follows: If Wn = w P r), then Wn+1, distributed according to P w,.), is the epirical distribution of urns when, starting with distribution w, one ball is reoved fro every non epty urn and then balls are thrown at rando until r urns are non epty again, balls overflowing the capacity C being rejected. The required nuber of thrown balls is τ n = r 1 l= r W n 1) Y l, 1) where Y l, l N are independent rando variables with geoetrical distributions on N with respective paraeters l/, i.e. PY l = k) = l/) k 1 1 l/), k 1. { Let F be defined on P = w R C+1 +, } C i=0 wi) = 1 by F w) = T C sw) Pλw) ) 2)

6 300 Yousra Chabchoub, Christine Fricker, Frédéric Meunier and Danielle Tibi where s : w w0) + w1), w2),..., wc), 0) on P { } T C : PN) = w n ) n N, λ : P R +, w log + i=0 w i = w1) ) P, w w0),..., wc 1), wi) i C and P λ is the Poisson { distribution with paraeter λ. Notice that F aps P to itself and, also by definition of λ, P r) def = w R C+1 +, C i=0 wi) = 1 and } C i=1 wi) = r to itself. 1.1 Convergence to a dynaical syste We prove the convergence of W n ) n N to the dynaical syste given by F as tends to +. The following lea is the key arguent. The unifor convergence stated below appears as the convenient way to express the convergence of P w,.) to δ F w) in order to prove both the convergence of W n ) n N, and, later on, the convergence of the stationary distributions. Define x = sup C i=0 x i for x R C+1. Lea 1 For ε > 0, sup P w, {w P r) : w F w) > ε}) 0. w P r) + Proof: The first step is to prove that, for ε > 0, ) λw) > ε 0 3) where λw) = log 1 + w1) = w). By Bienayé-Chebyshev s inequality, it is enough to prove that and τ1 sup P w w P r) ) and P w.) denotes P. W 0 sup w P r) E w ) τ 1 By equation 1), as EY l ) = 1/1 l/), using a change of index, ) τ E 1 w = λw) 0 4) ) τ sup Var 1 w 0. 5) w P r) r 1 l= r w1) r +w1) 1 l = 1 j. 6) j= r +1

7 Analysis of an algorith catching elephants on the Internet 301 A coparison with integrals leads to the following inequalities: log 1 r + w1) r + 1 ) τ E 1 w log 1 r + w1). 1 r It is then easy to show that the two extree ters tend to λw) = log1 + w1)/)), uniforly in w1) [0, 1]. This gives 4). For 5), as VarY l ) = l/)/1 l/) 2, by the sae change of index, ) τ Var 1 w = 1 r +w1) j= r +1 r +w1) j 1 j 2 = j 2 1 ) τ E 1 w. 7) j= r +1 The first ter of the right-hand side is bounded independently of w by + j= r +1 1/j2, which tends to 0 as tends to +. The second ter tends to 0 uniforly in w using 4) together with the unifor bound λw) log1 + 1/)). To obtain the lea, it is then sufficient to prove that, for each ε > 0, ) sup w P r) P w W1 F w) > ε, τ1 λw) ε ) Since W1 and F w) are probability easures on {0,..., C}, to get 8), it is sufficient to prove that for j {0,..., C 1}, sup w P r) P w W1 j) F w)j) > ε, τ1 λw) ε ) ) Let w P r). Define the following rando variables: For 1 i, Ni respectively Ñ i w)) is the nuber of additional balls in urn i when τ1 respectively λw)) new balls are thrown in the urns. One can construct these variables fro the sae sequence of balls i.e. of i.i.d. unifor on {1,..., } rando variables), eaning that balls are thrown in the sae locations for both operations until stopping. This provides a natural coupling for the N i s and Ñi s. Let j {0,..., C 1} be fixed. Given W0 = w, as j C 1, the capacity constraint does not interfere and W1 j) can be represented as W 1 j) = 1 j k=0 i Iw,k 1 {N i =j k} 10) where Iw,k is the set of urns with k balls in soe configuration of urns with distribution sw), so that card Iw,k = sw)k). The su over i is exactly the nuber of urns that contains k balls after the reoving of one ball per urn, and having j balls after new balls have been thrown. By coupling, on the event {W0 = w, τ1 / λw) ε/2}, the following is true: card{i, N i Ñ i w)} ε 2 11)

8 302 Yousra Chabchoub, Christine Fricker, Frédéric Meunier and Danielle Tibi thus, denoting W 1 j) = 1 j k=0 i I 1 w,k { N e i w)=j k}, on the sae event, W 1 j) W 1 j) ε 2. To prove equation 9), it is then sufficient to show that This will result fro sup P w W 1 j) F w)j) > ε w P r) sup w P r) ) 0. E w W 1 j)) F w)j) 0 and sup r) w P Var w W 1 j)) 0. which is quite standard to prove. The key arguent, with classical proof, is the following: If L i is the nuber of balls in urn i when throwing λ balls at rando in urns, if 0 < a < b, then, for all i 1, i 2 ) N 2, i) sup PL 1 λ [a,b] = i 1 ) P λ i 1 ) 0, ii) sup PL 1 = i 1, L 2 = i 2 ) P λ i 1 )P λ i 2 ) 0. λ [a,b] It is applied since λw) [0, log1 + 1/)]. It ends the proof. Proposition 1 If W0 converges in distribution to w 0 P r) then Wn ) n N converges in distribution to the dynaical syste w n ) n N given by the recursion w n+1 = F w n ), n N. Proof: Assue that W0 converges in distribution to w 0 P r). Convergence of W0,..., Wn ) can be proved by induction on n N. By assuption it is true for n = 0. Let us just prove it for n = 1, the sae arguents holding for general n, fro the assued property for n 1. Let g be continuous on the copact) set P r)2. Since the distribution µ of W0 has support in P r), E gw0, W1 )) = gw, w )P w, dw )dµ w) = P r)2 P r) P r) 12) gw, w )P w, dw ) gw, F w))) dµ w) + gw, F w))dµ w). P r) Since g., F.)) is continuous on P r) F being continuous as can be easily checked), the last integral converges to gw 0, w 1 ) by assuption or case n = 0). The first ter is bounded in odulus, for each η > 0, by sup gw, w )P w, dw ) gw, F w)) w P r) P r) 2 g { }) sup P w, w P r), w F w) > ε + η w P r)

9 Analysis of an algorith catching elephants on the Internet 303 where ε is associated to η by the unifor continuity of g on P r)2. By Lea 1, this is less than 2η for sufficiently large. Thus, as tends to +, E gw 0, W 1 )) gw 0, w 1 ). 1.2 Convergence of invariant easures Let, for N, π be the stationary distribution of W n ) n N. Define P as the transition on P r) given by P w,.) = δ F w). Proposition 2 Any liiting point π of π ) N is a probability easure on P r) which is invariant for P i.e. that satisfies F π) = π. Proof: A classical result states that, if P and P, N, are transition kernels on soe etric space E such that, for any bounded continuous f on E, P f is continuous and P f converges to P f uniforly on E then, for any sequence π ) of probability easures such that π is invariant under P, any liiting point of π is invariant under P. Indeed, for any and any bounded continuous f, π P f = π f. If a subsequence π p ) converges weakly to π, then π p f converges to πf. Writing π p P p f = π p P f + π p P p f P f), since P f continuous and bounded since f is), the first ter π p P f converges to πp f and the second ter tends to 0 by unifor convergence of P f to P f. Equation π p P p f = π p f thus gives, in the liit, πp f = πf for any bounded continuous f. Here the difficulty is that the P s and P are transitions on P r) and P r), which are in general disjoint. To solve this difficulty, extend artificially P and P to P by setting: P w,.) = δ F w) for w P \ P r) P w,.) = δ F w) for w P \ P r). The proposition is then deduced fro the classical result if we prove that, for each f continuous on P notice that then P f = f F is continuous), sup P fw) ff w)) 0, w P r) which is straightforward fro Lea 1. The fact that the support of π is in P r) is deduced fro the portanteau theore see Billingsley [3] p.16) using the sequence of closed sets { } C P r),n = w P, r wi) r + 1. n The fixed points of the dynaical syste are the probability easures w on P r) such that i=1 w = F w) = T C sw) P λw) )

10 304 Yousra Chabchoub, Christine Fricker, Frédéric Meunier and Danielle Tibi where λw) = log1 + w1)/)). This is exactly the invariant easure equation for the nuber of custoers just after copletion ties in an M/G/1/C queue with arrival rate λw) and service ties 1, so that it is equivalent to w = µ λw) 13) where µ λ respectively ν λ ) is the liiting distribution of the process of the nuber of custoers in an M/G/1/C respectively M/G/1/ ) queue with arrival rate λ and service ties 1. Indeed, it is well-known that this queue has a liiting distribution for λ R + respectively 0 λ < 1) which is the invariant probability easure of the ebedded Markov chain of the nuber of custoers just after copletion ties. The balance equations here reduce to a recursion syste, so that, even when λ 1, ν λ is well defined up to a ultiplicative constant which can not be noralized into a probability easure in this case). Moreover, ν λ is given by the Pollaczek-Khintchine forula for its generating function: n N ν λ n)u n = ν λ 0) g λu)u 1), for u < 1 14) u g λ u) where g λ u) = e λ1 u) and for λ < 1, ν λ 0) = 1 λ see for exaple Robert [13] p ). Notice that ν λ n) n N) has no closed for. For exaple, the expressions of the first ters are ν λ 1) = ν λ 0)e λ 1), ν λ 2) = ν λ 0)e λ e λ 1 λ), ν λ 3) = ν λ 0)e λ λλ + 2) λ)e λ + e 2λ ) 15) where ν λ 0) = 1 λ if λ < 1. For the M/G/1/C queue, µ λ i) = The following proposition characterizes the fixed points of F. ν λ i) C l=0 ν, i {0,..., C}. 16) λl) Proposition 3 F defined by 2) has one unique fixed point, denoted by w, in P r) given by the liiting distribution µ λ of the nuber of custoers in an M/G/1/C queue with arrival rate λ and service ties 1, where λ is deterined by the iplicit equation µ λ 0) = which is equivalent to λ = log 1 + µ λ 1) ), 17) where µ λ is given by 16) and ν λ by the Pollaczek-Kintchine forula 14). Notice that, oreover, r λ log). The upper bound on λ, obtained fro equation 17) using µ λ 1) r just says that the stationary ean nuber of balls between two shifts is less than the ean nuber of balls thrown until the first shift

11 Analysis of an algorith catching elephants on the Internet 305 starting with epty urns). Moreover λ r, which is clear if λ 1, and obtained if λ < 1 writing 16) for i = 0 and using C l=0 ν λ l) 1 in this equation. This is exactly the fact that the asyptotic stationary ean nuber of balls λ arriving between two shift ties is greater than the nuber of reoved balls at each shift, which is r. It is due to the losses under the capacity liit C. Proof: Only the existence and uniqueness result reains to prove. According to 13), w is soe fixed point if and only if it is a fixed point of the function P r) P r) w µ λw) with λw) = log1 + w1)/)). This function being continuous on the convex copact set P r), by Brouwer s theore, it has a fixed point. To prove uniqueness, let w and w be two fixed points of F in P r). By definition of P r), µ λw) 0) = µ λw )0) =. 18) A coupling arguent shows that, if λ λ then µ λ is stochastically doinated by µ λ, and in particular, µ λ 0) + µ λ 1) µ λ 0) + µ λ 1). 19) It can then be deduced that λw) = λw ). Indeed, if for exaple λw) < λw ), by equations 18) and 19), µ λw) 1) µ λw )1). thus, using 15) together with 18), λw) = log 1 + µ ) λw)1) λw ) = log 1 + µ ) λw )1) which contradicts λw) < λw ). One finally gets λw) = λw ), and then by equation 13), w = w. A Lyapunov function for the dynaical syste given by F on P r) is a function g 0 on P r) such that, for each w P r), gf w)) gw) with equality if and only if w is the fixed point of F. In the particular case C = 2, a Lyapunov function can be exhibited, resulting fro a contracting property of F in this case. Indeed, restricted to P r), F is here given by: [ w = 1 r, w1), w2) = r w1)) F w) =, ) log [ 1 ) log 1 + w1) 1 + w1) ) + ) + r w1) + w1) 1 + w1) P r) is soe one diensional subvariety of R 3, so that any w P r) can be identified with its second coordinate w1) [0, r], or equivalently with λw) = log1 + w1)/)) [0, log1/))]. ]). ],

12 306 Yousra Chabchoub, Christine Fricker, Frédéric Meunier and Danielle Tibi Using this last paraetrization of P r), it is easy to show that F rewrites as G, ) apping the interval I = [0, log1/))] to itself and defined, for λ I, by Gλ) = log λ + e λ. An eleentary coputation shows that G has derivative on I taking values in the interval ] 1, 0], which gives the already known existence and uniqueness of a fixed point λ for G or F, both assertions being equivalent, and λ being equal to λw)). Moreover, the following inequality holds for λ I: Gλ) λ λ λ, equality occurring only at λ = λ. As a result, g defined on P r) by gw) = λw) λ = λw) λw) = + w1) log + w1), is a Lyapunov function for the dynaical syste defined by F. For C > 2, we conjecture the existence of such a g. Theore 1 Assue that a Lyapunov function exists for the dynaical syste given by F on P r) then, as tends to +, the invariant easure of W n ) n N converges to δ w where w is the unique fixed point of F. Thus the following diagra coutes, Wn d) ) n N n + + d) W d) w n ) n N w Proof: We prove that δ w is the unique invariant easure π of P with support in P r). Let g be the Lyapunov function for F on P r). π is P -invariant, thus πp = π and πp g = πg which can be rewritten g F g)dπ = 0. This iplies that g = g F holds π alost surely because g g F 0. Equality being only true at w, π has support in { w}. 2 A ore general odel 2.1 Mice with general size distribution Let Wn ) n N be the sequence of vectors giving the proportions of urns at 0,..., C just before the nth shift tie in a odel where balls are thrown by batches. The balls in a batch are thrown together in a unique urn chosen at rando aong the urns. The ith batch is coposed with S i balls and S i ) i N is a sequence of i.i.d. rando variables distributed as a rando variable S on N with support containing 1. Let φ be the generating function of S. The quantity S i is called the size of batch i. The dynaics is the sae: If, before the nth shift tie, the state is w P r), it first becoes sw) and then a nuber τn defined by 1) of successive batches are thrown in urns until r urns are non epty. The odel generalizes the previous one obtained for S = 1.

13 Analysis of an algorith catching elephants on the Internet 307 Let F be defined on P by F w) = T c sw) C λw),s ) 20) where T C, λ and S are already defined and C λw),s is a copound Poisson distribution i.e. the distribution of the rando variable Y = X S i 21) i=1 where X is independent of S i ) i N with Poisson distribution of paraeter λw). We iic the arguents in Section 1 to obtain the convergence of the stationary distribution of the ergodic Markov chain W n ) n N as tends to + to a Dirac easure at the unique fixed point of F. Propositions 1 and 2 hold. The fixed points of F are described in the following proposition. Proposition 4 F defined for w P by F w) = T c sw) C λw),s ) has a unique fixed point on P r) which is exactly the invariant easure µ λ of the nuber of custoers in a M/G/1/C queue with batches of custoers arriving according to a Poisson process with intensity λ, batch sizes being i.i.d. distributed as S with generating function φ and service ties 1, where λ is deterined by the iplicit equation µ λ 0) = which is equivalent to where for i {0,..., C}, and ν λ is given by + n=0 λ = log 1 + µ λ1) ) µ λ i) = ν λ i) C l=0 ν λl) ν λ n)u n g φu)u 1) = ν λ 0) u g φu), u < 1 where g λ u) = e λ1 u) and ν λ 0) = 1 λes) when λ < 1. Recall that the first ters of ν λ are given by ν λ 1) = ν λ 0)e λ 1) and ν λ 2) = ν λ 0)e λ e λ 1 λps = 1)) where ν λ 0) = 1 λes) when λ < 1, which generalizes the previous expressions. For C = 2, the Lyapunov function defined when S = 1 still works. Furtherore, for C > 2, we assue the existence of a Lyapunov function for F. Theore 1 still holds.

14 308 Yousra Chabchoub, Christine Fricker, Frédéric Meunier and Danielle Tibi 2.2 A ulti-stage filter The filter is previously supposed to have only one stage. Let now assue that the filter has k stages of counters each. The natural odel then consists of k sets of urns where, when a ball is thrown, k copies of this ball are sent siultaneously and independently into the k stages, each falling at rando in one of the urns of its set. If soe ball hits an urn with C balls, then it is rejected. Moreover, when the proportion of non-epty urns in the whole filter reaches r, then one ball is reoved fro each non-epty urn: This is called a shift. The previous analysis extends with one ain difference: When the syste is initialized at soe state w j i), 1 j k, 0 i C), where w j i) is the proportion of urns with i balls in stage j, the nuber τ1 of balls thrown in each stage before the next shift is now asyptotically equivalent to λw), where w = wi), 0 i C) here gives the global proportions of urns in each possible state in the whole filter, that is wi) = 1 k w j i) for 0 i C. Thus λ is the sae function as for the one-filter case, here k j=1 evaluated at the global proportions: λw) = log 1 + k j=1 w j1) k) The one-stage proof of 3) is however not reproducible here, due to the lack of a representation of τ 1 analogous to 1) of Section 1. Another proof can be written. The alternative arguent is provided by noticing that when α balls per stage are thrown, with α / [λw) ε, λw)+ε] using the sae arguents i) and ii) as in Section 1), the epirical distributions of the urns at each stage are precisely known for large ) and do not correspond to the global proportion r of non-epty urns. Once the unifor) convergence of τ 1 / is established, the proof then proceeds along the sae lines as for k = 1 the sae reasoning holding for each stage). Notice however that the Markov property does not hold for the process of global proportions at shift ties, so that convergence in distribution is proved for the process of proportions detailed by stage, then inducing convergence. 3 Discussions 3.1 Synthesis: false positives and false negatives Fro a practical point of view, the ain results are Propositions 3 and 4. Given soe size distribution for the flows the generating function φ of Section 2), these propositions show how the values of the counters can be coputed fro the different paraeters of the algorith, since these values are encoded by the fixed point w of F : according to Theore 1, w is the state reached in the stationary regie when there is one stage and also when there are several stages see Subsection 2.2: one has k j=1 w jc) = kwc)). Moreover, the convergence is experientally really fast see the reark below), which ensures that in practice the algorith lives in the stationary phase. The coponent wi) of w gives the approxiate proportion of counters having value i in the whole Bloo filter. λ is the nuber of packets that arrive between two shifts. w and λ are respectively related to the nuber of false positives and to the nuber of false negatives: ).

15 Analysis of an algorith catching elephants on the Internet 309 The probability that a packet is a false positive is less than wc 1) + wc)) k since, in stage j, the probability to hit a counter at height C is at ost w j C 1) + w j C) and k w j C 1) + w j C)) wc 1) + wc)) k. j=1 The quantity λ is the tie nuber of packets) between two shifts, which is connected to the nuber of false negatives according to the discussion False positive and false negative of the Introduction. 3.2 Ipleentation and tests The algorith has been ipleented with an iproveent called the in-rule already proposed in [7]. Instead of increasing the k counters, an arriving packet is increenting only the counters aong k having the iniu value. Analytically ore difficult to study, the algorith should perfor better: Heuristically, ore flows are needed to reach high values of the counters inducing fewer false positives; oreover, the tie between two shifts is longer and hence the nuber of false negatives is decreased. It has been tested against on two ADSL traffic traces fro France Teleco, involving illions of flows. The perforance of the algorith is evaluated coparing the real nuber of elephants with the value estiated by the algorith. Even under the in-rule, the algorith perfors well only if r stays under a critical value r c, closely dependent on the ice distribution. Siulations have been processed with a one-stage filter with flows of size 1 to evaluate the transient phase duration. It appears that the nuber of shifts to reach the stationary phase is not uch greater than C. Such a result on the speed of convergence sees however theoretically out of reach. References [1] Nelson Antunes, Christine Fricker, Philippe Robert, and Danielle Tibi. Stochastic networks with ultiple stable points. Annals of Probability, 361): , [2] Youssef Azzana. Mesures de la topologie et du trafic Internet. PhD thesis, Université Pierre et Marie Curie, july [3] Patrick Billingsley. Convergence of probability easures. Wiley Series in Probability and Statistics: Probability and Statistics. John Wiley & Sons Inc., New York, second edition, A Wiley- Interscience Publication. [4] E. D. Deaine, A. López-Ortiz, and J. I. Munro. Frequency estiation, of internet packet streas with liited space. In Proceedings of the 10th Annual European Syposiu on Algoriths ESA 2002), pages , Roe, Italy, Septeber [5] Vincent Duas, Fabrice Guillein, and Philippe Robert. A Markovian analysis of additive-increase ultiplicative-decrease algoriths. Adv. in Appl. Probab., 341):85 111, [6] M. Durand and P. Flajolet. Loglog counting of large cardinalities. In G. Di Battista and U. Zwick, editors, Proceedings of the annual european syposiu on algoriths, ESA03), pages , 2003.

16 310 Yousra Chabchoub, Christine Fricker, Frédéric Meunier and Danielle Tibi [7] C. Estan and G. Varghese. New directions in traffic easureent and accounting. In John Wroclawski, editor, Proceedings of the ACM SIGCOMM 2002 Conference on Applications, Technologies, Architectures, and Protocols for Coputer Counications SIGCOMM-02), volue 32, 4 of Coputer Counication Review, pages , New York, August ACM Press. [8] P. Flajolet. On adaptative sapling. Coputing, pages , [9] P. Flajolet, E. Fusy, O. Gandouët, and F. Meunier. Hyperloglog: the analysis of a near-optial cardinality estiation algorith. In Proceedings of the 13th conference on analysis of algorith AofA 07), pages , Juan-les-Pins, France, [10] O. Gandouet and A. Jean-Marie. Loglog counting for the estiation of ip traffic. In Proceedings of the 4th Colloquiu on Matheatics and Coputer Science Algoriths, Trees, Cobinatorics and Probabilities, Nancy, France, [11] F. Giroire. Order statistics and estiating cardinalities of assive data sets. Discrete Applied Matheatics, to appear. [12] G. S. Manku and R. Motwani. Approxiate frequency counts aver data streas. In Proceedings of the 28th VLDB Conference, pages , Hong Kong, China, [13] Philippe Robert. Stochastic networks and queues, volue 52 of Applications of Matheatics. Springer-Verlag, Berlin, Stochastic Modelling and Applied Probability.

LogLog-Beta and More: A New Algorithm for Cardinality Estimation Based on LogLog Counting

LogLog-Beta and More: A New Algorithm for Cardinality Estimation Based on LogLog Counting LogLog-Beta and More: A New Algorith for Cardinality Estiation Based on LogLog Counting Jason Qin, Denys Ki, Yuei Tung The AOLP Core Data Service, AOL, 22000 AOL Way Dulles, VA 20163 E-ail: jasonqin@teaaolco

More information

Analyzing Simulation Results

Analyzing Simulation Results Analyzing Siulation Results Dr. John Mellor-Cruey Departent of Coputer Science Rice University johnc@cs.rice.edu COMP 528 Lecture 20 31 March 2005 Topics for Today Model verification Model validation Transient

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

e-companion ONLY AVAILABLE IN ELECTRONIC FORM OPERATIONS RESEARCH doi 10.1287/opre.1070.0427ec pp. ec1 ec5 e-copanion ONLY AVAILABLE IN ELECTRONIC FORM infors 07 INFORMS Electronic Copanion A Learning Approach for Interactive Marketing to a Custoer

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

On Constant Power Water-filling

On Constant Power Water-filling On Constant Power Water-filling Wei Yu and John M. Cioffi Electrical Engineering Departent Stanford University, Stanford, CA94305, U.S.A. eails: {weiyu,cioffi}@stanford.edu Abstract This paper derives

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay

A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless Networks with Order-Optimal Per-Flow Delay A Low-Coplexity Congestion Control and Scheduling Algorith for Multihop Wireless Networks with Order-Optial Per-Flow Delay Po-Kai Huang, Xiaojun Lin, and Chih-Chun Wang School of Electrical and Coputer

More information

Solutions of some selected problems of Homework 4

Solutions of some selected problems of Homework 4 Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

Kinetic Theory of Gases: Elementary Ideas

Kinetic Theory of Gases: Elementary Ideas Kinetic Theory of Gases: Eleentary Ideas 17th February 2010 1 Kinetic Theory: A Discussion Based on a Siplified iew of the Motion of Gases 1.1 Pressure: Consul Engel and Reid Ch. 33.1) for a discussion

More information

Estimating Entropy and Entropy Norm on Data Streams

Estimating Entropy and Entropy Norm on Data Streams Estiating Entropy and Entropy Nor on Data Streas Ait Chakrabarti 1, Khanh Do Ba 1, and S. Muthukrishnan 2 1 Departent of Coputer Science, Dartouth College, Hanover, NH 03755, USA 2 Departent of Coputer

More information

Fairness via priority scheduling

Fairness via priority scheduling Fairness via priority scheduling Veeraruna Kavitha, N Heachandra and Debayan Das IEOR, IIT Bobay, Mubai, 400076, India vavitha,nh,debayan}@iitbacin Abstract In the context of ulti-agent resource allocation

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Kinetic Theory of Gases: Elementary Ideas

Kinetic Theory of Gases: Elementary Ideas Kinetic Theory of Gases: Eleentary Ideas 9th February 011 1 Kinetic Theory: A Discussion Based on a Siplified iew of the Motion of Gases 1.1 Pressure: Consul Engel and Reid Ch. 33.1) for a discussion of

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography Tight Bounds for axial Identifiability of Failure Nodes in Boolean Network Toography Nicola Galesi Sapienza Università di Roa nicola.galesi@uniroa1.it Fariba Ranjbar Sapienza Università di Roa fariba.ranjbar@uniroa1.it

More information

Study on Markov Alternative Renewal Reward. Process for VLSI Cell Partitioning

Study on Markov Alternative Renewal Reward. Process for VLSI Cell Partitioning Int. Journal of Math. Analysis, Vol. 7, 2013, no. 40, 1949-1960 HIKARI Ltd, www.-hikari.co http://dx.doi.org/10.12988/ia.2013.36142 Study on Markov Alternative Renewal Reward Process for VLSI Cell Partitioning

More information

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science

A Better Algorithm For an Ancient Scheduling Problem. David R. Karger Steven J. Phillips Eric Torng. Department of Computer Science A Better Algorith For an Ancient Scheduling Proble David R. Karger Steven J. Phillips Eric Torng Departent of Coputer Science Stanford University Stanford, CA 9435-4 Abstract One of the oldest and siplest

More information

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels

Extension of CSRSM for the Parametric Study of the Face Stability of Pressurized Tunnels Extension of CSRSM for the Paraetric Study of the Face Stability of Pressurized Tunnels Guilhe Mollon 1, Daniel Dias 2, and Abdul-Haid Soubra 3, M.ASCE 1 LGCIE, INSA Lyon, Université de Lyon, Doaine scientifique

More information

Polygonal Designs: Existence and Construction

Polygonal Designs: Existence and Construction Polygonal Designs: Existence and Construction John Hegean Departent of Matheatics, Stanford University, Stanford, CA 9405 Jeff Langford Departent of Matheatics, Drake University, Des Moines, IA 5011 G

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs

Research Article On the Isolated Vertices and Connectivity in Random Intersection Graphs International Cobinatorics Volue 2011, Article ID 872703, 9 pages doi:10.1155/2011/872703 Research Article On the Isolated Vertices and Connectivity in Rando Intersection Graphs Yilun Shang Institute for

More information

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness

A Note on Scheduling Tall/Small Multiprocessor Tasks with Unit Processing Time to Minimize Maximum Tardiness A Note on Scheduling Tall/Sall Multiprocessor Tasks with Unit Processing Tie to Miniize Maxiu Tardiness Philippe Baptiste and Baruch Schieber IBM T.J. Watson Research Center P.O. Box 218, Yorktown Heights,

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

arxiv: v1 [cs.ds] 3 Feb 2014

arxiv: v1 [cs.ds] 3 Feb 2014 arxiv:40.043v [cs.ds] 3 Feb 04 A Bound on the Expected Optiality of Rando Feasible Solutions to Cobinatorial Optiization Probles Evan A. Sultani The Johns Hopins University APL evan@sultani.co http://www.sultani.co/

More information

arxiv: v3 [cs.ds] 22 Mar 2016

arxiv: v3 [cs.ds] 22 Mar 2016 A Shifting Bloo Filter Fraewor for Set Queries arxiv:1510.03019v3 [cs.ds] Mar 01 ABSTRACT Tong Yang Peing University, China yangtongeail@gail.co Yuanun Zhong Nanjing University, China un@sail.nju.edu.cn

More information

arxiv: v1 [cs.dc] 19 Jul 2011

arxiv: v1 [cs.dc] 19 Jul 2011 DECENTRALIZED LIST SCHEDULING MARC TCHIBOUKDJIAN, NICOLAS GAST, AND DENIS TRYSTRAM arxiv:1107.3734v1 [cs.dc] 19 Jul 2011 ABSTRACT. Classical list scheduling is a very popular and efficient technique for

More information

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies

Lost-Sales Problems with Stochastic Lead Times: Convexity Results for Base-Stock Policies OPERATIONS RESEARCH Vol. 52, No. 5, Septeber October 2004, pp. 795 803 issn 0030-364X eissn 1526-5463 04 5205 0795 infors doi 10.1287/opre.1040.0130 2004 INFORMS TECHNICAL NOTE Lost-Sales Probles with

More information

Birthday Paradox Calculations and Approximation

Birthday Paradox Calculations and Approximation Birthday Paradox Calculations and Approxiation Joshua E. Hill InfoGard Laboratories -March- v. Birthday Proble In the birthday proble, we have a group of n randoly selected people. If we assue that birthdays

More information

Multi-Dimensional Hegselmann-Krause Dynamics

Multi-Dimensional Hegselmann-Krause Dynamics Multi-Diensional Hegselann-Krause Dynaics A. Nedić Industrial and Enterprise Systes Engineering Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu B. Touri Coordinated Science Laboratory

More information

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008

Constrained Consensus and Optimization in Multi-Agent Networks arxiv: v2 [math.oc] 17 Dec 2008 LIDS Report 2779 1 Constrained Consensus and Optiization in Multi-Agent Networks arxiv:0802.3922v2 [ath.oc] 17 Dec 2008 Angelia Nedić, Asuan Ozdaglar, and Pablo A. Parrilo February 15, 2013 Abstract We

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

ZISC Neural Network Base Indicator for Classification Complexity Estimation

ZISC Neural Network Base Indicator for Classification Complexity Estimation ZISC Neural Network Base Indicator for Classification Coplexity Estiation Ivan Budnyk, Abdennasser Сhebira and Kurosh Madani Iages, Signals and Intelligent Systes Laboratory (LISSI / EA 3956) PARIS XII

More information

arxiv: v1 [cs.ds] 17 Mar 2016

arxiv: v1 [cs.ds] 17 Mar 2016 Tight Bounds for Single-Pass Streaing Coplexity of the Set Cover Proble Sepehr Assadi Sanjeev Khanna Yang Li Abstract arxiv:1603.05715v1 [cs.ds] 17 Mar 2016 We resolve the space coplexity of single-pass

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Lecture 21. Interior Point Methods Setup and Algorithm

Lecture 21. Interior Point Methods Setup and Algorithm Lecture 21 Interior Point Methods In 1984, Kararkar introduced a new weakly polynoial tie algorith for solving LPs [Kar84a], [Kar84b]. His algorith was theoretically faster than the ellipsoid ethod and

More information

Device-to-Device Collaboration through Distributed Storage

Device-to-Device Collaboration through Distributed Storage Device-to-Device Collaboration through Distributed Storage Negin Golrezaei, Student Meber, IEEE, Alexandros G Diakis, Meber, IEEE, Andreas F Molisch, Fellow, IEEE Dept of Electrical Eng University of Southern

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Computational and Statistical Learning Theory

Computational and Statistical Learning Theory Coputational and Statistical Learning Theory Proble sets 5 and 6 Due: Noveber th Please send your solutions to learning-subissions@ttic.edu Notations/Definitions Recall the definition of saple based Radeacher

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information

STRONG LAW OF LARGE NUMBERS FOR SCALAR-NORMED SUMS OF ELEMENTS OF REGRESSIVE SEQUENCES OF RANDOM VARIABLES

STRONG LAW OF LARGE NUMBERS FOR SCALAR-NORMED SUMS OF ELEMENTS OF REGRESSIVE SEQUENCES OF RANDOM VARIABLES Annales Univ Sci Budapest, Sect Cop 39 (2013) 365 379 STRONG LAW OF LARGE NUMBERS FOR SCALAR-NORMED SUMS OF ELEMENTS OF REGRESSIVE SEQUENCES OF RANDOM VARIABLES MK Runovska (Kiev, Ukraine) Dedicated to

More information

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS A Thesis Presented to The Faculty of the Departent of Matheatics San Jose State University In Partial Fulfillent of the Requireents

More information

Distributed Subgradient Methods for Multi-agent Optimization

Distributed Subgradient Methods for Multi-agent Optimization 1 Distributed Subgradient Methods for Multi-agent Optiization Angelia Nedić and Asuan Ozdaglar October 29, 2007 Abstract We study a distributed coputation odel for optiizing a su of convex objective functions

More information

Asynchronous Gossip Algorithms for Stochastic Optimization

Asynchronous Gossip Algorithms for Stochastic Optimization Asynchronous Gossip Algoriths for Stochastic Optiization S. Sundhar Ra ECE Dept. University of Illinois Urbana, IL 680 ssrini@illinois.edu A. Nedić IESE Dept. University of Illinois Urbana, IL 680 angelia@illinois.edu

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Curious Bounds for Floor Function Sums

Curious Bounds for Floor Function Sums 1 47 6 11 Journal of Integer Sequences, Vol. 1 (018), Article 18.1.8 Curious Bounds for Floor Function Sus Thotsaporn Thanatipanonda and Elaine Wong 1 Science Division Mahidol University International

More information

Simulation of Discrete Event Systems

Simulation of Discrete Event Systems Siulation of Discrete Event Systes Unit 9 Queueing Models Fall Winter 207/208 Prof. Dr.-Ing. Dipl.-Wirt.-Ing. Sven Tackenberg Benedikt Andrew Latos M.Sc.RWTH Chair and Institute of Industrial Engineering

More information

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Soft Coputing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis Beverly Rivera 1,2, Irbis Gallegos 1, and Vladik Kreinovich 2 1 Regional Cyber and Energy Security Center RCES

More information

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials

Fast Montgomery-like Square Root Computation over GF(2 m ) for All Trinomials Fast Montgoery-like Square Root Coputation over GF( ) for All Trinoials Yin Li a, Yu Zhang a, a Departent of Coputer Science and Technology, Xinyang Noral University, Henan, P.R.China Abstract This letter

More information

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation

On the Communication Complexity of Lipschitzian Optimization for the Coordinated Model of Computation journal of coplexity 6, 459473 (2000) doi:0.006jco.2000.0544, available online at http:www.idealibrary.co on On the Counication Coplexity of Lipschitzian Optiization for the Coordinated Model of Coputation

More information

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition

Upper bound on false alarm rate for landmine detection and classification using syntactic pattern recognition Upper bound on false alar rate for landine detection and classification using syntactic pattern recognition Ahed O. Nasif, Brian L. Mark, Kenneth J. Hintz, and Nathalia Peixoto Dept. of Electrical and

More information

Error Exponents in Asynchronous Communication

Error Exponents in Asynchronous Communication IEEE International Syposiu on Inforation Theory Proceedings Error Exponents in Asynchronous Counication Da Wang EECS Dept., MIT Cabridge, MA, USA Eail: dawang@it.edu Venkat Chandar Lincoln Laboratory,

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval

Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval Unifor Approxiation and Bernstein Polynoials with Coefficients in the Unit Interval Weiang Qian and Marc D. Riedel Electrical and Coputer Engineering, University of Minnesota 200 Union St. S.E. Minneapolis,

More information

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13 CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

ANALYSIS OF HALL-EFFECT THRUSTERS AND ION ENGINES FOR EARTH-TO-MOON TRANSFER

ANALYSIS OF HALL-EFFECT THRUSTERS AND ION ENGINES FOR EARTH-TO-MOON TRANSFER IEPC 003-0034 ANALYSIS OF HALL-EFFECT THRUSTERS AND ION ENGINES FOR EARTH-TO-MOON TRANSFER A. Bober, M. Guelan Asher Space Research Institute, Technion-Israel Institute of Technology, 3000 Haifa, Israel

More information

A Markov Framework for the Simple Genetic Algorithm

A Markov Framework for the Simple Genetic Algorithm A arkov Fraework for the Siple Genetic Algorith Thoas E. Davis*, Jose C. Principe Electrical Engineering Departent University of Florida, Gainesville, FL 326 *WL/NGS Eglin AFB, FL32542 Abstract This paper

More information

Algebraic Approach for Performance Bound Calculus on Transportation Networks

Algebraic Approach for Performance Bound Calculus on Transportation Networks Algebraic Approach for Perforance Bound Calculus on Transportation Networks (Road Network Calculus) Nadir Farhi, Habib Haj-Sale & Jean-Patrick Lebacque Université Paris-Est, IFSTTAR, GRETTIA, F-93166 Noisy-le-Grand,

More information

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs

Algorithms for parallel processor scheduling with distinct due windows and unit-time jobs BULLETIN OF THE POLISH ACADEMY OF SCIENCES TECHNICAL SCIENCES Vol. 57, No. 3, 2009 Algoriths for parallel processor scheduling with distinct due windows and unit-tie obs A. JANIAK 1, W.A. JANIAK 2, and

More information

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS

DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS ISSN 1440-771X AUSTRALIA DEPARTMENT OF ECONOMETRICS AND BUSINESS STATISTICS An Iproved Method for Bandwidth Selection When Estiating ROC Curves Peter G Hall and Rob J Hyndan Working Paper 11/00 An iproved

More information

An Extension to the Tactical Planning Model for a Job Shop: Continuous-Time Control

An Extension to the Tactical Planning Model for a Job Shop: Continuous-Time Control An Extension to the Tactical Planning Model for a Job Shop: Continuous-Tie Control Chee Chong. Teo, Rohit Bhatnagar, and Stephen C. Graves Singapore-MIT Alliance, Nanyang Technological Univ., and Massachusetts

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

The degree of a typical vertex in generalized random intersection graph models

The degree of a typical vertex in generalized random intersection graph models Discrete Matheatics 306 006 15 165 www.elsevier.co/locate/disc The degree of a typical vertex in generalized rando intersection graph odels Jerzy Jaworski a, Michał Karoński a, Dudley Stark b a Departent

More information

A remark on a success rate model for DPA and CPA

A remark on a success rate model for DPA and CPA A reark on a success rate odel for DPA and CPA A. Wieers, BSI Version 0.5 andreas.wieers@bsi.bund.de Septeber 5, 2018 Abstract The success rate is the ost coon evaluation etric for easuring the perforance

More information

Statistics and Probability Letters

Statistics and Probability Letters Statistics and Probability Letters 79 2009 223 233 Contents lists available at ScienceDirect Statistics and Probability Letters journal hoepage: www.elsevier.co/locate/stapro A CLT for a one-diensional

More information

Supplement to: Subsampling Methods for Persistent Homology

Supplement to: Subsampling Methods for Persistent Homology Suppleent to: Subsapling Methods for Persistent Hoology A. Technical results In this section, we present soe technical results that will be used to prove the ain theores. First, we expand the notation

More information

Support recovery in compressed sensing: An estimation theoretic approach

Support recovery in compressed sensing: An estimation theoretic approach Support recovery in copressed sensing: An estiation theoretic approach Ain Karbasi, Ali Horati, Soheil Mohajer, Martin Vetterli School of Coputer and Counication Sciences École Polytechnique Fédérale de

More information

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion

Supplementary Material for Fast and Provable Algorithms for Spectrally Sparse Signal Reconstruction via Low-Rank Hankel Matrix Completion Suppleentary Material for Fast and Provable Algoriths for Spectrally Sparse Signal Reconstruction via Low-Ran Hanel Matrix Copletion Jian-Feng Cai Tianing Wang Ke Wei March 1, 017 Abstract We establish

More information

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40

On Poset Merging. 1 Introduction. Peter Chen Guoli Ding Steve Seiden. Keywords: Merging, Partial Order, Lower Bounds. AMS Classification: 68W40 On Poset Merging Peter Chen Guoli Ding Steve Seiden Abstract We consider the follow poset erging proble: Let X and Y be two subsets of a partially ordered set S. Given coplete inforation about the ordering

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material

Consistent Multiclass Algorithms for Complex Performance Measures. Supplementary Material Consistent Multiclass Algoriths for Coplex Perforance Measures Suppleentary Material Notations. Let λ be the base easure over n given by the unifor rando variable (say U over n. Hence, for all easurable

More information

OPTIMIZATION in multi-agent networks has attracted

OPTIMIZATION in multi-agent networks has attracted Distributed constrained optiization and consensus in uncertain networks via proxial iniization Kostas Margellos, Alessandro Falsone, Sione Garatti and Maria Prandini arxiv:603.039v3 [ath.oc] 3 May 07 Abstract

More information

IN modern society that various systems have become more

IN modern society that various systems have become more Developent of Reliability Function in -Coponent Standby Redundant Syste with Priority Based on Maxiu Entropy Principle Ryosuke Hirata, Ikuo Arizono, Ryosuke Toohiro, Satoshi Oigawa, and Yasuhiko Takeoto

More information

Ph 20.3 Numerical Solution of Ordinary Differential Equations

Ph 20.3 Numerical Solution of Ordinary Differential Equations Ph 20.3 Nuerical Solution of Ordinary Differential Equations Due: Week 5 -v20170314- This Assignent So far, your assignents have tried to failiarize you with the hardware and software in the Physics Coputing

More information

Research Article Approximate Multidegree Reduction of λ-bézier Curves

Research Article Approximate Multidegree Reduction of λ-bézier Curves Matheatical Probles in Engineering Volue 6 Article ID 87 pages http://dxdoiorg//6/87 Research Article Approxiate Multidegree Reduction of λ-bézier Curves Gang Hu Huanxin Cao and Suxia Zhang Departent of

More information

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x)

Ştefan ŞTEFĂNESCU * is the minimum global value for the function h (x) 7Applying Nelder Mead s Optiization Algorith APPLYING NELDER MEAD S OPTIMIZATION ALGORITHM FOR MULTIPLE GLOBAL MINIMA Abstract Ştefan ŞTEFĂNESCU * The iterative deterinistic optiization ethod could not

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

4 = (0.02) 3 13, = 0.25 because = 25. Simi-

4 = (0.02) 3 13, = 0.25 because = 25. Simi- Theore. Let b and be integers greater than. If = (. a a 2 a i ) b,then for any t N, in base (b + t), the fraction has the digital representation = (. a a 2 a i ) b+t, where a i = a i + tk i with k i =

More information

3.8 Three Types of Convergence

3.8 Three Types of Convergence 3.8 Three Types of Convergence 3.8 Three Types of Convergence 93 Suppose that we are given a sequence functions {f k } k N on a set X and another function f on X. What does it ean for f k to converge to

More information

Chaotic Coupled Map Lattices

Chaotic Coupled Map Lattices Chaotic Coupled Map Lattices Author: Dustin Keys Advisors: Dr. Robert Indik, Dr. Kevin Lin 1 Introduction When a syste of chaotic aps is coupled in a way that allows the to share inforation about each

More information

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy Storage Capacity and Dynaics of Nononotonic Networks Bruno Crespi a and Ignazio Lazzizzera b a. IRST, I-38050 Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I-38050 Povo (Trento) Italy INFN Gruppo

More information

Modeling Chemical Reactions with Single Reactant Specie

Modeling Chemical Reactions with Single Reactant Specie Modeling Cheical Reactions with Single Reactant Specie Abhyudai Singh and João edro Hespanha Abstract A procedure for constructing approxiate stochastic odels for cheical reactions involving a single reactant

More information

Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links

Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Time-Varying Jamming Links Vulnerability of MRD-Code-Based Universal Secure Error-Correcting Network Codes under Tie-Varying Jaing Links Jun Kurihara KDDI R&D Laboratories, Inc 2 5 Ohara, Fujiino, Saitaa, 356 8502 Japan Eail: kurihara@kddilabsjp

More information

Optical Properties of Plasmas of High-Z Elements

Optical Properties of Plasmas of High-Z Elements Forschungszentru Karlsruhe Techni und Uwelt Wissenschaftlishe Berichte FZK Optical Properties of Plasas of High-Z Eleents V.Tolach 1, G.Miloshevsy 1, H.Würz Project Kernfusion 1 Heat and Mass Transfer

More information

The Transactional Nature of Quantum Information

The Transactional Nature of Quantum Information The Transactional Nature of Quantu Inforation Subhash Kak Departent of Coputer Science Oklahoa State University Stillwater, OK 7478 ABSTRACT Inforation, in its counications sense, is a transactional property.

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Warning System of Dangerous Chemical Gas in Factory Based on Wireless Sensor Network

Warning System of Dangerous Chemical Gas in Factory Based on Wireless Sensor Network 565 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 59, 07 Guest Editors: Zhuo Yang, Junie Ba, Jing Pan Copyright 07, AIDIC Servizi S.r.l. ISBN 978-88-95608-49-5; ISSN 83-96 The Italian Association

More information

REDUCTION OF FINITE ELEMENT MODELS BY PARAMETER IDENTIFICATION

REDUCTION OF FINITE ELEMENT MODELS BY PARAMETER IDENTIFICATION ISSN 139 14X INFORMATION TECHNOLOGY AND CONTROL, 008, Vol.37, No.3 REDUCTION OF FINITE ELEMENT MODELS BY PARAMETER IDENTIFICATION Riantas Barauskas, Vidantas Riavičius Departent of Syste Analysis, Kaunas

More information

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES S. E. Ahed, R. J. Tokins and A. I. Volodin Departent of Matheatics and Statistics University of Regina Regina,

More information

Chapter 6 1-D Continuous Groups

Chapter 6 1-D Continuous Groups Chapter 6 1-D Continuous Groups Continuous groups consist of group eleents labelled by one or ore continuous variables, say a 1, a 2,, a r, where each variable has a well- defined range. This chapter explores:

More information

A Theoretical Analysis of a Warm Start Technique

A Theoretical Analysis of a Warm Start Technique A Theoretical Analysis of a War Start Technique Martin A. Zinkevich Yahoo! Labs 701 First Avenue Sunnyvale, CA Abstract Batch gradient descent looks at every data point for every step, which is wasteful

More information