Extreme Value FEC for Reliable Broadcasting in Wireless Networks

Extreme Value FEC for Reliable Broadcasting in Wireless Networks Weiyao Xiao and David Starobinski Deartment of Electrical and Comuter Engineering Boston University, Boston, MA 225 Email:{weiyao, staro}@bu.edu Abstract The advent of ractical rateless codes enables imlementation of highly efficient acket-level forward error correction (FEC) strategies for reliable data broadcasting in loss-rone wireless networks, such as sensor networks. Yet, the critical question of accurately quantifying the roer amount of redundancy has remained largely unsolved. In this aer, we exloit advances in extreme value theory to rigorously address this roblem. Under the asymtotic regime of a large number of receivers, we derive a closed-form exression for the cumulative distribution function (CDF) of the comletion time of file distribution. We show the existence of a hase transition associated with this CDF and accurately locate the transition oint. We derive tight convergence bounds demonstrating the accuracy of the asymtotic estimate for the ractical case of a finite number of receivers. Further, we asymtotically characterize the CDF of the comletion time under heterogeneous acket loss, by establishing a close relationshi between the data broadcasting and multiset couon collector roblems. We demonstrate the benefits of our aroach through simulation and through real exeriments on a testbed of Tmote Sky sensors. Secifically, we augment the existing Rateless Deluge software dissemination rotocol with an extreme value FEC strategy. The exerimental results reveal reduction by a factor of five in retransmission request messages and by a factor of two in total dissemination time, at the cost of a marginally higher number of data acket transmissions in the order of 5%. Index Terms Extreme Value Theory, Couon Collector s Problem, Forward Error Correction (FEC), Rateless Coding, Over-the-Air Programming. I. INTRODUCTION Reliable data broadcasting for wireless networks is an essential service suorting a lethora of alications, including distribution of text and multimedia contents, odcasting, and over-the-air rogramming (OAP) [ 6]. The lossy nature of wireless channels significantly comlicates the task of reliable broadcasting, however. Due to the otentially large number of wireless devices, the socalled broadcast storm [7] henomenon may arise when multile receivers contend over a shared channel to request retransmissions of lost ackets via either acknowledgements (ACKs) or negative acknowledgements (NACKs) messages. Although some mechanisms exist to mitigate the broadcast storm roblem, such as ACK and NACK suression [8], the imact of this roblem can still be considerable [2, 5]. A reliminary version of this aer aeared in the roceedings of the IEEE INFOCOM 9 conference. Ideally, instead of relying on receivers to notify a source about missing ackets, a rocedure commonly referred to as automatic reeat request (ARQ), the source should be able to accurately redict the total number of transmissions required and sends out data without the need for acknowledgements. Packet-level forward error correction (FEC) [9] rovides a ractical aroach towards imlementing this idea. With the advent of rateless codes, such as random linear codes, LT, and rator codes [, ], FEC can be imlemented in a very efficient fashion, whereas a source continuously encodes new ackets based on the M original ackets of a given file. The source then sends out the encoded ackets and as soon as receiver obtains M (or slightly more) distinct ackets, it can reconstruct the entire file successfully. Although FEC broadcasting has been shown to outerform ARQ in many cases [2, 3], the major issue of quantifying the roer amount of redundancy has remained unsolved to a large extent (cf. Section II for related work). While transmitting too many redundant ackets wastes bandwidth and energy, too little redundancy leaves many receivers unable to reconstruct the original file, leading to retransmission requests and eventually the same roblems as encountered by ARQ schemes. In this aer, we exloit advances in extreme value theory (EVT) [4] to rigorously address the roblem of quantifying FEC redundancy in lossy wireless broadcast networks. Our main contributions are as follows. First, under the asymtotic regime of a large number of receivers N, we derive a closed-form exression for the cumulative distribution function (CDF) of the comletion time (i.e., the total number of ackets to be sent by a source to ensure file recovery by all the receivers). Our analysis reveals the existence of a hase transition roerty associated with this CDF. Secifically, we show that there exists a threshold on the number of ackets to be sent below which the robability that a file can be recovered by all the nodes in the network is close to zero. However, if the number of ackets sent is slightly greater than the threshold, then the robability that every node in the network is able to reconstruct the file quickly aroaches one. We accurately locate the threshold value and conduct a sensitivity analysis for the case where the acket loss robability and the number of receivers are imerfectly known. Further, we extend the analysis to the case where comlete file recetion is only required for N K out of N receivers, where K is a small, fixed number. Our second contribution is the derivation of tight con-

2 vergence bounds for a finite number of receivers N. These bounds allow us to estimate the error committed by relacing the exact CDF by its limiting form. They also rovide a means to comute the amount of redundancy needed for finite values of N. The bounds reveal that the asymtotic formula is remarkably accurate even for small values of N (e.g., ). Third, we analyze the heterogeneous acket loss case, whereby different receivers exerience different acket loss robabilities. To this effect, we establish a relationshi between the data broadcasting roblem and the multi-set couon collector s roblem [5]. Exloiting this relationshi, we rovide asymtotically tight bounds on the CDF of the comletion time to successfully disseminate a file to a set of receivers with heterogeneous acket loss robabilities. Last, we conduct real exeriments on a testbed of Tmote Sky sensors that illustrate ractical use of our theoretical findings. Secifically, we embed our extreme-value FEC strategy into the Rateless Deluge OAP rotocol [5] and demonstrate otential for significant reduction in retransmission requests (about 8%) and in comletion time (about 5%) at the cost of marginally higher data acket transmissions (less than 5%) with resect to the original rotocol. The rest of this aer is organized as follows. We first discuss related research on FEC data broadcasting in Section II. After reviewing basics of extreme value theory in Section III, we resent our network model and roblem formalization for homogeneous acket loss in Section IV-A. We conduct an asymtotic analysis of FEC broadcasting as N in Section IV-B, derive convergence bounds for finite N in Section IV-C, and erform a sensitivity analysis in Section IV-D. An asymtotic analysis of the comletion time under heterogeneous acket loss is carried out in Section V. We resent our simulation results and rototye imlementation in Sections VI and VII resectively, and conclude the aer in Section VIII. Due to sace limitation, roofs of some theorems are omitted. They can be found in [6]. II. RELATED WORK Reliable data dissemination is a key enabling technology for wireless sensor networks. It rovides fundamental services, such as dissemination of a software rogram from one source to an entire network [ 6]. Thus, the roblem considered in this aer is different from that of data aggregation, where multile sensors send their data to a sink [7, 8]. The concet of exloiting FEC for reliable multicasting/broadcasting has been the subject of considerable amount of work, both in wireline and wireless settings. We survey here only most closely related work. Rubenstein et al [9] roose a multicast rotocol that requires a source to forward redundant ackets in advance. This rotocol is shown to achieve a significant decrease in the exected time for reliable delivery of data. Rizzo et al roose RMDP [], another FEC-based reliable multicast rotocol, and show that FEC effectively reduces the amount of acknowledgments. However, the roblem of quantifying FEC redundancy remains unsolved. The terms multicasting and broadcasting are used interchangeably in this aer. Huitema [2] and Nonnenmacher et al [22] evaluate the erformance imrovements achieved with different levels of FEC redundancy via numerical comutation. Ghaderi et al [3] and Mosko et al [23] obtain numerical evaluation of the distribution of the comletion time. However, no closed form is rovided to relate the redundancy needed with the robability of success. Eryilmaz et al [24] rovide recursive exression for the average comletion time and Ghaderi et al [3] derive an asymtotic exression for it. However, they do not rovide results for the CDF. Furthermore, to our knowledge, our work is the first to demonstrate the hase transition associated with this CDF and to derive bounds on the asymtotic error for the case of a finite number of receivers. While in ractice the acket loss robability differs from node to node due to many factors (i.e., link quality, distance to the source, antenna sensitivity) all the following references [4, 3, 9 2] assume homogeneous acket loss rates in their analysis. The work in [22, 23] rovide analysis for heterogeneous acket loss robability scenarios. However, unlike our aer, the results are only numerical. A. Extreme Value Theory III. BACKGROUND Let X, X 2,.., X N be indeendent, identically distributed (i.i.d.) random variables. Extreme value theory rovides tools for characterizing ossible limit distributions of samle maxima of the above i.i.d. random variables. Denote by F the CDF of X and by F N the CDF of the maximum of X, X 2,.., X N. Suose there exists a sequence of constants and b N, such that max(x,x2,...,x N ) b N has a nondegenerate limit distribution as N, then or equivalently, lim F N ( x + b N ) = G(x), () lim N F ( x + b N ) = log G(x), (2) where G(x) is the CDF of one of the three extreme value distributions, namely Fréchet, Gumbel and Weibull [4,.9]. For a given random variable, various tests exist to determine its domain of attraction (i.e., the corresonding extreme value distribution) and its normalization constants. In our aer, all the distributions of interest belong to the domain of attraction of the Gumbel distribution, i.e., G(x) = ex( e x ) x R. (3) Under mild technical assumtions [4,.77], the domain of attraction conditions imly also moment convergence. Thus, for distributions belonging to the Gumbel s domain of attraction E[max(X, X 2,..., X N )] b N lim = γ, (4) where γ.5772 is the Euler s constant. The above result can be generalized to obtain characterization of the asymtotic distribution of the K-th largest random variable, where K is a fixed number. Arrange the random variables X i in increasing order as following, X :N X 2:N.. X N:N. Thus, X N:N = max n=,..n X n

3 Fig.. Illustration of convergence metric. is the largest random variable, X N :N is the second largest random variable, etc. According to Ref. [25,. 27], if ( ) lim Pr XN:N b N x = G(x), (5) then ( lim Pr XN K:N b N B. Convergence Metric ) x = G(x) K ( log i! ) i. G(x) (6) As mentioned above, an estimation based on EVT assumes N. We rovide now a metric to study the quality of convergence when N is finite. Secifically, we fix a value on the y-axis and measure the distance on the x-axis between the oints corresonding to the exact distribution F N and the limit distribution G. Secifically, as shown in Fig., let { G(x ) = y, F N (7) ( x + b N ) = y, then the convergence metric is set as follows = x x. (8) In the following section, we will derive a bound on that alies uniformly to an entire interval [y l, y h ], where y l y h <. If the desired comletion robability y is known in advance, then the values of y l and y h can simly be set to y leading to a tighter bound on. Otherwise, one can select a larger interval and the bound will aly to all values of y belonging to that interval. IV. THE HOMOGENEOUS CASE: LIMIT DISTRIBUTION AND CONVERGENCE BOUNDS A. Model and Problem Formulation We consider the roblem of broadcasting a file consisting of M ackets from a source (e.g., a base station) to N nodes within its transmission range. The time axis is slotted and each acket transmission takes one time slot. In this section, we assume that each node exeriences the same acket loss robability, indeendent of any other events. We assume that FEC is imlemented using a erfect rateless code, i.e., each node needs to correctly receive M distinct ackets to recover a file. Thus, a source transmits new ackets until all the nodes received M different ackets. If slightly more ackets are needed (say M ) because of the imerfection of codes, then one just need to relace M by M in the following analysis. Denote by T the random variable reresenting the comletion time, i.e. the number of time slots, needed to disseminate M ackets to a cluster of N nodes. Our goal is to characterize the CDF of T, namely Pr{T t}, with which one can determine the number of redundant ackets needed in FEC. Towards this end, we will use EVT to characterize the limiting form of the CDF of T when N and then derive bounds on the error for finite values of N. In this aer, we do not enter into the details of how to estimate the network arameters N and. We refer the interested reader to [5, 26] for ossible aroaches. Nevertheless, in Section IV-D, we will study robustness of our FEC rediction model to inaccurate estimation of these arameters. B. Asymtotic Analysis of Comletion Time Denote by Tn m the number of slots needed for node n to receive its m-th acket, m M. Clearly, Tn m follows a geometric distribution with mean /( ), i.e., Pr{Tn m = i} = i ( ). Thus, the time T n needed for node n to receive M different ackets is the sum of M i.i.d geometric random variables with mean /( ), i.e., T n = M m= T n m and T n is said to follow a negative binomial or Pascal distribution [27,.64]. Due to the broadcast nature of the channel, the comletion time for broadcasting a file to all the nodes is the maximum of N negative binomial random variables, i.e., T = max(t, T 2,......T N ). The following theorem tightly bounds the distribution of T as N. Before roceeding, we recall the definition of stochastic ordering [28,. 44]. Definition : A random variable X is stochastically larger than a random variable Y, denoted X st Y, if Pr(X > a) Pr(Y > a), for all a. (9) Theorem : The comletion time T to disseminate M ackets to N nodes using FEC broadcasting is bounded by random variables belonging to Gumbel s domain of attraction. Namely, there exist T l and T u = T l +, satisfying T l st T st T u, () lim Pr{T l b N x} = G(x), () where, =/ log( ), (2) b N = log (N) + (M ) log (τ) + (M ) log ( ) log (M )!, (3) ( τ = log (N) + (M ) log ( ) ). (4) Proof: Let D(t) = Pr{T n t}. Since T n follows a negative binomial distribution, D(t) = I( ; M, t M+) [27,.

4 64], [29,. 59], where t (t M) is an integer and I(z; a, b) is the regularized beta function, defined as following [29,. 56], I(z; a, b) = ( z)b B(a, b) a ( ) i ( a i ) ( z) i, (5) b + i where a and b are integers, and B(a, b) is the comlete beta function [29,. 594, 597] B(a, b) = Γ(a)Γ(b) Γ(a + b), (6) where Γ is gamma function, i.e., Γ(z) = x z e x dx, and Γ(a) = (a )Γ(a ) for a > [27,. 66]. Create a continuous R.V. Tu n with CDF F (x) = I( ; M, x M + ), where x > M. Let Tl n = Tu n. The robability distribution function for Tl n is thus F (x + ). According to Eq. (5) and Eq. (6), we have, F (x) = I( ; M, x M + ) (7) x M+ M ( ) M = ( ) i i B(M, x M + ) i x M + + i (8) = ΠM j= (x j) (M )! M x M+ ( ) M i ( ) i i x M + + i. (9) Let D(x) = D(x). Since I(x) is an increasing function of x, we have F (x + ) D( x ) F (x). () From Eq. (), we have Tl n st T st Tu n according to Definition. Let T u = max n=..n Tu n, and T l = max n=..n Tl n = T u. Since the max oeration conserves stochastic ordering, we have T l st T st T u. Inserting Eq. (2) and Eq. (3) into Eq. (9) yields lim N F ( x + b N ) N ΠM j= ( x + b N j) x+b N (M ) (M )! ( ) M M i ( ) i i x + b N M + + i. (2) According to Eq. (2) and Eq. (3), we have x+b N = x b N (22) = e x ( ) M ( ) M (M )!. (23) N τ Therefore, from Eq. (2) and Eq. (23), we obtain lim N F ( x + b N ) ( ΠM j= (x + b N j)e x τ M ( ) M i ) M ( ) M ( ) i i x + b N M + + i. (24) Noting that b N as N, we get M ( ) M ( ) i i lim i a N x + b N M + + i M ( ) M ( ) i i (25) i a N x + b N M ( ) M ( ) i i (26) x + b N i ( ) M. (27) x + b N The last ste of the above equations follows from the binomial theorem. Inserting Eq. (27) into Eq. (24), yields lim N F ( x + b N ) Π M Consider the exression Π M j= ( x + b N j) τ M ( x + b N ) j= ( x + b N j) τ M ( x + b N ) e x. (28). (29) According to Eq. (2), Eq. (3) and Eq. (4), the dominating comonent of ( both the ) numerator and denominator of that M exression is log (N). Accordingly, from Eq. (28), we obtain lim N F ( x + b N ) = e x = log G(x). (3) Thus, F is in the domain of attraction of G with normalizing constants and b N [4], namely, lim Pr{T l b N x} = G(x). (3) Theorem shows that as N, the CDF of the comletion time converges to a scaled and shifted Gumbel distribution, namely, Pr{T t} G( t b N ). (32) Since the acket loss robability is usually small, is small as well and the comletion time distribution has a shar hase transition around the oint b N. This will be verified by our numerical results in Section VI. As a corollary from the theorem, we can also asymtotically characterize the distribution of the comletion time when allowing u to K receivers not to recover the entire file. We denote the corresonding random variable T N K:N. From Theorem and Eq. (6), we have Pr(T N K:N t) G( t b N ) K e i(t b N ) i!. (33) Such a characterization sheds light on the trade-off between the stringency of the requirement for file comletion and the amount of FEC redundancy. For instance, according to Eq. (32) and Eq. (33), if the source transmits t = + b N

5 ackets, the robability that at least N nodes receive the file is about 37% larger than the robability that all N nodes receive the file. We will investigate this trade-off further in the numerical results section. Another corollary from the theorem is that the erformance of rateless coding on a single channel is identical to that of laintext coding over an unlimited number of channels [4]. C. Convergence Bounds Theorem characterizes the limiting form of the CDF of T as N. The following theorem bounds the asymtotic error using the convergence criterion defined in Eq. (8), that is, it bounds the distance between the asymtotic estimate x and the exact value x. Theorem 2: The distance = x x between the exact distribution Pr{T x + b N } = y and the Gumbel distribution G(x ) = y is bounded as follows for all robability values y belonging to the interval [y l, y h ]: l + h + log( + N log ), (34) y (M ) 2 where l = log 2( G (y l ) + b N ) M + 3 τ + (M )( G ), (35) (y l ) + b N + h =(M ) log( ) + (M ) G (y h ) + b N + τ, (36) τ and G is the inverse function of G. Theorem 2 rovides a means to conservatively imlement FEC for finite values of N, that is, if one wants to guarantee a comletion robability y, then the source should transmit at least (x + ) + b N ackets. The bound rovided by Eq. (34) also exhibits the desirable roerty of becoming tighter as the number of reciient nodes N increases and as the comletion robability y aroaches. D. Sensitivity Analysis In ractice, one seldom has erfect information on the network arameters and N. We are going next to analyze the effect of imerfect estimation of these arameters on the comutation of the FEC redundancy. Denote by N and the estimations of N and, resectively. Without loss of generality, we can write N = ( + ϵ N )N and = ( + ϵ ). A ositive value of ϵ N means that N is overestimated, while a negative value of ϵ N means that N is underestimated. A similar relation holds between ϵ and. Denote by T the number of acket transmissions to achieve a comletion robability y, as determined by Theorem using the arameters N and. Corresondingly, TN and T denote the number of acket transmissions to achieve a comletion robability y, as determined by Theorem using N and in the first case and N and in the second case. When and M are fixed, T roughly increases logarithmically with N. Thus, T N T + log ( + ϵ N ). (37) When N and M are fixed, T is aroximately a linear function of / log. Therefore, T T log log (+ϵ ) log = log +ϵ + log = log(+ϵ ) log For a small value of ϵ, by Taylor exansion, we have and. (38) log( + ϵ ) ϵ, (39) ϵ log ϵ + log. (4) From Eq. (38), Eq. (39) and Eq. (4), we get T T + ϵ log. (4) Eq. (37) and Eq. (4) can be used to conservatively calibrate the amount of FEC redundancy. They show that the comletion time is more sensitive to the receiver acket loss robability than to the number of receivers N. For examle, when = 5%, a % overestimate of N, namely ϵ N = %, will result in sending only.6 extra ackets on average. A % overestimate of will require the transmission of 6% more ackets overall, which is still reasonable. V. THE HETEROGENEOUS CASE: LIMIT OF DISTRIBUTION AND EXPECTATION In this section we relax the assumtion of homogeneous acket loss robabilities. We consider a model whereby receivers are deloyed uniformly at random within a disk of radius of R, with the source at the origin. The signal quality is discretized into L levels based on the distance from the source. Denote by α = [α, α 2,.., α L ] the distance vector (normalized by R), where < α <.. < α L =. Next, let ω α = [ω α, ω α2,.., ω αl ] be the corresonding acket loss vector, where < ω α <.. < ω αl <. The acket loss robability for a node is ω αl if its distance from the source is between α l R and α l R (α = by definition). This radio model, which catures satial correlation of the acket loss, is illustrated in Fig 2. Then, the CDF of the acket loss robability for node n is a multi-ste function defined as follows Pr( n x) = < x < ω α, αl 2 ω αl x < ω αl+, l =,.., L x ω αl. (42) Note that the radio model of the revious section is a secial case of this model, by setting L = and ω αl =. In the remainder of this section, we first establish a relation between the FEC data broadcasting roblem in wireless networks and the multi-set couon collector s roblem. This

6 Fig. 2. L L- Radio model for heterogeneous acket loss. connection enables us in the second art of the section to leverage recent analytical results on the asymtotic behavior of heterogeneous couon collector systems [5] to analyze FEC data broadcasting with heterogeneous acket loss robabilities. A. Relation between Couon Collector and Data Broadcasting Problems In the multi-set couon collector s roblem, a shoer tries to collect M comlete sets of N different couons in several attemts. Couon n is associated with a value q n >. Each attemt rovides the collector with a couon n with robability q n / N i= q i. Assume that there is unlimited suly of couons of each kind. Let η {qn} be the number of attemts the collector needs to make in order to obtain M comlete sets of N couons, where {q n } reresents the set of couon s values. Asymtotic limits of the CDF of η {qn } for large values of N are studied in [5]. Back to our original roblem, let { n } be a set of acket loss robabilities associated with each receiver and T {n} be the time to transmit M ackets to N users with acket loss robabilities, 2,.., N using FEC data broadcasting. The following Theorem establishes a relation between the CDF of T {n} and that of η {qn}. Theorem 3: Suose the acket loss robability at each receiver n, n =,.., N, is an i.i.d. random variable n and let q n = log( n ). Then, as N, Pr( η {q n} Nµ + M x) Pr(T { n } x) Pr( η {q n} Nµ x), where µ is the mean of the random variable log( n ). B. Asymtotic Comletion Time 2 (43) We next derive a closed-form exression for the distribution and the exectation of the comletion time as N, for the heterogeneous radio model resented at the beginning of the section. Theorem 4: If the CDF of the acket loss robability of each node n, n =,.., N, satisfies Eq. (42), then, as N G(x M ( ) T{n } b N ) Pr x G(x), (44) E[T {n }] = b N + Θ(), (45) Probability of Comletion Fig. 3. where.8.6.4.2 Simulation: N = 5, M =, = % Analytical : N = 5, M =, = % Simulation: N =, M =, = 5% Analytical : N =, M =, = 5% Simulation: N = 5, M =, = 5% Analytical : N = 5, M =, = 5% 4 6 8 Accuracy of asymtotic estimate and hase transition demonstration. =ω αl, N = (α 2 L α 2 L )N, (46) =/ log( ), (47) b N = log (N ) + (M ) log τ + (M ) log τ = log (N ) + (M ) log (M )!, (48) ( ) log. (49) This theorem rovides the following insight. On average the number of nodes staying within the furthest ring in the disk is (αl 2 α2 L )N. By comaring Eq. (47), Eq. (48) and Eq. (49) with Eq. (2), Eq. (3) and Eq. (4), resectively, we can see that T ({i }) is asymtotically identical to the exected time needed to disseminate M ackets to (αl 2 α2 L )N nodes with homogeneous acket loss robability ω αl. Hence, as N, the time needed to disseminate ackets to the nodes with the highest acket loss robability dominates. VI. NUMERICAL RESULTS In this section, we illustrate the major analytical findings of this aer, namely, (i) the accuracy of the asymtotic estimate of the CDF of the comletion time rovided by Theorem and the hase transition behavior of this CDF, (ii) the tightness of the uer and lower bounds derived along the roof of Theorem 2, (iii) the tradeoff between redundancy and comletion requirement, and (iv) the accuracy of the asymtotic limit on the exected comleted time rovided by Theorem 4. All the simulation lots are obtained by averaging results over simulations with identical arameters, but different random seed. A. Accuracy of Asymtotic Estimate Fig. 3 comares the CDF estimated by Theorem, with the CDF obtained from simulation for various arameters M, N, and. It is evident from the figure that the limit form rovides an accurate estimate of the actual distribution. It is

7 25 5 Simulation: = 8%, M = Analytical : = 8%, M = Simulation: = 5%, M = Analytical : = 5%, M = Simulation: = %, M = Analytical : = %, M = 3 25 5 Simulation: M =, N = 5 Analytical : M =, N = 5 Simulation: M = 8, N = 5 Analytical : M = 8, N = 5 Simulation: M = 5, N = 5 Analytical : M = 5, N = 5 35 3 25 5 5 Simulation: = 8%, N = Analytical : = 8%, N = Simulation: = 5%, N = Analytical : = 5%, N = Simulation: = %, N = Analytical : = %, N = 3 4 5 Number of Nodes (a) Varying the number of nodes N..2.4.6.8. Packet Loss Rate (b) Varying the acket loss rate. 2 4 6 8 2 4 6 8 Total Number of Packets to Disseminate (c) Varying the number of file ackets M. Fig. 4. Number of ackets needed to be sent to guarantee comletion with robability 99%: Varying different arameters. 8 6 4 2 8 6 4 Analytical Uer Bound Simulation Result Analytical Lower Bound 3 4 5 Number of Nodes (a) M =, = 5%, Varying N. 8 6 4 2 8 6 Analytical Uer Bound Simulation Result Analytical Lower Bound 4.2.4.6.8. Packet Loss Rate (b) N = 5, M = 8, Varying. 8 6 4 2 8 6 Analytical Uer Bound Simulation Result Analytical Lower Bound 4 2 4 6 8 Total Number of Packets to Broadcast (c) N =, = 5%, Varying M. Fig. 5. Number of ackets needed to guarantee comletion with robability 99%, comarison of simulation and analytical bounds. interesting to note that even with a large number of receivers and relatively high loss acket robability, we do not need a large number of redundant ackets to ensure file recetion (with high robability) by all the nodes. Fig. 3 also clearly demonstrates the hase transition behavior of the CDF. As exected, the CDF shifts to the right as the number of nodes N or the number of file ackets M increases, but the sharness hase transition is not much affected. On the other hand, the acket loss rate has an effect on both translating and scaling the CDF. A smaller value of shifts the CDF to the left and also results in a sharer transition. As discussed in Section IV-B, the hase transition occurs around the oint b N. Using Eq. (3), one can comute the values of b N for the three cases shown in Fig. 3, which are found to be 2.79, 5.659, 32.. These values accurately locate the hase transition oints. According to Theorem, as N, the CDF converges to a Gumbel distribution scaled by and translated to the right by b N an. To verify this finding, we closely examine each arameter by fixing the other two. Results are shown in Fig. 4(a), 4(b) and 4(c). We study the hase transition shift of the CDFs as the arameters change by evaluating the case where the comletion robability is 99%. Fig. 4(a) shows the number of ackets needed as N increases. As redicted by Eq. (37), when M and is fixed, the number of ackets needed increases logarithmically with N. This is true even when the number of nodes is small, e.g. N =,, 5. This result exlains why the redundancy needed is relatively small, even for large values of N. Fig. 4(b) demonstrates the case where N and M are fixed. From Eq. (4), the number of ackets is aroximately a linear function of, which coincides with Fig. 4(b). Fig. 4(c) shows that, fixing N and, the hase transition shifts to the right linearly as M increases. This is because the CDF shifts by b N an, which is roughly a linear function of M. B. Tightness of Bounds We next comare the analytical uer and lower bounds with simulation results. Each arameter (i.e., N, M, ) is investigated by fixing the other two. Simulation results are comared with analytical bounds, shown in Fig 5(a), Fig 5(b) and Fig 5(c). The analytical bounds are obtained by Theorem 2. We fix y l = y h = 99%, that is, a 99% robability of comletion. As exected, the curve reresenting simulation result lies between the analytical lower bound and uer bound. More imortantly, the ga between the uer bound and the simulation result is reasonably small for a variety of different arameters. If one is to use the analytical uer bound to estimate the amount of redundant ackets, then only one or two more ackets than necessary would be transmitted. C. Tradeoff between Redundancy and Comletion Requirement We next characterize the benefit of loosening the requirement of file recovery by all nodes. Secifically, we allow incomlete file recetion at u to K nodes. The analytical estimate in that case is obtained from Eq. (33). Fig. 6 shows

8 Number of ackets sent 4 35 3 25 5 Simulation: N=5, M=, =5% Analysis: N=5, M=, =5% Simulation: N=, M=, =5% Analysis: N=, M=, =5% Simulation: N=5, M=, =% Analysis: N=5, M=, =% 2 4 6 8 Maximum number of nodes failing to recover the file, K Fig. 6. Tradeoff between redundancy and number of incomlete file recetion allowed. Average 45 4 35 3 25 5 5 Analytical Estimate Homogeneous Packet Loss with ( 2 L - 2 ) N nodes L- Heterogeneous Packet Loss with N nodes Case Case 2 Case 3 Case 4 Case 5 Fig. 8. Average comletion time for scenarios with heterogeneous acket loss vs. homogeneous acket loss. Fig. 7. Different network settings for heterogeneous acket loss simulation. the number of transmissions required as a function of K for various cases. We set the comletion robability to 95%. We note a good match between the analytical estimates and the simulation results. We also observe that sacrificing one or two nodes can significantly reduce FEC redundancy. For examle, for the case N = 5, M = and = 5%, allowing incomlete file recovery by one node out of 5 leads to a % reduction in the redundancy amount. However, the marginal gain becomes less significant as K increases. D. Heterogeneous Packet Loss In this art we verify the result of Theorem 4, which states that as N, the time needed to disseminate ackets to the nodes with the highest acket loss robability dominates. We investigate a variety of network settings as shown in the table of Fig. 7. We comare three results: (i) the exected time to disseminate M ackets to N nodes under heterogeneous acket loss described by distance vector α and acket loss vector ω α, based on simulation; (ii) the exected time to disseminate M ackets to (αl 2 α2 L )N nodes all having the same acket loss robability ω αl, based on simulation; and (iii) the analytical estimate based on Theorem 4. The results are shown in 8. In cases through 3 we assume, L = 2 levels of signal quality: nodes within distance α R of the source have lower acket loss robability ω α. Nodes beyond this radius have higher acket loss robability ω α2. The results show that the resence of nodes with lower acket loss rates has little imact on the comletion time for the entire network. This is true even when only a small fraction of nodes suffers from higher acket loss rates. For examle, in case 3, Probability of comletion.8.6.4.2 Analytical Estimate 2 2 Homogeneous Packet Loss with ( - L- L ) N Heterogeneous Packet Loss with N nodes 4 6 8 Number of ackets sent Fig. 9. CDF of comletion time heterogeneous acket loss: arameters as described in Case 3 of Fig 7. on average only 9% N of nodes have higher acket loss robability e 2, and yet distributing a file only to these nodes takes as much as the time to distribute a file to a network where 8% N nodes have acket loss robability e 3 and 9% N nodes have acket loss robability e 2. This fact can also observed from Fig. 9 which deicts the CDF. Cases 4 and 5, in which the signal quality is discretized into L = 5 levels, reveal similar behavior. In all cases, Theorem 4 redicts well the simulation results. A. Set-u VII. PROTOTYPE IMPLEMENTATION In this section, we describe ractical imlementation of extreme value FEC into the Rateless Deluge over-the-air rogramming rotocol [5]. This rotocol uses random linear codes for data encoding and enables efficient distribution of a new file rogram to all the nodes of a sensor network. The default setting of Rateless Deluge is as follows. A file is divided into ages and each age consists of ackets, where each acket contains 23 bytes of data. A sensor sends out a request if it discovers its neighbors have new data. The

9 Probability of Comletion.8.6.4.2 Fig.. Exerimental testbed with Tmotes. request message secifies the age number and the number of ackets it needs. When a sensor receives enough number of ackets (in our case ), it can decode the age successfully. As in the original Deluge rotocol [], a sensor suresses its request if it overhears similar requests sent recently. Here, we augment the original Rateless Deluge with extreme value FEC, and refer to the new rotocol as Extreme Value FEC Deluge. Extreme Value FEC Deluge oerates the same as Rateless Deluge excet that when receiving a request for a new age, the base station broadcasts a redundant amount of ackets. The redundancy is set to guarantee with high enough robability that all the receivers recovered the file. In our case, we set the desired comletion robability to be 97%, and the redundancy is then comuted using Theorem. The erformance of Rateless Deluge and Extreme Value FEC Deluge are evaluated on a testbed consisting of Tmote Sky sensors (see Fig. ). All the sensors are within communication range. Sensors transmit at their highest ower setting over short distances to ensure a good link, and acket loss at the receiver is forced by droing ackets uniformly at random. One sensor serves as the base station and 8 others are receivers. The last sensor is used to record network traffic. During each exeriment, a new file is injected from a PC into the base station and the base station then disseminates it to the network. B. Results In our first exeriment, we disseminate a single age, acket file using Rateless Deluge. The acket loss robability is = 8%. We record the number of data ackets sent until every node finishes receiving the file. Based on identical iterations, we lot in Fig. the CDF of the number of ackets sent and comare it with the analytical estimate from Theorem. We observe that the theory redicts well the exerimental results. Further, even though the number of sensors in the network is relatively small, the shar hase transition is still evident. Next, we comare the erformance of Rateless Deluge and Extreme Value FEC Deluge. We distribute a -acket file and take averages over identical exeriments. We analyze the Exerimental Result Analytical Result 4 6 8 Number of Data Packets Sent Fig.. Real sensor exeriments vs. analysis: N = 8, M =, = 8%. Fig. 2. Rateless Deluge vs. EV-FEC Deluge: age, N = 8, M =, = 8%. network traffic in control lane as well as data lane, namely, we record the number of request messages and data messages sent. We also record the comletion time to disseminate the file. The results of the comarison are summarized in Fig. 2. The results show that Extreme Value FEC Deluge sends out slightly more data messages (less than 5%). However, it drastically reduces the amount of feedback request messages by a factor of about five comared to Rateless Deluge. Note that the minimum ossible number of request messages is one since at least one request message must be sent to initiate the dissemination rocess. With Extreme Value FEC Deluge, the average number of requests is.225. Thus, most of the time the entire network finishes receiving enough ackets after the base station s first set of transmissions. Thanks to its lower control lane overhead, Extreme Value FEC Deluge effectively reduces the comletion time to disseminate a -acket file to a 8-node network to.4 sec, which is about half of the time needed by Rateless Deluge. We observed similar results when disseminating larger files. VIII. C ONCLUDING R EMARKS In this aer, we develoed theoretical foundations and demonstrated ractical use of a highly efficient strategy for reliable data broadcasting, called extreme value FEC. This strategy accurately redicts the number of redundant ackets to be disseminated by a source so to avoid (with high robability) unnecessary retransmission requests by receivers. Our analysis, based on extreme value theory, accurately catures characteristics of the comletion time of FEC data broadcasting. Not only does it demonstrate the hase transition of the CDF of the comletion time, but also accurately inoints the location of the hase transition oint. The

analysis also reveals that the number of redundant ackets required to guarantee file comletion by all receivers increases only logarithmically with N. Another major contribution of the aer is in roviding convergence bounds for finite N, demonstrating fast convergence of the asymtotic estimate. By establishing a relation with the couon collector s roblem, we rovide asymtotically tight bounds on CDF of the comletion time to disseminate a file to receivers with heterogeneous acket loss robabilities. The result oints out that, as N gets large, the time needed to disseminate ackets to the nodes with the highest acket loss robability dominates. Simulations confirm this finding even when only a small fraction of nodes suffers from high acket loss rates. Our sensitivity analysis shows that FEC redundancy is robust to imerfect knowledge of the total number of receivers, while uncertainty in the acket loss robability will result in the same order of uncertainty in FEC redundancy. On the other hand, we show that FEC redundancy can be significantly reduced (e.g., on the order of %), if we allow incomlete file recetion at a single node in the network. However, the marginal gain in allowing incomlete file recetion at more nodes quickly diminishes. Finally, the aer reorts a ractical imlementation of the extreme value FEC strategy in conjunction with the Rateless Deluge OAP rotocol. The results show significant erformance imrovement with resect to control-lane overhead and average data dissemination time, thereby validating the benefits of our aroach under real network settings. The aer leaves many interesting roblems for future work. This includes extending the analysis to the case where receivers have temorally correlated acket loss robabilities as well as to multiho network scenarios. [2] M. Ghaderi, D. Towsley, and J. Kurose, Network coding erformance for reliable multicast, MILCOM 7. IEEE,. 7, Oct. 7. [3], Reliability gain of network coding in lossy wireless networks, INFOCOM 8., Aril 8. [4] S. I. Resnick, Extreme Values, Regular Variation, and Point Processes. Sringer, 987. [5] L.Holst, Extreme value distributions for random couon collector and birthday roblems, Extremes, vol. 4, no. 2,. 29 45,. [6] W. Xiao, Reliable data dissemination in dense wireless networks, Ph.D. dissertation, Boston University,. [7] S. Madden, M. J. Franklin, J. M. Hellerstein, and W. Hong, TAG: a Tiny AGgregation service for ad-hoc sensor networks, SIGOPS Oer. Syst. Rev., vol. 36, no. SI,. 3 46, 2. [8] T. He, B. M. Blum, J. A. Stankovic, and T. Abdelzaher, AIDA: Adative alication-indeendent data aggregation in wireless sensor networks, ACM Trans. Embed. Comut. Syst., vol. 3, no. 2,. 426 457, 4. [9] D. Rubenstein, J. Kurose, and D. Towsley, Real-time reliable multicast using roactive forward error correction, in NOSSDAV 98, 998. [] L. Rizzo and L. Vicisano, RMDP: an FEC-based reliable multicast rotocol for wireless environments, SIGMOBILE Mob. Comut. Commun. Rev., vol. 2, no. 2,. 23 3, 998. [2] C. Huitema, The case for acket level FEC, in Protocols for HighSeed Networks 96. Chaman & Hall, Ltd., 996,. 9. [22] J. Nonnenmacher, E. Biersack, and D. Towsley, Parity-based loss recovery for reliable multicast transmission, in SIGCOMM 97, 997. [23] M. Mosko and J. J. Garcia-Luna-Aceves, An analysis of acket loss correlation in FEC-enhanced multicast trees, in ICNP,. [24] A. Eryilmaz, A. Ozdaglar, and M. Medard, On delay erformance gains from network coding, CISS, 6. [25] J. Galambos, The Asymtotic Theory of Extreme Order Statistics. Robert Krieger Publishing Comany, 987. [26] D. S. J. De Couto, D. Aguayo, J. Bicket, and R. Morris, A highthroughut ath metric for multi-ho wireless routing, in MobiCom 3, 3,. 34 46. [27] W. Feller, An Introduction to Probability Theory and Its Alications. John Wiley & Sons, Inc., 968, vol.. [28] S. Ross, Stochastic Processes, 996. [29] M. Fogiel and J. R. Ogden, Handbook of Mathematical, Scientific, and Engineering Formulas, Tables, Functions, Grahs, Transforms. Research & Education Assoc., 984. R EFERENCES [] J. Hui and D. Culler, The dynamic behavior of a data dissemination rotocol for network rogramming at scale. in SenSys 4, Nov. 4. [2] W. Xiao and D. Starobinski, Poster abstract: Exloiting multi-channel diversity to seed u over-the-air rogramming of wireless sensor networks, in SenSys 5, San Diego, California, USA, Nov. 5. [3] S. Kulkarni and L. Wang, MNP: Multiho Network Rerogramming Service for Sensor Networks, in 25th IEEE International Conference on Distributed Comuting Systems, 5,. 7 6. [4] D. Starobinski and W. Xiao, Asymtotically otimal data dissemination in multi-channel wireless sensor networks: Single radios suffice, IEEE/ACM Transactions on Networking, to aear. [5] A. Hagedorn, D. Starobinski, and A. Trachtenberg, Rateless deluge: Over-the-air rogramming of wireless sensor networks using random linear codes, in IPSN 8, Saint Louis, MO, USA, Ar. 8. [6] Chieh-Jan Mike Liang and Ra zvan Musa loiu-e. and Andreas Terzis, Tyhoon: A Reliable Data Dissemination Protocol for Wireless Sensor Networks, in Wireless Sensor Networks. Sringer Berlin / Heidelberg, 8,. 268 285. [7] Y.-C. Tseng, S.-Y. Ni, Y.-S. Chen, and J.-P. Sheu, The broadcast storm roblem in a mobile ad hoc network, in Wireless Networks, vol. 8, no. 2/3. Kluwer Academic Publishers, 2,. 53 67. [8] P. Levis, N. Patel, S. Shenker, and D. Culler, Trickle: A self-regulating algorithm for code roagation and maintenance in wireless sensor networks, University of California at Berkeley, Tech. Re., 4. [9] N. Shacham and P. McKenney, Packet recovery in high-seed networks using coding and buffer management, INFOCOM 9, Jun 99. [] Y. Bartal, J. Byers, M. Luby, and D. Raz, Feedback-free multicast refix rotocols, ISCC 98.,. 35 4, 998. [] J. Byers, M. Luby, and M. Mitzenmacher, A digital fountain aroach to asynchronous reliable multicast, Selected Areas in Communications, IEEE Journal on, vol., no. 8,. 528 54, Oct 2. Weiyao Xiao received B.E. degree from Harbin Institute of Technology, Harbin, China and M.S. degree from Boston University in 4 and 7 resectively. Currently, he is working towards his Ph.D. degree in Electrical and Comuter Engineering, also at Boston University. His research interest centers around reliable data dissemination in wireless networks. David Starobinski received his Ph.D. in Electrical Engineering (999) from the Technion-Israel Institute of Technology. In 999- he was a visiting ost-doctoral researcher in the EECS deartment at UC Berkeley. In 7-8, he was an invited Professor at EPFL (Switzerland). Since Setember, he has been with Boston University, where he is now an Associate Professor. Dr. Starobinski received a CAREER award from the U.S. National Science Foundation and an Early Career Princial Investigator (ECPI) award from the U.S. Deartment of Energy. He is on the Editorial Board of the IEEE/ACM Transactions on Networking. His research interests are in the modeling and erformance evaluation of high-seed, wireless, and sensor networks.