In-Order Delivery Delay of Transport Layer Coding

In-Order Delivery Delay of Transort Layer Coding Jason Cloud, Douglas Leith, and Muriel Médard Massachusetts Institute of Technology, Cambridge, MA 039, USA Hamilton Institute, National University of Ireland Maynooth, Ireland Email: jcloud@mit.edu, doug.leith@nuim.ie, medard@mit.edu arxiv:408.440v cs.it] Aug 04 Abstract A large number of streaming alications use reliable transort rotocols such as TC to deliver content over the Internet. However, head-of-line blocing due to acet loss recovery can often result in unwanted behavior and oor alication layer erformance. Transort layer coding can hel mitigate this issue by heling to recover from lost acets without waiting for retransmissions. We consider the use of an on-line networ code that inserts coded acets at strategic locations within the underlying acet stream. If retransmissions are necessary, additional coding acets are transmitted to ensure the receiver s ability to decode. An analysis of this scheme is rovided that hels determine both the exected in-order acet delivery delay and its variance. Numerical results are then used to determine when and how many coded acets should be inserted into the acet stream, in addition to determining the trade-offs between reducing the in-order delay and the achievable rate. The analytical results are finally comared with exerimental results to rovide insight into how to minimize the delay of existing transort layer rotocols. I. INTRODUCTION Reliable transort rotocols are used in a variety of settings to rovide data transort for time sensitive alications. In fact, video streaming services such as Netflix and YouTube, which both use TC, account for the majority of fixed and mobile traffic in both North America and Euroe ]. In fixed, wireline networs where the acet erasure rate is low, the quality of user exerience QoE for these services is usually satisfactory. However, the growing trend towards wireless networs, esecially at the networ edge, is increasing non-congestion related acet erasures within the networ. This can result in degraded TC erformance and unaccetable QoE for time sensitive alications. While TC congestion control throttling is a major factor in the degraded erformance, head-of-line blocing when recovering from acet losses is another. This aer will focus on the latter by alying coding techniques to overcome lost acets and reduce head-of-line blocing issues so that overall in-order acet delay is minimized. These head-of-line blocing issues result from using techniques lie selective reeat automatic-reeat-request SR- ARQ, which is used in most reliable transort rotocols e.g., TC. While it hels to ensure high efficiency, one roblem with SR-ARQ is that acet recovery due to a loss can tae on the order of a round-tri time RTT or more ]. When the RT T or more recisely the bandwidthdelay roduct BD is very small and feedbac is close to being instantaneous, SR-ARQ rovides near otimal inorder acet delay. Unfortunately, feedbac is often delayed and only contains a artial ma of the receiver s nowledge. This can have major imlications for alications that require reliable delivery with constraints on the time between the transmission and in-order delivery of a acet. As a result, we are forced to loo at alternatives to SR-ARQ. This aer will exlore the use of a systematic random linear networ code RLNC, in conjunction with a coded generalization of SR-ARQ, to hel reduce the time needed to recover from losses. The scheme considered first adds redundancy to the original data stream by injecting coded acets at ey locations to hel overcome otential losses. This has the benefit of reducing the number of retransmissions and, consequently, the delay. However, correlated losses or incorrect information about the networ can result in the receiver s inability to decode. Therefore, feedbac and coded retransmission of data is also considered. The following sections will rovide the answers to two questions about the roosed scheme: when should redundant acets be inserted into the original acet stream to minimize in-order acet delay; and how much redundancy should be added to meet a user s requested QoE. These answers will be rovided through an analysis of the in-order delivery delay as a function of the coding window size and redundancy. We will then use numerical results to hel determine the cost in terms of rate of reducing the delay and as a tool to hel determine the aroriate coding window size for a given networ ath/lin. While an in-deth comarison of our scheme with others is not within the scoe of this aer, we will use SR-ARQ as a baseline to hel show the benefits of coding at the transort layer. The remainder of the aer is organized as follows. Section II rovides an overview of the related wor in the area of transort layer coding and coding for reducing delay. Section III describes the coding algorithm and system model used throughout the aer. Section IV rovides the tools needed to analyze the roosed scheme; and an analysis of the first two moments of the in-order delay are rovided in Sections V and VI. Furthermore, the throughut efficiency is derived in Section VII to hel determine the cost of coding. Numerical results are finally resented in Section VIII and we conclude in Section IX. II. RELATED WORK A resurgence of interest in coding at the transort layer has taen lace to hel overcome TC s oor erformance in wireless networs. Sundararajan et. al. 3] first roosed TC with Networ Coding TC/NC. They insert a coding shim

between the TC and I layers that introduces redundancy into the networ in order to soof TC into believing the networ is error-free. Loss-Tolerant TC LT-TC 4], 5], ] is another aroach using Reed-Solomon RS codes and exlicit congestion notification ECN to overcome random acet erasures and imrove erformance. In addition, Coded TC CTC 7] uses RLNC 8] and a modified additive-increase, multilicative decrease AIMD algorithm for maintaining high throughut in high acet erasure networs. While these roosals have shown coding can hel increase throughut, esecially in challenged networs, only anecdotal evidence has been rovided showing the benefits for time sensitive alications. On the other hand, a large body of research investigating the delay of coding in different settings has taen lace. In general, most of these wors can be summarized by Figure. The coding delay of chuned and overlaing chuned codes 9] shown in Figure a, networ coding in timedivision dulexing TDD channels 0], ], ], and networ coding in line networs where coding also occurs at intermediate nodes 3] is well understood. In addition, a non-asymtotic analysis of the delay distributions of random linear networ coding RLNC 4] and various multicast scenarios 5], ], 7] using a variant of the scheme in Figure b have also been investigated. Furthermore, the research that loos at the in-order acet delay is rovided in ] and 8] for uncoded systems, while 9], 0], and ] considers the in-order acet delay for non-systematic coding schemes similar to the one shown in Figure b. However, these non-systematic schemes may not be the otimum strategy in networs or communication channels with a long RT T. ossibly the closest wor to ours is that done by Joshi et. al. ], 3] and Tömösözi et. al. 4]. Bounds on the exected in-order delay and a study of the rate/delay trade-offs using a time-invariant coding scheme is rovided in ] and 3] where they assume feedbac is instantaneous, rovided in a bloc-wise manner, or not available at all. A generalized examle of their coding scheme is shown in Figure c. While their analysis rovides insight into the benefits of coding for streaming alications, their model is similar to a half-dulex communication channel where the sender transmits a finite bloc a information and then waits for the receiver to send feedbac. Unfortunately, it is unclear if their analysis can be extended to full-dulex channels or models where feedbac does not rovide comlete information about the receiver s state-sace. Finally, the wor in 4] considers the in-order delay of online networ coding where feedbac determines the source acets used to generate coded acets. However, they only rovide exerimental results and do not attemt an analysis. III. CODING ALGORITHM AND SYSTEM MODEL We consider a time-slotted model to determine the coding window size and added redundancy R that minimizes the er-acet, or laybac, delay D. The duration of each slot is t s s t/rate where s t is the size of each transmitted Time Time 3 4 5 7 8 9 0 3 4 5 7 8 9 0 3 4 5 3 4 5 7 8 9 0 3 4 5 7 8 9 0 3 4 5 Information acets i 3 4 5 7 8 9 0 3 4 5 a Chun Code 3 4 5 7 8 9 0 3 4 5 4 5 acet Delivery c Time-Invariant Streaming Code 3 4 5 7 8 9 0 3 4 5 7 8 9 0 3 4 5 Information acets i 3 4 5 7 8 9 0 3 4 5 Feedbac Received b Rateless Code 3 4 5 7 8 9 0 3 4 5 Lost acet 3 3 n R 4 4 5 b 7 8 9 0 3 4 3 5 5 3 7 8 9 0 7 9 3 3 4 4 0 5 5 d Systematic Code with Feedbac Our Scheme Figure : Coding matrices for various schemes assuming an identical loss attern and a feedbac delay of 4 time-slots. The columns reresent the original information acet i, and the rows reresent the comosition of the transmitted acet at time t e.g., the transmitted acet in time-slot of a is c j 4 i α i,j i where α i,j is defined in Section III. Lines within a matrix indicate when feedbac about a generation reresented by different colors was received. acet and Rate is the transmission rate of the networ. Also let t be the roagation delay between the sender and the receiver i.e., RTT t s +t assuming that the size of each acnowledgement is sufficiently small. acet erasures are assumed to be indeendently and identically distributed i.i.d. with ǫ being the robability of a acet erasure within the networ. Source acets i, i {,...,N}, are first artitioned } into coding generations G j { j+,..., minj,n, j, N / ]. Each generation G j is transmitted using the systematic networ coding scheme shown in Algorithm where the coding window sans the entire generation. The coded acets c j,m shown in the algorithm are generated by taing random linear combinations of the information acets contained within the generation where the coding coefficients α i,j,m F q are chosen at random and i is treated as a vector in F q. Once every acet in G j has been transmitted both uncoded and coded, the coding window slides to the next generation G j+ and the rocess reeats without waiting for feedbac. We assume that delayed feedbac is rovided about each generation i.e., multile coding generations can be in-flight at

Algorithm Code Generation Algorithm : for all j, N / ] do : for all acets i, i j +,minj,n] do 3: Transmit i 4: for all m,n ] do 5: Transmit c j,m minj,n ij+ α i,j,m i Algorithm DOF Retransmission Algorithm : ACK From G j Received : if No acets from G j in-flight and l > 0 then 3: for all m,r l] do 4: Transmit c j,m minj,n ij+ α i,j,m i any time; and this feedbac contains the number of degrees of freedom dof s l still required to decode the generation. If l > 0, an additional n l R l l coded acets or dof s are retransmitted. This rocess is shown in Algorithm and continues until all dofs have been received and the generation can be decoded and delivered. Figure d rovides an examle of the roosed scheme. Here we can see that source acets are artitioned into coding generations of size 3 acets, and one coded acet is also transmitted for each generation i.e., R.33. In this case, the first two acets of the blue generation can be delivered, but the third acet cannot since it is lost and the generation cannot be decoded. Delayed feedbac indicates that additional dof s are needed and two additional transmission attemts are required to successfully transmit the required dof. Once the dof is delivered, the remainder of the blue generation, as well as the entire green generation, can be delivered in-order. Due to the comlexity of the rocess, several assumtions are needed. First, retransmissions occur immediately after feedbac is obtained indicating additional dof s are needed without waiting for the coding window to shift to a new generation. Second, the time to transmit acets after the first round does not increase the delay. For examle, the acet transmission time is t s seconds. Assuming l dof retransmissions are needed, the additionaln l t s seconds needed to transmit these acets are not taen into account. Third, the number of reviously transmitted generations that can cause head-of-line blocing is limited to b where b BD /n. Fourth, all acets within a generation are available to the transort layer without delay i.e., we assume an infinite acet source. Finally, the coding window/generation size with the added redundancy is smaller than thebd i.e.,n < BD. Without this assumtion, feedbac will be received rior to the transmission of the coded acets allowing for the use of SR- ARQ without a large imact to the erformance. It is imortant to note that these assumtions rovide a lower bound. The first two assumtions ensure feedbac is acted uon immediately and does not imact the delay exerienced by other generations. The third assumtion limits the ossibility of a reviously transmitted generation reventing delivery, thereby decreasing the overall delay. IV. RELIMINARIES We first define several robability distributions and random variables that will be used extensively in later sections. Define ] R + + to be the transition matrix of a Marov chain. Each transition within the chain reresents the number of dofs, or acets, successfully received after a round of transmissions, and each state reresents the number of dof s still needed by the client to decode. As a result, the elements of ] can be defined as follows: Bn i,ij,ǫ for i,],0 < j i ij ] ni mi Bn i,m,ǫ for i,],j 0 for i 0,j 0, where Bn,, n n. Let X r be the state of the chain ] at time r. It follows ] that r{x r j X 0 i} r ij for r and 0 ij 0. In our model, X0 with robability equal to and a generation is successfully decoded when state 0 is entered at time r. Furthermore, the robability i0 r ] is the robability that all acets within a single generation have been successfully received in or before r transmission rounds. Using this Marov chain, define Y to be the number of transmission rounds required to transfer a single generation. The distribution on Y is: { ] y Y y ] y for y 0 otherwise. Next, define Z i to be the number of transmission rounds required to transfer i generations. Before defining the distribution on Z i, we first rovide the following Lemma. Lemma. Let N indeendent rocesses defined by the transition matrix ] start at the same time. The robability that all rocesses comlete in less than or equal to z rounds, or transitions, with at least one rocess comleting in round z is r{z z} z ]N z ] N. roof: Let f z z j Y j. The robability of N indeendent rocesses comleting in less than or equal to z rounds with at least one rocess comleting in round z is: r{z z} N i Y z BN,i,f z 3 f z i N N i i z z Y z i z ] Ni 4 ] N+ ] N+ z ]N+ z ] z N+ z ] ] N+ z ] z ] ] z N+ z ] z ]N z 5 ] N.

Given Lemma, the distribution on Z i is: { zi Zi z i ]i zi ] i for z i,i b 0 otherwise. Also define S to be the number of uncoded acets that are successfully transferred within a generation rior to the first acet loss. The distribution on S is: ǫǫ s for s 0,],y ǫ s for s,y S Y s y ǫǫ s 8 for s 0,],y, ǫ 0 otherwise, and its first three moments are given by the following lemma. Lemma. Define s i E S i Y ] and s i E S i Y ]. Then given Y y, the first through third moments of S are s ǫ ǫ 7 ǫ, 9 s ǫ ǫ ǫ+ǫ s, 0 s 3 ǫ3 ǫ 3 ǫ+ǫ +3ǫs and 3 ǫ +ǫ+ +43ǫs, for i,,3. s i si i ǫ ǫ, roof: Define the moment generating function of S when Y to be M S Y t Ee ts Y ] 3 ǫ e t ǫ e t +ǫe t +e t ǫ. 4 The first, second, and third moments ofs wheny are then δ/δtm S Y 0, δ /δt M S Y 0, and δ3 /δt 3 M S Y 0 resectively. For Y, we need to scale the above exectations accordingly. This can be done by subtracting the term i ǫ from each of the moments above and dividing by ǫ. Finally, let V i, i b, describe the osition of the last received generation reventing delivery in round z i. The following lemma hels to define the distribution on V i. Lemma 3. Let N indeendent rocesses defined by the transition matrix ] start at the same time, and all rocesses comlete in or before round z N with at least one rocess comleting in round z N. The robability that the jth rocess is the last to comlete is defined by the distribution VN Z N v N z N zn ]NvN zn ZN z N ] vn Y z N 5 for v N 0,...N, j Nv N, Y y defined in, and Zi z i defined in 7. Furthermore, define vn i E VN i Z N]. Then and v N z N] vn z N] Y z N N z N] N ZN z N Y z N + z N] Y z N ] N+, N zn ] N ZN z N N zn Y z N ZN z N. 7 roof: Let β zn Y zn / z N ], be the robability of a generation finishing in roundz N given all of then generations have comleted transmission in or before round z N. The distribution on V N 0,N ] is VN Z N v N z N β zn β zn vn N j0 β z N β zn j 8 β z N β zn vn+ β zn N 9 zn ]NvN zn ] vn Y z N. ZN z N 0 Define the moment generating function of V N given Z N to be M VN Z N t Ee tvn Z N z N ] zn ]N zn ] N e Nt Y z N z N ] zn ]. e t ZN z N The first and second moments of V N given Z N are δ/δtm VN Z N 0 and δ /δt M VN Z N 0 resectively. Now that we have the distributions for the random variables Y, Z i, S, andv i, as well as several relevant moments, we have the tools needed to derive the exected in-order delivery delay. V. EXECTED IN-ORDER DELIVERY DELAY A lower bound on the exected delay, ED], can be derived using the law of total exectation: ED] E Y EZb E D D Y,Z b ]] ]. From, there are four distinct cases that must be evaluated. For each case, define d Yy,Zz ED Y y,z b z]. A. Case : Y,Z b The latest generation in transit comletes within the first round of transmission and no reviously transmitted generations revent delivery. As a result, all acets received rior to the first loss i.e., acets,... s are immediately delivered. Once a acet loss is observed, acets received after the loss i.e., acets s+,..., are buffered until the entire generation is decoded. An examle is given in Figure a where n, 4, the number of acets received rior to the first loss is s, and the number of coded acets needed to recover from the two acet losses is c.

Server Client Server Client Server Client Server Client d t s +t s d t s +t G G G G 3 4 s d t s +t 3 c, G 3 G v G 3 G v c, c, c, 3, 4 c c,3 d 3 4t s +t d 4 3t s +t ACKl d 5t s +t d 3 4t s +3t, 3 a Case : Y,Z b b Case : Y >,Z b Figure : Examle of a case and b case. The delay d i of each acet is listed next to the time when it is delivered to the alication layer. Taing the exectation over all S and all acets within the generation, the mean delay is d Y,Z t s +t s + s t + si i0 +EC S] t s S Y s +t s +t S Y t + t s s s + + sec S] S Y s + t + t s s s + + s S Y s + 3 4 5 t s s +s + +3 +t, where s and s are given by Lemma ; and EC S] is the exected number of coded acets needed to recover from all acet erasures occurring in the first acets. When s <, the number of coded acets required is at least one i.e., EC S] leading to the bound in 5. B. Case : Y >,Z b All acets {,... s } are delivered immediately until the first acet loss is observed. Since Y >, at least one retransmission event is needed to roerly decode. Once all dofs have been received and the generation can be decoded, the remaining acets { s+,..., } are delivered in-order. An examle is rovided in Figure b. The generation cannot be decoded because there are too many acet losses during the first transmission attemt. As a result, one additional dof is retransmitted allowing the client to decode in round two i.e., Y. Taing the exectation over all S and all acets within the generation, the exected delay for this case is 3 G 3 + +s+ +s+ 3 a Case 3: Z b > Y,Z b > b Case 4: Y Z b,z b > Figure 3: Examle of a case 3 and b case 4 where b 3. d Y>,Z t +t s s+ s i0 t +y t + si+n t s S Y s y 7 s n s +n + t s y s y + t, 8 where s and s are given by Lemma. It is imortant to note that we do not tae into account the time to transmit acets after the first round see the assumtions in Section III. C. Case 3: Z b > Y,Z b > In this case, generation G j comletes rior to a reviously sent generation. } As a result, all acets { j+,..., j G j are buffered until all revious generations have been delivered. Once there are no earlier generations reventing in-order delivery, all acets in G j are immediately delivered. Figure 3a rovides an examle. Consider the delay exerienced by acets in G 3. While G 3 is successfully decoded after the first transmission attemt, generation G cannot be decoded forcing all acets in G 3 to be buffered until G is delivered. Taing the exectation over all acets within the generation and all ossible locations of the last unsuccessfully decoded generation, the exected delay is d Z>Y,Z> n +it s +t +t z i v n b + t s 9 z t vb n + t s, 30 where vb is given by Lemma 3. D. Case 4: Y Z b,z b > Finally, this case is a mixture of the last two. The generation G j comletes after all reviously transmitted generations, but it requires more than one transmission round to decode. acets received before the first acet loss are buffered until

all revious generations are delivered, and acets received after the first acet loss are buffered until G j can be decoded. An examle is rovided in Figure 3b. Consider the delay of acets in G 3. Both G and G 3 cannot be decoded after the first transmission attemt. After the second { transmission attemt, } G can be decoded allowing acets +,..., +s+ G3 to be delivered; although acets { } +s+,..., 3 G3 must wait to be delivered until after G 3 is decoded. Taing the exectation over all S, all acets within the generation, and all ossible locations of the last unsuccessfully decoded generation, the exected delay is d Y Z,Z> s n i+t s +t z i v b + n t s + js+ t y +n j +t s S Y s y 3 z y s t +y n v b + s n + t s. 3 The exectations s and v b are given by Lemmas and resectively. Combining the cases above, we obtain the following: Theorem 4. The exected in-order acet delay for the roosed coding scheme is lower bounded by ED] d Y y,zzb Y y Zb z b. 33 z b y where d Yy,Zzb is given in equations, 8, 30, and 3; and the distributions Y y and Zb z b are given in equations and 7 resectively. VI. IN-ORDER DELIVERY DELAY VARIANCE The second moment of the in-order delivery delay can be determined in a similar manner as the first. Again, we can use the law of total exectation to find the moment: ED ] E Y EZb ED D Y,Z b ]]]. 34 As with the first moment, four distinct cases exist that must be dealt with searately. For each case, define d Y y,zz E D Y y,z b z ]. While we omit the initial ste in the derivation of each case, d Y,Z can be determined using the same assumtions as above. A. Case : Y,Z b Using the exectations defined in Lemma, the second moment d Y,Z is shown below. For s <, the number of coded acets needed to decode the generation will always be greater than or equal to one i.e., c. Therefore, the bound in 35 follows from letting c and C S s for all s. d Y,Z t + +3t t s + +9+3 t s +3+ 7 + t s t + t s s +3 + t s + t t s s 3 t ss 3. 35 B. Case : Y >,Z b This case can be derived in a similar manner as the last. Again, each s i, i {,,3}, are given by Lemma. d Y>,Z n n ++ 3 3 + + t s + n y y + + t t s + n +t s +yt t s s 3 t ss 3 n +n + t s +yn +t t s +y t s + y + t. 3 C. Case 3: Z b > Y,Z b > The second moment d Z>Y,Z> can be derived as follows: d Z>Y,Z> n v b + n vb + 3 t s z n v b + t t s +z t. D. Case 4: Y Z b,z b > 37 Using Lemmas and 3, the second moment d Y Z,Z> is: d Y Z,Z> y n +t s t +y t + n n ++ 3 + t s + n v t b + s +y zt t s s + n n v v b b t s n v b z +y +y z t t s 4y zy +z t s. 38 Combining the cases above, we obtain:

Theorem 5. The second moment of the in-order acet delay for the roosed coding scheme is lower bounded by ED ] d Yy,Zz b Y y Zb z b, 39 z b y where d Yy,Zz b is given in equations 35, 3, 37, and 38; and the distributions Y y and Zb z b are given in and 7 resectively. Furthermore, the in-order delay variance is σd ED ] ED] where ED] is given in. VII. EFFICIENCY The above results show adding redundancy into a acet stream decreases the in-order delivery delay. However, doing so comes with a cost. We characterize this cost in terms of efficiency. Before defining the efficiency, let M i, i 0,], be the number of acets received at the sin as a result of transmitting a generation of sizei. Alternatively,M i is the total number of acets received by the sin for any ath starting in state i and ending in state 0 of the Marov chain defined in Section III. Furthermore, define M ij to be the number of acets received by the sin as a result of a single transition from state i to state j i.e., i j. M ij is deterministic e.g., m ij i j when i,j and i j. For any transition i 0, i, m i0 i,n i ] has robability Bn i,m i0,ǫ Mi0 m i0 ni ji Bn 40 i,j,ǫ /a i0bn i,m i0,ǫ. 4 Therefore, the exected number of acets received by the sin is { ij for i,j,i j EM ij ] ni a i0 xi x Bn i,x,ǫ for i,j 0. 4 Given EM ij ] i,j, the total number of acets received by the sin when transmitting a generation of size i is i EM i ] EM ij ]+EM j ]a ij 43 a ii j0 where EM 0 ] 0. This leads us to the following theorem. Theorem. The efficiencyη, defined as the ratio between the number of information acets or dof s within each generation of size and the exected number of acets received by the sin, is η EM ]. 44 VIII. NUMERICAL RESULTS The analysis resented in the last few sections rovided a method to lower bound the exected in-order delivery delay. Unfortunately, the comlexity of the rocess revents us from determining a closed form exression for this bound. However, this section will rovide numerical results. Before roceeding, several items need to be noted. First, we do not consider the terms where Y y Zb z b < 0 when calculating ED] and E D ] since they have little effect on the overall ED] ED] 0 3 0 0 3 0 RTT 500 ms RTT 00 ms 0 0 0 0 Coding Window/Generation Size RTT 500 ms RTT 00 ms a ǫ 0.0 0 0 0 0 Coding Window/Generation Size b ǫ 0. R 0.05 R 0. SRARQ R 0.05 R 0. SRARQ Figure 4: The in-order delay for two erasure rates as a function of on a 0 Mbs lin. The error bars show σ D above and below the mean. The analytical and simulated results are reresented using solid and dotted lines resectively. Note the log scale of both the x-axis and y-axis. calculation. Second, the analytical curves are samled at local maxima. As the code generation size increases, the number of in-flight generations, b BD /, incrementally decreases. Uon each decrease in b, a discontinuity occurs that causes an artificial decrease in ED] that becomes less noticeable as increases towards the next decrease in b. This transient behavior in the analysis is more rominent in the cases where R /ǫ and less so when R /ǫ. Regardless, the figures show an aroximation with this transient behavior removed. Third, we note that R may not be an integer. To overcome this issue when generating and transmitting coded acets, R and R coded acets are sent with robability R R and R R resectively Finally, we denote the redundancy used in each of the figures as R x +x /ǫ. A. Coding Window Size and Redundancy Selection Results for four different networs/lins are shown in Figure 4. The simulation was develoed in Matlab utilizing a model similar to that resented in Section III, although several of the assumtions are relaxed. The time it taes to retransmit coded acets after feedbac is received is taen into account. Furthermore, the number of generations reventing delivery is not limited to a single BD of acets, which increases

0 00 80 0 40 R, ε 0.0 0.05 R 0., ε 0.0 R 0.5, ε 0.0 R 0.05, ε 0. R 0., ε 0. R, ε 0. 0.5 ED] t ms 0 3 0 0 ε 0.00 ε 0.0 ε 0. ε 0. 0 0 50 00 50 00 50 BD Figure 5: as a function of the BD. the robability of head-of-line blocing. Both of these relaxations effectively increases the delay exerienced by a acet. Finally, the figure shows the delay of an idealized version of SR-ARQ where we assume infinite buffer sizes and the delay is measured from the time a acet is first transmitted until the time it is delivered in-order. Figure 4 illustrates that adding redundancy and/or choosing the correct coding window/generation size can have major imlications on the in-order delay. Not only does choosing correctly reduce the delay, but can also reduce the jitter. However, it is aarent when viewing ED] as a function of that the roer selection of for a given R is critical for minimizing ED] and E D ]. In fact, Figure 4 indicates that adding redundancy and choosing a moderately sized generation is needed in most cases to ensure both are minimized. Before roceeding, it is imortant to note that a certain level of redundancy is needed to see benefits. Each curve shows results for R > /ǫ. For R /ǫ, it is ossible to see inorder delays and jitter worse than the idealized ARQ scheme. Consider an examle where a acet loss is observed near the beginning of a generation that cannot be decoded after the first transmission attemt. Since feedbac is not sent/acted uon until the end of the generation, the extra time waiting for the feedbac can induce larger delays than what would have occurred under a simle ARQ scheme. We can reduce this time by reacting to feedbac before the end of a generation; but it is still extremely imortant to ensure that the choice of and R will decrease the robability of a decoding failure and rovide imroved delay erformance. The shae of the curves in the figure also indicate that there are two major contributors to the in-order delay that need to be balanced. Let be the generation size where ED] is minimized for a given ǫ and R, i.e., argmin ED]. 45 To the left, the delay is dominated by head-of-line blocing and resequencing delay created by revious generations. To the right of, the delay is dominated by the time it taes to receive enough dof s to decode the generation. While there are gains in efficiency for >, the benefits are negligible for most time-sensitive alications. As a result, we show for a given ǫ and R as a function of the BD in Figure 5 0.7 0.75 0.8 0.85 0.9 0.95 Efficiency η Figure : Rate-delay trade-off for a 0 Mbs lin with a RTT of 00 ms. The error bars reresent σ D above and below the mean, and the delay for ARQ is shown for η. Note the log scale of the y-axis. and mae three observations concerning this figure. First, the coding window size increases with ǫ, which is oosite of what we would exect from a tyical erasure code 5]. In the case of small ǫ, it is better to try and quicly correct only some of the acet losses occurring within a generation using the initially transmitted coded acets while relying heavily on feedbac to overcome any decoding errors. In the case of large ǫ, a large generation size is better where the majority of acet losses occurring within a generation are corrected using the initially transmitted coded acets and feedbac is relied uon to hel overcome the rare decoding error. Second, increasingr decreases. This due to the receiver s increased ability to decode a generation without having to wait for retransmissions. Third, is not very sensitive to the BD in most cases enabling increased flexibility during system design and imlementation. B. Rate-Delay Trade-Off While transort layer coding can hel meet strict delay constraints, the decreased delay comes at the cost of throughut, or efficiency. Let ED ], σ D, and η be the exected inorder delay, the standard deviation, and the exected efficiency resectively that corresonds to defined in eq. 45. The rate-delay trade-off is shown by lotting ED ] as a function of η in Figure. The exected SR-ARQ delay i.e., the data oint for η is also lotted for each acet erasure rate as a reference. The figure shows that an initial increase in R or a decrease in η has the biggest effect on ED]. In fact, the majority of the decrease is observed at the cost of just a few ercent - 5% of the available networ caacity when ǫ is small. As R is increased further, the rimary benefit resents itself as a reduction in the jitter or E D ]. Furthermore, the figure shows that even for high acet erasure rates e.g.,0%, strict delay constraints can be met as long as the user is willing to sacrifice throughut. C. Real-World Comarison We finally comare the analysis with exerimentally obtained results in Figure 7 and show that our analysis rovides

ED] t ms 0 00 80 0 40 0 0.4..8...4. Redundancy R 3 4 8 Figure 7: Exerimental solid lines and analytical dotted lines results for various over a 5 Mbs lin with RTT 0 ms and ǫ 0.. a reasonable aroximation to real-world rotocols. The exeriments were conducted using Coded TC CTC over an emulated networ similar to the one used in 7] with a rate of 5 Mbs and a RTT of 0 ms. The only difference between our setu and theirs was that we fixed CTC s congestion control window size cwnd to be equal to the BD of the networ in order to eliminate the affects of fluctuating cwnd sizes. There are several contributing factors for the differences between the exerimental and analytical results shown in the figure. First, the analytical model aroximates the algorithm used in CTC. Where we assume feedbac is only acted uon at the end of a generation, CTC roactively acts uon feedbac and does not wait until the end of a generation to determine if retransmissions are required. CTC s standard deviation is less than the analytical standard deviation as a result. Second, the exeriments include additional rocessing time needed to accomlish tass such as coding and decoding, while the analysis does not. Finally, the assumtions made in Sections III and V effectively lower boundsed] and E D ]. Regardless, the analysis does rovide a fairly good estimate of the in-order delay and can be used to hel inform decisions regarding the aroriate generation size to use for a given networ/lin. IX. CONCLUSION In this aer, we addressed the use of transort layer coding to imrove alication layer erformance. A coding algorithm and an analysis of the in-order delivery delay s first two moments were resented, in addition to numerical results addressing when and how much redundancy should be added to a acet stream to meet a user s delay constraints. These results showed that the coding window size that minimizes the exected in-order delay is largely insensitive to the BD of the networ for some cases. Finally, we comared our analysis with the measured delay of an imlemented transort rotocol, CTC. While our analysis and the behavior of CTC do not rovide a one-to-one comarison, we illustrated how our wor can be used to hel inform system decisions when attemting to minimize delay. ACKNOWLEDGMENTS We would lie to than the authors of 7] for the use of their CTC code. Without their hel, we would not have been able to collect the exerimental results. REFERENCES ] Sandvine, Global Internet henomena. Online, May 04. ] Y. Xia and D. Tse, Analysis on acet Resequencing for Reliable Networ rotocols, in INFOCOM, vol.,. 990 000, Mar. 003. 3] J. K. Sundararajan, D. Shah, M. Médard, S. Jaubcza, M. Mitzenmacher, and J. Barros, Networ Coding Meets TC: Theory and Imlementation, roc. of the IEEE, vol. 99,. 490 5, Mar. 0. 4] V. Subramanian, S. Kalyanaraman, and K. K. Ramarishnan, Hybrid acet FEC and Retransmission-Based Erasure Recovery Mechanisms for Lossy Networs: Analysis and Design, in COMSWARE, 007. 5] O. Ticoo, V. Subraman, S. Kalyanaraman, and K. K. Ramarishnan, LT-TC: End-to-End Framewor to Imrove TC erformance Over Networs with Lossy Channels, in IWQoS,. 8 93, 005. ] B. Ganguly, B. Holzbauer, K. Kar, and K. Battle, Loss-Tolerant TC LT-TC: Imlementation and Exerimental Evaluation, in MILCOM, 0. 7] M. Kim, J. Cloud, A. arandehgheibi, L. Urbina, K. Fouli, D. J. Leith, and M. Médard, Congestion Control for Coded Transort Layers, in ICC, June 04. 8] T. Ho, M. Médard, R. Koetter, D. Karger, M. Effros, J. Shi, and B. Leong, A Random Linear Networ Coding Aroach to Multicast, IEEE Trans. on Info. Theory, vol. 5, no. 0,. 443 4430, 00. 9] A. Heidarzadeh, Design and Analysis of Random Linear Networ Coding Schemes: Dense Codes, Chuned Codes and Overlaed Chuned Codes. h.d. Thesis, Carleton University, Ottawa, Canada, Dec. 0. 0] D. Lucani, M. Médard, and M. Stojanovic, Broadcasting in Time- Division Dulexing: A Random Linear Networ Coding Aroach, in NetCod,. 7, June 009. ] D. Lucani, M. Médard, and M. Stojanovic, Online Networ Coding for Time-Division Dulexing, in GLOBECOM, Dec. 00. ] D. Lucani, M. Stojanovic, and M. Médard, Random Linear Networ Coding For Time Division Dulexing: When To Sto Taling And Start Listening, in INFOCOM,. 800 808, Ar. 009. 3] T. Dialiotis, A. Dimais, T. Ho, and M. Effros, On the Delay of Networ Coding Over Line Networs, in ISIT, June 009. 4] M. Nistor, R. Costa, T. Vinhoza, and J. Barros, Non-Asymtotic Analysis of Networ Coding Delay, in NetCod, June 00. 5] E. Drinea, C. Fragouli, and L. Keller, Delay with Networ Coding and Feedbac, in ISIT,. 844 848, June 009. ] A. Eryilmaz, A. Ozdaglar, and M. Médard, On Delay erformance Gains From Networ Coding, in CISS,. 84 870, Mar. 00. 7] B. Swana, A. Eryilmaz, and N. Shroff, Throughut-Delay Analysis of Random Linear Networ Coding for Wireless Broadcasting, IEEE Trans. on Information Theory, vol. 59,. 38 34, Oct. 03. 8] H. Yao, Y. Kochman, and G. W. Wornell, A Multi-Burst Transmission Strategy for Streaming Over Blocage Channels with Long Feedbac Delay, IEEE JSAC, vol. 9,. 033 043, Dec. 0. 9] M. Nistor, J. Barros, F. Vieira, T. Vinhoza, and J. Widmer, Networ Coding Delay: A Brute-Force Analysis, in ITA, Jan. 00. 0] J. Sundararajan,. Sadeghi, and M. Médard, A Feedbac-Based Adative Broadcast Coding Scheme for Reducing In-Order Delivery Delay, in NetCod, June 009. ] W. Zeng, C. Ng, and M. Médard, Joint Coding and Scheduling Otimization in Wireless Systems with Varying Delay Sensitivities, in SECON,. 4 44, June 0. ] G. Joshi, Y. Kochman, and G. W. Wornell, On laybac Delay in Streaming Communication, in ISIT,. 85 80, July 0. 3] G. Joshi, Y. Kochman, and G. Wornell, Effect of Bloc-Wise Feedbac on the Throughut-Delay Trade-Off in Streaming, in INFOCOM Worsho on Contemorary Video, Ar. 04. 4] M. Tömösözi, F. H. Fitze, F. H. Fitze, D. E. Lucani, M. V. edersen, and. Seeling, On the Delay Characteristics for oint-to- oint Lins using Random Linear Networ Coding with On-the-Fly Coding Caabilities, in Euroean Wireless 04, May 04. 5] R. Koetter and F. Kschischang, Coding for Errors and Erasures in Random Networ Coding, IEEE Trans. on Information Theory, vol. 54,. 3579 359, Aug. 008.