Copyright 999 Uiversity of Califoria Reliability ad Queueig by David G. Messerschmitt Supplemetary sectio for Uderstadig Networked Applicatios: A First Course, Morga Kaufma, 999. Copyright otice: Permissio is grated to copy ad distribute this material for educatioal purposes oly, provided that this copyright otice remais attached. The reliability of a system of compoets is importat to ayoe desigig high-availability systems. The cogestio pheomeo affects the scalability ad capacity of both servers ad etworks. What both these issues have i commo is radom arrivals. I the case of reliability, these arrivals are compoet failures, i the case of servers they are tasks to compute, ad i the commuicatio liks they are packets. These three problems share a commo modelig framework based o probability theory. This appedix will characterize the effectiveess of compoet redudacy, i which a compoet is replicated to mitigate the effect of compoet failure. It will also cosider the statistics of waitig time i queues, applyig to server task or etwork queues. A basic kowledge of probability theory is presumed, ad further details ca be foud i [Rad89], [Wal9], ad [Wal96]. The Expoetial Distributio A expoetially distributed radom variable X 0 has a probability desity fuctio f( x λe λx with mea value λ ad probability distributio fuctio Fx ( e λx. This distributio plays a major role i both reliability aalysis ad queueig aalysis. The expoetial distributio has oe importat property that helps uderstad the followig: it is memoryless. If we let 0 < < x, the the coditioal probability that X< x give that X is Pr{ X< x X } Fx ( Fx ( 0 Fx ( 0 e λ ( x x0 Fx (. EQ This is a expoetial distributio refereced to x rather tha x 0 the distributio is uaffected by the kowledge that X. Compoet Failure I modelig the failure of compoets i a system, statistical techiques should be used because the time of failure caot be predicted with certaity (see "How Effective is Redudacy?" i Chapter 3. The simplest assumptio, ad a reasoable oe for electroic compoets ad subsystems is that a compoet fails at a costat rate λ. This meas that for a large umber of compoets, i a iterval of legth a fractio λ of the compoets that are still workig at the begiig of that iterval will fail o average, i the limit as 0. Let X be a radom variable deotig the time to failure for oe compoet, ad let its distributio fuctio be Fx ( ; that is, Fx ( is the probability that X x. The at some time x, a fractio Fx ( of these compoets will ot have failed, ad the probability of compoet failure i the Page 8/8/99
Copyright 999 Uiversity of Califoria iterval [, xx+ ] is Fx ( + Fx (. Thus, the coditio for costat failure rate is Fx ( + Fx ( λ ( Fx ( as 0. EQ 2 This ca be rewritte as F ( x λ ( Fx (. EQ 3 The solutio to this differetial equatio (with the appropriate boudary coditio F( 0 0 is the expoetial distributio, Fx ( e λx. The average time to failure is the mea value of this distributio, T λ. O reflectio, the memoryless property of the expoetial distributio is cosistet with the costat failure rate assumptio. Kowig that a compoet has lasted for some period says othig (i the statistical sese about remaiig time to failure the compoet has o mechaism for wearig out, which would result i a icreasig failure rate with time ad make its previous lifetime relevat. This assumptio is reasoably valid for electroic compoets, although it would be questioable for compoets with mechaical elemets (like disk drives or keyboards. Compoet Redudacy Suppose replicas of a compoet are provided, oly oe of which eed work for system availability. Presumably this redudacy icreases the time to system failure, but how much? Examp le If a computer must be fuctioig as part of a etworked applicatio, the the availability of the applicatio ca be icreased by replicatig the computer (ad its software times. If oe or more of these computers fail, oe of the still-fuctioig computers is used. It is oly ecessary for ay oe of these computers to be fuctioig for the applicatio to be available. Of course, a differet aalysis is required i differet circumstaces. For example, i some systems there may be differet compoets, all of which have to be workig i order for the system to be available. Assume there are compoets, all of which must fail for a system to fail. Further assume that each has a costat failure rate λ, ad the compoets fail idepedetly. The at time x, the probability that oe compoet has failed is Fx ( ad the probability that all compoets have failed is F ( x, sice they fail idepedetly. The probability desity of the time to failure is thus the derivative of this distributio fuctio, F ( x F ( x, ad the average time to failure of all compoets is T xf ( x F ( x dx 0 Substitutig the expoetial distributio, T x( e λx ( λe λx dx 0. EQ 4 which (after some tedious algebra ad referece to a table of itegrals evaluates to EQ 5 Page 2 8/8/99
Copyright 999 Uiversity of Califoria T ( i + λ i i i This last idetity ca be established by defiig the polyomials λ i i. EQ 6 gx ( ( i + ad, EQ 7 i x i hx ( x i i i i where it suffices to show that g( h(. The derivatives are easily determied, g ( x ( x - ad h ( x x. EQ 8 x - x g ( x Itegratig the right side of Equatio 8, it is readily show that g( g( 0 h( h( 0 ad the expected result follows from g( 0 h( 0 0. Task ad Packet Arrivals Suppose tasks or packets arrive a radom times 0 < T < T 2 < T 3 <. These arrival times form a Poisso process with rate λ if the iterarrival times T, T 2 T, T 3 T 2, are idepedet radom variables with idetical expoetial distributios with mea λ. The arrival rate (expected umber of arrivals i uit time is λ. Note the parallel to reliability, where the cocer is with a sigle arrival ; amely, a failure. With queueig, there is a sequece of radom arrivals, but by assumptio the distributio of iter-arrival times is the same as the distributio of time to failure. From the memoryless property, kowig the elapsed time from the last arrival says othig about the time to the ext arrival it has the same expoetial distributio ad takes λ o average. Thus, each arrival has o awareess (i the statistical sese of the last arrival, which is a reasoable assumptio if the arrivals (tasks or packets are from idepedet sources. I order to characterize the waitig time i a queue (see "Modelig Cogestio" i Chapter 7, assumptios must also be made as to the statistics of task processig time or packet trasmissio time (which is queueig theory is called the service time. The simplest assumptio is that there is a sigle server, ad the service times are radom, idepedet, ad distributed with a expoetial distributio with mea µ. This says that as log as there are tasks or packets waitig, the costat rate at which they are serviced (processed or trasmitted is µ. A queue with this expoetial service time ad Poisso arrivals is called a M/M/ queue. It is the simplest case may more complicated possibilities are possible. The utilizatio of the server is defied as ρ λ µ. For the server to be able to keep up with the arrivals, we must have a arrival rate less tha a service rate; that is, λ < µ or ρ <. Further derivatios of queueig results is beyod the scope of this book, other tha to ote two useful results. The average umber of tasks or packets waitig i the queue is ρ L - ρ i EQ 9 Page 3 8/8/99
Copyright 999 Uiversity of Califoria ad the average completio time of a task (from arrival to completio of service or latecy of a packet (from arrival util the ed of trasmissio time is T -. EQ 0 µ λ Statistical Multiplexig Statistical multiplexig ca ow be modeled usig the M/M/ queueig model (see "Sharig Commuicatio Liks: Statistical Multiplexig" i Chapter 8. Assume that N differet streams of packets, each with idepedet Poisso process arrival rate λ, are multiplexed together. The aggregate arrivals ca the be show to be a Poisso process with rate Nλ. Packets are assumed to have radom legths idepedet of oe aother ad with a expoetial distributio with mea L bits. This implies that the trasmissio time for a commuicatio lik with bitrate R is expoetial with average trasmissio time µ L R or rate µ R L. The performace parameter of iterest is the average latecy, which is L T -. EQ R L Nλ R NλL I this equatio, NλL is the average aggregate icomig bitrate (packet arrival rate times average packet legth, which must be less tha the lik bitrate R. The latecy is proportioal to the average packet legth, ad icreases as the aggregate bit rate icreases. It approaches a miimum of T L R, the average trasmissio time, for low arrival rates. The statistical multiplexig advatage arises from comparig this latecy to the case where the lik bitrate R is statically divided ito N lower-bitrate streams each with rate R N, with oe icomig packet stream applied to each (this is called time-divisio multiplexig. For each of these subchaels, the packet arrival rate is λ ad the average service time is µ NL R or rate µ R NL, ad thus the latecy is - NL. EQ 2 R NL λ R NλL NT Thus, the average latecy has bee icreased by a factor N over the statistical multiplexig case. This quatifies oe advatage of statistical multiplexig the reductio i packet latecy but there are others. For example, time-divisio multiplexig limits the average bitrate of each ad every packet stream to R N, but statistical multiplexig limits oly the aggregate average bitrate. Although ot reflected i the assumptios ad aalysis above, this is of practical importace because the bitrate resources are allocated more dyamically ad flexibly. Exercises E. Suppose a simple system is decomposed ito two compoet. If either compoet fails, the system fails. Assume the compoets fail idepedetly at a costat rate λ ad λ 2 respectively, ad show that the mea time to system failure is ( λ + λ 2. E2. Assume a system is composed of compoets, all with the same costat failure rate λ. Failure of Page 4 8/8/99
Copyright 999 Uiversity of Califoria ay oe of the compoets meas system failure. a. Fid the distributio of time to system failure. b. Fid the mea time to system failure. Page 5 8/8/99