Understanding and Using Availability Jorge Luis Romeu, Ph.D. ASQ CQE/CRE, & Senior Member Email: romeu@cortland.edu htt://myrofile.cos.com/romeu ASQ/RD Webinar Series Noviembre 5, J. L. Romeu - Consultant (c)
Webinar Take-Aways Understanding Availability from a ractical standoint Calculating different Availability ratings Practical and Economic ways of enhancing Availability J. L. Romeu - Consultant (c)
Summary Availability is a erformance measure concerned with assessing a maintained system or device, with resect to its ability to be used when needed. We overview how it is measured under its three different definitions, and via several methods (theoretical/ractical), using both statistical and Markov aroaches. We overview the cases where redundancy is used and where degradation is allowed. Finally, we discuss ways of imroving Availability and rovide numerical examles. J. L. Romeu - Consultant (c) 3
When to use Availability When system/device can fail and be reaired During maintenance, system is down After maintenance, system is again u Formal Definition: a measure of the degree to which an item is in an oerable state at any time. (Reliability Toolkit, RIAC) J. L. Romeu - Consultant (c) 4
System Availability A robabilistic concet based on: Two Random Variables X and Y X, System or device time between failures Y, Maintenance or reair time Long run averages of X and Y are: E(X) Mean time Between Failures (MTBF) E(Y) Exected Maintenance Time (MTTR) J. L. Romeu - Consultant (c) 5
Availability by Mission Tye Blanchard (Ref. ): availability may be exressed differently, deending on the system and its mission. There are three tyes of Availability: Inherent Achieved Oerational J. L. Romeu - Consultant (c) 6
Inherent Availability: A i * Probability that a system, when used under stated conditions, will oerate satisfactorily at any oint in time. * A i excludes reventive maintenance, logistics and administrative delays, etc. A i MTBF MTBF MTTR J. L. Romeu - Consultant (c) 7
Achieved Availability: A a * Probability that a system, when used under stated conditions, will oerate satisfactorily at any oint in time, when called uon. * A i includes other activities such as reventive maintenance, logistics, etc. J. L. Romeu - Consultant (c) 8
Oerational Availability: A o * Probability that a system, when used under stated conditions will oerate satisfactorily when called uon. * A o includes all factors that contribute to system downtime (now called Mean Down Time, MDT) for all reasons (maintenance actions and delays, access, diagnostics, active reair, suly delays, etc.). J. L. Romeu - Consultant (c) 9
A o Long Run average formula: Fav. Cases A o P( System. U) Tot. Cases UTime. E( X ) Cycle. Time E( X ) E( Y ) MTBF MTBF MDT J. L. Romeu - Consultant (c)
Numerical Examle Event SubEvent Time Inherent Achieved Oerational U Running 5 5 5 5 Down Wait-D Down Diagnose 5 5 5 5 Down Wait-S 3 3 Down Wait-Adm Down Install 8 8 8 8 Down Wait-Adm 3 3 U Running 45 45 45 45 Down Preventive 7 7 7 U Running 5 5 5 5 UTime 47 47 47 47 Maintenance 3 38 Availability.988.88.7946 J. L. Romeu - Consultant (c)
Formal Definition of Availability Hoyland et al (Ref. ): availability at time t, denoted A(t), is the robability that the system is functioning (u and running) at time t. X(t): the state of a system at time t * u and running, [X(t) = ], * down and failed [X(t) = ] A(t) can then be written: A(t) = P{X(T) = }; t > J. L. Romeu - Consultant (c)
Availability as a R. V. A X X Y ; X, Y The roblem of obtaining the density function of A resolved via variable transformation of the joint distribution Based on the two Random Variables X and Y time to failure X, and time to reair Y Exected and Variance of the Availability r.v. L (th Percentile of A) = P{A <.} =. First and Third Quartiles of Availability, etc. Theoretical results, aroximated by Monte Carlo J. L. Romeu - Consultant (c) 3
Monte Carlo Simulation Generate n = 5 random Exonential failure and reair times: X i and Y i Obtain the corresonding Availabilities: A i = X i /( X i + Y i ); i 5 Sort them, and calculate all the n = 5 A i results, numerically Obtain the desired arameters from them. J. L. Romeu - Consultant (c) 4
Numerical Examle Use Beta distribution for exediency Ratio yielding A i is distributed Beta(μ ; μ ) Time to failure (X) mean: μ = 5 hours Time to reair (Y): μ = 3 hours Generate n = 5 random Beta values with the above arameters μ and μ Obtain the MC Availabilities: A i J. L. Romeu - Consultant (c) 5
Frequency Histogram of Examle 4 3.89.9.9.9.93.94.95.96.97.98 Beta5-3 J. L. Romeu - Consultant (c) 6
Estimated Parameters of Examle MC Results for Beta(5,3) Examle: Average Availability =.9435 Variance of Availability = 9.9x -5 Life L =.935 Quartiles:.937 and.955 P(A) >.95.9499 R.949 =3653 3653 P{ A.95} P{ A.95}.736 5.694 J. L. Romeu - Consultant (c) 7
Markov Model Aroach Two-state Markov Chain (Refs. 4, 5, 6, 7) Monitor status of system at time T: X(T) Denote State (Down), and State (U) X(T) = : system S is down at time T What is the robability q (or ) that system S is U (or Down) at time T, given that it was Down (or U) at time T-? J. L. Romeu - Consultant (c) 8
Markov Reresentation of S: P{ X ( T ) X ( T ) } q P{ X ( T ) X ( T ) } J. L. Romeu - Consultant (c) 9
Numerical Examle: System S is in state U; then moves to state Down in one ste, with Prob. = =. A Geometric distribution with Mean μ=/ = 5 hours. System S is in state Down; then moves to state U in one-ste, with Probability = q =.33 A Geometric distribution, with Mean μ=/q = 3 hours. Every ste (time eriod to transition) is an hour. The Geometric distribution is the Discrete counterart of the Continuous Exonential J. L. Romeu - Consultant (c)
Transition Probability Matrix P States States ( q q) (.967.33) ( ) (..998) Entries of Matrix P = ( ij ) corresond to the Markov Chain s one-ste transition robabilities. Rows reresent every system state that S can be in, at time T. Columns reresent every other state that S can go into, in one ste (i.e. where S will be, at time T+). J. L. Romeu - Consultant (c)
Obtain the robability of S moving from state U to Down, in Two Hours P q ( () ( q) q) q ( q ) q q( q q) ( q q q( ( ) q) ) ( ) q () oo () () () Hence, the robability S will go down in two hours is: () ( q) ( ).3 J. L. Romeu - Consultant (c)
J. L. Romeu - Consultant (c) 3 Other useful Markov results: If () =.3 => () =- () =A(T)=.993 system Availability, after T= hours of oeration Prob. of moving from state to, in stes: (P) => () =.7; includes that S could have gone Down or U, then restored again, several times. For sufficiently large n (long run) and two-states: ) /( ) /( ) /( ) /( } ) ( { q q q q q q q q q q q q q Limit P Limit n n n n Examle: U = q/(+q) =.943 ; Down = /(+q) =.57
Markov Model for redundant system A Redundant System is comosed of two identical devices, in arallel. The System is maintained and can function at a degraded level, with only one unit UP. The System has now three States:,, : State, the Down state; both units are DOWN State, the Degraded state; only one unit is UP State, the UP state; both units are oerating J. L. Romeu - Consultant (c) 4
J. L. Romeu - Consultant (c) 5 i j ii ij i T X i T X P T X T X P q T X T X P T X T X P q T X T X P } ) ( ) ( { } ) ( ) ( { } ) ( ) ( { } ) ( ) ( { } ) ( ) ( { Markov Model system reresentation
Oerational Conditions Every ste (hour) T is an indeendent trial Success Prob. ij corresonds to a transition from current state i into state j =,, Distribution of every change of state is the Geometric (Counterart of the Exonential) Mean time to accomlishing such change of state is: μ = / ij J. L. Romeu - Consultant (c) 6
Transition Probability Matrix P: States States P q q q q As before, the robability of being in state j after n stes, given that we started in some state i of S, is obtained by raising matrix P to the ower n, and then looking at entry ij of the resulting matrix P n. J. L. Romeu - Consultant (c) 7
Numerical Examle Probability of either unit failing in the next hour is. Probability q of the reair crew comleting a maintenance job in the next hour is.33 Only one failure is allowed in each unit time eriod, and only one reair can be undertaken at any unit time J. L. Romeu - Consultant (c) 8
Probability that a degraded system (in State ) remains degraded after two hours of oeration: Sum robabilities corresonding to 3 events the system status has never changed. one unit reaired but another fails during nd hour remaining unit fails in the first hour (system goes down), but a reair is comleted in the nd hour J. L. Romeu - Consultant (c) 9
P Numerical Examle: P].934..33 P [ P q ( q) () q (.35)..33 The robability that a system, in degraded state, is still in degraded state after two hours, is: P =.934 J. L. Romeu - Consultant (c) 3
Mean time μ that the system S sends in the Degraded state System S can change to U or Down with robabilities and q, resectively S will remain in the state Degraded with robability - - q (i.e. no change) On average, S will send a sojourn of length / ( + q) = /.35 = 8.57 hrs in the Degraded state, before moving out. J. L. Romeu - Consultant (c) 3
Availability at time T A(T) = P{S is Available at T} System S is not Down at time T Then, S can be either U, or Degraded A(T) deends on the initial state of S Find Prob. S is Degraded Available at T, given that S was Degraded at T= (initially). ( T ) A( T) ( T ) P{ X ( T) ( T ) X () ( T ) } ( T ) ( T ) ( T ) ( T ) ( T ) J. L. Romeu - Consultant (c) 3
State Occuancies Long run averages of system sojourns Asymtotic robabilities of system S being in each one of its ossible states at any time T Or the ercent time S sent in these states Irresective of the state S was in, initially. Results are obtained by considering Vector П of the long run robabilities: J. L. Romeu - Consultant (c) 33
Characteristics of Vector П Limit T (Prob{ X ( T) },Prob{ X ( T) },Prob{ X ( T) }) Vector П fulfills two imortant roerties: ( ) : P ;() ; with : Limit Prob{ X ( T) i} i i T П P = П (Vector П times matrix P equals П) defines a system of linear equations, normalized by the nd roerty. P with :,, 3,, i i 3 3 3 33 J. L. Romeu - Consultant (c) 34
Numerical Examle P.967,,..965.33,,.967..33.965.4.33.996.33.4.996 ; with : i i Solution of the system yields long run occuancy rates:,.65,.74,.886, J. L. Romeu - Consultant (c) 35
Interretation of results: П =.886 indicates that system S is oerating at full caacity 88% of the time. П =.74 indicates that system S is oerating at Degraded caacity % of the time. П : robability corresonding to State (Down) is associated with S being Unavailable (=.65) long run System Availability is given by: A = - П = -.65 =.9935 J. L. Romeu - Consultant (c) 36
Exected Times For System S to go Down, if initially S was Degraded (denoted V ), or U (V ) Or the average time System S sent in each of these states (, ) before going Down. Assume Down is an absorbing state one that, once entered, can never be left Solve a system of equations leading to all such ossible situations. J. L. Romeu - Consultant (c) 37
Numerical Examle: One ste, at minimum (initial visit), before system S goes Down. If S is not absorbed then, system S will move on to any of other, non-absorbing (U, Degraded) state with corresonding robability, and then the rocess restarts: V V V V V V.965V.4V.33V.996V Average times until system S goes down yield: V = 465 (if starting in state Degraded) and V = 4875 (if starting in state U). J. L. Romeu - Consultant (c) 38
Model Comarisons The initially non-maintained system version, would work an Exected hours in U state, before going Down (Ref. 7). The fact that maintenance is now ossible, and that S can oerate in a Degraded state: results in an increase of 3/ 3/.4 / 75.33/. hours in its Exected Time to go Down (from U). The imroved Exected Time is due to the Sum of the Two Exected times to failures: V = 3/λ + μ/λ = 75 + 45 = 4875 45 J. L. Romeu - Consultant (c) 39
Conclusions Availability is the ratio of: U.Time to Cycle.Time Hence, we can enhance Availability by: Increasing the device or system Life Decreasing the maintenance time Simultaneously, doing both above. Decreasing maintenance is usually easier. J. L. Romeu - Consultant (c) 4
Bibliograhy. System Reliability Theory: Models and Statistical Methods. Hoyland, A. and M. Rausand. Wiley, NY. 994.. Logistics Engineering and Management. Blanchard, B. S. Prentice Hall NJ 998. 3. An Introduction to Stochastic Modeling. Taylor, H. and S. Karlin. Academic Press. NY. 993. 4. Methods for Statistical Analysis of Reliability and Life Data. Mann, N., R. Schafer and N. Singurwalla. John Wiley. NY. 974. 5. Alicability of Markov Analysis Methods to Availability, Maintainability and Safety. Fuqua, N. RAC START Volume, No.. htt://rac.alionscience.com/df/markov.df 6. Aendix C of the Oerational Availability Handbook (OPAH). Manary, J. RAC. 7. Understanding Series and Parallel Systems. Romeu, J. L. RAC START. Vol., No. 6. htt://src.alionscience.com/df/availstat.df J. L. Romeu - Consultant (c) 4
About the Author Jorge Luis Romeu is a Research Professor, Syracuse University (SU), where he teaches statistics, quality and oerations research courses. He is also a Senior Science Advisor with Quanterion Solutions Inc., which oerates the RIAC (Reliability Information Analysis Center). Romeu has 3 years exerience alying statistical and oerations research methods to reliability, quality and industrial engineering. Romeu retired Emeritus from SUNY, after fourteen years of teaching mathematics, statistics and comuters. He was a Fulbright Senior Seaker Secialist, at several foreign universities (Mexico:994, and 3; Dominican Reublic: 4, and Ecuador: 6). Romeu is lead author of the book A Practical Guide to Statistical Analysis of Materials Proerty Data. He has develoed and taught many workshos and training courses for racticing engineers and statistics faculty, and has ublished over thirty articles on alied statistics and statistical education. He obtained the Saaty Award for Best Alied Statistics Paer, ublished in the American Journal of Mathematics and Management Sciences (AJMMS), in 997 and 7, and the MVEEC Award for Outstanding Professional Develoment in. Romeu holds a Ph.D. in Oerations Research, is a Chartered Statistician Fellow of the Royal Statistical Society, and a Senior Member of the American Society for Quality, holding certifications in Quality and Reliability. Romeu is the ASQ Syracuse Chater Chair. For more information, visit htt://myrofile.cos.com/romeu J. L. Romeu - Consultant (c) 4