Reliability of Safety-Critical Systems Chapter 8. Probability of Failure on Demand using IEC 61508 formulas Mary Ann Lundteigen and Marvin Rausand mary.a.lundteigen@ntnu.no &marvin.rausand@ntnu.no RAMS Group Department of Production and Quality Engineering NTNU (Version 1.2 per August 2016) M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 1 / 28
Reliability of Safety-Critical Systems Slides related to the book Reliability of Safety-Critical Systems Theory and Applications Wiley, 2014 Theory and Applications Marvin Rausand Homepage of the book: http://www.ntnu.edu/ross/ books/sis M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 2 / 28
Purpose The main purpose of this presentation is to: Introduce and explain the simplified formulas in IEC 61508, part 6 for calculating the average probability of failure on demand (PFD avg ). Introduce and discuss some of the related concepts and assumptions The main content of this presentation is taken from Chapter 8.4. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 3 / 28
Remark The explanation used in the textbook is not the only one. A more thorough derivation of formulas for IEC 61508 has been made by Dr. Fares Innal, in his PhD thesis: Innal, F. (2008) Contribution to modeling safety instrumented systems and to assessing their performance. Critical analysis of IEC 6 508 standard. PhD thesis. University of Bordeaux, Bordeaux, France. Link to the report is found here: http://frigg.ivt.ntnu.no/ross/sis/innal08.pdf (copy the link) M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 4 / 28
Overall concept: Assumptions IEC 61508 (part 6) presents simplified formulas for selected voted configurations, derived with basis in the following assumptions: PFD (G) avg =,G t GE where,g is the Group failure frequency of dangerous failures,,g and t GE is the Group-equivalent mean downtime. The following slides are all about explaining how the formulas in IEC 61508 (part 6) are derived and to suggest generalized formulas for koon votes systems. Assumptions underlying formulas: Any parallel structure of channels constitutes identical components. Time to failure and repair times are exponentially distributed. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 5 / 28
Overall concept: Illustration PFD avg=,g t GE Group of channels,g t GE M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 6 / 28
Overall concept: Illustration Group of channels which upon failure allows continued operation in degraded mode Channel D t CE,2=MTTR U t CE,1=τ/2 + MRT Group of channels which if all fail - results in system failure M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 7 / 28
Equivalent mean downtimes It is useful to introduce three categories of equivalent mean downtimes: Channel equivalent mean downtime, t CE, i.e. the equivalent mean downtime with a single dangerous failure Group equivalent mean downtime, t GE, in the meaning of group equivalent mean downtime where the subsystem is fully unavailable: In case of a single element subsystem: This is t GE calculated for the case where the subsystem has a single failures. I.e., t GE = t CE for single subsystems. In case of a koon voted subsystem: This is t GE, calculated for the case where the subsystem has n k + 1 failures. Group equivalent downtime in degraded mode of operation, where the subsystem is still able to carry out its function: This is t GjE with j=1..n-k. This parameter is needed for the calculation of,g. For j = 1, t G1E = t CE. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 8 / 28
Overall concept: Illustration t CE Group of channels which upon failure allows continued operation in degraded mode Channel λdd tce,2=mttr λdu tce,1=τ/2 + MRT t GjE t GE Group of channels which if all fail - results in system failure M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 9 / 28
Examples from standard: 2oo3 voted subsystem M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 10 / 28
Examples from standard: 1oo3 voted subsystem M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 11 / 28
Remark Use of notations To maintain the general equation PFD (G) avg =,G t GE, we decided to always use t GE for the multiplicity of channel failures where the subsystem enters the failed state. This means that there may be some difference in notations used here and IEC 61508, however, when all formulas are expanded the result is the same. For example, in the case of the formula for 1oo3 copied from IEC 61508 (part 6) we would switch around the meaning of t GE and t G2E. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 12 / 28
Formulas for equivalent mean downtimes Channel equivalent mean downtime, t CE : t CE = U ( ) τ 2 + MRT Group equivalent mean downtime, t GE : t GE = λ ( ) DU τ n k + 2 + MRT + D MTTR + D MTTR Group equivalent mean downtime in degraded mode, t GjE, j = 2..(n k): t GjE = λ ( ) DU τ j + 1 + MRT + D MTTR For the group we assume that the downtime of DD failures is not affected by how many channels being down. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 13 / 28
Explanation: 1oo3 voted system Illustration considering DU failures only: Upon first failure: 1oo3 system F τ OK τ/2 (i.e. tce) OK Upon second failure: F τ F τ/3 (i.e. tg2e) OK Upon third failure: F τ F τ/4 (i.e. tge) F M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 14 / 28
Dangerous group frequency,,g The dangerous group frequency -,G - of a subsystem:.. is the frequency of events where the subsystem is unavailable... where the subsystem is regarded as a group of elements... and if the group is voted koon, this is the frequency of events where the subsystem has n k + 1 failures. May be calculated using the following reasoning: First, one of the channels must fail (failure rate) Then the second channel must fail while the first channel is down (probability) Then the third channel must fail while two channels are down (probability)... Then the (n-k+1)th channel must fail while n-k channels are down (probability) May be regarded as disjoint events as the restoration is fast compared to failure rate of channels. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 15 / 28
How to determine,g? Case study of a 1oo2 system Consider a 1oo2 system with identical and independent channels with failure rate. As the channels are independent, they will not fail at exactly the same time. The first D failure occurs with rate 2, since any of the two components may fail first. The mean downtime of a single channel is t CE. A dangerous group failure occurs if the second channel fails when the first channel is down. The probability that this happens is 1 e t CE t CE when t CE < 0.1. This means that the group failure rate for a 1oo2 system is: λ (1oo2) D,G 2 t CE = 2λ 2 D t CE M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 16 / 28
How to determine,g? Case study of a 2oo3 system For a 2oo3 system with identical and independent channels with failure rate, we may assume that: The first (D) failure occurs with rate 3, since any of the three components may fail first. A dangerous group failure occurs if a second channel out of the the two remaining channels fails, while a single channel is unavailable. A channel is unavailable for a period t CE.The probability of this event occurring is Pr(T D < t CE ) = 1 e 2t CE 2 t CE when t CE < 0.1. This means that: λ (2oo3) D,G 6λ 2 D t CE M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 17 / 28
What is,g? Case study of a 1oo3 system For a 1oo3 system with identical and independent channels with failure rate, we may assume that: The first (D) failure occurs with rate 3, since any of the three components may fail first. The probability that a second channel fails (out of the remaining two channels) while a single channel is unavailable is Pr(T D ) < t CE = 1 e 2t CE 2 t CE when t CE < 0.1. The dangerous group failure occurs if a third channel fails while two other channels are unavailable. Two channels are unavailable for a period t G2E. The probability of this event occurring is Pr(T D < t G2E ) = 1 e t G2E t G2E when t G2E < 0.1. This means that: λ (1oo3) D,G 6λ 3 D t CEt G2E M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 18 / 28
What is,g? koon system Now consider a koon system. The first failure occurs with rate n. The probability that the second failure occurs while the first one is down is (n 1) t CE, the probability that a third failure occurs while two are down is (n 2) t GE2, and so on till the group failure (involving n-k+1 failures) occurs with probability that (n k + 1) t GE(n k). λ (koon) D,G λ n k+1 D n k k (n j + 1)t GjE where t GjE is the equivalent downtime due to the jth failure: j=1 t GjE = U τ ( j + 1 + MRT ) + D MTTR Note that t G1E is what we previously have defined as t CE. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 19 / 28
What is,g? Example equations The equation developed for koon may be used to set up the equations for specific combinations of k and n: k/n 1oo1 1oo2 1oo3 2oo3 1oo4 2oo4 Group failure frequency 2λ 2 D t CE 6λ 3 D t CEt GE2 6λ 2 D t CE 24λ 4 D t CEt GE2 t GE3 24λ 3 D t CEt GE2 M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 20 / 28
What is PFD avg? Recall that the general equation for PFD avg was: PFD avg =,G t GE We may expand this equation for a selection of systems: k/n PFD avg t GE λ 1oo1 t DU GE ( τ 2 + MRT ) + D 1oo2 2λ 2 D t λ CEt DU GE ( τ 3 + MRT ) + D 1oo3 6λ 3 D t CEt GE2 t GE U 2oo3 6λ 2 D t CEt GE U 1oo4 24λ 4 D t CEt GE2 t GE3 t GE U 2oo4 24λ 3 D t CEt GE2 t GE U MTTR MTTR ( τ 4 + MRT ) + D MTTR ( τ 3 + MRT ) + D MTTR ( τ 5 + MRT ) + D MTTR ( τ 4 + MRT ) + D MTTR M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 21 / 28
Inclusion of CCFs IEC 61508 includes the contribution from CCFs due to DU as well as DD failures using the standard beta factor model. The fraction of the DD failure rate that is CCFs is denoted β D : λ (i) DD = (1 β D )D and λ (c) DD = β DD The fraction of the DU failure rate that is CCFs is denoted β λ (i) DU = (1 β)u and λ (c) DU = βu M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 22 / 28
Inclusion of CCFs CCFs are included as a virtual functional block in the reliability block diagram, one for the CCFs associated with DD failures and one for CCFs associated with DU failures. DU ind DD ind 1 DU ind DD ind 2 DU ind DD ind 1 DU ind DD ind 3 CCF DU CCF DD DU ind DD ind 2 DU ind DD ind 3 Figure: A RBD of a 2oo3 system with CCFs included M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 23 / 28
Inclusion of CCFs As the functional block for CCFs are single elements, it is rather straight forward to set up the PFD avg (note that λ (c) D = λ(c) DU + λ(c) DD : [ λ (c) ( PFDavg CCF = λ (c) D t(c) CE = λ(c) DU τ D λ (c) 2 + MRT D = βu ( τ 2 + MRT ) + β D D MTTR ) + λ(c) ] DD λ (c) MTTR D The contribution to PFD from the independent part remains as before, except that the fraction (1 β) and (1 β D ) must be extracted for the DU and DD failure rate respectively. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 24 / 28
Inclusion of CCFs: Example Consider a 1oo3 system of identical components that may be subject to CCFs. The PFD avg becomes: PFD avg = 6((1 β D )D + (1 β)u ) 3 t CE t GE2 t GE + β D D MTTR + βu ( τ 2 + MRT ) M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 25 / 28
IEC formulas using Markov approach The same formulas may be derived using Markov methods, (or more precisely using approximate Markov steady state models based on multiphase Markov models). Consider figure 3.19 in Fares Innal master thesis. This diagram can be modified for different architectures, but this one is used to illustrate derivation of formula for a xoo3 system. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 26 / 28
IEC formulas using Markov approach For a 1oo3 system, the PFDavg corresponds to the steady state probability of state 4: PFD avg = P 4 ( ) = 6 3 6 λ 3 D + 6 λ 2 6 D µ 3 + 3 µ 2 µ 3 + µ 1 µ 2 µ 3 µ 1 µ 2 µ 3 3 For a 2oo3 system, the PFDavg corresponds to the steady state probability of state 3 and 4 (but probability of state 3 is the dominating): PFD avg P 3 ( ) = 6λ 2 D µ 3 6 λ 3 D + 6 λ 2 6 D µ 3 + 3 µ 2 µ 3 + µ 1 µ 2 µ 3 µ 1 µ 2 2 M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 27 / 28
Meaning of µ i The meaning of µ i is identical to the meaning of t GEi : µ 1 = t G1E = t CE µ 2 = t G2E (which is t GE for 2oo3 system) µ 3 = t G3E (which is t GE for 1oo3 system) Remarks: t CE is always used when the equivalent mean downtime concerns a single channel t GE is always used when the equivalent mean downtime concerns the whole group (at (n-k+1)th failure) t GjE is otherwise used to represent multiple failures up to the (n-k)th failures. M.A.Lundteigen (RAMS Group) Reliability of Safety-Critical Systems (Version 1.2) 28 / 28