Terminology and Concepts

Size: px
Start display at page:

Download "Terminology and Concepts"

Transcription

1 Terminology and Concepts Prof. Naga Kandasamy 1 Goals of Fault Tolerance Dependability is an umbrella term encompassing the concepts of reliability, availability, performability, safety, and testability. We will now define the above terms in an intuitive fashion. 1.1 Reliability The reliability R(t) of a system is a function of time, and is defined as the conditional probability that the system will perform correctly throughout the interval [t, t]. given that the system was performing correctly at time t. So, reliability is the probability that the system will operate correctly throughout a complete interval of time. Reliability is used to characterize systems in which even momentary periods of incorrect performance are unacceptable, or in which it is impossible to repair the system (e.g., a spacecraft where the time interval of concern may be years). In some other applications such as flight control, the time interval of concern may be a few hours. Fault tolerance can improve a system s reliability by keeping the system operational when hardware and software failures occur. 1.2 Availability Availability A(t) is a function of time, and is defined as the probability that a system is operating correctly and is available to perform its functions at the instant of time t. Availability differs from reliability in that reliability depends on an interval of time, whereas availability is taken at an instant of time. So, a system can be highly available yet experience frequent periods of down time as long as the length of each down-time period is very short. The most common measure of availability is the expected fraction of time that a system is available to correctly perform its functions. 1.3 Performability In many cases, it is possible to design systems that can continue to perform correctly after the occurrence of hardware/software failures, at a diminished level of performance. So, the performability P (L, t) of a system is a function of time, and is defined as the probability that the system performance will be at, or above, some level L at the instant of time t. Performability differs from reliability in that reliability is a measure of the likelihood that all of the functions are performed correctly, whereas performability is a measure of likelihood that some subset of the functions is performed correctly. Graceful degradation is the ability of the system to automatically decrease its level of performance to compensate for hardware/software failures. Fault tolerance can provide graceful degradation and improve performability by eliminating failed hardware/software components, allowing performance at some reduced level These notes are adapted from: B. W. Johnson, Design and Analysis of Fault Tolerant Digital Systems, Addison Wesley, 1

2 Fig. 1: The full adder circuit used to illustrate the distinction between faults and errors. 1.4 Safety Safety S(t) is the probability that a system will either perform its functions correctly or will discontinue its functions in a manner that does not compromise the safety of any people associated with the system (fail-safe capability). Safety and reliability differ because reliability is the probability that a system will perform its functions correctly, whereas safety is the probability that a system will either perform its functions correctly or will discontinue the functions in a fail-safe manner. 1.5 Maintainability and Testability Maintainability is a measure of the ease with which a system can be repaired, once it has failed. So, the maintainability M(t) is the probability that a failed system will be restored to an operational state within a specified period of time t. The restoration process includes locating and diagnosing the problem, repairing and reconfiguring the system, and bringing the system back to its operational state. 2 Faults, errors, and system failures We define the following basic terms: A fault is a defect within the system. An error is a deviation from the required operation of the (sub)system. A system failure occurs when the system delivers a function deviating from the one specified. There is a cause-and-effect relationship between faults, errors, and failures. Faults result in errors, and errors can lead to system failures. In other words, errors are the effect of faults, and failures are the effect of errors. The full-adder circuit shown in Fig. 1 can be used to illustrate the distinction between faults and errors. The inputs A i, B i, and C i are the two operand bits and the carry bit, respectively. The truth table showing the correct performance for this circuit is shown in Fig. 2. If a short occurs between line L and the power supply line, resulting in line L becoming permanently fixed at a logic 1 value, then a fault (or defect) has occurred in the circuit. The fault is the actual short within the circuit. Fig. 3 shows the truth table of the circuit that contains the physical fault. Comparing Figs. 2 and 3, we see that the circuit performs correctly for the input combinations 1, 11, 11, and 111, but not for, 1, 1, and 11. So, whenever an input pattern is supplied to the circuit that results in an incorrect output, 2

3 Fig. 2: The truth table for the fault-free full-adder circuit. an error has occurred. If the output of the circuit is used to control a relay, and the relay is opened when it should be closed, a failure has occurred. 2.1 Characteristics of Faults System faults can be classified based on their duration. Permanent failures remain in existence indefinitely if no corrective action is taken. Though many are residual design or manufacturing faults, they are also caused by catastrophic events such as an accident. Intermittent failures appear, disappear, and reappear repeatedly. They are difficult to predict, but their effects are highly correlated. Most intermittent faults are due to marginal design, testing, or manufacturing, and manifest themselves under certain environmental or system conditions. Transient failures appear and disappear quickly, and are not correlated with each other. They are most commonly induced by random environmental disturbances such as electro-magnetic interference. Faults can also be characterized based on the underlying cause: (1) Specification mistakes; (2) Implementation mistakes; (3) External disturbances; and (4) Component failures. Fig. 3: The truth table for the full-adder circuit when the line L is stuck at logic value 1. 3

4 System reliability Non-redundant systems Redundant systems Fault avoidance Fault detection Masking redundancy Fault-tolerant systems Dynamic redundancy On-line detection/masking Reconfiguration Retry Online repair Fig. 4: A taxonomy of possible failure-response strategies. 2.2 Fault Models To design a fault-tolerant system, it is necessary to assume that the underlying faults behave according to some fault model. Even though, in practice, faults can be transient in nature, and exhibit complex behavior, fault models are used to make the problem of designing fault-tolerant systems more manageable, and as a way to restrict our attention to a subset of all faults that can occur. A commonly used fault model to capture the behavior of fault digital circuits is the logical stuck-fault model. 2.3 Failure Response Strategies A taxonomy of the primary techniques used to design systems to operate in a fault-prone environment is shown in Fig. 4. Broadly speaking, there are three primary methods: fault avoidance (e.g., shielding from EMI), fault masking (e.g., TMR systems), and fault tolerance. Fault detection does not tolerate faults, but provides a warning that a fault has occurred. Masking redundancy (also called static redundancy) tolerates failures, but provides no warning of them. Dynamic redundancy covers those systems whose configuration can be dynamically changed in response to a fault, or in which masking redundancy is enhanced by on-line fault detection which allows on-line repair. 3 Quantitative Evaluation of System Reliability Reliability of a system R(t) is defined to be the probability of a component or system functioning correctly over a given time period [t, t] under a given set of operating conditions. Consider a set of N identical components, all of which begin operating at the same time. Then, at some time t, the number of components operating correctly is N o (t) and the number of failed components is N f (t). Then, the reliability of a 4

5 component at time t is given by R(t) = N o(t) N = N o (t) N o (t) + N f (t) which is simply the probability that a component has survived the interval [t, t ]. We can also define unreliability Q(t) as the probability that a system will not function correctly over a given period of time. This is also called the probability of failure. If the number of failed components during time t is given by n f (t), then Q(t) = N f (t) N = N f (t) N o (t) + N f (t) From the definitions of reliability and unreliability, we obtain Q(t) = 1 R(t) If we write the reliability function as R(t) = 1 N f (t) N and differentiate R(t) with respect to time, we obtain = 1 N dn f (t) which can be rewritten as dn f (t) = N The derivative dn f (t)/ is simply the instantaneous rate at which components are failing. At time t, there are still N o (t) components operating correctly. Dividing dn f (t)/ by N o (t), we obtain z(t) = 1 dn f (t) N o (t) where z(t) is called the hazard function, hazard rate, or failure rate. The unit for the failure-rate function is failures per unit of time. The failure rate function can also be written in terms of the reliability function R(t) as z(t) = 1 dn f (t) = 1 N o (t) N o (t) Rearranging, we obtain the following differential equation. [ N = z(t)r(t) ] = R(t) The failure-rate function z(t) of electronic components exhibits a bathtub curve, shown in Fig. 5, comprising three distinct regions burn in, useful life, and wear out. It is typically assumed that the failure rate is constant during a component s useful life and is given by z(t) = λ. So, the differential equation is = λr(t) Solving the above equation gives us R(t) = e λt (1) The exponential relationship between reliability and time is known as the the exponential failure law. Thus, the probability of a system working correctly throughout a given period of time decreases exponentially with the length of this time period. The exponential failure law is extremely valuable for the analysis of electronic components, and is by far the most commonly used relationship between reliability and time. 5

6 Fig. 5: The bathtub form of the failure curve Mean time to failure. Mean time to failure (MTTF) is another way to quantify system reliability. MTTF gives the expected time that a system will operate before the first failure occurs. If we have N identical components operating at time t =, and we measure the time each system operates before failing, the average time is the MTTF. If each component i operates for a time t i before encountering the first failure, the MTTF is given by N i=1 MT T F = t i N We can calculate the MTTF by finding the expected value of the time of failure. From probability theory, we know that the expected value of a random variable X is E[X] = xf(x)dx where f(x) is the probability density function. From a reliability viewpoint, we are interested in the MTTF. So, MT T F = tf(t) where f(t) is the failure density function. The failure density function is f(t) = dq(t) = So, the MTTF can be written as MT T F = t Using integration by parts, we can show that MT T F = [ tr(t) + R(t) ] = R(t) If the reliability function obeys the exponential failure law, the MTTF is given by MT T F = e λt = 1 λ The above equation leads to a very simple result that the MTTF is the inverse of the system failure rate. Thus, a system with a constant failure rate of.1 failures per hour will have a mean time to failure of 1 hours. Finally, the reliability of a system equal to the MTTF for the failure exponential law is R(t) = e λ(1/λ) = e 1 =.37 (2) In other words, the system has only a 37% chance of operating correctly for an amount of time equal to its MTTF. 6

7 Mean time to repair. The mean time to repair (MTTR) is the average time taken to repair a failed system. Just as we describe the reliability of a system using its failure rate, we can quantify the repairability of a system using its repair rate µ. The MTTR is 1 µ. Mean time between failures. If a failed system can be repaired and made as good as new, then the mean time between failures (MTBF) is given by MT BF = MT T F + MT T R (3) The availability of the system is the probability that the system will be functioning correctly at any given time. In other words, it is the fraction of time for which a system is operational. Availabilty = Time system is operational Total time = MT T F MT T F + MT T R = MT T F MT BF (4) 7

Fault Tolerance. Dealing with Faults

Fault Tolerance. Dealing with Faults Fault Tolerance Real-time computing systems must be fault-tolerant: they must be able to continue operating despite the failure of a limited subset of their hardware or software. They must also allow graceful

More information

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Fault Tolerant Computing ECE 655

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Fault Tolerant Computing ECE 655 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE 655 Part 1 Introduction C. M. Krishna Fall 2006 ECE655/Krishna Part.1.1 Prerequisites Basic courses in

More information

Reliable Computing I

Reliable Computing I Instructor: Mehdi Tahoori Reliable Computing I Lecture 5: Reliability Evaluation INSTITUTE OF COMPUTER ENGINEERING (ITEC) CHAIR FOR DEPENDABLE NANO COMPUTING (CDNC) National Research Center of the Helmholtz

More information

Quantitative evaluation of Dependability

Quantitative evaluation of Dependability Quantitative evaluation of Dependability 1 Quantitative evaluation of Dependability Faults are the cause of errors and failures. Does the arrival time of faults fit a probability distribution? If so, what

More information

Concept of Reliability

Concept of Reliability Concept of Reliability Prepared By Dr. M. S. Memon Department of Industrial Engineering and Management Mehran University of Engineering and Technology Jamshoro, Sindh, Pakistan RELIABILITY Reliability

More information

Dependable Computer Systems

Dependable Computer Systems Dependable Computer Systems Part 3: Fault-Tolerance and Modelling Contents Reliability: Basic Mathematical Model Example Failure Rate Functions Probabilistic Structural-Based Modeling: Part 1 Maintenance

More information

Part 3: Fault-tolerance and Modeling

Part 3: Fault-tolerance and Modeling Part 3: Fault-tolerance and Modeling Course: Dependable Computer Systems 2012, Stefan Poledna, All rights reserved part 3, page 1 Goals of fault-tolerance modeling Design phase Designing and implementing

More information

Markov Models for Reliability Modeling

Markov Models for Reliability Modeling Markov Models for Reliability Modeling Prof. Naga Kandasamy ECE Department, Drexel University, Philadelphia, PA 904 Many complex systems cannot be easily modeled in a combinatorial fashion. The corresponding

More information

Combinational Techniques for Reliability Modeling

Combinational Techniques for Reliability Modeling Combinational Techniques for Reliability Modeling Prof. Naga Kandasamy, ECE Department Drexel University, Philadelphia, PA 19104. January 24, 2009 The following material is derived from these text books.

More information

Quantitative evaluation of Dependability

Quantitative evaluation of Dependability Quantitative evaluation of Dependability 1 Quantitative evaluation of Dependability Faults are the cause of errors and failures. Does the arrival time of faults fit a probability distribution? If so, what

More information

Fault-Tolerant Computing

Fault-Tolerant Computing Fault-Tolerant Computing Motivation, Background, and Tools Slide 1 About This Presentation This presentation has been prepared for the graduate course ECE 257A (Fault-Tolerant Computing) by Behrooz Parhami,

More information

Evaluation and Validation

Evaluation and Validation Evaluation and Validation Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: Alexandra Nolte, Gesine Marwedel, 2003 2011 06 18 These slides use Microsoft clip arts. Microsoft copyright restrictions

More information

Dependable Systems. ! Dependability Attributes. Dr. Peter Tröger. Sources:

Dependable Systems. ! Dependability Attributes. Dr. Peter Tröger. Sources: Dependable Systems! Dependability Attributes Dr. Peter Tröger! Sources:! J.C. Laprie. Dependability: Basic Concepts and Terminology Eusgeld, Irene et al.: Dependability Metrics. 4909. Springer Publishing,

More information

Chapter 5. System Reliability and Reliability Prediction.

Chapter 5. System Reliability and Reliability Prediction. Chapter 5. System Reliability and Reliability Prediction. Problems & Solutions. Problem 1. Estimate the individual part failure rate given a base failure rate of 0.0333 failure/hour, a quality factor of

More information

EE 445 / 850: Final Examination

EE 445 / 850: Final Examination EE 445 / 850: Final Examination Date and Time: 3 Dec 0, PM Room: HLTH B6 Exam Duration: 3 hours One formula sheet permitted. - Covers chapters - 5 problems each carrying 0 marks - Must show all calculations

More information

Reliability of Technical Systems

Reliability of Technical Systems Main Topics 1. Introduction, Key Terms, Framing the Problem 2. Reliability Parameters: Failure Rate, Failure Probability, etc. 3. Some Important Reliability Distributions 4. Component Reliability 5. Software

More information

CHAPTER 10 RELIABILITY

CHAPTER 10 RELIABILITY CHAPTER 10 RELIABILITY Failure rates Reliability Constant failure rate and exponential distribution System Reliability Components in series Components in parallel Combination system 1 Failure Rate Curve

More information

EECS150 - Digital Design Lecture 26 Faults and Error Correction. Recap

EECS150 - Digital Design Lecture 26 Faults and Error Correction. Recap EECS150 - Digital Design Lecture 26 Faults and Error Correction Nov. 26, 2013 Prof. Ronald Fearing Electrical Engineering and Computer Sciences University of California, Berkeley (slides courtesy of Prof.

More information

9. Reliability theory

9. Reliability theory Material based on original slides by Tuomas Tirronen ELEC-C720 Modeling and analysis of communication networks Contents Introduction Structural system models Reliability of structures of independent repairable

More information

EECS150 - Digital Design Lecture 26 - Faults and Error Correction. Types of Faults in Digital Designs

EECS150 - Digital Design Lecture 26 - Faults and Error Correction. Types of Faults in Digital Designs EECS150 - Digital Design Lecture 26 - Faults and Error Correction April 25, 2013 John Wawrzynek 1 Types of Faults in Digital Designs Design Bugs (function, timing, power draw) detected and corrected at

More information

IoT Network Quality/Reliability

IoT Network Quality/Reliability IoT Network Quality/Reliability IEEE PHM June 19, 2017 Byung K. Yi, Dr. of Sci. Executive V.P. & CTO, InterDigital Communications, Inc Louis Kerofsky, PhD. Director of Partner Development InterDigital

More information

ECE 3060 VLSI and Advanced Digital Design. Testing

ECE 3060 VLSI and Advanced Digital Design. Testing ECE 3060 VLSI and Advanced Digital Design Testing Outline Definitions Faults and Errors Fault models and definitions Fault Detection Undetectable Faults can be used in synthesis Fault Simulation Observability

More information

Chapter 2 Fault Modeling

Chapter 2 Fault Modeling Chapter 2 Fault Modeling Jin-Fu Li Advanced Reliable Systems (ARES) Lab. Department of Electrical Engineering National Central University Jungli, Taiwan Outline Why Model Faults? Fault Models (Faults)

More information

Availability. M(t) = 1 - e -mt

Availability. M(t) = 1 - e -mt Availability Availability - A(t) the probability that the system is operating correctly and is available to perform its functions at the instant of time t More general concept than reliability: failure

More information

Tradeoff between Reliability and Power Management

Tradeoff between Reliability and Power Management Tradeoff between Reliability and Power Management 9/1/2005 FORGE Lee, Kyoungwoo Contents 1. Overview of relationship between reliability and power management 2. Dakai Zhu, Rami Melhem and Daniel Moss e,

More information

Evaluation and Validation

Evaluation and Validation Evaluation and Validation Jian-Jia Chen (slides are based on Peter Marwedel) TU Dortmund, Informatik 12 Germany Springer, 2010 2018 年 01 月 17 日 These slides use Microsoft clip arts. Microsoft copyright

More information

VLSI Design I. Defect Mechanisms and Fault Models

VLSI Design I. Defect Mechanisms and Fault Models VLSI Design I Defect Mechanisms and Fault Models He s dead Jim... Overview Defects Fault models Goal: You know the difference between design and fabrication defects. You know sources of defects and you

More information

Signal Handling & Processing

Signal Handling & Processing Signal Handling & Processing The output signal of the primary transducer may be too small to drive indicating, recording or control elements directly. Or it may be in a form which is not convenient for

More information

Evaluation criteria for reliability in computer systems

Evaluation criteria for reliability in computer systems Journal of Electrical and Electronic Engineering 5; 3(-): 83-87 Published online February, 5 (http://www.sciencepublishinggroup.com/j/jeee) doi:.648/j.jeee.s.53.8 ISSN: 39-63 (Print); ISSN: 39-65 (Online)

More information

Software Reliability & Testing

Software Reliability & Testing Repairable systems Repairable system A reparable system is obtained by glueing individual non-repairable systems each around a single failure To describe this gluing process we need to review the concept

More information

We are IntechOpen, the first native scientific publisher of Open Access books. International authors and editors. Our authors are among the TOP 1%

We are IntechOpen, the first native scientific publisher of Open Access books. International authors and editors. Our authors are among the TOP 1% We are IntechOpen, the first native scientific publisher of Open Access books 3,350 108,000 1.7 M Open access books available International authors and editors Downloads Our authors are among the 151 Countries

More information

Mean fault time for estimation of average probability of failure on demand.

Mean fault time for estimation of average probability of failure on demand. Mean fault time for estimation of average probability of failure on demand. Isshi KOYATA a *, Koichi SUYAMA b, and Yoshinobu SATO c a The University of Marine Science and Technology Doctoral Course, Course

More information

Chapter 9 Part II Maintainability

Chapter 9 Part II Maintainability Chapter 9 Part II Maintainability 9.4 System Repair Time 9.5 Reliability Under Preventive Maintenance 9.6 State-Dependent Systems with Repair C. Ebeling, Intro to Reliability & Maintainability Chapter

More information

Practical Applications of Reliability Theory

Practical Applications of Reliability Theory Practical Applications of Reliability Theory George Dodson Spallation Neutron Source Managed by UT-Battelle Topics Reliability Terms and Definitions Reliability Modeling as a tool for evaluating system

More information

Quantitative Reliability Analysis

Quantitative Reliability Analysis Quantitative Reliability Analysis Moosung Jae May 4, 2015 System Reliability Analysis System reliability analysis is conducted in terms of probabilities The probabilities of events can be modelled as logical

More information

Safety and Reliability of Embedded Systems

Safety and Reliability of Embedded Systems (Sicherheit und Zuverlässigkeit eingebetteter Systeme) Fault Tree Analysis Mathematical Background and Algorithms Prof. Dr. Liggesmeyer, 0 Content Definitions of Terms Introduction to Combinatorics General

More information

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists 3,900 116,000 120M Open access books available International authors and editors Downloads Our

More information

ELE 491 Senior Design Project Proposal

ELE 491 Senior Design Project Proposal ELE 491 Senior Design Project Proposal These slides are loosely based on the book Design for Electrical and Computer Engineers by Ford and Coulston. I have used the sources referenced in the book freely

More information

A DESIGN DIVERSITY METRIC AND RELIABILITY ANALYSIS FOR REDUNDANT SYSTEMS. Subhasish Mitra, Nirmal R. Saxena and Edward J.

A DESIGN DIVERSITY METRIC AND RELIABILITY ANALYSIS FOR REDUNDANT SYSTEMS. Subhasish Mitra, Nirmal R. Saxena and Edward J. A DESIGN DIVERSITY METRIC AND RELIABILITY ANALYSIS FOR REDUNDANT SYSTEMS Subhasish Mitra Nirmal R. Saxena and Edward J. McCluskey Center for Reliable Computing (http://crc.stanford.edu) Departments of

More information

Why fault tolerant system?

Why fault tolerant system? Why fault tolerant system? Non Fault-Tolerant System Component 1 Component 2 Component N The reliability block diagram of a series systemeach element of the system must operate correctly for the system

More information

Reliability Engineering I

Reliability Engineering I Happiness is taking the reliability final exam. Reliability Engineering I ENM/MSC 565 Review for the Final Exam Vital Statistics What R&M concepts covered in the course When Monday April 29 from 4:30 6:00

More information

Fundamentals of Reliability Engineering and Applications

Fundamentals of Reliability Engineering and Applications Fundamentals of Reliability Engineering and Applications E. A. Elsayed elsayed@rci.rutgers.edu Rutgers University Quality Control & Reliability Engineering (QCRE) IIE February 21, 2012 1 Outline Part 1.

More information

Multi-State Availability Modeling in Practice

Multi-State Availability Modeling in Practice Multi-State Availability Modeling in Practice Kishor S. Trivedi, Dong Seong Kim, Xiaoyan Yin Depart ment of Electrical and Computer Engineering, Duke University, Durham, NC 27708 USA kst@ee.duke.edu, {dk76,

More information

DVClub Europe Formal fault analysis for ISO fault metrics on real world designs. Jörg Große Product Manager Functional Safety November 2016

DVClub Europe Formal fault analysis for ISO fault metrics on real world designs. Jörg Große Product Manager Functional Safety November 2016 DVClub Europe Formal fault analysis for ISO 26262 fault metrics on real world designs Jörg Große Product Manager Functional Safety November 2016 Page 1 11/27/2016 Introduction Functional Safety The objective

More information

Safety analysis and standards Analyse de sécurité et normes Sicherheitsanalyse und Normen

Safety analysis and standards Analyse de sécurité et normes Sicherheitsanalyse und Normen Industrial Automation Automation Industrielle Industrielle Automation 9.6 Safety analysis and standards Analyse de sécurité et normes Sicherheitsanalyse und Normen Prof Dr. Hubert Kirrmann & Dr. B. Eschermann

More information

SUPPLEMENT TO CHAPTER

SUPPLEMENT TO CHAPTER SUPPLEMENT TO CHAPTER 4 Reliability SUPPLEMENT OUTLINE Introduction, 2 Finding Probability of Functioning When Activated, 2 Finding Probability of Functioning for a Given Length of Time, 4 Key Terms, 10

More information

Reliability of Safety-Critical Systems Chapter 9. Average frequency of dangerous failures

Reliability of Safety-Critical Systems Chapter 9. Average frequency of dangerous failures Reliability of Safety-Critical Systems Chapter 9. Average frequency of dangerous failures Mary Ann Lundteigen and Marvin Rausand mary.a.lundteigen@ntnu.no &marvin.rausand@ntnu.no RAMS Group Department

More information

Considering Security Aspects in Safety Environment. Dipl.-Ing. Evzudin Ugljesa

Considering Security Aspects in Safety Environment. Dipl.-Ing. Evzudin Ugljesa Considering Security spects in Safety Environment Dipl.-ng. Evzudin Ugljesa Overview ntroduction Definitions of safety relevant parameters Description of the oo4-architecture Calculation of the FD-Value

More information

10 Introduction to Reliability

10 Introduction to Reliability 0 Introduction to Reliability 10 Introduction to Reliability The following notes are based on Volume 6: How to Analyze Reliability Data, by Wayne Nelson (1993), ASQC Press. When considering the reliability

More information

Introduction to Engineering Reliability

Introduction to Engineering Reliability Introduction to Engineering Reliability Robert C. Patev North Atlantic Division Regional Technical Specialist (978) 318-8394 Topics Reliability Basic Principles of Reliability Analysis Non-Probabilistic

More information

Engineering Risk Benefit Analysis

Engineering Risk Benefit Analysis Engineering Risk Benefit Analysis 1.155, 2.943, 3.577, 6.938, 10.816, 13.621, 16.862, 22.82, ESD.72, ESD.721 RPRA 3. Probability Distributions in RPRA George E. Apostolakis Massachusetts Institute of Technology

More information

Key Words: Lifetime Data Analysis (LDA), Probability Density Function (PDF), Goodness of fit methods, Chi-square method.

Key Words: Lifetime Data Analysis (LDA), Probability Density Function (PDF), Goodness of fit methods, Chi-square method. Reliability prediction based on lifetime data analysis methodology: The pump case study Abstract: The business case aims to demonstrate the lifetime data analysis methodology application from the historical

More information

Failure rate in the continuous sense. Figure. Exponential failure density functions [f(t)] 1

Failure rate in the continuous sense. Figure. Exponential failure density functions [f(t)] 1 Failure rate (Updated and Adapted from Notes by Dr. A.K. Nema) Part 1: Failure rate is the frequency with which an engineered system or component fails, expressed for example in failures per hour. It is

More information

B.H. Far

B.H. Far SENG 521 Software Reliability & Software Quality Chapter 8: System Reliability Department of Electrical & Computer Engineering, University of Calgary B.H. Far (far@ucalgary.ca) http://www.enel.ucalgary.ca/people/far/lectures/seng521

More information

A New Reliability Allocation Method Based on FTA and AHP for Nuclear Power Plant!

A New Reliability Allocation Method Based on FTA and AHP for Nuclear Power Plant! A New Reliability Allocation Method Based on FTA and AHP for Nuclear Power Plant! Presented by Rongxiang Hu Contributed by FDS Team Institute of Nuclear Energy Safety Technology (INEST) Chinese Academy

More information

Fault Detection probability evaluation approach in combinational circuits using test set generation method

Fault Detection probability evaluation approach in combinational circuits using test set generation method Fault Detection probability evaluation approach in combinational circuits using test set generation method Namita Arya 1, Amit Prakash Singh 2 University of Information and Communication Technology, Guru

More information

CHAPTER 3 MATHEMATICAL AND SIMULATION TOOLS FOR MANET ANALYSIS

CHAPTER 3 MATHEMATICAL AND SIMULATION TOOLS FOR MANET ANALYSIS 44 CHAPTER 3 MATHEMATICAL AND SIMULATION TOOLS FOR MANET ANALYSIS 3.1 INTRODUCTION MANET analysis is a multidimensional affair. Many tools of mathematics are used in the analysis. Among them, the prime

More information

Markov Reliability and Availability Analysis. Markov Processes

Markov Reliability and Availability Analysis. Markov Processes Markov Reliability and Availability Analysis Firma convenzione Politecnico Part II: Continuous di Milano e Time Veneranda Discrete Fabbrica State del Duomo di Milano Markov Processes Aula Magna Rettorato

More information

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists 3,500 108,000 1.7 M Open access books available International authors and editors Downloads Our

More information

Chapter 15. System Reliability Concepts and Methods. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University

Chapter 15. System Reliability Concepts and Methods. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University Chapter 15 System Reliability Concepts and Methods William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University Copyright 1998-2008 W. Q. Meeker and L. A. Escobar. Based on

More information

Objective Experiments Glossary of Statistical Terms

Objective Experiments Glossary of Statistical Terms Objective Experiments Glossary of Statistical Terms This glossary is intended to provide friendly definitions for terms used commonly in engineering and science. It is not intended to be absolutely precise.

More information

Design of Reliable Processors Based on Unreliable Devices Séminaire COMELEC

Design of Reliable Processors Based on Unreliable Devices Séminaire COMELEC Design of Reliable Processors Based on Unreliable Devices Séminaire COMELEC Lirida Alves de Barros Naviner Paris, 1 July 213 Outline Basics on reliability Technology Aspects Design for Reliability Conclusions

More information

Lecture 5 Fault Modeling

Lecture 5 Fault Modeling Lecture 5 Fault Modeling Why model faults? Some real defects in VLSI and PCB Common fault models Stuck-at faults Single stuck-at faults Fault equivalence Fault dominance and checkpoint theorem Classes

More information

Quiz #2 A Mighty Fine Review

Quiz #2 A Mighty Fine Review Quiz #2 A Mighty Fine Review February 27: A reliable adventure; a day like all days filled with those events that alter and change the course of history and you will be there! What is a Quiz #2? Three

More information

B.H. Far

B.H. Far SENG 637 Dependability, Reliability & Testing of Software Systems Chapter 3: System Reliability Department of Electrical & Computer Engineering, University of Calgary B.H. Far (far@ucalgary.ca) http://www.enel.ucalgary.ca/people/far/lectures/seng637/

More information

ANALYSIS FOR A PARALLEL REPAIRABLE SYSTEM WITH DIFFERENT FAILURE MODES

ANALYSIS FOR A PARALLEL REPAIRABLE SYSTEM WITH DIFFERENT FAILURE MODES Journal of Reliability and Statistical Studies; ISSN (Print): 0974-8024, (Online):2229-5666, Vol. 5, Issue 1 (2012): 95-106 ANALYSIS FOR A PARALLEL REPAIRABLE SYSTEM WITH DIFFERENT FAILURE MODES M. A.

More information

Monte Carlo Simulation for Reliability and Availability analyses

Monte Carlo Simulation for Reliability and Availability analyses Monte Carlo Simulation for Reliability and Availability analyses Monte Carlo Simulation: Why? 2 Computation of system reliability and availability for industrial systems characterized by: many components

More information

Maintenance free operating period an alternative measure to MTBF and failure rate for specifying reliability?

Maintenance free operating period an alternative measure to MTBF and failure rate for specifying reliability? Reliability Engineering and System Safety 64 (1999) 127 131 Technical note Maintenance free operating period an alternative measure to MTBF and failure rate for specifying reliability? U. Dinesh Kumar

More information

FAULT TOLERANT DESIGN: AN INTRODUCTION

FAULT TOLERANT DESIGN: AN INTRODUCTION FAULT TOLERANT DESIGN: AN INTRODUCTION ELENA DUBROVA Department of Microelectronics and Information Technology Royal Institute of Technology Stockholm, Sweden Kluwer Academic Publishers Boston/Dordrecht/London

More information

Constant speed drive time between overhaul extension: a case study from Italian Air Force Fleet.

Constant speed drive time between overhaul extension: a case study from Italian Air Force Fleet. Constant speed drive time between overhaul extension: a case study from Italian Air Force Fleet. Capt. M. Amura¹, Capt. F. De Trane², Maj. L. Aiello¹ Italian Air Force Centro Sperimentale Volo - Airport

More information

Chapter 2. Planning Criteria. Turaj Amraee. Fall 2012 K.N.Toosi University of Technology

Chapter 2. Planning Criteria. Turaj Amraee. Fall 2012 K.N.Toosi University of Technology Chapter 2 Planning Criteria By Turaj Amraee Fall 2012 K.N.Toosi University of Technology Outline 1- Introduction 2- System Adequacy and Security 3- Planning Purposes 4- Planning Standards 5- Reliability

More information

Chapter 6. a. Open Circuit. Only if both resistors fail open-circuit, i.e. they are in parallel.

Chapter 6. a. Open Circuit. Only if both resistors fail open-circuit, i.e. they are in parallel. Chapter 6 1. a. Section 6.1. b. Section 6.3, see also Section 6.2. c. Predictions based on most published sources of reliability data tend to underestimate the reliability that is achievable, given that

More information

Rel: Estimating Digital System Reliability

Rel: Estimating Digital System Reliability Rel 1 Rel: Estimating Digital System Reliability Qualitatively, the reliability of a digital system is the likelihood that it works correctly when you need it. Marketing and sa les people like to say that

More information

DIAGNOSIS OF FAULT IN TESTABLE REVERSIBLE SEQUENTIAL CIRCUITS USING MULTIPLEXER CONSERVATIVE QUANTUM DOT CELLULAR AUTOMATA

DIAGNOSIS OF FAULT IN TESTABLE REVERSIBLE SEQUENTIAL CIRCUITS USING MULTIPLEXER CONSERVATIVE QUANTUM DOT CELLULAR AUTOMATA DIAGNOSIS OF FAULT IN TESTABLE REVERSIBLE SEQUENTIAL CIRCUITS USING MULTIPLEXER CONSERVATIVE QUANTUM DOT CELLULAR AUTOMATA Nikitha.S.Paulin 1, S.Abirami 2, Prabu Venkateswaran.S 3 1, 2 PG students / VLSI

More information

Reliability and Availability Simulation. Krige Visser, Professor, University of Pretoria, South Africa

Reliability and Availability Simulation. Krige Visser, Professor, University of Pretoria, South Africa Reliability and Availability Simulation Krige Visser, Professor, University of Pretoria, South Africa Content BACKGROUND DEFINITIONS SINGLE COMPONENTS MULTI-COMPONENT SYSTEMS AVAILABILITY SIMULATION CONCLUSION

More information

Optimal Time and Random Inspection Policies for Computer Systems

Optimal Time and Random Inspection Policies for Computer Systems Appl. Math. Inf. Sci. 8, No. 1L, 413-417 214) 413 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/1.12785/amis/81l51 Optimal Time and Random Inspection Policies for

More information

Chapter 8. Calculation of PFD using Markov

Chapter 8. Calculation of PFD using Markov Chapter 8. Calculation of PFD using Markov Mary Ann Lundteigen Marvin Rausand RAMS Group Department of Mechanical and Industrial Engineering NTNU (Version 0.1) Lundteigen& Rausand Chapter 8.Calculation

More information

Looking at a two binary digit sum shows what we need to extend addition to multiple binary digits.

Looking at a two binary digit sum shows what we need to extend addition to multiple binary digits. A Full Adder The half-adder is extremely useful until you want to add more that one binary digit quantities. The slow way to develop a two binary digit adders would be to make a truth table and reduce

More information

Stochastic Renewal Processes in Structural Reliability Analysis:

Stochastic Renewal Processes in Structural Reliability Analysis: Stochastic Renewal Processes in Structural Reliability Analysis: An Overview of Models and Applications Professor and Industrial Research Chair Department of Civil and Environmental Engineering University

More information

Software Reliability.... Yes sometimes the system fails...

Software Reliability.... Yes sometimes the system fails... Software Reliability... Yes sometimes the system fails... Motivations Software is an essential component of many safety-critical systems These systems depend on the reliable operation of software components

More information

Reliability, Redundancy, and Resiliency

Reliability, Redundancy, and Resiliency Lecture #11 October 3, 2017 Review of probability theory Component reliability Confidence Redundancy Reliability diagrams Intercorrelated Failures System resiliency Resiliency in fixed fleets 1 2017 David

More information

Comparing the Effects of Intermittent and Transient Hardware Faults on Programs

Comparing the Effects of Intermittent and Transient Hardware Faults on Programs Comparing the Effects of Intermittent and Transient Hardware Faults on Programs Jiesheng Wei, Layali Rashid, Karthik Pattabiraman and Sathish Gopalakrishnan Department of Electrical and Computer Engineering,

More information

B.H. Far

B.H. Far SENG 521 Software Reliability & Software Quality Chapter 6: Software Reliability Models Department of Electrical & Computer Engineering, University of Calgary B.H. Far (far@ucalgary.ca) http://www.enel.ucalgary.ca/people/far/lectures/seng521

More information

Module No. # 03 Lecture No. # 11 Probabilistic risk analysis

Module No. # 03 Lecture No. # 11 Probabilistic risk analysis Health, Safety and Environmental Management in Petroleum and offshore Engineering Prof. Dr. Srinivasan Chandrasekaran Department of Ocean Engineering Indian Institute of Technology, Madras Module No. #

More information

Dictionary-Less Defect Diagnosis as Surrogate Single Stuck-At Faults

Dictionary-Less Defect Diagnosis as Surrogate Single Stuck-At Faults Dictionary-Less Defect Diagnosis as Surrogate Single Stuck-At Faults Chidambaram Alagappan and Vishwani D. Agrawal Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849,

More information

Failure detectors Introduction CHAPTER

Failure detectors Introduction CHAPTER CHAPTER 15 Failure detectors 15.1 Introduction This chapter deals with the design of fault-tolerant distributed systems. It is widely known that the design and verification of fault-tolerent distributed

More information

An approach to the design of highly reliable alld fail-safe digital systems*

An approach to the design of highly reliable alld fail-safe digital systems* An approach to the design of highly reliable alld fail-safe digital systems* by HEKRY Y. H. CHUANG University of Pittsburgh Pittsburgh, Pennsylvania and SANTANU DAS North Electric Com pan y Delaware, Ohio

More information

Page 1. Outline. Experimental Methodology. Modeling. ECE 254 / CPS 225 Fault Tolerant and Testable Computing Systems. Modeling and Evaluation

Page 1. Outline. Experimental Methodology. Modeling. ECE 254 / CPS 225 Fault Tolerant and Testable Computing Systems. Modeling and Evaluation Outline Fault Tolerant and Testable Computing Systems Modeling and Evaluation Copyright 2011 Daniel J. Sorin Duke University Experimental Methodology and Modeling Random Variables Probabilistic Models

More information

Safety Analysis Using Petri Nets

Safety Analysis Using Petri Nets Safety Analysis Using Petri Nets IEEE Transactions on Software Engineering (1987) Nancy G. Leveson and Janice L. Stolzy Park, Ji Hun 2010.06.21 Introduction Background Petri net Time petri net Contents

More information

Semiconductor Reliability

Semiconductor Reliability Semiconductor Reliability. Semiconductor Device Failure Region Below figure shows the time-dependent change in the semiconductor device failure rate. Discussions on failure rate change in time often classify

More information

Reliability, Redundancy, and Resiliency

Reliability, Redundancy, and Resiliency Review of probability theory Component reliability Confidence Redundancy Reliability diagrams Intercorrelated failures System resiliency Resiliency in fixed fleets Perspective on the term project 1 2010

More information

Chapter 2. Theory of Errors and Basic Adjustment Principles

Chapter 2. Theory of Errors and Basic Adjustment Principles Chapter 2 Theory of Errors and Basic Adjustment Principles 2.1. Introduction Measurement is an observation carried out to determine the values of quantities (distances, angles, directions, temperature

More information

In-Flight Engine Diagnostics and Prognostics Using A Stochastic-Neuro-Fuzzy Inference System

In-Flight Engine Diagnostics and Prognostics Using A Stochastic-Neuro-Fuzzy Inference System In-Flight Engine Diagnostics and Prognostics Using A Stochastic-Neuro-Fuzzy Inference System Dan M. Ghiocel & Joshua Altmann STI Technologies, Rochester, New York, USA Keywords: reliability, stochastic

More information

CHAPTER 3 ANALYSIS OF RELIABILITY AND PROBABILITY MEASURES

CHAPTER 3 ANALYSIS OF RELIABILITY AND PROBABILITY MEASURES 27 CHAPTER 3 ANALYSIS OF RELIABILITY AND PROBABILITY MEASURES 3.1 INTRODUCTION The express purpose of this research is to assimilate reliability and its associated probabilistic variables into the Unit

More information

Non-observable failure progression

Non-observable failure progression Non-observable failure progression 1 Age based maintenance policies We consider a situation where we are not able to observe failure progression, or where it is impractical to observe failure progression:

More information

6. STRUCTURAL SAFETY

6. STRUCTURAL SAFETY 6.1 RELIABILITY 6. STRUCTURAL SAFETY Igor Kokcharov Dependability is the ability of a structure to maintain its working parameters in given ranges for a stated period of time. Dependability is a collective

More information

R E A D : E S S E N T I A L S C R U M : A P R A C T I C A L G U I D E T O T H E M O S T P O P U L A R A G I L E P R O C E S S. C H.

R E A D : E S S E N T I A L S C R U M : A P R A C T I C A L G U I D E T O T H E M O S T P O P U L A R A G I L E P R O C E S S. C H. R E A D : E S S E N T I A L S C R U M : A P R A C T I C A L G U I D E T O T H E M O S T P O P U L A R A G I L E P R O C E S S. C H. 5 S O F T W A R E E N G I N E E R I N G B Y S O M M E R V I L L E S E

More information

Page 1. Outline. Modeling. Experimental Methodology. ECE 254 / CPS 225 Fault Tolerant and Testable Computing Systems. Modeling and Evaluation

Page 1. Outline. Modeling. Experimental Methodology. ECE 254 / CPS 225 Fault Tolerant and Testable Computing Systems. Modeling and Evaluation Page 1 Outline ECE 254 / CPS 225 Fault Tolerant and Testable Computing Systems Modeling and Evaluation Copyright 2004 Daniel J. Sorin Duke University Experimental Methodology and Modeling Modeling Random

More information

Time-varying failure rate for system reliability analysis in large-scale railway risk assessment simulation

Time-varying failure rate for system reliability analysis in large-scale railway risk assessment simulation Time-varying failure rate for system reliability analysis in large-scale railway risk assessment simulation H. Zhang, E. Cutright & T. Giras Center of Rail Safety-Critical Excellence, University of Virginia,

More information

Advanced Testing. EE5375 ADD II Prof. MacDonald

Advanced Testing. EE5375 ADD II Prof. MacDonald Advanced Testing EE5375 ADD II Prof. MacDonald Functional Testing l Original testing method l Run chip from reset l Tester emulates the outside world l Chip runs functionally with internally generated

More information