Combinational Techniques for Reliability Modeling
|
|
- Molly Weaver
- 5 years ago
- Views:
Transcription
1 Combinational Techniques for Reliability Modeling Prof. Naga Kandasamy, ECE Department Drexel University, Philadelphia, PA January 24, 2009 The following material is derived from these text books. D. P. Siewiorek and R. S. Swarz, Reliable Computer Systems, 3rd Edition, A. K. Peters, Natick, Massachusetts, M. L. Shooman, Reliability of Computer Systems and Networks, John Wiley & Sons, When designing a system, it is important to be able to predict the reliability of the final system containing many components. The two most common methods of estimating the reliability of complex systems are combinational modeling and Markov state modeling. 1 Canonical Structures We will first consider some canonical structures and discuss how their reliability can be quantified using combinational techniques. 1.1 Series and Parallel Systems In a series combination of components, the failure of any of the components will result in the failure of the overall system. If a system contains N components arranged in series, and if the failure rates of the components are independent, then the system s failure rate λ is given by where λ i is the failure rate of the i th component. λ = N i=1 The reliability of the series arrangement may also be expressed in terms of the reliability of individual components. If R i (t) is the reliability of the i th component in the system, the overall system reliability R(t) is given by R(t) = R 1 (t)r 2 (t)... R N (t) λ i which may be written as N R(t) = R i (t) i=1 The reliability of a parallel combination of components is given by N [ R(t) = 1 1 Ri (t) ] i=1 1
2 Series combination of components 1 2 N Parallel combination of components N Fig. 1: Series and parallel combination of system components. 1.2 Series-Parallel Combinations Consider the system shown in Fig. 2. If the reliability of module M1 is 0.99, that of M2, M3, and M4 is 0.80, M5 and M6 is 0.90, M7 and M8 is 0.95, and M9 is Then, the reliability of the parallel combination of modules 2, 3, and 4 is given by R(t) = 1 [1 0.8] 3 = The series combination of M5 and M7 (and M6 and M8) has reliability , or 0.855, and the parallel combination of these two paths is R(t) = 1 [ ] 2 = The system can be simplified as shown in Fig. 2. The series combination of M10 and M11 has reliability 0.971, and this in combination in parallel with M9 gives us R(t) = 1 [ ][1 0.94] = Nonseries/Nonparallel Models Sometimes, a success diagram is used to describe the operational mode of a system. A success diagram may not be directly reducible by the application of the series/parallel formulas. In such cases, one can obtain a lower bound on system reliability in terms of minimal cut sets of the system. We can define a cut set of a graph as a set of branches which interrupts all connections between the input and the output when removed from the graph. The minimum cut sets are a group of distinct cut sets containing the minimum number of terms. All system failures can be represented by the removal of at least one minimal cut set from the graph. The probability of system failure is, therefore, given by the probability that at least one minimal cut set fails. Let Q cuti denote the probability that a cut set fails. So, the lower bound on system reliability is given by R sys Π(1 Q cuti ) 2
3 Series-parallel combination of components M2 M5 M7 M1 M3 M6 M8 M4 2 M9 M1 M10 M11 M9 Fig. 2: Series/parallel combination of components. M6 M1 M2 M3 M4 M5 M6 M1 M2 M3 M4 M5 Fig. 3: A success diagram of a system. The minimum cut sets in Fig. 3 are {M1, M5}, {M1, M3}, M4, {M3, M6}, and {M2, M5, M6}. Assuming all modules are identical reliability R sys R(1 (1 R 2 )) 3 (1 (1 R) 3 ) 3
4 Fig. 4: The reliability of a NMR system comprising 2n + 1 modules as a function of time. A tie set is a group of paths (or branches) which when traversed, forms a connection between the input and the output. A minimal tie set is that containing a minimum number of elements. If no node is traversed more than once in tracing out the tie set, then the tie set is minimal. If R pathi denotes the serial reliability of path i, then R sys 1 Π(1 R pathi ) The minimum tie sets in Fig. 3 are {M1, M2, M3, M4}, {M1, M6, M4}, {M5, M3, M4}. The system reliability is given by R s R 4 + 2R 3 2 Masking Redundancy using M-out-of-N structures Another simple structure that serves as a useful model for many reliability problems is an M-out-of-N structure. Such a model represents a system of N components in which M out of N components must be good for the system to succeed. Thus, success of exactly M-out-of-N identical and independent components is given by ( ) N p M (1 p) N M M Here, p denotes the (identical) reliability of each component. For a constant failure rate of λ and using the exponential failure law p = e λt for each item, the success of at least M-out-of-N items is given by R(t) = N i=m ( ) N e iλt (1 e λt ) N i i In general, N is an odd integer. However, as we shall soon see, if we can diagnose and lock out faulty modules, it is feasible to let N be an even integer. If we let N = 2n + 1, n 1, then, in a simple masking scheme, we need a majority of the modules to work correctly, that is N = n + 1. Fig. 4 shows the reliability function for various values of n assuming p = e λt. The figure shows that NMR is superior to a single unit in the high-reliability region, specifically NMR is superior to the single unit for λt < Therefore, when designing any system, we must carefully evaluate the reliability values obtained over the range 0 < t < maximum mission time for various values of n and λ. 4
5 2.1 Triple Modular Redundancy A special case of an M-out-of-N structure is triple modular redundancy or TMR. The basic TMR structure, shown in Fig. 5, consists of three parallel modules where each module is provided with the same input. The outputs of the three modules are compared by the voter, which gives the majority opinion as the system output. If all three modules are operating properly, all outputs agree, and thus the system output is correct. However, if one module has failed so that it has produced an incorrect output, the voter chooses the output of the two good modules as the system output because they both agree, and thus the system output is correct. If two modules have failed, the voter agrees with the majority (the two that have failed), and thus the system output is incorrect. A TMR system will function correctly provided that at least two modules are operational, and assuming that the voter does not fail, that is R v = 1. Thus, the probability of the system working correctly is given by R = R v (( ) 3 p 3 (1 p) = 3p 2 2p 3 = p 2 (3 2p) ( ) 3 p 2 (1 p) 1) 2 This is, of course, the reliability expression for a two-out-of-three system. If we assume a constant-failure rate λ, then each module/component has a reliability p = e λt, and substituting in the above equation yields, R(t) = 3e 2λt 2e 3λt We can compute the MTTF for this system by integrating the reliability function as MT T F = 0 = 3 2λ 2 3λ = 5 6λ 3e 2λt 2e 3λt This TMR system can be called a 3-2 system because the system succeeds if 3 or 2 units are good. Thus, when a second failure occurs, the voter does not know which of the components have failed and cannot determine which is the good component. In some cases, additional information is available by such means as observation (from a human operator or a diagnostic system) of the remaining two units after the first failure occurs. If one of the two remaining units Module 1 Module 2 Voter Module 3 Fig. 5: The basic triple-modular redundancy (TMR) scheme. 5
6 Fig. 6: Comparison of the reliability functions of a single system/component/module, a TMR 3-2 system, and a TMR system in the high-reliability region. has behaved erratically, it would be locked out (i.e., disconnected) and the other unit would be assumed to operate properly. In such a case, the TMR system becomes a 1-out-of-3 system with a voter, which can be called a TMR system. The reliability equations then become, R(t) = 3p 2 2p 3 + 3p(1 p) 2 = e 3λt 3e 2λt + 3e λt and the MTTF calculation yields MT T F = 1 3λ 3 2λ + 3 λ = 11 6λ Fig. 6 shows the superiority of the TMR systems in the high-reliability region. Note that the TMR 3-2 system reliability decreases to about the same value as a single component when λt increases from about 0.3 to Thus, TMR is of most use for λt < 0.2, whereas TMR is of greater benefit and provides a considerably higher reliability for λt < System Versus Component Redundancy Suppose we desire to use NMR for a digital system composed of three components A, B, and C, we must answer the following question: Do we use NMR on three full systems (A 1 B 1 C 1, A 2 B 2 C 2 and A 3 B 3 C 3 ) with one voter, or do we use voting at a lower or component level, with one voter comparing A 1 A 2 A 3, a second comparing B 1 B 2 B 3 and a third voter comparing C 1 C 2 C 3? In general, two redundancy techniques that are easily classified and studied are component and system redundancy. We can, in fact, prove that component redundancy is superior to system redundancy in a wide variety of situations. Consider the three system configurations shown in Fig. 7. The reliability of the simplex system in Fig. 7(a) is given by R a (t) = R M1 (t) R M2 (t) = p 2 6
7 A simplex (non-redundant) system comprising two components M 1 M 2 System redundancy (a) Component redundancy M 1 M 2 M 1 M 2 M 1 M 2 M 1 M 2 (b) (c) Fig. 7: Comparison of three different systems; (a) A simplex (or non-redundant) system, (b) system redundancy, and (c) component redundancy. where the components M 1 and M 2 are independent, but have identical reliability R(t) = p. The reliability expression for Fig. 7(b), comprising two simplex units connected in parallel, is given by ( ) 2 R b (t) = Ra(t) 2 + R a (t)(1 R a (t)) (1) 1 = p 2 (2 p 2 ) For Fig. 7(c), we combine each component pair in parallel to obtain To compare Equations 1 and 2, we use the ratio R c (t) = [p 2 + 2p(1 p)] 2 (2) = p 2 (2 p) 2 R c (t) R b (t) = p2 (2 p) 2 p 2 (2 p 2 ) = (2 p)2 (2 p 2 ) (3) Some algebraic manipulation yields R c (t) 2(1 p)2 = 1 + R b (t) 2 p 2 Since 0 < p < 1, the term 2 p 2 > 0, and R c (t)/r b (t) 1. Therefore, component redundancy is superior to system redundancy for this example. (They are, of course, equal at the extremes when p = 0 or p = 1). We can extend the above analysis to m components, in which case Equation 3 becomes R c (t) (2 p)m = R b (t) (2 p m ) It can be shown by induction that this ratio is always greater than 1 and that component redundancy is superior regardless of the number of components. The superiority of component redundancy over system redundancy also holds true for non-identical components. (4) 7
8 Fig. 8: Redundancy comparison: (a) component redundancy and (b) system redundancy A simpler proof of the foregoing principle can be formulated by considering tie sets. In Fig. 7(b), the tie sets are M 1 M 2 and M 1M 2, whereas in Fig. 7(c), the tie sets are M 1 M 2, M 1 M 2, M 1M 2, and M 1M 2. Since the system reliability is the probability of the union of tie sets, and since the redundant system in Fig. 7(c) has the same two tie sets as Fig. 7(b) as well as two additional ones, the component-redundancy configuration has a greater reliability than the configuration with two simplex units connected in parallel. This tie-set proof can be extended to the general case. The reliability of system and component redundancy are compared graphically in Fig Voter Design Issues This section considers various issues related to voter design including relaxing our assumption of perfect voters. Returning to the TMR reliability equation, if the unit reliability is denoted as p c and the voter reliability by 8
9 Fig. 9: A TMR system with component level voting and redundant voters. p v, then the reliability equation must be modified to yield R sys = p v p 2 c(3 2p c ) To achieve an overall gain in reliability, the reliability of the TMR scheme with an imperfect voter should be greater than the reliability of a single component. That is, R sys > p c R sys p c > 1 This requires that R sys p c = p v p c (3 2p c ) > 1 So, the minimum value of p v can be obtained by setting p v p c (3 2p c ) = 1, or 1 p v = (3 2p c )ṗ c 3.1 Use of Redundant Voters In certain cases, it may not be possible to build individual voters with a high enough reliability, to meet the requirements of an ultra-reliable system. Since the voter reliability multiples the N-modular redundancy reliability R(t) = p v ( N ( ) N p i (1 p) N i) i i=m the system reliability can never exceed that of the voter. Moreover, if voting is done at the component level, the situation is even worse, since the reliability function will be multiplied by p n v (the voters are arranged in series). This can significantly lower the reliability of the NMR scheme. Therefore, in such cases, we must consider the possibility of using redundant voters. Fig. 9 shows a TMR system with component-level voting using redundant voters. The figure shows a system composed of n different components (or sub-systems). Each sub-system is organized as a TMR structure with redundant voters. In the last stage of voting, only a single voter V n can be used. Note also that errors do not propagate more than one stage. If the sub-systems A 1, B 1, and C 1 are working properly, the outputs of the replicated voters will agree. If one module, say B 1 fails, then the three voters V 1, V 1, and V 1 will agree with the majority (A and C). If a voter, say V 1, fails, then it will provide an incorrect input to B 2 leading to an erroneous output. However, the next stage of voters will have correct inputs from A 2 and C 2, and the erroneous output from B 2 will be masked. 3.2 Exact versus Inexact Voting Voting on the outputs can be performed in exact fashion (bit-wise voting), or in an inexact or approximate fashion. Please read the Harper and Lala paper, available from the course web site, for more information. 9
10 Fig. 10: Synchronization in the COMTRAC railroad traffic control computer. 3.3 Synchronization Issues Synchronizing the outputs of the replicated units is another concern. The problem of synchronization is often solved using a common (fault tolerant) clock. Another method of synchronization is used by the COMTRAC railroad traffic control computer shown in Fig Synchronization is maintained at the program task level. The system controller (DSC) ensures that both processors are performing the same calculation. When both computers have nished the calculation, the DSC compares the two results. If a mismatch occurs, the controller forces both processors to run identical test programs. The test program exercises the entire processor during the course of calculating a single constant. 4 Dynamic Redundancy One of the drawbacks of an NMR scheme is that the fault masking ability deteriorates as more copies fail. In its pure form, fault masking neutralizes the effects of failed units without notification of their failures. Therefore, the faulty modules can eventually outvote the good modules. However, an NMR system could continue to function longer if the known bad modules could be discounted in the vote. Two methods of reconfiguration based on NMR are: (1) Hybrid redundancy where failed modules are replaced with good spares and (2) Dynamic modification of the voting process or adaptive voting. 4.1 Hybrid Redundancy Fig. 11 illustrates the basic concept in which a core group of N identical modules is used at any one time, and their outputs voted upon to produce the system output. When a disagreement is detected, the module(s) in the minority are assumed to have failed and are replaced by an equivalent number of spare modules. Initially, the system contains a total of (N + S) modules. As long as the number of failed modules does not exceed 1 Ihara et al., Fault-Tolerant Computer System with Three Symmetric Computers, Proceedings of the IEEE, pp , October
11 Fig. 11: Organization of a system with hybrid redundancy. Fig. 12: The quad-redundant flight control system of the space shuttle. t = (N/2) in the core group before reconfiguration can take place, the system in Fig. 11 can tolerate the failure of t + S of its modules. 4.2 Adaptive voting with lockout When N-modular redundancy is used and N is greater than three, additional considerations emerge. For example, consider the quad-redundant system shown in Fig. 12. This is the architecture used for the Space Shuttle s primary flight control system (FCS). Let us focus on the first four computers in the FCS. Here, we have an example of a 4-level voting with lockout. Let us assume that unit B fails permanently. There is no reason to leave B in the system if we have a way to remove it from the voting process. The rationale here is that a second failure, say that of unit C, can lead to a situation where the two failed units agree and the two good elements agree, leading to a stand-off. Clearly, this can be avoided, if, after the failure of B, it is locked out, and the system reconfigures to become a TMR system. 11
12 Fig. 13: Reliability comparison of the various voting systems. In the case of adaptive voting with lockout in Fig. 12, after a first failure, the system assumes a TMR configuration. After the second failure, the system assumes a duplex configuration where the outputs of the functioning units are simply compared to detect faults. If the comparison fails, the system can shut off and the backup system takes over. If we assume that the lockout works perfectly, the system in Fig. 12 will succeed if there are 0, 1, or 2 failures. If the reliability of each module is p, the reliability of the overall system is given by R(2 of 4) = p 4 + (4p 3 4p 4 ) + (6p 2 12p 3 + 6p 4 ) (5) = 3p 4 8p 3 + 6p 2 The reliability of the system will be even higher if we can detect and isolate a third failure, that is, after the compare fails in the duplex system, the two units are taken o-line, and through a series of tests, the faulty unit is identified. In this case, we can start with Equation 5 and add the probability that the system will function properly if a single units works, to obtain R(1 of 4) = p 4 + 4p 3 6p 2 + 4p Fig. 13 plots the above Equations for various values of p (reliability of a single unit). Note that the TMR scheme is poorer than a single element for p < 0.5 but better that a single element for p >
Markov Models for Reliability Modeling
Markov Models for Reliability Modeling Prof. Naga Kandasamy ECE Department, Drexel University, Philadelphia, PA 904 Many complex systems cannot be easily modeled in a combinatorial fashion. The corresponding
More informationReliable Computing I
Instructor: Mehdi Tahoori Reliable Computing I Lecture 5: Reliability Evaluation INSTITUTE OF COMPUTER ENGINEERING (ITEC) CHAIR FOR DEPENDABLE NANO COMPUTING (CDNC) National Research Center of the Helmholtz
More informationFault Tolerance. Dealing with Faults
Fault Tolerance Real-time computing systems must be fault-tolerant: they must be able to continue operating despite the failure of a limited subset of their hardware or software. They must also allow graceful
More informationQuantitative evaluation of Dependability
Quantitative evaluation of Dependability 1 Quantitative evaluation of Dependability Faults are the cause of errors and failures. Does the arrival time of faults fit a probability distribution? If so, what
More informationTerminology and Concepts
Terminology and Concepts Prof. Naga Kandasamy 1 Goals of Fault Tolerance Dependability is an umbrella term encompassing the concepts of reliability, availability, performability, safety, and testability.
More informationDependable Computer Systems
Dependable Computer Systems Part 3: Fault-Tolerance and Modelling Contents Reliability: Basic Mathematical Model Example Failure Rate Functions Probabilistic Structural-Based Modeling: Part 1 Maintenance
More informationQuantitative evaluation of Dependability
Quantitative evaluation of Dependability 1 Quantitative evaluation of Dependability Faults are the cause of errors and failures. Does the arrival time of faults fit a probability distribution? If so, what
More informationDegradable Agreement in the Presence of. Byzantine Faults. Nitin H. Vaidya. Technical Report #
Degradable Agreement in the Presence of Byzantine Faults Nitin H. Vaidya Technical Report # 92-020 Abstract Consider a system consisting of a sender that wants to send a value to certain receivers. Byzantine
More informationA DESIGN DIVERSITY METRIC AND RELIABILITY ANALYSIS FOR REDUNDANT SYSTEMS. Subhasish Mitra, Nirmal R. Saxena and Edward J.
A DESIGN DIVERSITY METRIC AND RELIABILITY ANALYSIS FOR REDUNDANT SYSTEMS Subhasish Mitra Nirmal R. Saxena and Edward J. McCluskey Center for Reliable Computing (http://crc.stanford.edu) Departments of
More informationUNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Fault Tolerant Computing ECE 655
UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE 655 Part 1 Introduction C. M. Krishna Fall 2006 ECE655/Krishna Part.1.1 Prerequisites Basic courses in
More informationCHAPTER 10 RELIABILITY
CHAPTER 10 RELIABILITY Failure rates Reliability Constant failure rate and exponential distribution System Reliability Components in series Components in parallel Combination system 1 Failure Rate Curve
More informationPart 3: Fault-tolerance and Modeling
Part 3: Fault-tolerance and Modeling Course: Dependable Computer Systems 2012, Stefan Poledna, All rights reserved part 3, page 1 Goals of fault-tolerance modeling Design phase Designing and implementing
More informationChapter 5. System Reliability and Reliability Prediction.
Chapter 5. System Reliability and Reliability Prediction. Problems & Solutions. Problem 1. Estimate the individual part failure rate given a base failure rate of 0.0333 failure/hour, a quality factor of
More informationFault-Tolerant Computing
Fault-Tolerant Computing Motivation, Background, and Tools Slide 1 About This Presentation This presentation has been prepared for the graduate course ECE 257A (Fault-Tolerant Computing) by Behrooz Parhami,
More informationBasic Elements of System Reliability
Chapter 2 Basic Elements of System Reliability It is difficult to get where you want to go if you don t know where that is. Abstract This chapter presents the basic principles and functional relationships
More information2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51
2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each
More informationRadiation Induced Multi bit Upsets in Xilinx SRAM Based FPGAs
LA-UR-05-6725 Radiation Induced Multi bit Upsets in Xilinx SRAM Based FPGAs Heather Quinn, Paul Graham, Jim Krone, and Michael Caffrey Los Alamos National Laboratory Sana Rezgui and Carl Carmichael Xilinx
More informationFailure detectors Introduction CHAPTER
CHAPTER 15 Failure detectors 15.1 Introduction This chapter deals with the design of fault-tolerant distributed systems. It is widely known that the design and verification of fault-tolerent distributed
More informationTradeoff between Reliability and Power Management
Tradeoff between Reliability and Power Management 9/1/2005 FORGE Lee, Kyoungwoo Contents 1. Overview of relationship between reliability and power management 2. Dakai Zhu, Rami Melhem and Daniel Moss e,
More informationTECHNIQUES FOR ESTIMATION OF DESIGN DIVERSITY FOR COMBINATIONAL LOGIC CIRCUITS
TECHNIQUES FOR ESTIMATION OF DESIGN DIVERSITY FOR COMBINATIONAL LOGIC CIRCUITS Subhasish Mitra, Nirmal R. Saxena and Edward J. McCluskey Center for Reliable Computing Departments of Electrical Engineering
More informationDistributed Systems Byzantine Agreement
Distributed Systems Byzantine Agreement He Sun School of Informatics University of Edinburgh Outline Finish EIG algorithm for Byzantine agreement. Number-of-processors lower bound for Byzantine agreement.
More informationEECS150 - Digital Design Lecture 26 - Faults and Error Correction. Types of Faults in Digital Designs
EECS150 - Digital Design Lecture 26 - Faults and Error Correction April 25, 2013 John Wawrzynek 1 Types of Faults in Digital Designs Design Bugs (function, timing, power draw) detected and corrected at
More informationFault Masking By Probabilistic Voting
OncuBilim Algorithm And Sstems Labs. Vol.9, Art.No:,(9) Fault Masking B Probabilistic Voting B. Bakant ALAGÖZ Abstract: In this stud, we introduced a probabilistic voter, regarding smbol probabilities
More informationEECS150 - Digital Design Lecture 26 Faults and Error Correction. Recap
EECS150 - Digital Design Lecture 26 Faults and Error Correction Nov. 26, 2013 Prof. Ronald Fearing Electrical Engineering and Computer Sciences University of California, Berkeley (slides courtesy of Prof.
More informationSafety and Reliability of Embedded Systems
(Sicherheit und Zuverlässigkeit eingebetteter Systeme) Fault Tree Analysis Mathematical Background and Algorithms Prof. Dr. Liggesmeyer, 0 Content Definitions of Terms Introduction to Combinatorics General
More informationWhy fault tolerant system?
Why fault tolerant system? Non Fault-Tolerant System Component 1 Component 2 Component N The reliability block diagram of a series systemeach element of the system must operate correctly for the system
More informationSystem Reliability Analysis. CS6323 Networks and Systems
System Reliability Analysis CS6323 Networks and Systems Topics Combinatorial Models for reliability Topology-based (structured) methods for Series Systems Parallel Systems Reliability analysis for arbitrary
More informationStochastic Monitoring and Testing of Digital LTI Filters
Stochastic Monitoring and Testing of Digital LTI Filters CHRISTOFOROS N. HADJICOSTIS Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign 148 C&SRL, 1308 West Main
More informationChapter 6. a. Open Circuit. Only if both resistors fail open-circuit, i.e. they are in parallel.
Chapter 6 1. a. Section 6.1. b. Section 6.3, see also Section 6.2. c. Predictions based on most published sources of reliability data tend to underestimate the reliability that is achievable, given that
More informationReliability Engineering I
Happiness is taking the reliability final exam. Reliability Engineering I ENM/MSC 565 Review for the Final Exam Vital Statistics What R&M concepts covered in the course When Monday April 29 from 4:30 6:00
More informationQuantitative Reliability Analysis
Quantitative Reliability Analysis Moosung Jae May 4, 2015 System Reliability Analysis System reliability analysis is conducted in terms of probabilities The probabilities of events can be modelled as logical
More informationFault-Tolerant Computing
Fault-Tolerant Computing Motivation, Background, and Tools Slide 1 About This Presentation This presentation has been prepared for the graduate course ECE 257A (Fault-Tolerant Computing) by Behrooz Parhami,
More informationAn approach to the design of highly reliable alld fail-safe digital systems*
An approach to the design of highly reliable alld fail-safe digital systems* by HEKRY Y. H. CHUANG University of Pittsburgh Pittsburgh, Pennsylvania and SANTANU DAS North Electric Com pan y Delaware, Ohio
More informationUnreliable Failure Detectors for Reliable Distributed Systems
Unreliable Failure Detectors for Reliable Distributed Systems A different approach Augment the asynchronous model with an unreliable failure detector for crash failures Define failure detectors in terms
More informationEvaluation criteria for reliability in computer systems
Journal of Electrical and Electronic Engineering 5; 3(-): 83-87 Published online February, 5 (http://www.sciencepublishinggroup.com/j/jeee) doi:.648/j.jeee.s.53.8 ISSN: 39-63 (Print); ISSN: 39-65 (Online)
More informationSTUDY OF PERMUTATION MATRICES BASED LDPC CODE CONSTRUCTION
EE229B PROJECT REPORT STUDY OF PERMUTATION MATRICES BASED LDPC CODE CONSTRUCTION Zhengya Zhang SID: 16827455 zyzhang@eecs.berkeley.edu 1 MOTIVATION Permutation matrices refer to the square matrices with
More informationReal-Time Course. Clock synchronization. June Peter van der TU/e Computer Science, System Architecture and Networking
Real-Time Course Clock synchronization 1 Clocks Processor p has monotonically increasing clock function C p (t) Clock has drift rate For t1 and t2, with t2 > t1 (1-ρ)(t2-t1)
More informationSynthesis of Saturating Counters Using Traditional and Non-traditional Basic Counters
Synthesis of Saturating Counters Using Traditional and Non-traditional Basic Counters Zhaojun Wo and Israel Koren Department of Electrical and Computer Engineering University of Massachusetts, Amherst,
More informationQuiz #2 A Mighty Fine Review
Quiz #2 A Mighty Fine Review February 27: A reliable adventure; a day like all days filled with those events that alter and change the course of history and you will be there! What is a Quiz #2? Three
More informationSIMULATION-BASED APPROXIMATE GLOBAL FAULT COLLAPSING
SIMULATION-BASED APPROXIMATE GLOBAL FAULT COLLAPSING Hussain Al-Asaad and Raymond Lee Computer Engineering Research Laboratory Department of Electrical & Computer Engineering University of California One
More informationOptimal Time and Random Inspection Policies for Computer Systems
Appl. Math. Inf. Sci. 8, No. 1L, 413-417 214) 413 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/1.12785/amis/81l51 Optimal Time and Random Inspection Policies for
More informationMulti-State Availability Modeling in Practice
Multi-State Availability Modeling in Practice Kishor S. Trivedi, Dong Seong Kim, Xiaoyan Yin Depart ment of Electrical and Computer Engineering, Duke University, Durham, NC 27708 USA kst@ee.duke.edu, {dk76,
More informationRedundant Array of Independent Disks
Redundant Array of Independent Disks Yashwant K. Malaiya 1 Redundant Array of Independent Disks (RAID) Enables greater levels of performance and/or reliability How? By concurrent use of two or more hard
More informationC 1. Recap: Finger Table. CSE 486/586 Distributed Systems Consensus. One Reason: Impossibility of Consensus. Let s Consider This
Recap: Finger Table Finding a using fingers Distributed Systems onsensus Steve Ko omputer Sciences and Engineering University at Buffalo N102 86 + 2 4 N86 20 + 2 6 N20 2 Let s onsider This
More informationAGREEMENT PROBLEMS (1) Agreement problems arise in many practical applications:
AGREEMENT PROBLEMS (1) AGREEMENT PROBLEMS Agreement problems arise in many practical applications: agreement on whether to commit or abort the results of a distributed atomic action (e.g. database transaction)
More informationEE 445 / 850: Final Examination
EE 445 / 850: Final Examination Date and Time: 3 Dec 0, PM Room: HLTH B6 Exam Duration: 3 hours One formula sheet permitted. - Covers chapters - 5 problems each carrying 0 marks - Must show all calculations
More informationDependability Analysis
Software and Systems Verification (VIMIMA01) Dependability Analysis Istvan Majzik Budapest University of Technology and Economics Fault Tolerant Systems Research Group Budapest University of Technology
More informationClock Synchronization with Bounded Global and Local Skew
Clock Synchronization with ounded Global and Local Skew Distributed Computing Christoph Lenzen, ETH Zurich Thomas Locher, ETH Zurich Roger Wattenhofer, ETH Zurich October 2008 Motivation: No Global Clock
More informationOn Detecting Multiple Faults in Baseline Interconnection Networks
On Detecting Multiple Faults in Baseline Interconnection Networks SHUN-SHII LIN 1 AND SHAN-TAI CHEN 2 1 National Taiwan Normal University, Taipei, Taiwan, ROC 2 Chung Cheng Institute of Technology, Tao-Yuan,
More informationNo.5 Node Grouping in System-Level Fault Diagnosis 475 identified under above strategy is precise in the sense that all nodes in F are truly faulty an
Vol.16 No.5 J. Comput. Sci. & Technol. Sept. 2001 Node Grouping in System-Level Fault Diagnosis ZHANG Dafang (± ) 1, XIE Gaogang (ΞΛ ) 1 and MIN Yinghua ( ΠΦ) 2 1 Department of Computer Science, Hunan
More information6.852: Distributed Algorithms Fall, Class 24
6.852: Distributed Algorithms Fall, 2009 Class 24 Today s plan Self-stabilization Self-stabilizing algorithms: Breadth-first spanning tree Mutual exclusion Composing self-stabilizing algorithms Making
More informationReliable Broadcast for Broadcast Busses
Reliable Broadcast for Broadcast Busses Ozalp Babaoglu and Rogerio Drummond. Streets of Byzantium: Network Architectures for Reliable Broadcast. IEEE Transactions on Software Engineering SE- 11(6):546-554,
More informationLecture 5 Fault Modeling
Lecture 5 Fault Modeling Why model faults? Some real defects in VLSI and PCB Common fault models Stuck-at faults Single stuck-at faults Fault equivalence Fault dominance and checkpoint theorem Classes
More informationDepartment of Electrical and Computer Engineering University of Wisconsin Madison. Fall Midterm Examination CLOSED BOOK
Department of Electrical and Computer Engineering University of Wisconsin Madison ECE 553: Testing and Testable Design of Digital Systems Fall 203-204 Midterm Examination CLOSED OOK Kewal K. Saluja Date:
More informationB.H. Far
SENG 637 Dependability, Reliability & Testing of Software Systems Chapter 3: System Reliability Department of Electrical & Computer Engineering, University of Calgary B.H. Far (far@ucalgary.ca) http://www.enel.ucalgary.ca/people/far/lectures/seng637/
More informationOptimal Checkpoint Placement on Real-Time Tasks with Harmonic Periods
Kwak SW, Yang JM. Optimal checkpoint placement on real-time tasks with harmonic periods. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(1): 105 112 Jan. 2012. DOI 10.1007/s11390-012-1209-0 Optimal Checkpoint
More informationAgreement Protocols. CS60002: Distributed Systems. Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur
Agreement Protocols CS60002: Distributed Systems Pallab Dasgupta Dept. of Computer Sc. & Engg., Indian Institute of Technology Kharagpur Classification of Faults Based on components that failed Program
More informationIntroduction. An Introduction to Algorithms and Data Structures
Introduction An Introduction to Algorithms and Data Structures Overview Aims This course is an introduction to the design, analysis and wide variety of algorithms (a topic often called Algorithmics ).
More informationFault-Tolerant Computer System Design ECE 60872/CS 590. Topic 2: Discrete Distributions
Fault-Tolerant Computer System Design ECE 60872/CS 590 Topic 2: Discrete Distributions Saurabh Bagchi ECE/CS Purdue University Outline Basic probability Conditional probability Independence of events Series-parallel
More informationWhat is a quantum computer? Quantum Architecture. Quantum Mechanics. Quantum Superposition. Quantum Entanglement. What is a Quantum Computer (contd.
What is a quantum computer? Quantum Architecture by Murat Birben A quantum computer is a device designed to take advantage of distincly quantum phenomena in carrying out a computational task. A quantum
More informationGraphing Radicals Business 7
Graphing Radicals Business 7 Radical functions have the form: The most frequently used radical is the square root; since it is the most frequently used we assume the number 2 is used and the square root
More informationAvailability and Reliability Analysis for Dependent System with Load-Sharing and Degradation Facility
International Journal of Systems Science and Applied Mathematics 2018; 3(1): 10-15 http://www.sciencepublishinggroup.com/j/ijssam doi: 10.11648/j.ijssam.20180301.12 ISSN: 2575-5838 (Print); ISSN: 2575-5803
More informationComparative Reliability Analysis of Reactor Trip System Architectures: Industrial Case
Comparative Reliability Analysis of Reactor Trip System Architectures: Industrial Case Aleksei Vambol 1 and Vyacheslav Kharchenko 1,2 1 Department of Computer Systems, Networks and Cybersecurity, National
More informationBackground on Coherent Systems
2 Background on Coherent Systems 2.1 Basic Ideas We will use the term system quite freely and regularly, even though it will remain an undefined term throughout this monograph. As we all have some experience
More information2nd Exam. First Name: Second Name: Matriculation Number: Degree Programme (please mark): CS Bachelor CS Master CS Lehramt SSE Master Other:
2 Concurrency Theory WS 2013/2014 Chair for Software Modeling and Verification Rheinisch-Westfälische Technische Hochschule Aachen Prof. Dr. Ir. Joost-Pieter Katoen apl. Prof. Dr. Thomas Noll S. Chakraorty,
More informationTime Dependent Analysis with Common Cause Failure Events in RiskSpectrum
Time Dependent Analysis with Common Cause Failure Events in RiskSpectrum Pavel Krcal a,b and Ola Bäckström a a Lloyd's Register Consulting, Stockholm, Sweden b Uppsala University, Uppsala, Sweden Abstract:
More informationThe Simplex Method: An Example
The Simplex Method: An Example Our first step is to introduce one more new variable, which we denote by z. The variable z is define to be equal to 4x 1 +3x 2. Doing this will allow us to have a unified
More informationBinary Decision Diagrams and Symbolic Model Checking
Binary Decision Diagrams and Symbolic Model Checking Randy Bryant Ed Clarke Ken McMillan Allen Emerson CMU CMU Cadence U Texas http://www.cs.cmu.edu/~bryant Binary Decision Diagrams Restricted Form of
More informationDepartment of Electrical and Computer Engineering University of Wisconsin Madison. Fall Midterm Examination CLOSED BOOK
Department of Electrical and Computer Engineering University of Wisconsin Madison ECE 553: Testing and Testable Design of Digital Systems Fall 2014-2015 Midterm Examination CLOSED BOOK Kewal K. Saluja
More informationDeterministic Consensus Algorithm with Linear Per-Bit Complexity
Deterministic Consensus Algorithm with Linear Per-Bit Complexity Guanfeng Liang and Nitin Vaidya Department of Electrical and Computer Engineering, and Coordinated Science Laboratory University of Illinois
More informationLan Performance LAB Ethernet : CSMA/CD TOKEN RING: TOKEN
Lan Performance LAB Ethernet : CSMA/CD TOKEN RING: TOKEN Ethernet Frame Format 7 b y te s 1 b y te 2 o r 6 b y te s 2 o r 6 b y te s 2 b y te s 4-1 5 0 0 b y te s 4 b y te s P r e a m b le S ta r t F r
More informationElementary Algebra - Problem Drill 01: Introduction to Elementary Algebra
Elementary Algebra - Problem Drill 01: Introduction to Elementary Algebra No. 1 of 10 Instructions: (1) Read the problem and answer choices carefully (2) Work the problems on paper as 1. Which of the following
More informationDiagnosis of Repeated/Intermittent Failures in Discrete Event Systems
Diagnosis of Repeated/Intermittent Failures in Discrete Event Systems Shengbing Jiang, Ratnesh Kumar, and Humberto E. Garcia Abstract We introduce the notion of repeated failure diagnosability for diagnosing
More informationModular numbers and Error Correcting Codes. Introduction. Modular Arithmetic.
Modular numbers and Error Correcting Codes Introduction Modular Arithmetic Finite fields n-space over a finite field Error correcting codes Exercises Introduction. Data transmission is not normally perfect;
More informationWeek Cuts, Branch & Bound, and Lagrangean Relaxation
Week 11 1 Integer Linear Programming This week we will discuss solution methods for solving integer linear programming problems. I will skip the part on complexity theory, Section 11.8, although this is
More informationFinally the Weakest Failure Detector for Non-Blocking Atomic Commit
Finally the Weakest Failure Detector for Non-Blocking Atomic Commit Rachid Guerraoui Petr Kouznetsov Distributed Programming Laboratory EPFL Abstract Recent papers [7, 9] define the weakest failure detector
More informationCoordination. Failures and Consensus. Consensus. Consensus. Overview. Properties for Correct Consensus. Variant I: Consensus (C) P 1. v 1.
Coordination Failures and Consensus If the solution to availability and scalability is to decentralize and replicate functions and data, how do we coordinate the nodes? data consistency update propagation
More informationIntroduction to the Simplex Algorithm Active Learning Module 3
Introduction to the Simplex Algorithm Active Learning Module 3 J. René Villalobos and Gary L. Hogg Arizona State University Paul M. Griffin Georgia Institute of Technology Background Material Almost any
More informationthen the hard copy will not be correct whenever your instructor modifies the assignments.
Assignments for Math 2030 then the hard copy will not be correct whenever your instructor modifies the assignments. exams, but working through the problems is a good way to prepare for the exams. It is
More informationFAULT TOLERANT SYSTEMS
ﻋﻨﻮان درس ﻧﺎم اﺳﺘﺎد 1394-95 ﻧﺎم درس ﻧﺎم رﺷﺘﻪ ﻧﺎم ﮔﺮاﯾﺶ ﻧﺎم ﻣﻮﻟﻒ ﻧﺎم ﮐﺎرﺷﻨﺎس درس FAULT TOLERANT SYSTEMS Part 8 RAID Systems Chapter 3 Information Redundancy RAID - Redundant Arrays of Inexpensive (Independent
More information9.2 Multiplication Properties of Radicals
Section 9.2 Multiplication Properties of Radicals 885 9.2 Multiplication Properties of Radicals Recall that the equation x 2 = a, where a is a positive real number, has two solutions, as indicated in Figure
More informationRead this before starting!
Points missed: Student's Name: Total score: / points East Tennessee State University Department of Computer and Information Sciences CSCI 25 (Tarnoff) Computer Organization TEST 2 for Fall Semester, 28
More informationFault Tolerance Technique in Huffman Coding applies to Baseline JPEG
Fault Tolerance Technique in Huffman Coding applies to Baseline JPEG Cung Nguyen and Robert G. Redinbo Department of Electrical and Computer Engineering University of California, Davis, CA email: cunguyen,
More informationMassachusetts Institute of Technology Department of Electrical Engineering and Computer Science Algorithms For Inference Fall 2014
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 Recitation 3 1 Gaussian Graphical Models: Schur s Complement Consider
More informationEGFC: AN EXACT GLOBAL FAULT COLLAPSING TOOL FOR COMBINATIONAL CIRCUITS
EGFC: AN EXACT GLOBAL FAULT COLLAPSING TOOL FOR COMBINATIONAL CIRCUITS Hussain Al-Asaad Department of Electrical & Computer Engineering University of California One Shields Avenue, Davis, CA 95616-5294
More informationReliability of Technical Systems
Reliability of Technical Systems Main Topics 1. Short Introduction, Reliability Parameters: Failure Rate, Failure Probability, etc. 2. Some Important Reliability Distributions 3. Component Reliability
More informationDictionary-Less Defect Diagnosis as Surrogate Single Stuck-At Faults
Dictionary-Less Defect Diagnosis as Surrogate Single Stuck-At Faults Chidambaram Alagappan and Vishwani D. Agrawal Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849,
More informationCS Homework 3. October 15, 2009
CS 294 - Homework 3 October 15, 2009 If you have questions, contact Alexandre Bouchard (bouchard@cs.berkeley.edu) for part 1 and Alex Simma (asimma@eecs.berkeley.edu) for part 2. Also check the class website
More informationThe Applications of Inductive Method in the Construction of Fault Trees MENG Qinghe 1,a, SUN Qin 2,b
The Applications of Inductive Method in the Construction of Fault Trees MENG Qinghe 1,a, SUN Qin 2,b 1 School of Aeronautics, Northwestern Polytechnical University, Xi an 710072, China 2 School of Aeronautics,
More informationTime-varying failure rate for system reliability analysis in large-scale railway risk assessment simulation
Time-varying failure rate for system reliability analysis in large-scale railway risk assessment simulation H. Zhang, E. Cutright & T. Giras Center of Rail Safety-Critical Excellence, University of Virginia,
More informationQuadruple Adaptive Redundancy with Fault Detection Estimator
Quadruple Adaptive Redundancy with Fault Detection Estimator Dohyeung Kim 1 and Richard Voyles 2 Abstract As a result of advances in technology, systems have grown more and more complex, leading to greater
More informationLearning Objectives:
Learning Objectives: t the end of this topic you will be able to; draw a block diagram showing how -type flip-flops can be connected to form a synchronous counter to meet a given specification; explain
More informationChapter 15. System Reliability Concepts and Methods. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University
Chapter 15 System Reliability Concepts and Methods William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University Copyright 1998-2008 W. Q. Meeker and L. A. Escobar. Based on
More information2 Generating Functions
2 Generating Functions In this part of the course, we re going to introduce algebraic methods for counting and proving combinatorial identities. This is often greatly advantageous over the method of finding
More informationSwarm-bots. Marco Dorigo FNRS Research Director IRIDIA Université Libre de Bruxelles
Swarm-bots Marco Dorigo FNRS Research Director IRIDIA Université Libre de Bruxelles Swarm-bots The swarm-bot is an experiment in swarm robotics Swarm robotics is the application of swarm intelligence principles
More informationCoping with disk crashes
Lecture 04.03 Coping with disk crashes By Marina Barsky Winter 2016, University of Toronto Disk failure types Intermittent failure Disk crash the entire disk becomes unreadable, suddenly and permanently
More informationCyber Physical Power Systems Power in Communications
1 Cyber Physical Power Systems Power in Communications Information and Communications Tech. Power Supply 2 ICT systems represent a noticeable (about 5 % of total t demand d in U.S.) fast increasing load.
More informationAlgebra 2 Secondary Mathematics Instructional Guide
Algebra 2 Secondary Mathematics Instructional Guide 2009-2010 ALGEBRA 2AB (Grade 9, 10 or 11) Prerequisite: Algebra 1AB or Geometry AB 310303 Algebra 2A 310304 Algebra 2B COURSE DESCRIPTION Los Angeles
More informationIntegrating Reliability into the Design of Power Electronics Systems
Integrating Reliability into the Design of Power Electronics Systems Alejandro D. Domínguez-García Grainger Center for Electric Machinery and Electromechanics Department of Electrical and Computer Engineering
More informationN= {1,2,3,4,5,6,7,8,9,10,11,...}
1.1: Integers and Order of Operations 1. Define the integers 2. Graph integers on a number line. 3. Using inequality symbols < and > 4. Find the absolute value of an integer 5. Perform operations with
More information