UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Fault Tolerant Computing ECE 655
|
|
- Herbert Baker
- 5 years ago
- Views:
Transcription
1 UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering Fault Tolerant Computing ECE 655 Part 1 Introduction C. M. Krishna Fall 2006 ECE655/Krishna Part.1.1 Prerequisites Basic courses in Digital Design Hardware Organization/Computer Architecture Probability ECE655/Krishna Part.1.2 Page 1
2 Fault Tolerance - Basic definition Fault-tolerant systems - ideally systems capable of executing their tasks correctly regardless of either hardware failures or software errors In practice - we can never guarantee the flawless execution of tasks under any circumstances Limit ourselves to types of failures and errors which are more likely to occur ECE655/Krishna Part.1.3 Need For Fault Tolerance ECE655/Krishna Part.1.4 Page 2
3 Need For Fault Tolerance - Life Critical Applications Life-critical applications such as; aircrafts, nuclear reactors, chemical plants, and medical equipment A malfunction of a computer in such applications can lead to catastrophe Their probability of failure must be extremely low, possibly one in a billion per hour of operation ECE655/Krishna Part.1.5 Need for Fault Tolerance - Harsh Environment A computing system operating in a harsh environment where it is subjected to electromagnetic disturbances particle hits and alike Very large number of failures means: the system will not produce useful results unless some fault-tolerance is incorporated ECE655/Krishna Part.1.6 Page 3
4 Need For Fault Tolerance - Highly Complex Systems Complex systems consist of millions of devices Every physical device has a certain probability of failure A very large number of devices implies that the likelihood of failures is high The system will experience faults at such a frequency which renders it useless ECE655/Krishna Part.1.7 Fault Tolerance Measures It is important to have proper yardsticks - measures - by which to measure the effect of fault tolerance A measure is a mathematical abstraction, which expresses only some subset of the object's nature Measures? ECE655/Krishna Part.1.8 Page 4
5 Traditional Measures Assumption: The system can be in one of two states: up or down Examples: A lightbulb is either good or burned out; A wire is either connected or broken Two traditional measures: Reliability and Availability Reliability, R(t): probability that the system is up during the interval [0,t], given it was up at time 0 Availability, A(t), is the fraction of time that the system is up during the interval [0,t] Point Availability, Ap(t), is the probability that the system is up at time t A related measure is MTTF - Mean Time To Failure - the average time the system remains up before it goes down and has to be repaired or replaced ECE655/Krishna Part.1.9 Need For More Measures The assumption of the system being in state up or down is very limiting Example: A processor with one of its several hundreds of millions of gates stuck at logic value 0 and the rest is functional - may affect the output of the processor once in every 25,000 hours of use The processor is not fault-free, but cannot be defined as being down More general measures than the traditional reliability and availability are needed ECE655/Krishna Part.1.10 Page 5
6 More General Measures Capacity Reliability - probability that the system capacity (as measured, for example, by throughput) at time t exceeds some given threshold at that time Another extension - consider everything from the perspective of the application This approach was taken to define the measure known as Performability ECE655/Krishna Part.1.11 Computational Capacity Example: N processors in a gracefully degrading system System recovers from failures of processors and is useful as long as at least one processor remains operational Let Pi = Prob {i processors are operational} R(t) = S Pi Let c be the computational capacity of a processor (e.g., number of fixed-size tasks it can execute) Computational capacity of i processors: Ci = i c Computational capacity of the system: S Ci Pi ECE655/Krishna Part.1.12 Page 6
7 Performability Application is used to define accomplishment levels L1, L2,...,Ln Each represents a level of quality of service delivered by the application Example: Li indicates i system crashes during the mission time period T Performability is a vector (P(L1),P(L2),...,P(Ln)) where P(Li) is the probability that the computer functions well enough to permit the application to reach up to accomplishment level Li ECE655/Krishna Part.1.13 Network Connectivity Measures Focus on the network that connects the processors Classical Node and Line Connectivity - the minimum number of nodes and lines, respectively, that have to fail before the network becomes disconnected Measure indicates how vulnerable the network is to disconnection A network disconnected by the failure of just one (critically-positioned) node is potentially more vulnerable than another which requires several nodes to fail before it becomes disconnected ECE655/Krishna Part.1.14 Page 7
8 Connectivity - Examples ECE655/Krishna Part.1.15 Network Resilience Measures Classical connectivity distinguishes between only two network states: connected and disconnected It says nothing about how the network degrades as nodes fail before becoming disconnected Two possible resilience measures: Average node-pair distance Network diameter - the maximum node-pair distance Both calculated given the probability of node and/or link failure ECE655/Krishna Part.1.16 Page 8
9 More Network Measures What happens upon network disconnection A network that splits into one large connected component and several small pieces may be able to function, vs. a network that splinters into a large number of small pieces Another measure of a network's resilience to failure: probability distribution of the largest component upon disconnection ECE655/Krishna Part.1.17 Redundancy Redundancy is at the heart of fault tolerance Redundancy can be defined as the incorporation of extra parts in the design of a system in such a way that its function is not impaired in the event of a failure We will study four forms of redundancy: ECE655/Krishna Part.1.18 Page 9
10 Hardware Redundancy Extra hardware is added to override the effects of a failed component Static Hardware Redundancy - for immediate masking of a failure Example: Use three processors (instead of one), each performing the same function. The majority output of these processors can override the wrong output of a single faulty processor Dynamic Hardware Redundancy - spare components are activated upon the failure of a currently active component Hybrid Hardware Redundancy - A combination of static and dynamic redundancy techniques ECE655/Krishna Part.1.19 Software Redundancy Software redundancy is provided by having multiple teams of programmers Write different versions of software for the same function The hope is that such diversity will ensure that not all the copies will fail on the same set of input data ECE655/Krishna Part.1.20 Page 10
11 Information and Time Redundancy Information redundancy: provided by adding bits to the original data bits so that an error in the data bits can be detected and even corrected Error detecting and correcting codes have been developed and are being used Information redundancy often requires hardware redundancy to process the additional bits Time redundancy: provided by having additional time during which a failed execution can be repeated Most failures are transient - they go away after some time If enough slack time is available, the failed unit can recover and redo the affected computation ECE655/Krishna Part.1.21 Hardware Faults Classification Three types of faults: Transient Faults - disappear after a relatively short time Example - a memory cell whose contents are changed spuriously due to some electromagnetic interference Overwriting the memory cell with the right content will make the fault go away Permanent Faults - never go away, component has to be repaired or replaced Intermittent Faults - cycle between active and benign states Example - a loose connection ECE655/Krishna Part.1.22 Page 11
12 Failure Rate The rate at which a component suffers faults depends on its age, the ambient temperature, any voltage or physical shocks that it suffers, and the technology The dependence on age is usually captured by the bathtub curve: ECE655/Krishna Part.1.23 Bathtub Curve When components are very young, the failure rate is quite high: there is a good chance that some units with manufacturing defects slipped through manufacturing quality control and were released As time goes on, these units are weeded out, and the unit spends the bulk of its life showing a fairly constant failure rate As it becomes very old, aging effects start to take over, and the failure rate rises again ECE655/Krishna Part.1.24 Page 12
13 Empirical Formula for l - Failure Rate l = pl pq (C1 pt pv + C2 pe) pl: Learning factor, (how mature the technology is) pq: Manufacturing process Quality factor (0.25 to 20.00) pt: Temperature factor, (from 0.1 to 1000), proportional to exp(-ea/kt) where Ea is the activation energy in electronvolts associated with the technology, k is the Boltzmann constant and T is the temperature in Kelvin pv: Voltage stress factor for CMOS devices (from 1 to 10 depending on the supply voltage and the temperature); does not apply to other technologies (set to 1) pe: Environment shock factor: from about 0.4 (airconditioned environment), to 13.0 (harsh environment) C1, C2: Complexity factors; functions of number of gates on the chip and number of pins in the package Further details: MIL-HDBK-217E handbook ECE655/Krishna Part.1.25 Environment Impact Devices operating in space, which is replete with energy-charged particles and can subject devices to severe temperature swings, can be expected to fail more often Similarly for computers in automobiles (high temperature and vibration) and industrial applications ECE655/Krishna Part.1.26 Page 13
14 Faults Vs. Errors A fault can be either a hardware defect or a software/programming mistake By contrast, an error is a manifestation of a fault For example, consider an adder circuit one of whose output lines is stuck at 1. This is a fault, but not (yet) an error. This fault causes an error when the adder is used and the result on that line is supposed to have been a 0, rather than a 1 ECE655/Krishna Part.1.27 Propagation of Faults and Errors Both faults and errors can spread through the system If a chip shorts out power to ground, it may cause nearby chips to fail as well Errors can spread because the output of one processor is frequently used as input by other processors Adder example: the erroneous result of the faulty adder can be fed into further calculations, thus propagating the error ECE655/Krishna Part.1.28 Page 14
15 Containment Zones To limit such situations, designers incorporate containment zones into systems Barriers that reduce the chance that a fault or error in one zone will propagate to another A fault-containment zone can be created by providing an independent power supply to each zone The designer tries to electrically isolate one zone from another An error-containment zone can be created by using redundant units and voting on their output ECE655/Krishna Part.1.29 Time to Failure - Analytic Model Consider the following model: N identical components, all operational at time t=0 Each component remains operational until it is hit by a failure All failures are permanent and occur in each component independently from other components We first concentrate on one component T - the lifetime of one component - the time until it fails T is a random variable f(t) - the density function of T F(t) - the cumulative distribution function of T ECE655/Krishna Part.1.30 Page 15
16 Probabilistic Interpretation F(t) - the probability that the component will fail at or before time t F(t) = Prob (T t) f(t) - the momentary rate of failure f(t)dt = Prob (t T t+dt) Like any density function (defined for t 0) f(t) dt =1 0 f(t) 0, for all t 0 These functions are related through f(t)=df(t)/dt F(t)= f(s) ds ECE655/Krishna Part t Reliability and Failure (Hazard) Rate The reliability of a single component - R(t) R(t) = Prob (T>t) = 1- F(t) The failure probability of a component at time t, p(t) - the conditional probability that the component will fail at time t, given it has not failed before p(t) = Prob (t T t+dt T t) = Prob (t T t+dt) / Prob(T t) = f(t)dt / (1-F(t)) The failure rate (or hazard rate) of a component at time t, h(t), is defined as p(t)/dt h(t) = f(t)/(1- F(t)) Since dr(t)/dt = -f(t), we get h(t) = -1/R(t) dr(t)/dt ECE655/Krishna Part.1.32 Page 16
17 Constant Failure Rate If the failure rate is constant over time - h(t) = l - then dr(t) / dt = - l R(t) ; R(0)=1 The solution of the differential equation is - l t R(t) = e - l t f(t)= l e - l t F(t)=1- e A constant failure rate is obtained if and only if T, the lifetime of the component, has an exponential distribution ECE655/Krishna Part.1.33 Mean Time to Failure MTTF - the expected value of the lifetime T MTTF = E[T] = t f(t) dt 0 dr(t)/dt= - f(t) MTTF = - t dr(t)/dt dt = [-t R(t) ] + R(t) dt 0 -t R(t) = 0 when t=0 and when t= since R( )=0 Therefore, MTTF = R(t) dt 0 If the failure rate is a constant l - l t R(t) = e - l t MTTF = e dt = 1/ l ECE655/Krishna Part.1.34 Page 17
18 Weibull Distribution - Introduction In most calculations of reliability, a constant failure rate l is assumed, or equivalently - the Exponential distribution for the component lifetime T There are cases in which this simplifying assumption is inappropriate Example - during the infant mortality and wear-out phases of the bathtub curve In such cases, the Weibull distribution for the lifetime T is often used for reliability calculation ECE655/Krishna Part.1.35 Weibull distribution - Equation The Weibull distribution has two parameters, l and b The density function of the component lifetime T: b b-1 - l t f(t)= l b t e The failure rate for the Weibull distribution is b-1 h(t)= l b t The failure rate h(t) is decreasing with time for b<1, increasing with time for b>1, constant for b=1, appropriate for the infant mortality, wearout and middle phases, respectively ECE655/Krishna Part.1.36 Page 18
19 MTTF for the Weibull Distribution The reliability for the Weibull distribution is b - l t R(t) = e The MTTF for the Weibull distribution is 1/b MTTF = G(1/b)/(b l ) where G(x) is the Gamma function The special case b = 1 is the Exponential distribution with a constant failure rate ECE655/Krishna Part.1.37 Page 19
Terminology and Concepts
Terminology and Concepts Prof. Naga Kandasamy 1 Goals of Fault Tolerance Dependability is an umbrella term encompassing the concepts of reliability, availability, performability, safety, and testability.
More informationQuantitative evaluation of Dependability
Quantitative evaluation of Dependability 1 Quantitative evaluation of Dependability Faults are the cause of errors and failures. Does the arrival time of faults fit a probability distribution? If so, what
More informationQuantitative evaluation of Dependability
Quantitative evaluation of Dependability 1 Quantitative evaluation of Dependability Faults are the cause of errors and failures. Does the arrival time of faults fit a probability distribution? If so, what
More informationFault Tolerance. Dealing with Faults
Fault Tolerance Real-time computing systems must be fault-tolerant: they must be able to continue operating despite the failure of a limited subset of their hardware or software. They must also allow graceful
More informationReliability of Technical Systems
Main Topics 1. Introduction, Key Terms, Framing the Problem 2. Reliability Parameters: Failure Rate, Failure Probability, etc. 3. Some Important Reliability Distributions 4. Component Reliability 5. Software
More informationFault-Tolerant Computing
Fault-Tolerant Computing Motivation, Background, and Tools Slide 1 About This Presentation This presentation has been prepared for the graduate course ECE 257A (Fault-Tolerant Computing) by Behrooz Parhami,
More informationChapter 6. a. Open Circuit. Only if both resistors fail open-circuit, i.e. they are in parallel.
Chapter 6 1. a. Section 6.1. b. Section 6.3, see also Section 6.2. c. Predictions based on most published sources of reliability data tend to underestimate the reliability that is achievable, given that
More informationVLSI Design I. Defect Mechanisms and Fault Models
VLSI Design I Defect Mechanisms and Fault Models He s dead Jim... Overview Defects Fault models Goal: You know the difference between design and fabrication defects. You know sources of defects and you
More informationLecture 5 Fault Modeling
Lecture 5 Fault Modeling Why model faults? Some real defects in VLSI and PCB Common fault models Stuck-at faults Single stuck-at faults Fault equivalence Fault dominance and checkpoint theorem Classes
More informationEECS150 - Digital Design Lecture 26 - Faults and Error Correction. Types of Faults in Digital Designs
EECS150 - Digital Design Lecture 26 - Faults and Error Correction April 25, 2013 John Wawrzynek 1 Types of Faults in Digital Designs Design Bugs (function, timing, power draw) detected and corrected at
More informationSignal Handling & Processing
Signal Handling & Processing The output signal of the primary transducer may be too small to drive indicating, recording or control elements directly. Or it may be in a form which is not convenient for
More informationEECS150 - Digital Design Lecture 26 Faults and Error Correction. Recap
EECS150 - Digital Design Lecture 26 Faults and Error Correction Nov. 26, 2013 Prof. Ronald Fearing Electrical Engineering and Computer Sciences University of California, Berkeley (slides courtesy of Prof.
More informationReliable Computing I
Instructor: Mehdi Tahoori Reliable Computing I Lecture 5: Reliability Evaluation INSTITUTE OF COMPUTER ENGINEERING (ITEC) CHAIR FOR DEPENDABLE NANO COMPUTING (CDNC) National Research Center of the Helmholtz
More informationRel: Estimating Digital System Reliability
Rel 1 Rel: Estimating Digital System Reliability Qualitatively, the reliability of a digital system is the likelihood that it works correctly when you need it. Marketing and sa les people like to say that
More informationChapter 2 Fault Modeling
Chapter 2 Fault Modeling Jin-Fu Li Advanced Reliable Systems (ARES) Lab. Department of Electrical Engineering National Central University Jungli, Taiwan Outline Why Model Faults? Fault Models (Faults)
More informationDependable Systems. ! Dependability Attributes. Dr. Peter Tröger. Sources:
Dependable Systems! Dependability Attributes Dr. Peter Tröger! Sources:! J.C. Laprie. Dependability: Basic Concepts and Terminology Eusgeld, Irene et al.: Dependability Metrics. 4909. Springer Publishing,
More informationEE 445 / 850: Final Examination
EE 445 / 850: Final Examination Date and Time: 3 Dec 0, PM Room: HLTH B6 Exam Duration: 3 hours One formula sheet permitted. - Covers chapters - 5 problems each carrying 0 marks - Must show all calculations
More informationDependable Computer Systems
Dependable Computer Systems Part 3: Fault-Tolerance and Modelling Contents Reliability: Basic Mathematical Model Example Failure Rate Functions Probabilistic Structural-Based Modeling: Part 1 Maintenance
More informationFigure 1.1: Schematic symbols of an N-transistor and P-transistor
Chapter 1 The digital abstraction The term a digital circuit refers to a device that works in a binary world. In the binary world, the only values are zeros and ones. Hence, the inputs of a digital circuit
More informationReliability Analysis of Moog Ultrasonic Air Bubble Detectors
Reliability Analysis of Moog Ultrasonic Air Bubble Detectors Air-in-line sensors are vital to the performance of many of today s medical device applications. The reliability of these sensors should be
More informationCombinational Techniques for Reliability Modeling
Combinational Techniques for Reliability Modeling Prof. Naga Kandasamy, ECE Department Drexel University, Philadelphia, PA 19104. January 24, 2009 The following material is derived from these text books.
More information10 Introduction to Reliability
0 Introduction to Reliability 10 Introduction to Reliability The following notes are based on Volume 6: How to Analyze Reliability Data, by Wayne Nelson (1993), ASQC Press. When considering the reliability
More informationWe are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors
We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists 3,900 116,000 120M Open access books available International authors and editors Downloads Our
More informationCHAPTER 10 RELIABILITY
CHAPTER 10 RELIABILITY Failure rates Reliability Constant failure rate and exponential distribution System Reliability Components in series Components in parallel Combination system 1 Failure Rate Curve
More informationEngineering Risk Benefit Analysis
Engineering Risk Benefit Analysis 1.155, 2.943, 3.577, 6.938, 10.816, 13.621, 16.862, 22.82, ESD.72, ESD.721 RPRA 3. Probability Distributions in RPRA George E. Apostolakis Massachusetts Institute of Technology
More informationAdvanced Testing. EE5375 ADD II Prof. MacDonald
Advanced Testing EE5375 ADD II Prof. MacDonald Functional Testing l Original testing method l Run chip from reset l Tester emulates the outside world l Chip runs functionally with internally generated
More informationFailure rate in the continuous sense. Figure. Exponential failure density functions [f(t)] 1
Failure rate (Updated and Adapted from Notes by Dr. A.K. Nema) Part 1: Failure rate is the frequency with which an engineered system or component fails, expressed for example in failures per hour. It is
More informationConcept of Reliability
Concept of Reliability Prepared By Dr. M. S. Memon Department of Industrial Engineering and Management Mehran University of Engineering and Technology Jamshoro, Sindh, Pakistan RELIABILITY Reliability
More informationLecture 16: Circuit Pitfalls
Introduction to CMOS VLSI Design Lecture 16: Circuit Pitfalls David Harris Harvey Mudd College Spring 2004 Outline Pitfalls Detective puzzle Given circuit and symptom, diagnose cause and recommend solution
More informationECE-470 Digital Design II Memory Test. Memory Cells Per Chip. Failure Mechanisms. Motivation. Test Time in Seconds (Memory Size: n Bits) Fault Types
ECE-470 Digital Design II Memory Test Motivation Semiconductor memories are about 35% of the entire semiconductor market Memories are the most numerous IPs used in SOC designs Number of bits per chip continues
More informationELE 491 Senior Design Project Proposal
ELE 491 Senior Design Project Proposal These slides are loosely based on the book Design for Electrical and Computer Engineers by Ford and Coulston. I have used the sources referenced in the book freely
More informationPractical Applications of Reliability Theory
Practical Applications of Reliability Theory George Dodson Spallation Neutron Source Managed by UT-Battelle Topics Reliability Terms and Definitions Reliability Modeling as a tool for evaluating system
More informationTradeoff between Reliability and Power Management
Tradeoff between Reliability and Power Management 9/1/2005 FORGE Lee, Kyoungwoo Contents 1. Overview of relationship between reliability and power management 2. Dakai Zhu, Rami Melhem and Daniel Moss e,
More informationComparative Distributions of Hazard Modeling Analysis
Comparative s of Hazard Modeling Analysis Rana Abdul Wajid Professor and Director Center for Statistics Lahore School of Economics Lahore E-mail: drrana@lse.edu.pk M. Shuaib Khan Department of Statistics
More informationFault Modeling. 李昆忠 Kuen-Jong Lee. Dept. of Electrical Engineering National Cheng-Kung University Tainan, Taiwan. VLSI Testing Class
Fault Modeling 李昆忠 Kuen-Jong Lee Dept. of Electrical Engineering National Cheng-Kung University Tainan, Taiwan Class Fault Modeling Some Definitions Why Modeling Faults Various Fault Models Fault Detection
More informationChapter 5. System Reliability and Reliability Prediction.
Chapter 5. System Reliability and Reliability Prediction. Problems & Solutions. Problem 1. Estimate the individual part failure rate given a base failure rate of 0.0333 failure/hour, a quality factor of
More informationChapter 15. System Reliability Concepts and Methods. William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University
Chapter 15 System Reliability Concepts and Methods William Q. Meeker and Luis A. Escobar Iowa State University and Louisiana State University Copyright 1998-2008 W. Q. Meeker and L. A. Escobar. Based on
More informationECE 3060 VLSI and Advanced Digital Design. Testing
ECE 3060 VLSI and Advanced Digital Design Testing Outline Definitions Faults and Errors Fault models and definitions Fault Detection Undetectable Faults can be used in synthesis Fault Simulation Observability
More informationEvaluation and Validation
Evaluation and Validation Peter Marwedel TU Dortmund, Informatik 12 Germany Graphics: Alexandra Nolte, Gesine Marwedel, 2003 2011 06 18 These slides use Microsoft clip arts. Microsoft copyright restrictions
More informationImportance of the Running-In Phase on the Life of Repairable Systems
Engineering, 214, 6, 78-83 Published Online February 214 (http://www.scirp.org/journal/eng) http://dx.doi.org/1.4236/eng.214.6211 Importance of the Running-In Phase on the Life of Repairable Systems Salima
More informationLoad-strength Dynamic Interaction Principle and Failure Rate Model
International Journal of Performability Engineering Vol. 6, No. 3, May 21, pp. 25-214. RAMS Consultants Printed in India Load-strength Dynamic Interaction Principle and Failure Rate Model LIYANG XIE and
More informationReliability of Bulk Power Systems (cont d)
Reliability of Bulk Power Systems (cont d) Important requirements of a reliable electric power service Voltage and frequency must be held within close tolerances Synchronous generators must be kept running
More informationKey Words: Lifetime Data Analysis (LDA), Probability Density Function (PDF), Goodness of fit methods, Chi-square method.
Reliability prediction based on lifetime data analysis methodology: The pump case study Abstract: The business case aims to demonstrate the lifetime data analysis methodology application from the historical
More informationUnderstanding Integrated Circuit Package Power Capabilities
Understanding Integrated Circuit Package Power Capabilities INTRODUCTION The short and long term reliability of National Semiconductor s interface circuits like any integrated circuit is very dependent
More informationReliability Engineering I
Happiness is taking the reliability final exam. Reliability Engineering I ENM/MSC 565 Review for the Final Exam Vital Statistics What R&M concepts covered in the course When Monday April 29 from 4:30 6:00
More informationReliability and Availability Simulation. Krige Visser, Professor, University of Pretoria, South Africa
Reliability and Availability Simulation Krige Visser, Professor, University of Pretoria, South Africa Content BACKGROUND DEFINITIONS SINGLE COMPONENTS MULTI-COMPONENT SYSTEMS AVAILABILITY SIMULATION CONCLUSION
More informationIntroduction to Reliability Theory (part 2)
Introduction to Reliability Theory (part 2) Frank Coolen UTOPIAE Training School II, Durham University 3 July 2018 (UTOPIAE) Introduction to Reliability Theory 1 / 21 Outline Statistical issues Software
More informationUnderstanding Integrated Circuit Package Power Capabilities
Understanding Integrated Circuit Package Power Capabilities INTRODUCTION The short and long term reliability of s interface circuits, like any integrated circuit, is very dependent on its environmental
More informationSemiconductor Reliability
Semiconductor Reliability. Semiconductor Device Failure Region Below figure shows the time-dependent change in the semiconductor device failure rate. Discussions on failure rate change in time often classify
More informationCyber Physical Power Systems Power in Communications
1 Cyber Physical Power Systems Power in Communications Information and Communications Tech. Power Supply 2 ICT systems represent a noticeable (about 5 % of total t demand d in U.S.) fast increasing load.
More informationIoT Network Quality/Reliability
IoT Network Quality/Reliability IEEE PHM June 19, 2017 Byung K. Yi, Dr. of Sci. Executive V.P. & CTO, InterDigital Communications, Inc Louis Kerofsky, PhD. Director of Partner Development InterDigital
More informationPart 3: Fault-tolerance and Modeling
Part 3: Fault-tolerance and Modeling Course: Dependable Computer Systems 2012, Stefan Poledna, All rights reserved part 3, page 1 Goals of fault-tolerance modeling Design phase Designing and implementing
More informationSemiconductor Technologies
UNIVERSITI TUNKU ABDUL RAHMAN Semiconductor Technologies Quality Control Dr. Lim Soo King 1/2/212 Chapter 1 Quality Control... 1 1. Introduction... 1 1.1 In-Process Quality Control and Incoming Quality
More informationStatistics for Engineers Lecture 4 Reliability and Lifetime Distributions
Statistics for Engineers Lecture 4 Reliability and Lifetime Distributions Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu February 15, 2017 Chong Ma (Statistics, USC)
More informationContents. Introduction Field Return Rate Estimation Process Used Parameters
Contents Introduction Field Return Rate Estimation Process Used Parameters Parts Count Reliability Prediction Accelerated Stress Testing Maturity Level Field Return Rate Indicator Function Field Return
More informationFault Tolerant Computing CS 530 Fault Modeling
CS 53 Fault Modeling Yashwant K. Malaiya Colorado State University Fault Modeling Why fault modeling? Stuck-at / fault model The single fault assumption Bridging and delay faults MOS transistors and CMOS
More informationConfidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean
Confidence Intervals Confidence interval for sample mean The CLT tells us: as the sample size n increases, the sample mean is approximately Normal with mean and standard deviation Thus, we have a standard
More informationDictionary-Less Defect Diagnosis as Surrogate Single Stuck-At Faults
Dictionary-Less Defect Diagnosis as Surrogate Single Stuck-At Faults Chidambaram Alagappan and Vishwani D. Agrawal Department of Electrical and Computer Engineering Auburn University, Auburn, AL 36849,
More informationLecture 16: Circuit Pitfalls
Lecture 16: Circuit Pitfalls Outline Variation Noise Budgets Reliability Circuit Pitfalls 2 Variation Process Threshold Channel length Interconnect dimensions Environment Voltage Temperature Aging / Wearout
More informationTemperature and Humidity Acceleration Factors on MLV Lifetime
Temperature and Humidity Acceleration Factors on MLV Lifetime With and Without DC Bias Greg Caswell Introduction This white paper assesses the temperature and humidity acceleration factors both with and
More informationCRYOGENIC DRAM BASED MEMORY SYSTEM FOR SCALABLE QUANTUM COMPUTERS: A FEASIBILITY STUDY
CRYOGENIC DRAM BASED MEMORY SYSTEM FOR SCALABLE QUANTUM COMPUTERS: A FEASIBILITY STUDY MEMSYS-2017 SWAMIT TANNU DOUG CARMEAN MOINUDDIN QURESHI Why Quantum Computers? 2 Molecule and Material Simulations
More informationTime-varying failure rate for system reliability analysis in large-scale railway risk assessment simulation
Time-varying failure rate for system reliability analysis in large-scale railway risk assessment simulation H. Zhang, E. Cutright & T. Giras Center of Rail Safety-Critical Excellence, University of Virginia,
More informationCMP 338: Third Class
CMP 338: Third Class HW 2 solution Conversion between bases The TINY processor Abstraction and separation of concerns Circuit design big picture Moore s law and chip fabrication cost Performance What does
More informationMemory Thermal Management 101
Memory Thermal Management 101 Overview With the continuing industry trends towards smaller, faster, and higher power memories, thermal management is becoming increasingly important. Not only are device
More informationLecture 5 Probability
Lecture 5 Probability Dr. V.G. Snell Nuclear Reactor Safety Course McMaster University vgs 1 Probability Basic Ideas P(A)/probability of event A 'lim n64 ( x n ) (1) (Axiom #1) 0 # P(A) #1 (1) (Axiom #2):
More informationConsidering Security Aspects in Safety Environment. Dipl.-Ing. Evzudin Ugljesa
Considering Security spects in Safety Environment Dipl.-ng. Evzudin Ugljesa Overview ntroduction Definitions of safety relevant parameters Description of the oo4-architecture Calculation of the FD-Value
More informationPERFORMANCE METRICS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah
PERFORMANCE METRICS Mahdi Nazm Bojnordi Assistant Professor School of Computing University of Utah CS/ECE 6810: Computer Architecture Overview Announcement Jan. 17 th : Homework 1 release (due on Jan.
More informationWhy the fuss about measurements and precision?
Introduction In this tutorial you will learn the definitions, rules and techniques needed to record measurements in the laboratory to the proper precision (significant figures). You should also develop
More informationEE241 - Spring 2000 Advanced Digital Integrated Circuits. Announcements
EE241 - Spring 2 Advanced Digital Integrated Circuits Lecture 11 Low Power-Low Energy Circuit Design Announcements Homework #2 due Friday, 3/3 by 5pm Midterm project reports due in two weeks - 3/7 by 5pm
More informationB.H. Far
SENG 637 Dependability, Reliability & Testing of Software Systems Chapter 3: System Reliability Department of Electrical & Computer Engineering, University of Calgary B.H. Far (far@ucalgary.ca) http://www.enel.ucalgary.ca/people/far/lectures/seng637/
More informationSystem-Level Modeling and Microprocessor Reliability Analysis for Backend Wearout Mechanisms
System-Level Modeling and Microprocessor Reliability Analysis for Backend Wearout Mechanisms Chang-Chih Chen and Linda Milor School of Electrical and Comptuer Engineering, Georgia Institute of Technology,
More informationCHAPTER 3 ANALYSIS OF RELIABILITY AND PROBABILITY MEASURES
27 CHAPTER 3 ANALYSIS OF RELIABILITY AND PROBABILITY MEASURES 3.1 INTRODUCTION The express purpose of this research is to assimilate reliability and its associated probabilistic variables into the Unit
More informationDVClub Europe Formal fault analysis for ISO fault metrics on real world designs. Jörg Große Product Manager Functional Safety November 2016
DVClub Europe Formal fault analysis for ISO 26262 fault metrics on real world designs Jörg Große Product Manager Functional Safety November 2016 Page 1 11/27/2016 Introduction Functional Safety The objective
More informationIncremental Latin Hypercube Sampling
Incremental Latin Hypercube Sampling for Lifetime Stochastic Behavioral Modeling of Analog Circuits Yen-Lung Chen +, Wei Wu *, Chien-Nan Jimmy Liu + and Lei He * EE Dept., National Central University,
More informationFundamentals of Reliability Engineering and Applications
Fundamentals of Reliability Engineering and Applications E. A. Elsayed elsayed@rci.rutgers.edu Rutgers University Quality Control & Reliability Engineering (QCRE) IIE February 21, 2012 1 Outline Part 1.
More informationAvailability and Reliability Analysis for Dependent System with Load-Sharing and Degradation Facility
International Journal of Systems Science and Applied Mathematics 2018; 3(1): 10-15 http://www.sciencepublishinggroup.com/j/ijssam doi: 10.11648/j.ijssam.20180301.12 ISSN: 2575-5838 (Print); ISSN: 2575-5803
More informationDesign for Manufacturability and Power Estimation. Physical issues verification (DSM)
Design for Manufacturability and Power Estimation Lecture 25 Alessandra Nardi Thanks to Prof. Jan Rabaey and Prof. K. Keutzer Physical issues verification (DSM) Interconnects Signal Integrity P/G integrity
More informationSingle Stuck-At Fault Model Other Fault Models Redundancy and Untestable Faults Fault Equivalence and Fault Dominance Method of Boolean Difference
Single Stuck-At Fault Model Other Fault Models Redundancy and Untestable Faults Fault Equivalence and Fault Dominance Method of Boolean Difference Copyright 1998 Elizabeth M. Rudnick 1 Modeling the effects
More informationNon-observable failure progression
Non-observable failure progression 1 Age based maintenance policies We consider a situation where we are not able to observe failure progression, or where it is impractical to observe failure progression:
More informationIntroduction to Engineering Reliability
Introduction to Engineering Reliability Robert C. Patev North Atlantic Division Regional Technical Specialist (978) 318-8394 Topics Reliability Basic Principles of Reliability Analysis Non-Probabilistic
More informationLecture 2: Metrics to Evaluate Systems
Lecture 2: Metrics to Evaluate Systems Topics: Metrics: power, reliability, cost, benchmark suites, performance equation, summarizing performance with AM, GM, HM Sign up for the class mailing list! Video
More informationDesign of Reliable Processors Based on Unreliable Devices Séminaire COMELEC
Design of Reliable Processors Based on Unreliable Devices Séminaire COMELEC Lirida Alves de Barros Naviner Paris, 1 July 213 Outline Basics on reliability Technology Aspects Design for Reliability Conclusions
More informationFault Tolerant Computing CS 530 Fault Modeling. Yashwant K. Malaiya Colorado State University
CS 530 Fault Modeling Yashwant K. Malaiya Colorado State University 1 Objectives The number of potential defects in a unit under test is extremely large. A fault-model presumes that most of the defects
More informationMulti-State Availability Modeling in Practice
Multi-State Availability Modeling in Practice Kishor S. Trivedi, Dong Seong Kim, Xiaoyan Yin Depart ment of Electrical and Computer Engineering, Duke University, Durham, NC 27708 USA kst@ee.duke.edu, {dk76,
More informationQuality and Reliability Report
Quality and Reliability Report GaAs Schottky Diode Products 051-06444 rev C Marki Microwave Inc. 215 Vineyard Court, Morgan Hill, CA 95037 Phone: (408) 778-4200 / FAX: (408) 778-4300 Email: info@markimicrowave.com
More informationFault Modeling. Fault Modeling Outline
Fault Modeling Outline Single Stuck-t Fault Model Other Fault Models Redundancy and Untestable Faults Fault Equivalence and Fault Dominance Method of oolean Difference Copyright 1998 Elizabeth M. Rudnick
More informationOverview ECE 553: TESTING AND TESTABLE DESIGN OF. Memory Density. Test Time in Seconds (Memory Size n Bits) 10/28/2014
ECE 553: TESTING AND TESTABLE DESIGN OF DIGITAL SYSTES Memory testing Overview Motivation and introduction Functional model of a memory A simple minded test and its limitations Fault models March tests
More information6. STRUCTURAL SAFETY
6.1 RELIABILITY 6. STRUCTURAL SAFETY Igor Kokcharov Dependability is the ability of a structure to maintain its working parameters in given ranges for a stated period of time. Dependability is a collective
More informationQuantitative Reliability Analysis
Quantitative Reliability Analysis Moosung Jae May 4, 2015 System Reliability Analysis System reliability analysis is conducted in terms of probabilities The probabilities of events can be modelled as logical
More informationChapter 2 Reliability Metrology
Chapter 2 Reliability Metrology Abstract In this chapter, first, reliability is defined. Then, different ways of modeling reliability are discussed. Empirical models are based on field data and are easy
More informationECE 512 Digital System Testing and Design for Testability. Model Solutions for Assignment #3
ECE 512 Digital System Testing and Design for Testability Model Solutions for Assignment #3 14.1) In a fault-free instance of the circuit in Fig. 14.15, holding the input low for two clock cycles should
More informationMaintenance free operating period an alternative measure to MTBF and failure rate for specifying reliability?
Reliability Engineering and System Safety 64 (1999) 127 131 Technical note Maintenance free operating period an alternative measure to MTBF and failure rate for specifying reliability? U. Dinesh Kumar
More informationDC/DC converter Input 9-36 and Vdc Output up to 0.5A/3W
PKV 3000 I PKV 5000 I DC/DC converter Input 9-36 and 18-72 Vdc up to 0.5A/3W Key Features Industry standard DIL24 Wide input voltage range, 9 36V, 18 72V High efficiency 74 83% typical Low idling power
More informationAN Reliability of High Power Bipolar Devices Application Note AN September 2009 LN26862 Authors: Dinesh Chamund, Colin Rout
Reliability of High Power Bipolar Devices Application Note AN5948-2 September 2009 LN26862 Authors: Dinesh Chamund, Colin Rout INTRODUCTION We are often asked What is the MTBF or FIT rating of this diode
More informationStatistical Analysis of BTI in the Presence of Processinduced Voltage and Temperature Variations
Statistical Analysis of BTI in the Presence of Processinduced Voltage and Temperature Variations Farshad Firouzi, Saman Kiamehr, Mehdi. B. Tahoori INSTITUTE OF COMPUTER ENGINEERING (ITEC) CHAIR FOR DEPENDABLE
More informationExample: sending one bit of information across noisy channel. Effects of the noise: flip the bit with probability p.
Lecture 20 Page 1 Lecture 20 Quantum error correction Classical error correction Modern computers: failure rate is below one error in 10 17 operations Data transmission and storage (file transfers, cell
More informationFAULT TOLERANT DESIGN: AN INTRODUCTION
FAULT TOLERANT DESIGN: AN INTRODUCTION ELENA DUBROVA Department of Microelectronics and Information Technology Royal Institute of Technology Stockholm, Sweden Kluwer Academic Publishers Boston/Dordrecht/London
More information2.6 Complexity Theory for Map-Reduce. Star Joins 2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51
2.6. COMPLEXITY THEORY FOR MAP-REDUCE 51 Star Joins A common structure for data mining of commercial data is the star join. For example, a chain store like Walmart keeps a fact table whose tuples each
More informationWeibull models for software reliability: case studies
Weibull models for software reliability: case studies Pavel Praks a Tomáš Musil b Petr Zajac c Pierre-Etienne Labeau d mailto:pavel.praks@vsb.cz December 6, 2007 avšb - Technical University of Ostrava,
More informationRadiation Effects in Nano Inverter Gate
Nanoscience and Nanotechnology 2012, 2(6): 159-163 DOI: 10.5923/j.nn.20120206.02 Radiation Effects in Nano Inverter Gate Nooshin Mahdavi Sama Technical and Vocational Training College, Islamic Azad University,
More information