Understanding and Using Availability

Similar documents
Understanding and Using Availability

START Selected Topics in Assurance

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

Availability and Maintainability. Piero Baraldi

Calculation of MTTF values with Markov Models for Safety Instrumented Systems

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

Estimation of the large covariance matrix with two-step monotone missing data

Approximating min-max k-clustering

Estimation of component redundancy in optimal age maintenance

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION

3-Unit System Comprising Two Types of Units with First Come First Served Repair Pattern Except When Both Types of Units are Waiting for Repair

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

4. Score normalization technical details We now discuss the technical details of the score normalization method.

Module 8. Lecture 3: Markov chain

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

Wolfgang POESSNECKER and Ulrich GROSS*

Developing A Deterioration Probabilistic Model for Rail Wear

Probabilistic Analysis of a Computer System with Inspection and Priority for Repair Activities of H/W over Replacement of S/W

Homework Solution 4 for APPM4/5560 Markov Processes

Outline. Markov Chains and Markov Models. Outline. Markov Chains. Markov Chains Definitions Huizhen Yu

Improved Capacity Bounds for the Binary Energy Harvesting Channel

Asymptotic Properties of the Markov Chain Model method of finding Markov chains Generators of..

A Comparative Study of Profit Analysis of Two Reliability Models on a 2-unit PLC System

8 STOCHASTIC PROCESSES

On split sample and randomized confidence intervals for binomial proportions

Department of Mathematics

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Proceedings of the 2017 Winter Simulation Conference W. K. V. Chan, A. D Ambrogio, G. Zacharewicz, N. Mustafee, G. Wainer, and E. Page, eds.

STABILITY ANALYSIS AND CONTROL OF STOCHASTIC DYNAMIC SYSTEMS USING POLYNOMIAL CHAOS. A Dissertation JAMES ROBERT FISHER

Distributed Rule-Based Inference in the Presence of Redundant Information

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations

Discrete-time Geo/Geo/1 Queue with Negative Customers and Working Breakdowns

Optimal Design of Truss Structures Using a Neutrosophic Number Optimization Model under an Indeterminate Environment

SIMULATED ANNEALING AND JOINT MANUFACTURING BATCH-SIZING. Ruhul SARKER. Xin YAO

An Analysis of Reliable Classifiers through ROC Isometrics

arxiv: v1 [physics.data-an] 26 Oct 2012

Methods for detecting fatigue cracks in gears

Re-entry Protocols for Seismically Active Mines Using Statistical Analysis of Aftershock Sequences

A generalization of Amdahl's law and relative conditions of parallelism

A Model for Randomly Correlated Deposition

Analysis of Pressure Transient Response for an Injector under Hydraulic Stimulation at the Salak Geothermal Field, Indonesia

The Binomial Approach for Probability of Detection

Homogeneous and Inhomogeneous Model for Flow and Heat Transfer in Porous Materials as High Temperature Solar Air Receivers

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

arxiv: v1 [cond-mat.stat-mech] 1 Dec 2017

Information collection on a graph

Analysis of M/M/n/K Queue with Multiple Priorities

How to Estimate Expected Shortfall When Probabilities Are Known with Interval or Fuzzy Uncertainty

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation

Information collection on a graph

Parallel Quantum-inspired Genetic Algorithm for Combinatorial Optimization Problem

On the Relationship Between Packet Size and Router Performance for Heavy-Tailed Traffic 1

Radial Basis Function Networks: Algorithms

Model checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle]

Actual exergy intake to perform the same task

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Monopolist s mark-up and the elasticity of substitution

Convex Optimization methods for Computing Channel Capacity

Multi-Operation Multi-Machine Scheduling

A PEAK FACTOR FOR PREDICTING NON-GAUSSIAN PEAK RESULTANT RESPONSE OF WIND-EXCITED TALL BUILDINGS

Genetic Algorithms, Selection Schemes, and the Varying Eects of Noise. IlliGAL Report No November Department of General Engineering

PETER J. GRABNER AND ARNOLD KNOPFMACHER

ECE 534 Information Theory - Midterm 2

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Statics and dynamics: some elementary concepts

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

1 Probability Spaces and Random Variables

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

Analysis of some entrance probabilities for killed birth-death processes

Yang Y * and Jung I U.S. NRC Abstract

The Noise Power Ratio - Theory and ADC Testing

Session 5: Review of Classical Astrodynamics

ON GENERALIZED SARMANOV BIVARIATE DISTRIBUTIONS

Periodic scheduling 05/06/

A continuous review inventory model with the controllable production rate of the manufacturer

Brownian Motion and Random Prime Factorization

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

Additive results for the generalized Drazin inverse in a Banach algebra

Bayesian Networks Practice

Modeling Residual-Geometric Flow Sampling

MULTIVARIATE SHEWHART QUALITY CONTROL FOR STANDARD DEVIATION

MODELING AND SIMULATION OF A SATELLITE PROPULSION SUBSYSTEM BY PHYSICAL AND SIGNAL FLOWS. Leonardo Leite Oliva. Marcelo Lopes de Oliveira e Souza

Chapter 7: Special Distributions

General Linear Model Introduction, Classes of Linear models and Estimation

Applied Mathematics and Computation

On Wald-Type Optimal Stopping for Brownian Motion

Sums of independent random variables

FORMAL DEFINITION OF TOLERANCING IN CAD AND METROLOGY

An Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem

Evaluation of the critical wave groups method for calculating the probability of extreme ship responses in beam seas

arxiv:cond-mat/ v2 25 Sep 2002

Supplementary Materials for Robust Estimation of the False Discovery Rate

Finite-State Verification or Model Checking. Finite State Verification (FSV) or Model Checking

Transcription:

Understanding and Using Availability Jorge Luis Romeu, Ph.D. ASQ CQE/CRE, & Senior Member C. Stat Fellow, Royal Statistical Society Past Director, Region II (NY & PA) Director: Juarez Lincoln Marti Int l Ed. Project Email: romeu@cortland.edu Web: htt://www.linkedin.com/ub/jorge-luis-romeu/6/566/04/ Webinar: June 8, 07 J. L. Romeu - Consultant (c) 07

Webinar Take-Aways Understanding Availability from a ractical standoint Calculating different Availability ratings Practical and Economic ways of enhancing Availability J. L. Romeu - Consultant (c) 07

Summary Availability is a erformance measure concerned with assessing a maintained system or device, with resect to its ability to be used when needed. We overview how it is measured under its three different definitions, and via several methods (theoretical/ractical), using both statistical and Markov aroaches. We overview the cases where redundancy is used and where degradation is allowed. Finally, we discuss ways of imroving Availability and rovide numerical examles. J. L. Romeu - Consultant (c) 07 3

When to use Availability When system/device can fail and be reaired During maintenance, system is down After maintenance, system is again u Formal Definition: a measure of the degree to which an item is in an oerable state at any time. (Reliability Toolkit, RIAC) J. L. Romeu - Consultant (c) 07 4

System Availability A robabilistic concet based on: Two Random Variables X and Y X, System or device time between failures Y, Maintenance or reair time Long run averages of X and Y are: E(X) Mean time Between Failures (MTBF) E(Y) Exected Maintenance Time (MTTR) J. L. Romeu - Consultant (c) 07 5

Availability by Mission Tye Blanchard (Ref. ): availability may be exressed differently, deending on the system and its mission. There are three tyes of Availability: Inherent Achieved Oerational J. L. Romeu - Consultant (c) 07 6

Inherent Availability: A i * Probability that a system, when used under stated conditions, will oerate satisfactorily at any oint in time. * A i excludes reventive maintenance, logistics and administrative delays, etc. A i MTBF MTBF MTTR J. L. Romeu - Consultant (c) 0 7

Achieved Availability: A a * Probability that a system, when used under stated conditions, will oerate satisfactorily at any oint in time, when called uon. * A a includes other activities such as reventive maintenance, logistics, etc. J. L. Romeu - Consultant (c) 07 8

Oerational Availability: A o * Probability that a system, when used under stated conditions will oerate satisfactorily when called uon. * A o includes all factors that contribute to system downtime (now called Mean Down Time, MDT) for all reasons (maintenance actions and delays, access, diagnostics, active reair, suly delays, etc.). J. L. Romeu - Consultant (c) 07 9

A o Long Run average formula: Fav. Cases A o P( System. U) Tot. Cases U. Time E( X ) Cycle. Time E( X ) E( Y ) MTBF MTBF MDT J. L. Romeu - Consultant (c) 07 0

Numerical Examle Event SubEvent Time Inherent Achieved Oerational U Running 50 50 50 50 Down Wait-D 0 0 Down Diagnose 5 5 5 5 Down Wait-S 3 3 Down Wait-Adm Down Install 8 8 8 8 Down Wait-Adm 3 3 U Running 45 45 45 45 Down Preventive 7 7 7 U Running 5 5 5 5 UTime 47 47 47 47 Maintenance 3 0 38 Availability 0.988 0.880 0.7946 J. L. Romeu - Consultant (c) 07

Formal Definition of Availability Hoyland et al (Ref. ): availability at time t, denoted A(t), is the robability that the system is functioning (u and running) at time t. X(t): the state of a system at time t * u and running, [X(t) = ], * down and failed [X(t) = 0] A(t) can then be written: A(t) = P{X(t) = }; t > 0 J. L. Romeu - Consultant (c) 07

Availability as a R. V. A X X Y ; X, Y 0 The roblem of obtaining the density function of A resolved via variable transformation of the joint distribution Based on the two Random Variables X and Y time to failure X, and time to reair Y Exected and Variance of the Availability r.v. L 0 (0th Percentile of A) = P{A < 0.} = 0. First and Third Quartiles of Availability, etc. Theoretical results, aroximated by Monte Carlo J. L. Romeu - Consultant (c) 07 3

Monte Carlo Simulation Generate n = 5000 random Exonential failure and reair times: X i and Y i Obtain the corresonding Availabilities: A i = X i /( X i + Y i ); i 5000 Sort them, and calculate all the n = 5000 A i results, numerically Obtain the desired arameters from them. J. L. Romeu - Consultant (c) 07 4

Numerical Examle Use Beta distribution for exediency Ratio yielding A i is distributed Beta(μ ; μ ) Time to failure (X) mean: μ = 500 hours Time to reair (Y): μ = 30 hours Generate n = 5000 random Beta values with the above arameters μ and μ Obtain the MC Availabilities: A i J. L. Romeu - Consultant (c) 07 5

Frequency Histogram of Examle 400 300 00 00 0 0.89 0.90 0.9 0.9 0.93 0.94 0.95 0.96 0.97 0.98 Beta500-30 J. L. Romeu - Consultant (c) 07 6

Estimated Parameters of Examle MC Results for Beta(500,30) Examle: Average Availability = 0.9435 Variance of Availability = 9.9x0-5 Life L 0 = 0.9305 Quartiles: 0.9370 and 0.9505 P(A) > 0.9505 0.694 3673 P{ A 0.95} P{ A 0.95} 0.7346 5000 0.654 J. L. Romeu - Consultant (c) 07 7

Markov Model Aroach Two-state Markov Chain (Refs. 4, 5, 6, 7) Monitor status of system at time T: X(T) Denote State 0 (Down), and State (U) Let X(T) = 0: system S is down at time T Define the robability q that system S is U at time T (or, that S was Down at T) given that it was Down (or U) at time T-? J. L. Romeu - Consultant (c) 07 8

Markov Reresentation of S: 0 P{ X ( T ) X ( T ) 0} q 0 P{ X ( T ) 0 X ( T ) } 0 J. L. Romeu - Consultant (c) 07 9

Numerical Examle: System S is in state U; then moves to state Down in one ste, with Prob. 0 = = 0.00 A Geometric distribution with Mean μ=/ = 500 hours. System S is in state Down; then moves to state U in one-ste, with Probability 0 = q = 0.033 A Geometric distribution, with Mean μ=/q = 30 hours. Every ste (time eriod to transition) is an hour. The Geometric Distribution is the Discrete counterart of the Continuous Exonential J. L. Romeu - Consultant (c) 07 0

Transition Probability Matrix P States 0 States 0 0 ( q q) 0 (0.967 0.033) ( ) (0.00 0.998) Entries of Matrix P = ( ij ) corresond to the Markov Chain s one-ste transition robabilities. Rows reresent every system state that S can be in, at time T. Columns reresent every other state that S can go into, in one ste (i.e. where S will be, at time T+). J. L. Romeu - Consultant (c) 07

Obtain the robability of S moving from state U to Down, in Two Hours P q ( () 0 ( q) q) 0 q ( 00 q ) q q( 0 q q) ( q q q( ( ) q) ) ( ) q () oo () 0 () 0 () Hence, the robability S will go down in two hours is: () 0 ( q) ( ) 0.003 J. L. Romeu - Consultant (c) 07

J. L. Romeu - Consultant (c) 07 3 Other useful Markov results: If 0 () = 0.003 => () =- 0 () =A(T)=0.993 system Availability, after T= hours of oeration Prob. of moving from state to 0, in 0 stes: (P) 0 => 0 (0) = 0.07; includes that S could have gone Down or U, then restored again, several times. For sufficiently large n (long run) and two-states: ) /( ) /( ) /( ) /( } ) ( { q q q q q q q q q q q q q Limit P Limit n n n n Examle: U = q/(+q) = 0.943 ; Down = /(+q) = 0.057

Markov Model for redundant system A Redundant System is comosed of two identical devices, in arallel. The System is maintained and can function at a degraded level, with only one unit UP. The System has now three States: 0,, : State 0, the Down state; both units are DOWN State, the Degraded state; only one unit is UP State, the UP state; both units are oerating J. L. Romeu - Consultant (c) 07 4

J. L. Romeu - Consultant (c) 07 5 i j ii ij i T X i T X P T X T X P q T X T X P T X T X P q T X T X P } ) ( ) ( { } ) ( ) ( { } ) ( ) ( { } ) ( 0 ) ( { 0} ) ( ) ( { 0 0 Markov Model system reresentation 0

Oerational Conditions Every ste (hour) T is an indeendent trial Success Prob. ij corresonds to a transition from current state i into state j = 0,, Distribution of every change of state is the Geometric (Counterart of the Exonential) Mean time to accomlishing such change of state is: μ = / ij J. L. Romeu - Consultant (c) 07 6

Transition Probability Matrix P: States 0 States 0 P 0 00 0 0 0 0 0 q 0 q q 0 q As before, the robability of being in state j after n stes, given that we started in some state i of S, is obtained by raising matrix P to the ower n, and then looking at entry ij of the resulting matrix P n. J. L. Romeu - Consultant (c) 07 7

Numerical Examle Probability of either unit failing in the next hour is 0.00 Probability q of the reair crew comleting a maintenance job in the next hour is 0.033 Only one failure is allowed in each unit time eriod, and only one reair can be undertaken at any unit time J. L. Romeu - Consultant (c) 07 8

Probability that a degraded system (in State ) remains degraded after two hours of oeration: Sum robabilities corresonding to 3 events the system status has never changed. one unit reaired but another fails during nd hour remaining unit fails in the first hour (system goes down), but a reair is comleted in the nd hour J. L. Romeu - Consultant (c) 07 9

Numerical Examle: P [ P P] q ( 0.000.033 P 0.934 q) () q 0 ( 0.035) 0 0.000.033 The robability that a system, in degraded state, is still in degraded state after two hours, is: P = 0.934 J. L. Romeu - Consultant (c) 07 30

Mean time μ that the system S sends in the Degraded state System S can change to U or Down with robabilities and q, resectively S will remain in the state Degraded with robability - - q (i.e. no change) On average, S will send a sojourn of length / ( + q) = / 0.035 = 8.57 hrs in the Degraded state, before moving out. J. L. Romeu - Consultant (c) 07 3

Availability at time T A(T) = P{S is Available at T} System S is not Down at time T Then, S can be either U, or Degraded A(T) deends on the initial state of S Find Prob. S is Degraded Available at T given that S was Degraded (initially ) at T=0 ( T ) 0 A( T) ( T ) P{ X ( T) ( T ) X (0) ( T ) } ( T ) ( T ) 0 ( T ) ( T ) 0 ( T ) J. L. Romeu - Consultant (c) 07 3

State Occuancies Long run averages of system sojourns Asymtotic robabilities of system S being in each one of its ossible states at any time T Or the ercent time S sent in these states Irresective of the state S was in, initially. Results are obtained by considering Vector П of the long run robabilities: J. L. Romeu - Consultant (c) 07 33

Characteristics of Vector П Limit T (Prob{ X ( T) 0},Pr ob{ X ( T) },Pr ob{ X ( T) }) Vector П fulfills two imortant roerties: ( ) : P ;() ; with : Limit Pr ob{ X( T) i} i i T П P = П (Vector П times matrix P equals П) defines a system of linear equations, normalized by the nd roerty. P with :,, 3,, i 0 i 0 3 3 3 33 0 J. L. Romeu - Consultant (c) 07 34

Numerical Examle P 0.967 0.033,, 0.00 0.965 0.033,, 0 0 0.004 0.967 0 0.00 0 0.033 0 0.965 0.004 0.033 0.996 0 0.996 ; with : i 0 i 0 Solution of the system yields long run occuancy rates:, 0.0065,0.074,0.886 0, J. L. Romeu - Consultant (c) 07 35

Interretation of results: П = 0.886 indicates that system S is oerating at full caacity 88% of the time. П = 0.074 indicates that system S is oerating at Degraded caacity 0% of the time. П 0 : robability corresonding to State 0 (Down) is associated with S being Unavailable (= 0.0065) long run System Availability is given by: A = - П 0 = - 0.0065 = 0.9935 J. L. Romeu - Consultant (c) 07 36

Exected Times For System S to go Down, if initially S was U (denoted V ), or Degraded (V ) Or the average time System S sent in each of these states (, ) before going Down. Assume Down is an absorbing state one that, once entered, can never be left Solve a system of equations leading to all such ossible situations. J. L. Romeu - Consultant (c) 07 37

Numerical Examle: One ste, at minimum (initial visit), before system S goes Down. If S is not absorbed then, system S will move on to any of other, non-absorbing (U, Degraded) state with corresonding robability, and then the rocess restarts: V V V V V V 0.965V 0.004V 0.033V 0.996V Average times until system S goes down yield: V = 465 (if starting in state Degraded) and V = 4875 (if starting in state U). J. L. Romeu - Consultant (c) 07 38

Model Comarisons The initially non-maintained system version, would work an Exected hours in U state, before going Down (Ref. 7). The fact that maintenance is now ossible, and that S can oerate in a Degraded state: results in an increase of 3/ 3/ 0.004 / 750 0.033/ 0.00 hours in its Exected Time to go Down (from U). The imroved Exected Time is due to the Sum of the Two Exected times to failures: V = 3/λ + μ/λ = 750 + 45 = 4875 45 J. L. Romeu - Consultant (c) 07 39

Examle of Increasing Availability MTBF 85 A 0.85 85% MTBF MTTR 85 5 Assume MTTR is largely affected by delays: Waiting for a secialist mechanic Waiting for a secial sare art Assume Availability HAS to be at least 90%: We can hire additional secialists or mechanics We can increase the warehouse arts inventory Assume such would reduce MTTR to 8 units: Therefore, the New System Availability is: A = 85/(85+8) =85/93 = 0.94 ~ 9.4% J. L. Romeu - Consultant (c) 0 40

Conclusions Availability is the ratio of: U.Time to Cycle.Time Hence, we can enhance Availability by: Increasing the device or system Life (R) Decreasing/Imroving maintenance time Simultaneously, doing both above. Decreasing maintenance is usually: Easier and/or Cheaer. J. L. Romeu - Consultant (c) 07 4

Juarez Lincoln Marti Int l Ed. Project The JLM International Project develos rograms to suort Higher Education in Iberoamerica Its Web Page: htt://web.cortland.edu/matresearch/ A quick overview of the Project in PPT: htt://web.cortland.edu/romeu/juarezuscots09.df JLM Project Sonsors the Quality, Reliability and Industrial Statistics Institute Web Site: htt://ecs.syr.edu/faculty/romeu/qr&cii.htm This aer is available in the Internet: htt://src.alionscience.com/df/availstat.df

Bibliograhy. System Reliability Theory: Models and Statistical Methods. Hoyland, A. and M. Rausand. Wiley, NY. 994.. Logistics Engineering and Management. Blanchard, B. S. Prentice Hall NJ 998. 3. An Introduction to Stochastic Modeling. Taylor, H. and S. Karlin. Academic Press. NY. 993. 4. Methods for Statistical Analysis of Reliability and Life Data. Mann, N., R. Schafer and N. Singurwalla. John Wiley. NY. 974. 5. Alicability of Markov Analysis Methods to Availability, Maintainability and Safety. Fuqua, N. RAC START Volume 0, No.. htt://rac.alionscience.com/df/markov.df 6. Aendix C of the Oerational Availability Handbook (OPAH). Manary, J. RAC. 7. Understanding Series and Parallel Systems. Romeu, J. L. RAC START. Vol., No. 6. htt://src.alionscience.com/df/availstat.df J. L. Romeu - Consultant (c) 07 43

About the Author Jorge Luis Romeu is a Research Professor, Syracuse University (SU), where he teaches statistics, quality and oerations research courses. He worked as a Senior Engineer for the RIAC (Reliability Information Analysis Center). Romeu has 40 years alying statistical and oerations research methods to HW/SW reliability, quality and industrial engineering. Romeu retired Emeritus from SUNY, where he taught mathematics, statistics and comuters. He was a Fulbright Senior Secialist, at universities in Mexico (994, 000 and 003), Dominican Reublic (004), and Ecuador (006). He created and directs the Juarez-Lincoln-Marti Int l Ed. Project. Romeu is lead author of A Practical Guide to Statistical Analysis of Materials Proerty Data. He has develoed and teaches many workshos and training courses for racticing engineers and statistics faculty, and has ublished over forty articles on alied statistics and statistical education. He obtained the Saaty Award for the Best Alied Statistics Paer in American Journal of Mathematics and Management Sciences (AJMMS), in 997 and 007, and the MVEEC Award for Outstanding Professional Develoment in 00 & 0. Romeu holds a Ph.D. in Oerations Research, is a Chartered Statistician Fellow of the Royal Statistical Society, member of the American Statistical Association, and a Senior Member of the American Society for Quality. He holds ASQ certifications in Quality and Reliability, He is Past Regional Director of ASQ Region II and Vice-Chair of NEQC. For more information, visit his web site in htt://web.cortland.edu/romeu J. L. Romeu - Consultant (c) 07 44