Risk Matrix vs Risk Matrix Reloaded B. McLaughlin, ScD, MInstP Regional Managing Director ReliabilitySolutions@charter.net www.lifetime reliability.com 864.430.2695
Classical Risk Matrix The classical Risk Matrix is not really a matrix in the mathematical sense. It is just the grid on a log log plot in which each rectangle has a label or color indicating the level of risk associated with that rectangle. The X axis represents consequence (cost/failure) and the Y axis represents frequency (failures/time). The signature color pattern on the classical Risk Matrix straight lines of constant risk proceeding from left to right downward at 45 degrees is merely a consequence of the definition of risk.
Classical Risk Matrix (cont.) (risk)=(consequence)(failure frequency) or (cost/time)=(cost/failure)(failure/time) or r = cf log r = logc + log Taking the log of both sides yields which is of the form b = x + y where b = log r = constant. This is the equation of a straight line having a slope of 1 f and intercepts of b = log r on both axes. Applications of the Classical Risk Matrix are illustrated in subsequent slides.
Classical Risk Matrix Applications The classical Risk Matrix is a useful tool to: (1) estimate the magnitude of a particular risk, (2) identify the consequences of asset failure, (3) match maintenance strategy to risk level, (4) identify risks you will not carry, (5) track the effectiveness of asset risk reduction steps, (6) determine the relative impact of multiple choices.
Happens Regularly Unlikely but Possible Events Failure Frequency Events (per Yr) Mean Time Between Event (Yr) 1,000 10 3 100 10 2 10 10 1 1 10 0 10 10-1 100 10-2 1,000 10-3 Operations Maint Cost Health & Safety Environment/ Quality/ Reputation Frequency Factors 7 6 5 4 3 2 1 Trivial injury/ impact 1 2 3 4 Rating with the Risk Triangle 5 6 7 Consequences Factors Minor injury/ impact Minor effect Moderate injury/ impact Moderate effect Significant injury/ impact Significant effect Serious injury/ impact Serious effect Thanks to Howard Witt: for the slide Risk/Criticality Factors $100 $1k $10k $100k $1M $10M $100M Negligible effect Add Consequence Factors to Frequency Factors to get Single Fatality Extreme Event Fractional values can be used. Remember log scale. e.g. 4.5 = 3 events/yr not 5 Multiple Fatalities Major Disaster
Identify Consequences of Asset Failure on a Calibrated Risk Matrix Likelihood/Frequency of Equipment Failure Event per Year DAFT Cost per Event $30 $100 $300 $1,000 $3,000 $10,000 $30,000 $100,000 $300,000 $1,000,000 $3,000,000 $10,000,000 $30,000,000 $100,000,000 $300,000,000 $1,000,000,000 Comments Count per Year Time Scale Descriptor Scale C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 100 Twice per week L13 H H H E E E E E E E E E E E E E 30 Once per fortnight L12 M M M M H E E E E E E E E E E E 10 Once per month Certain L11 L L L L M H E E E E E E E E E E 0.3 Once per quarter L10 L M H E E E E E E E E E 1 Once per year Almost Certain L9 L M H E E E E E E E E Event will occur on an annual basis 0.3 0.1 0.03 0.01 0.003 0.001 0.0003 Once every 3 years Once per 10 years Once per 30 years Once per 100 years Once every 300 years Once every 1,000 years Once every 3,000 years Likely L8 L M H E E E E E E E Even has occurred several times or more in a lifetime career Possible L7 L M H E E E E E E Event might occur once in a lifetime career Unlikely L6 L M H E E E E E Event does occur somewhere from time to time Rare L5 L M H E E E E Heard of something like it occurring elsewhere L4 L M H E E E Very Rare L3 L M H E E Never heard of this happening L2 L M H E 0.0001 Once every 10,000 years Almost Incredible L1 L M H Theoretically possible but not expected to occur Note: Risk Level 1) Risk Boundary is adjustable and selected to be at 'LOW' Level. Recalibrate the risk matrix to a company s risk boundaries by re-colouring the cells to suit. Red = Extreme 2) Based on HB436:2004-Risk Management Amber = High 3) Identify 'Black Swan' events as B-S (A 'Black Swan' event is one that people say 'will not happen' because it has not yet happened) Yellow = Medium Green = Low Blue = Accepted
Maintenance Strategies Matched to Risk Levels Consequence Insignificant Minor Moderate Major Catastrophic Frequency 1 2 3 4 5 6 Certain PM / Precision CM / Precision Precision / Design-out Design-out Design-out 5 Likely PM / Precision CM / Precision Precision / Design-out Precision / Design-out Design-out 4 Possible PM / Precision PM / Precision CM / Precision Precision / Design-out Precision / Design-out 3 Unlikely BD PM / Precision CM / Precision CM / Precision Precision / Design-out 2 Rare BD PM / Precision PM / Precision CM / Precision CM / Precision 1 Very Rare BD PM / Precision PM / Precision CM / Precision CM / Precision
Identify What Risks You WILL NOT Carry Reduce Consequence Reduce Chance This table is the basic approach to identify the extent of risk. There is full mathematical modelling as well, but this basic method is a fine start. The layout is universal. You change consequence descriptions to what you are willing to accept, and the costs to DAFT Costs you are willing to pay.
EWW SOLUTION: Tracking Risk Matrix Used to Prove Asset Operating Risk Reduction Likelihood of Equipment Failure Event per Year Event Count / Year 100 30 10 3 Time Scale Twice per week Once per fortnight Once per month Once per quarter 1 Once per year 0.3 0.1 0.03 0.01 0.003 0.001 0.0003 0.0001 Once every 3 years Once per 10 years Once per 30 years Once per 100 years Once every 300 years Once every 1,000 years Once every 3,000 years Once every 10,000 years Descriptor Scale DAFT Cost per Event $30 $100 $300 $1,000 $3,000 $10,000 Historic Description 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 $30,000 $100,000 2 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 11 1.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 10.5 Certain 1 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 Almost Certain Likely Possible Unlikely Rare Very Rare Almost Incredible Event will occur on an annual basis Event has occurred several times or more in a lifetime career Event might occur once in a lifetime career Event does occur somewhere from time to time Heard of something like it occurring elsewhere Never heard of this happening Theoretically possible but not expected to occur 0.5 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 0 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 0.5 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 1 3.5 4 4.5 5 5.5 CM 6 6.5 7 7.5 8 1.5 3.5 4 4.5 5 5.5 6 6.5 7 7.5 2 3.5 4 4.5 5 5.5 6 6.5 7 2.5 3.5 4 4.5 5 5.5 6 6.5 3 3.5 4 4.5 5 5.5 6 3.5 3.5 4 4.5 5 5.5 4 3.5 4 4.5 5 Note: Risk Level 1) Risk Boundary 'LOW' Level is set at total of $10,000/year Red = Extreme 2) Based on HB436:2004 Risk Management Amber = High 3) Identify 'Black Swan' events as B S (A 'Black Swan' event is one that people say 'will not happen' because it has not yet happened) Yellow = Medium 4) DAFT Cost (Defect and Failure True Cost) is the total business wide cost from the event Green = Low Blue = Accepted $300,000 $1,000,000 $3,000,000 PM $10,000,000 CM oil condition analysis CM cable thermographs PM oil filtration PM oil change PM oil leaks from TX PM water ingress paths PM oil breather contamination PM cable connections $30,000,000 $100,000,000 $300,000,000 $1,000,000,000
Use a Risk Matrix to Show Impact of Choices Nothing is Certain with Risk; It Changes Unless it is Controlled Risk = Consequence x [Opportunity x Unreliability]
Classical Risk Matrix in 3D A three dimensional version of the classical Risk Matrix is also possible by expressing failure frequency as the product of (opportunity/time) and (unreliability). This means f = qf and r = cqf or (cost/time) = (cost/failure)(opportunity/time)(failures/opportunity). It follows that log r = log c + log q + log F which is of the form b = x + y + z where b = log r = constant. This is the equation of a plane having intercepts of b = log r on all three axes. Instead of lines of constant risk, the 3D risk matrix would have planes of constant 0 F 1 risk. However, by definition because unreliability is a probability. Therefore, the log F axis is constrained to negative values and cannot have a positive intercept. One way to avoid this problem would be to select units of cost and time to constrain r, c, and q to the interval [0, 1].
Risk Matrix Reloaded
The Matrix Management of Risk Abatement utilizes a new kind of Risk Matrix called the Project Management Progress Matrix (PMPM) or Risk Matrix Reloaded.
Practitioners of process improvement, asset management, quality control, risk abatement and reliability optimization frequently combine multiple analytical procedures because engineers and managers have not embraced a single procedure to cover all issues. The most prominent example is the combination of Lean and Six Sigma. Several other analytical procedures (TPM, P M Analysis, TOC, TQM, ACE 3T QMS, EWW) and a variety of maintenance methodologies are also combined in various ways. But how can different analytical procedures be systematically integrated to achieve balanced progress toward a common goal? One way is to use a portion of the Enterprise Wellness Way methodology as a vehicle for matrix management.
The matrix integration process starts with the overall Value Stream Map and drills down, with increasing magnification, through successive levels (map, individual map activities, resources within each activity and components within each resource) for a coordinated four level attack on risk. Some enterprises may require more than four levels of drill down. The concept remains the same. Appropriate analytical procedures are selected for each level. This selection is generally unconstrained and several procedures can be combined at a given level. However, the procedures should have a common objective such as failure elimination as opposed to achieving acceptable failure rates. Otherwise a disunity of purpose will confuse the integration process. Example procedures for each level are shown in Table 1:
Table 1. Example Analytical Procedures Appropriate for the Four Levels Level Map (overall enterprise) Activity (manufacturing step, medical activity step, service activity step) Resource (tool, instrument, equipment, operator, work product) Component (individual part in resource) Analytical Procedure Lean Optimization P M Analysis ACE 3T QMS Optimal Reliability
Without integrated implementation, these analytical procedures may: (1) bottleneck at difficult internal stages, (2) lack coordinated risk abatement between levels and (3) fail to balance progress rate. However, systematic integration may be achieved by conducting each procedure against the backdrop of IONICS, a 6 step methodology facilitating matrix management and defined by: Identify risks, Order by importance, Numerate options, Introduce solutions, Control processes and Synthesize new ideas. The scope, for each step of IONICS, is broadly defined by a custom logic tree created to extract risk abatement information from analytical procedures. Integrated implementation of analytical procedures is achieved by intra level and inter level coordination as well as statistical control of the implementation process.
Conceptual Basis This section defines the basis for intra level and inter level coordination and statistical control of the implementation process. The cornerstone matrix is the risk matrix given by denotes the total number of Identified risks at the resource level. is the total number of Introduced solutions at the activity level. is the total number of Controlled processes at the map level.
Intra Level Coordination Ideally, for each level. Equal values for all six in a given row is the target for intra level coordination. However, the actual progression of accomplishment is from left to right in a given row so usually for. In some organizations dependent on DMAIC and rigidly constrained by a Change Control Board, a solution may not be introduced for months after the corresponding risk has been identified. This can put several months of defects into the product stream. If all six in any given row had the same value, then all six columns would be identical and the determinant of all 4 by 4 matrices formed from any four columns would be zero. In fact, the determinant of a single 4 by 4 matrix would be zero if any two columns (i.e. Identify risks and Order by importance) were either identical or directly proportional to each other. This suggests the possibility of using various determinants constructed from the risk matrix as metrics for measuring the degree of intra level coordination.
Inter Level Coordination At each step, risk abatement accomplishments should be coordinated between the = 6 pairs of constructed from the four levels. Unfortunately, this may be overlooked when different engineers are dedicated to each level or otherwise responsible for different portions of the overall process. The same basic reliability issues will frequently surface at multiple levels. For example, unreliable parts at the component level correspond to unreliable equipment at the resource level. Failure to identify certain overlaps means that risks, at some level, are being overlooked.
Balanced Progress Rate The overall risk abatement process could take from six to twelve months. During that time, each level should move forward at a rate proportional to the number of risks identified in that level. Progress rate correlation can be measured by determining the Pearson s Correlation coefficient for each of the = 6 pairs of levels (rows) in the rate matrix formed by dividing each element of by the elapsed time t. These correlation coefficients will be the same as those between levels of the matrix itself since correlation is not effected by dividing each element by the same number.
Pearson s Correlation Coefficient The Pearson s Correlation Coefficient is a measure of the linear dependence between two variables. A value of 1 or +1 indicates a perfect correlation; an X,Y plot would be a straight line. A value of zero indicates no correlation. From the sampling theory of correlation, the statistic given by t t = c ( n 2) ( 2 1 c ) has a Student s distribution with n 2 degrees of freedom where represents the correlation coefficient and n=6 which is the number of pairs in the analysis. t = 2.13 at the 5% level of significance and the corresponding c value is 0.729. Therefore, if the correlation coefficient exceeds 0.729, we can, with a 5% significance level, reject the hypothesis that the two levels are uncorrelated. c
Qualitative Assessments of Progress Rate Another characteristic is the rank of the rate matrix. If the rows are linearly independent, the rank will be 4. If any row is a linear combination of the other three, the rank will be 3, and so forth. The correlation coefficients themselves can be collected in a symmetric 4 by 4 correlation matrix C c = 32 c23, for example, is the Pearson s Correlation coefficient between level 3 and level 2 of the rate matrix. For a perfect correlation between all pairs of levels, each of the correlation coefficients would be 1 and the Frobenius Norm or square root of the sum of the squares of C matrix elements would be 4. The actual Frobenius Norm of the elements can be compared to 4 for a qualitative estimate of balanced progress rate. Other qualitative methods are also available such as Singular Value Decomposition.
Conclusions Managers and engineers are unlikely to embrace only one or two analytical procedures for process improvement and risk abatement in the foreseeable future. This means a variety of procedures must be systematically integrated to achieve balanced progress toward a common goal. One such integration technique is has been outlined in this presentation.