Section II: Assessing Chart Performance. (Jim Benneyan)

Section II: Assessing Chart Performance (Jim Benneyan) 1

Learning Objectives Understand concepts of chart performance Two types of errors o Type 1: Call an in-control process out-of-control o Type 2: Call an unstable process stable Calculation and interpretation 2 o Probability point outside limits o Operation Characteristic (OC) Curves o Time until detection o Average run length (ARL) curves

What is the objective? 1. Do we want to Improve, detect changes (good & bad) Not 2. Do we want to do this Fast 3. Do we want to Slow Rarely make wrong conclusion Err a lot

Most important charts in SPC?

Testing over time Test 1 Test 3 Long term median Test 2 Where, if any, true improvements? Unlikely to detect improvement immediately: 1. RL is a statistical fact often takes a while 2. Intervention tuning and effect lags

Important process changes? Mean increases or decreases Percentage rate changes Counts rate changes Variance changes Concept sketch Are small or large changes more important to detect faster?

Average Run Length Control chart performance Sensitivity-Specificity Concept Interpretation. Poor design Specificity vs sensitivity (type 1 vs type 2 error) Narrow tall curve = Better! Good design True any statistical method Faster detection and / or with higher probability Out of control In-control Out of control ARL = Average run length Difference (out control) No difference (in control) Difference (out control) = Mean time until a signal of change

Probability of Signall In-control VAP rate Interpretation which better? 1.0 OC Curve Comparison 0.8 n = 250 0.6 0.4 0.2 rule 2 rule 1 g-ma-min 0.0.0000.0005.0010.0015.0020.0025.0030.0035.0040.0045.0050 VAP Rate

True State of Process Out of Control In control Types of Errors Decision from the data point In control Out of Control Good Type I Error α Risk Type II Error β Risk Good

Example: Medical Records Audit We have been auditing records for a particular type of error (defect) for the past 24 months. We select 200 records at random each month and review them for the defect. The following control chart shows our observations

Proportion Control chart for Defect Rate P Chart of Errors 0.07 UCL=0.06914 0.06 0.05 0.04 0.03 _ P=0.03188 0.02 0.01 0.00 LCL=0 1 2 3 4 5 6 7 8 9 10 11 12 13 Month 14 15 16 17 18 19 20 21 22 23 24

Type 1 Error - Concept Probability call in-control process out-of-control If process unchanged (rate =.03188; 3.18 defects per 100 records) Probability point outside limits =? Probability of exceeding 3σ (for normal distribution) is.00135. Probability outside either limit = P(X < LCL) + P(X > UCL) =.00135 +.00135 =.0027 (Note for later: 1/.0027 370.37) This seems pretty good balance

Type 2 Error - Concept Probability call out-of-control process in-control Suppose Process has changed (to what? Matters!) Probability point outside limits =? Call the amount of the change and p a the new defect rate Suppose defect rate increased to = p a = p + = 0.05 Then Probability outside either limit = P(X < LCL) + P(X > UCL) =? = 1 - below

Calculation for p chart type 1 Upper 3 σ Control Limit is calculated as follows: UCL = + 3* If P = 0.03188 then the UCL is.06914 Similarly UCL for np chart (n = 200) is 13.82 P (X > UCL) = P(X > 13.82) =.0053 P(X < LCL) = P(X < 0) = 0 P( false alarm ) =.0053 + 0 =.0053

P(X<=13) What if process has changed (type 2)? Much the same - Calculate probability point outside limit But use new defect rate (p a = p + ) 0.031875 New Defect rate P(X 13) 0.032 0.9947 0.035 0.9887 0.040 0.9688 0.045 0.9308 0.050 0.8701 0.055 0.7866 0.060 0.6849 0.065 0.5731 0.070 0.4606 0.075 0.3557 0.080 0.2643 0.085 0.1892 0.090 0.1308 0.095 0.0874 0.100 0.0566 0.105 0.0355 Plot 1.0 0.8 0.6 0.4 0.2 0.0 0.00 0.01 0.02 OOC defect rates Operating Characteristic Curve (OCC) In control defect rate 0.03 0.04 Process unchanged 0.05 0.06 0.07 True P Is this good? 0.08 0.09 OOC defect rates 0.10 0.11 0.12

P(X<=13) Ideal OC Curve 1.0 Operating Characteristic Curve (OCC) Perfect detection 0.8 0.6 Curve n=200 Never false alarm 0.4 0.2 0.0 0.00 0.01 Ideal Curve 0.02 0.03 0.04 0.05 0.06 True P 0.07 0.08 0.09 0.10 0.11 0.12 Not possible! But steeper and taller is better.

Average Run Length (ARL) More practical meaning Remember we are monitoring process over time more than just this one point many points! If process shifts to a new level (P a ), it may not be detected on the very next point. (usually not!) How many points (run length) until shift is detected? This is the Average Run Length (ARL). Mean time until detection key concept!

ARL Calculation Call: = P(point outside limits) ARL = 1/. This is 1 minus value of OC curve vertical axis Easy to translate OC curve into ARL curve just take inverse, ARL = 1/(1-vertical value) Theory: o Each plotted point is a Bernoulli( ) trial with parameter o RL = number of points (BT s) until first one outside limits o Therefore RL ~ geometric( ) random variable o Expected value of geometric RV = 1/

Average Run Length For above example New Defect rate P(X UCL) ARL 0.01 0.999999979 48140529 0.015 0.999997455 392992 0.02 0.999940032 16676 0.025 0.999426674 1744 0.03 0.996898361 322 0.032 0.9947 190 0.035 0.9887 88.15 0.040 0.9688 32.04 0.045 0.9308 14.44 0.050 0.8701 7.70 0.055 0.7866 4.69 0.060 0.6849 3.17 0.065 0.5731 2.34 0.070 0.4606 1.85 0.075 0.3557 1.55 0.080 0.2643 1.36 0.085 0.1892 1.23 0.090 0.1308 1.15 0.095 0.0874 1.10 0.100 0.0566 1.06 0.105 0.0355 1.04 0.110 0.0217 1.02 100 90 80 70 60 50 40 30 20 10 0 0.00 0.01 Average Run Length vs True P (% with defect) OK to never detect? 0.02 0.03 0.04 Some detection delays ok 0.05 0.06 True P 0.07 0.08 Critical detection region 0.09 0.10 0.11 0.12

Your turn 1. For your earlier control chart, suppose the process remains the same. a. Compute the type 1 error: b. Compute the ARL:

Your turn 2. Suppose the process defect rate increases by 25% of its current p-bar value. For this one value of p a, a. Compute the type 2 error: b. Compute the ARL:

Section III: Improving performance (Victoria Jordan) 23

Learning Objectives Sample size selection, Simple rules Supplementary rules pros and cons EWMA and CUSUM charts - advantages and disadvantages 24

Improving SPC Performance How do we improve the power of the chart? 25

4 options (improving performance) X 1. Add within limit run rules. o Use caution. Lots of bad advice 2. Increase sample size o See following guidelines and reference 3. Use more powerful chart o EWMA, CUSUM, others 4. Use rare event charts 5. Other tricks, advanced topics

Option 1. Unnatural variation run rules 1. Any point outside either control limit 2. 8 consecutive points on same side of center line (CL) 3. 4 of 5 consecutive pts outside CL ± 1 SD (same side) 4. 13 Unlikely consecutive to occur values by chance within alone CL ± if 1 the SD process has not improved (or worsened) 5. 2 of 3 consecutive pts outside of CL ± 2 SD (same side) 6. 12 Likelihood of 14 consecutive of 8 heads points coin on tosses same in side a row of CL.004 7. 8 consecutive points exhibiting either an increasing or decreasing trend Black: Recommended So: If they occur, implies the process has changed 8. Cyclic or periodic behavior Note: Slight variations exist in different publications White: Variation reduction Grey: Inflates false signal rate (not recommended)

Run Rule Comments 1. Often recommended. 2. User beware also changes the Type I risk. 3. Assuming run rules are independent of each other (and this is not the case): If each rule is set at the equivalent of 3σ (Probability of Type I error is 0.0013), The with 8 run rules in play, the combined Probability of Type I is 1-(1-α) 8 = 1-(1-0.0013) 8 = 0.0104. This is about 2.3 σ instead of 3σ. Total rules ~5% false alarm rate (ARL 20 370!!)

Run Rule Impact p charts Better sensitivity But higher false alarm rate Buyer beware!

Option 2: Improving sample sizes Different rules of thumb Rule 1: np and p charts: n p / 5 (p = defect rate) c and u charts: n / 5 ( = average) xbar charts: (see paper)

Rule 1. Reasonable bell-shape Binomial charts Poisson charts (CL 5) 33

Improving the Power (Ability to Detect a Process Change) For Attribute Charts (np, p, c, u): Add run rules. But, this will increase the Type I risk (false alarms) Increase sample size. General rule to use Normal approximation (Shewhart charts) is: np 5 for p-charts c 10 for u-charts May need to increase from there

Jan2006 Feb2006 Mar2006 Apr2006 May2006 Jun2006 Jul2006 Aug2006 Sep2006 Oct2006 Nov2006 Dec2006 Jan2007 Feb2007 Mar2007 Apr2007 May2007 Jun2007 Jul2007 Aug2007 Sep2007 Oct2007 Nov2007 Dec2007 Jan2008 Feb2008 Mar2008 Apr2008 May2008 Jun2008 Jul2008 Aug2008 Rate Drug Events 5.0? no data? u-chart: Events per 1000 Cases 4.5 4.0 3.5 UCL=3.83708 3.0 2.5 2.0 1.5 1.0 CL=0.96397 0.5 0.0 LCL=0.00000 Month

Effect of Sample Size on Performance Note that as sample size increases, the curve moves closer to the Ideal state larger samples are better

Sample Size Calculation - Example If SCIP bundle compliance averages 50% and is being tracked weekly on a p-chart, what sample size is suggested for the chart? What if compliance improves to 95%? Will that be enough to ensure a powerful chart?

Rule 2 and 3 2. Not a lot of zeros (P(X=0) >.05) p and np charts: c and u charts: 3. LCL > 0 p and np charts: 40 c and u charts:

Results for p charts

Results for u charts

Sample size look-up table See reference paper

Improving the Power (Ability to Detect a Process Change) For Variables Charts (X, R; X, s; X, MR): Add run rules. But, this will increase the Type I risk (false alarms) Increase sample size. Use X,R or X,s instead of X, MR (not always possible in healthcare must have rational subgroups) Increase the frequency of sampling (daily instead of weekly)

Similar Effect of Sample Size on X-bar charts

Option 3: More advanced SPC charts Exponentially weighted moving average (EWMA) Cumulative sum (CUSUM) Both combine past data, essentially forming large samples Advantages: More power (for small shifts) since Shewhart charts are weak here and past data amplifies small signals in the data Filter out noise to help see trends Disadvantages: Less power (for large shifts) since past data dilutes recent large signals in the data Can smooth out important noise

What they look like Opioid abuse rates Healthcare Systems Engineering www.coe.neu.edu/healthcare Northeastern University 2010

Comparison Shewhart p chart (Opioid abuse rates) EWMA p chart (same data) Cusum p chart (same data, more later) Benneyan J, Butler S, Villapiano A, Katz N, Duffy M, Budman S (2011), A Statistical Process Control Approach to Prescription Medication and Opioid Abuse Surveillance, Journal of Addiction Medicine, 5(2): 99-109 Healthcare Systems Engineering www.coe.neu.edu/healthcare Northeastern University 2010

-0.9-0.8-0.7-0.6-0.5-0.4-0.3-0.2-0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Performance 1 0.9 0.8 Shewhart EWMA s 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Shift in p

EWMA charts how they are computed (see flip chart)

Exercise Modify your earlier Shewhart p chart in Excel to be a EWMA p chart

CUSUM charts what they look like The mean shifted at point 50 from 100 to 105 53

CUSUM charts concept Useful in detecting small shifts in the mean. The idea is to look at difference between target and actual observations. In stable process, these differences should be randomly distributed about zero. If we add them up as we go, the sum should also fluctuate around zero. If the mean shifts, the sum will begin to get larger in magnitude and signal a change. 54 Math: beyond scope & time. See references

Univariate SPC Recap

Section V: Measurement Systems (J.B. and V.J.) 56

Table Exercise At your table Write on piece of paper the information from the following slide. Do not compare with your neighbor (yet) 57

Day 1 Count the number of F s Finished files are the result of years of scientific study combined with the experience of years 58

Day 2 Count the number of F s It is easy to miss the finer points in life. Folks are frequently guilty of falling into this trap 59

Day 3 Count the number of F s The necessity of training hands for firstclass farms in the fatherly handling of friendly farm livestock is foremost in the minds of farm owners. The forefathers of the farm owners trained the farm hands for the first-class farms in the 60 fatherly handling of livestock.

Day 4 Count the number of F s The owners of the farms feel they should carry on with the family tradition of training farm hands in the fatherly handling of farm stock because they feel it is the basis of good future farming. 61

Ferquency Sample 1 results Suppose F = Patient fall Average = St dev = 62 6 Number of Falls per Period

Ferquency Sample 2 results Average = St dev = 63 5 Number of Falls per Period

Ferquency Sample 3 results Average = St dev = 64 30 Number of Falls per Period

Ferquency Sample 4 results Average = St dev = 65 12 Number of Falls per Period

Summary Sample n X-bar s 2 R -hat 1 n 1 = x 1 = s 2 1 = R 1 = ^ 1 = 2 n 2 = x 2 = s 2 2 = R 2 = ^ 2 = 3 n 3 = x 3 = s 2 3 = R 3 = ^ 3 = 4 n 4 = x 4 = s 2 4 = R 4 = ^ 4 = Average x = s 2 = ^ -bar= 66

Some comments Some easier to see or miss than others Human factors rules the day No process variation within an item (exact) All within-item variation is 100% due to measurement system error True number each day =? True process variation day-to-day =? 67

68 A1c Example

Table Discussions Discuss 1. Where these concepts apply in your work environments 2. Strategies for reducing measurement error 3. Strategies for dealing with existence of measurement error (if can t be eliminated) 69

Assessing and Considering Measurement Error When the instrument reads sugar level of blood, is it correct? Is hand hygiene compliance correct? When someone reviews a medical record and determines the information to be coded, is it correct? What do we mean by correct?

Measurement Error - Definitions True Value - the value that IS correct (but is unknown) Resolution - the smallest increment that is measured. Bias average distance between observed value and the true value Accuracy Measure of bias; Is the instrument measuring close to the true value on average? Precision Variability in the measurement instrument

Two Types of Measurement Error Accuracy bias Precision variability

Example Is the measurement system accurate or precise if the true value = 120.0 and repeated measurements of the same sample are: 1. 100.0, 140.5, 130.0, 109.5, 2. 140.2, 139.9, 140.1, 139.8, 3. 120.0, 120.1, 119.9, 120.0,

Quantifying Measurement Error Observed Value = True Value + Measurement Error Var(OV) = Var(TV + E) 2 = P 2 + E 2 74 How to estimate E 2?

Two Components of Precision Variability due to Repeatability, RPT Differences in measurements on one sample for the same measurer (technician, nurse, instrument) over and over. Variability due to Reproducibility, RPD Differences in measurements on one sample between measurers.

Quantifying Measurement Error E 2 = RPT 2 + RPD 2 76

Estimate of Measurement Error Sample n X-bar s 2 R -hat 1 n 1 = x 1 = s 2 1 = R 1 = ^ 1 = 2 n 2 = x 2 = s 2 2 = R 2 = ^ 2 = 3 n 3 = x 3 = s 2 3 = R 3 = ^ 3 = 4 n 4 = x 4 = s 2 4 = R 4 = ^ 4 = Average x = s 2 = ^ -bar= 77

Four Types of Gage Studies Gage R&R (Tabular Form) Short Term (X-bar, R Chart) Long Term (X-bar, s chart) ANOVA

Measurement Error Present in all processes Adds to the total variability measured Can often be quantified If larger, can lead to wrong decisions 80