SESUG 203 Paper SD-07 Maximizig Cofidece ad Coverage for a Noparametric Upper Tolerace Limit for a Fixed Number of Samples Deis J. Beal, Sciece Applicatios Iteratioal Corporatio, Oak Ridge, Teessee ABSTRACT A oparametric upper tolerace limit (UTL) bouds a specified percetage of the populatio distributio with specified cofidece. The most commo UTL is based o the largest order statistic (the maximum) where the umber of samples required for a give cofidece ad coverage is easily derived for a ifiitely large populatio. This relatioship ca be used to determie the umber of samples prior to samplig to achieve a give cofidece ad coverage. However, ofte statisticias are give a data set ad asked to calculate a UTL for the umber of samples provided. Sice the umber of samples usually caot be icreased to icrease cofidece or coverage for the UTL, the maximum cofidece ad coverage for the give umber of samples is desired. This paper derives the maximum cofidece ad coverage for a fixed umber of samples. This relatioship is demostrated both graphically ad i tabular form. The maximum cofidece ad coverage are calculated for several sample sizes usig results from the maximizatio. This paper is for itermediate SAS users of Base SAS who uderstad statistical itervals. Key words: upper tolerace limit, order statistics, sample size, cofidece, coverage, maximizatio INTRODUCTION A oe-sided distributio-free (oparametric) upper tolerace limit (UTL) is equivalet to a oe-sided distributio-free cofidece boud for a percetile of that populatio. No distributioal assumptios are ecessary such as ormality, logormality, gamma or ay other cotiuous distributio. However, the oparametric UTL does assume the data collected are radomly selected from a ifiitely large populatio, are statistically idepedet samples, ad are statistically represetative of the populatio. UTLs have both a cofidece ad coverage attributio. The coverage of a UTL is the percetage p (0 < p < ) of the populatio distributio that is bouded by the order statistic from the sample. The cofidece of a UTL is how cofidet oe is that the specified order statistic bouds the percetile of the populatio distributio ad is deoted 00x( - α)% where α is the Type I error rate (0 < α < ). A Type I error (α ) is the probability of rejectig the ull hypothesis whe i fact the ull hypothesis is true. Oce the cofidece, coverage ad desired order statistic are specified, the miimum umber of samples () ecessary to achieve these parameters ca be calculated (Beal 202). The SAS code uses the SAS System for persoal computers versio 9.3 ruig o Widows 7. THEORY OF ORDER STATISTICS A oe-sided oparametric UTL assumig a ifiitely large populatio that relates cofidece ( - α), coverage (p), ad the umber of samples () is show i Equatio () (Hah ad Meeker, 99). p = α () For a fixed sample size, the objective fuctio to maximize is the sum of cofidece ad coverage, as show i Equatio (2). f α) = α + p ( (2) Substitutig Equatio () ito Equatio (2) yields Equatio (3). = α (3) f ( α ) + α To maximize Equatio (3), we take the first derivative of f(α) ad set it equal to 0, as show i Equatio (4).
f ( ) = + α = 0 SESUG 203 α (4) Solvig Equatio (4) for α yields Equatio () for >. = α () α from Equatio () maximizes Equatio (3) as show by Equatio (6) for >. f ( ) α ) = ( 2 2 < 0 α (6) Therefore, the maximum cofidece term from Equatio (3) is show i Equatio (7). = α (7) The maximum coverage term from Equatio (3) is show i Equatio (8). p = = α (8) GRAPHS OF THE OBJECTIVE FUNCTION Figure shows lie plots of the fuctio from Equatio (3) to be maximized o the vertical axis with α o the horizotal axis. The fuctio of cofidece plus coverage is show for selected umber of samples () from to 00 where the fuctio is maximized. The top figure of Figure shows the complete fuctio for all α (0 < α < ). The bottom figure shows the same curves, but magifies the plot for oly smaller values of α (0 < α < 0.20) sice the fuctios are maximized withi this rage. The vertical lies show where the fuctio is maximized for each. Figure shows as icreases the optimal α decreases. As α decreases, cofidece icreases ad coverage decreases for ay. Ay combiatio of cofidece ad coverage alog each lie plot may be selected for each. For example, for = 0 oe could choose 99% cofidece (α = 0.0) with approximately 63% coverage. This would result i oly 99% cofidece + 63% coverage = 62% combied cofidece ad coverage. Selectig the optimal α = 0.0774 yields approximately 92.26% cofidece ad 77.43% coverage for a total of 69.7%. Table shows the maximized cofidece, maximized coverage ad optimal α for various usig Equatios (), (7) ad (8). 2
SESUG 203 Cofidece + Coverage 2.0.9.8.7.6..4.3 0 20 30 0 00.2..0 0.0 0. 0.2 0.3 0.4 0. 0.6 0.7 0.8 0.9.0 Alpha 2.0.9.8 Cofidece + Coverage.7.6..4.3.2..0 0 20 30 0 00 0.00 0.02 0.04 0.06 0.08 0.0 0.2 0.4 0.6 0.8 0.20 Alpha Figure. Lie plots of cofidece plus coverage objective fuctio for =, 0, 20, 30, 0, 00 3
SESUG 203 Table. Optimal cofidece ad coverage for selected sample sizes Sample Optimal Optimal Optimal Cofidece Size () α Cofidece (%) Coverage (%) + Coverage (%) 2 0.20 7.000 0.000 2.00 3 0.92 80.7 7.73 38.49 4 0.7 84.2 62.996 47.2 0.34 86.62 66.874 3.0 6 0.6 88.33 69.883 8.24 7 0.03 89.67 72.302 6.97 8 0.093 90.73 74.300 6.0 9 0.084 9.7 7.984 67.4 0 0.077 92.27 77.426 69.68 0.072 92.847 78.679 7.3 2 0.066 93.32 79.780 73.3 3 0.062 93.788 80.7 74.4 4 0.08 94.69 8.627 7.80 0.0 94.06 82.43 76.92 6 0.02 94.80 83.24 77.93 7 0.049 9.072 83.772 78.84 8 0.047 9.33 84.36 79.68 9 0.04 9.3 84.90 80.44 20 0.043 9.729 8.43 8.4 2 0.04 9.9 8.879 8.79 22 0.039 96.077 86.33 82.39 23 0.038 96.230 86.77 82.9 24 0.036 96.37 87.09 83.47 2 0.03 96.02 87.449 83.9 26 0.034 96.624 87.78 84.40 27 0.033 96.737 88.094 84.83 28 0.032 96.843 88.390 8.23 29 0.03 96.942 88.669 8.6 30 0.030 97.036 88.933 8.97 40 0.023 97.726 90.97 88.70 0 0.08 98.3 92.327 90.48 7 0.03 98.742 94.332 93.07 00 0.00 99.04 9.4 94.0 SAS CODE FOR GENERATING GRAPHS The SAS code that calculates the curves for the cofidece plus coverage objective fuctio is show below. data a; do =, 0, 20, 30, 0, 00; alpha_opt = **(/(-)); ** equatio ; cof_opt = - alpha_opt; ** equatio 7; p_opt = **(/(-)); ** equatio 8; do alpha = 0.00 to 0.999 by 0.00; f = - alpha + alpha**(/); ** equatio 3; output; ed; ed; output; ru; 4
SESUG 203 CONCLUSION For a give data set with fixed umber of samples, the cofidece ad coverage ca be selected for a oparametric UTL o the maximum result, assumig a ifiitely large populatio from which the represetative samples are draw. However, for small samples there is isufficiet data to achieve both high cofidece ad high coverage. A icrease i cofidece will cause a decrease i coverage, while a icrease i coverage will cause a decrease i cofidece. This paper derives the equatios to calculate the maximum cofidece ad coverage for ay >. This relatioship is demostrated both graphically ad i tabular form for various values of. These results allow the data aalyst to obtai the maximum cofidece ad coverage for a oparametric UTL from ay data set. REFERENCES Beal, Deis J. 202. Sample Size Determiatio for a Noparametric Upper Tolerace Limit for ay Order Statistic, Proceedigs of the 20 th Aual Coferece of the SouthEast SAS Users Group. Hah, G. ad W. Meeker. 99. Statistical Itervals: A Guide for Practitioers. 9-92. New York, New York: Joh Wiley & Sos, Ic. CONTACT INFORMATION The author welcomes ad ecourages ay questios, correctios, feedback, ad remarks. Cotact the author at: Deis J. Beal, Ph.D. Seior Statisticia / Risk Scietist Sciece Applicatios Iteratioal Corporatio 30 Laboratory Road Oak Ridge, Teessee 3783 phoe: 86-48-8736 e-mail: beald@saic.com SAS ad all other SAS Istitute Ic. product or service ames are registered trademarks or trademarks of SAS Istitute Ic. i the USA ad other coutries. idicates USA registratio. Other brad ad product ames are registered trademarks or trademarks of their respective compaies.