Paper SD-07. Key words: upper tolerance limit, order statistics, sample size, confidence, coverage, maximization

Similar documents
7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

A statistical method to determine sample size to estimate characteristic value of soil parameters

NCSS Statistical Software. Tolerance Intervals

Sample Size Determination (Two or More Samples)

1 Inferential Methods for Correlation and Regression Analysis

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Topic 9: Sampling Distributions of Estimators

Linear Regression Models

1 Review of Probability & Statistics

Frequentist Inference

Using SAS to Evaluate Integrals and Reverse Functions in Power and Sample Size Calculations

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

STAT 155 Introductory Statistics Chapter 6: Introduction to Inference. Lecture 18: Estimation with Confidence

Simulation. Two Rule For Inverting A Distribution Function

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

Confidence Intervals รศ.ดร. อน นต ผลเพ ม Assoc.Prof. Anan Phonphoem, Ph.D. Intelligent Wireless Network Group (IWING Lab)

Estimation of a population proportion March 23,

Exam II Covers. STA 291 Lecture 19. Exam II Next Tuesday 5-7pm Memorial Hall (Same place as exam I) Makeup Exam 7:15pm 9:15pm Location CB 234

AP Statistics Review Ch. 8

Regression, Inference, and Model Building

Properties and Hypothesis Testing

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Class 23. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

GUIDELINES ON REPRESENTATIVE SAMPLING

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Topic 9: Sampling Distributions of Estimators

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Topic 9: Sampling Distributions of Estimators

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

TMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.

Sampling Distributions, Z-Tests, Power

Algebra of Least Squares

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

This is an introductory course in Analysis of Variance and Design of Experiments.

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date: Confidence Interval Guesswork with Confidence

Understanding Samples

(all terms are scalars).the minimization is clearer in sum notation:

Efficient GMM LECTURE 12 GMM II

Statistics 511 Additional Materials

10-701/ Machine Learning Mid-term Exam Solution

DEPARTMENT OF ACTUARIAL STUDIES RESEARCH PAPER SERIES

One-Sample Test for Proportion

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

Sequences. A Sequence is a list of numbers written in order.

Lecture 7: Properties of Random Samples

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Module 1 Fundamentals in statistics

Exam II Review. CEE 3710 November 15, /16/2017. EXAM II Friday, November 17, in class. Open book and open notes.

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

IE 230 Probability & Statistics in Engineering I. Closed book and notes. No calculators. 120 minutes.

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

Lecture 7: Non-parametric Comparison of Location. GENOME 560 Doug Fowler, GS

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

A Confidence Interval for μ

1036: Probability & Statistics

MATH 1080: Calculus of One Variable II Fall 2017 Textbook: Single Variable Calculus: Early Transcendentals, 7e, by James Stewart.

MIDTERM 3 CALCULUS 2. Monday, December 3, :15 PM to 6:45 PM. Name PRACTICE EXAM SOLUTIONS

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

Statistical Intervals for a Single Sample

Confidence Intervals for the Population Proportion p

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Chapter 6. Sampling and Estimation

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

TEACHER CERTIFICATION STUDY GUIDE

Basis for simulation techniques

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

A LARGER SAMPLE SIZE IS NOT ALWAYS BETTER!!!

Topic 5 [434 marks] (i) Find the range of values of n for which. (ii) Write down the value of x dx in terms of n, when it does exist.

Chapter 10: Power Series

Stat 200 -Testing Summary Page 1

Estimating the Population Mean - when a sample average is calculated we can create an interval centered on this average

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Approximate Confidence Interval for the Reciprocal of a Normal Mean with a Known Coefficient of Variation

Topic 18: Composite Hypotheses

Median and IQR The median is the value which divides the ordered data values in half.

Extreme Value Charts and Analysis of Means (ANOM) Based on the Log Logistic Distribution

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Chapter 8: Estimating with Confidence

NUMERICAL METHODS FOR SOLVING EQUATIONS

S160 #12. Review of Large Sample Result for Sample Proportion

Chapter 13, Part A Analysis of Variance and Experimental Design

n m CHAPTER 3 RATIONAL EXPONENTS AND RADICAL FUNCTIONS 3-1 Evaluate n th Roots and Use Rational Exponents Real nth Roots of a n th Root of a

The target reliability and design working life

Poisson approximation

Determining the sample size necessary to pass the tentative final monograph pre-operative skin preparation study requirements

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Transcription:

SESUG 203 Paper SD-07 Maximizig Cofidece ad Coverage for a Noparametric Upper Tolerace Limit for a Fixed Number of Samples Deis J. Beal, Sciece Applicatios Iteratioal Corporatio, Oak Ridge, Teessee ABSTRACT A oparametric upper tolerace limit (UTL) bouds a specified percetage of the populatio distributio with specified cofidece. The most commo UTL is based o the largest order statistic (the maximum) where the umber of samples required for a give cofidece ad coverage is easily derived for a ifiitely large populatio. This relatioship ca be used to determie the umber of samples prior to samplig to achieve a give cofidece ad coverage. However, ofte statisticias are give a data set ad asked to calculate a UTL for the umber of samples provided. Sice the umber of samples usually caot be icreased to icrease cofidece or coverage for the UTL, the maximum cofidece ad coverage for the give umber of samples is desired. This paper derives the maximum cofidece ad coverage for a fixed umber of samples. This relatioship is demostrated both graphically ad i tabular form. The maximum cofidece ad coverage are calculated for several sample sizes usig results from the maximizatio. This paper is for itermediate SAS users of Base SAS who uderstad statistical itervals. Key words: upper tolerace limit, order statistics, sample size, cofidece, coverage, maximizatio INTRODUCTION A oe-sided distributio-free (oparametric) upper tolerace limit (UTL) is equivalet to a oe-sided distributio-free cofidece boud for a percetile of that populatio. No distributioal assumptios are ecessary such as ormality, logormality, gamma or ay other cotiuous distributio. However, the oparametric UTL does assume the data collected are radomly selected from a ifiitely large populatio, are statistically idepedet samples, ad are statistically represetative of the populatio. UTLs have both a cofidece ad coverage attributio. The coverage of a UTL is the percetage p (0 < p < ) of the populatio distributio that is bouded by the order statistic from the sample. The cofidece of a UTL is how cofidet oe is that the specified order statistic bouds the percetile of the populatio distributio ad is deoted 00x( - α)% where α is the Type I error rate (0 < α < ). A Type I error (α ) is the probability of rejectig the ull hypothesis whe i fact the ull hypothesis is true. Oce the cofidece, coverage ad desired order statistic are specified, the miimum umber of samples () ecessary to achieve these parameters ca be calculated (Beal 202). The SAS code uses the SAS System for persoal computers versio 9.3 ruig o Widows 7. THEORY OF ORDER STATISTICS A oe-sided oparametric UTL assumig a ifiitely large populatio that relates cofidece ( - α), coverage (p), ad the umber of samples () is show i Equatio () (Hah ad Meeker, 99). p = α () For a fixed sample size, the objective fuctio to maximize is the sum of cofidece ad coverage, as show i Equatio (2). f α) = α + p ( (2) Substitutig Equatio () ito Equatio (2) yields Equatio (3). = α (3) f ( α ) + α To maximize Equatio (3), we take the first derivative of f(α) ad set it equal to 0, as show i Equatio (4).

f ( ) = + α = 0 SESUG 203 α (4) Solvig Equatio (4) for α yields Equatio () for >. = α () α from Equatio () maximizes Equatio (3) as show by Equatio (6) for >. f ( ) α ) = ( 2 2 < 0 α (6) Therefore, the maximum cofidece term from Equatio (3) is show i Equatio (7). = α (7) The maximum coverage term from Equatio (3) is show i Equatio (8). p = = α (8) GRAPHS OF THE OBJECTIVE FUNCTION Figure shows lie plots of the fuctio from Equatio (3) to be maximized o the vertical axis with α o the horizotal axis. The fuctio of cofidece plus coverage is show for selected umber of samples () from to 00 where the fuctio is maximized. The top figure of Figure shows the complete fuctio for all α (0 < α < ). The bottom figure shows the same curves, but magifies the plot for oly smaller values of α (0 < α < 0.20) sice the fuctios are maximized withi this rage. The vertical lies show where the fuctio is maximized for each. Figure shows as icreases the optimal α decreases. As α decreases, cofidece icreases ad coverage decreases for ay. Ay combiatio of cofidece ad coverage alog each lie plot may be selected for each. For example, for = 0 oe could choose 99% cofidece (α = 0.0) with approximately 63% coverage. This would result i oly 99% cofidece + 63% coverage = 62% combied cofidece ad coverage. Selectig the optimal α = 0.0774 yields approximately 92.26% cofidece ad 77.43% coverage for a total of 69.7%. Table shows the maximized cofidece, maximized coverage ad optimal α for various usig Equatios (), (7) ad (8). 2

SESUG 203 Cofidece + Coverage 2.0.9.8.7.6..4.3 0 20 30 0 00.2..0 0.0 0. 0.2 0.3 0.4 0. 0.6 0.7 0.8 0.9.0 Alpha 2.0.9.8 Cofidece + Coverage.7.6..4.3.2..0 0 20 30 0 00 0.00 0.02 0.04 0.06 0.08 0.0 0.2 0.4 0.6 0.8 0.20 Alpha Figure. Lie plots of cofidece plus coverage objective fuctio for =, 0, 20, 30, 0, 00 3

SESUG 203 Table. Optimal cofidece ad coverage for selected sample sizes Sample Optimal Optimal Optimal Cofidece Size () α Cofidece (%) Coverage (%) + Coverage (%) 2 0.20 7.000 0.000 2.00 3 0.92 80.7 7.73 38.49 4 0.7 84.2 62.996 47.2 0.34 86.62 66.874 3.0 6 0.6 88.33 69.883 8.24 7 0.03 89.67 72.302 6.97 8 0.093 90.73 74.300 6.0 9 0.084 9.7 7.984 67.4 0 0.077 92.27 77.426 69.68 0.072 92.847 78.679 7.3 2 0.066 93.32 79.780 73.3 3 0.062 93.788 80.7 74.4 4 0.08 94.69 8.627 7.80 0.0 94.06 82.43 76.92 6 0.02 94.80 83.24 77.93 7 0.049 9.072 83.772 78.84 8 0.047 9.33 84.36 79.68 9 0.04 9.3 84.90 80.44 20 0.043 9.729 8.43 8.4 2 0.04 9.9 8.879 8.79 22 0.039 96.077 86.33 82.39 23 0.038 96.230 86.77 82.9 24 0.036 96.37 87.09 83.47 2 0.03 96.02 87.449 83.9 26 0.034 96.624 87.78 84.40 27 0.033 96.737 88.094 84.83 28 0.032 96.843 88.390 8.23 29 0.03 96.942 88.669 8.6 30 0.030 97.036 88.933 8.97 40 0.023 97.726 90.97 88.70 0 0.08 98.3 92.327 90.48 7 0.03 98.742 94.332 93.07 00 0.00 99.04 9.4 94.0 SAS CODE FOR GENERATING GRAPHS The SAS code that calculates the curves for the cofidece plus coverage objective fuctio is show below. data a; do =, 0, 20, 30, 0, 00; alpha_opt = **(/(-)); ** equatio ; cof_opt = - alpha_opt; ** equatio 7; p_opt = **(/(-)); ** equatio 8; do alpha = 0.00 to 0.999 by 0.00; f = - alpha + alpha**(/); ** equatio 3; output; ed; ed; output; ru; 4

SESUG 203 CONCLUSION For a give data set with fixed umber of samples, the cofidece ad coverage ca be selected for a oparametric UTL o the maximum result, assumig a ifiitely large populatio from which the represetative samples are draw. However, for small samples there is isufficiet data to achieve both high cofidece ad high coverage. A icrease i cofidece will cause a decrease i coverage, while a icrease i coverage will cause a decrease i cofidece. This paper derives the equatios to calculate the maximum cofidece ad coverage for ay >. This relatioship is demostrated both graphically ad i tabular form for various values of. These results allow the data aalyst to obtai the maximum cofidece ad coverage for a oparametric UTL from ay data set. REFERENCES Beal, Deis J. 202. Sample Size Determiatio for a Noparametric Upper Tolerace Limit for ay Order Statistic, Proceedigs of the 20 th Aual Coferece of the SouthEast SAS Users Group. Hah, G. ad W. Meeker. 99. Statistical Itervals: A Guide for Practitioers. 9-92. New York, New York: Joh Wiley & Sos, Ic. CONTACT INFORMATION The author welcomes ad ecourages ay questios, correctios, feedback, ad remarks. Cotact the author at: Deis J. Beal, Ph.D. Seior Statisticia / Risk Scietist Sciece Applicatios Iteratioal Corporatio 30 Laboratory Road Oak Ridge, Teessee 3783 phoe: 86-48-8736 e-mail: beald@saic.com SAS ad all other SAS Istitute Ic. product or service ames are registered trademarks or trademarks of SAS Istitute Ic. i the USA ad other coutries. idicates USA registratio. Other brad ad product ames are registered trademarks or trademarks of their respective compaies.