Quality Control Using Inferential Statistics in Weibull-based Reliability Analyses

Size: px

Start display at page:

Download "Quality Control Using Inferential Statistics in Weibull-based Reliability Analyses"

Alfred Dennis
5 years ago
Views:

1 GRAPHITE TESTING FOR NUCLEAR APPLICATIONS: THE SIGNIFICANCE OF TEST SPECIMEN VOLUME AND GEOMETRY AND THE STATISTICAL SIGNIFICANCE OF TEST SPECIMEN POPULATION 1 STP 1578, 2014 / available online at / doi: /STP Stephen F. Duffy 1 and Ankurben Parikh 2 Quality Control Using Inferential Statistics in Weibull-based Reliability Analyses Reference Duffy, Stephen F. and Parikh, Ankurben, Quality Control Using Inferential Statistics in Weibull-based Reliability Analyses, Graphite Testing for Nuclear Applications: The Significance of Test Specimen Volume and Geometry and the Statistical Significance of Test Specimen Population, STP 1578, Nassia Tzelepi and Mark Carroll, pp. 1 18, doi: / STP , ASTM International, West Conshohocken, PA ABSTRACT Design codes and fitness-for-service protocols have recognized the need to characterize the tensile strength of graphite as a random variable through the use of probability density functions. Characterizing probability density functions require more tests than typically needed to simply define an average value for tensile strength. ASTM and the needs of nuclear design codes should dovetail on this issue. The two-parameter Weibull distribution (an extreme-value distribution) is adopted for the tensile strength of this material. The failure data from bend tests or tensile tests are used to determine the Weibull modulus (m) and Weibull characteristic strength (r h ). To determine an estimate of the true Weibull distribution parameters, maximum likelihood estimators are used. The quality of the estimated parameters relative to the true distribution parameters depends fundamentally on the number of samples taken to failure. The statistical concepts of confidence intervals and hypothesis testing are presented pertaining to their use in assessing the goodness of the estimated distribution parameters. The inferential statistics tools enable the calculation of likelihood confidence rings. The concept of how the true distribution parameters lie within a likelihood ring with a specified confidence is presented. A material acceptance criterion is Manuscript received August 15, 2013; accepted for publication March 10, 2014; published online July 18, Ph.D., P.E., Cleveland State Univ., Cleveland, OH 44115, United States of America. 2 Cleveland State Univ., Cleveland, OH 44115, United States of America. 3 ASTM Symposium on Graphite Testing for Nuclear Applications: The Significance of Test Specimen Volume and Geometry and the Statistical Significance of Test Specimen Population on Sept 19 20, 2013 in Seattle, WA. Copyright VC 2014 by ASTM International, 100 Barr Harbor Drive, PO Box C700, West Conshohocken, PA

2 2 STP 1578 On Graphite Testing for Nuclear Applications defined here, and the criterion depends on establishing an acceptable probability of failure of the component under design, as well as an acceptable level of confidence associated with the estimated distribution parameter determined using failure data from a specific type of strength test. Keywords graphite, Weibull, confidence bounds, likelihood ratio rings Nomenclature H 0 ¼ null hypothesis H 1 ¼ alternative hypothesis L ¼ likelihood function L ¼ natural log of the likelihood function m ¼ Weibull modulus ~m ¼ estimated Weibull modulus n ¼ sample size P f ¼ probability of failure T ¼ test statistic a ¼ significance level b ¼ probability of a Type II error c ¼ confidence level H 0 ¼ vector of all the maximum likelihood estimator parameter estimates H c 0 ¼ vector of all point estimates that are not maximum likelihood estimator parameter estimates r h ¼ Weibull characteristic strength ~r h ¼ estimated Weibull characteristic strength Introduction This work presents the mathematical concepts behind statistical tools that, when combined properly, lead to a simple quality control program for components fabricated from graphite. The data on mechanistic strength (which is treated as a random variable) should be used to accept or reject a graphite material for a given application. The two-parameter Weibull distribution is used to characterize the tensile strength of graphite. The Weibull distribution is an extreme-value distribution, and this facet makes it the preferred distribution for characterizing a material s minimum strength. Estimates of the true Weibull distribution parameters should be determined using maximum likelihood estimators (MLEs). The quality of the estimated parameters relative to the true distribution parameters depends fundamentally on the number of samples taken to failure. The statistical concepts of confidence intervals and hypothesis testing are employed to assess quality. Quality is defined by how

3 DUFFY AND PARIKH, DOI /STP close the estimated parameters are to the true distribution parameters. The quality of the distribution parameters can have a direct effect on whether a certain grade or type of graphite material is acceptable for a given application. Both inferential statistics concepts (i.e., confidence intervals and hypothesis testing) enable the calculation of likelihood confidence rings. Work showing how the true distribution parameters lie within a likelihood ring with a specified confidence is presented here. The size of the ring has direct bearing on the quality of the estimated parameters. One must specify and associate an acceptable level of confidence with the estimated distribution parameters. This work shows how to construct likelihood ratio confidence rings that establish an acceptance region based on a given level of confidence. Material performance curves are presented that are based on an acceptable component probability of failure. Combining the two elements (i.e., the material performance curve and a likelihood ratio ring) allows the design engineer to determine whether a material is suited for the component design at hand. The result is a simple approach to a quality assurance criterion. Point Estimates and Confidence Bounds Data related to the tensile strength of graphite can be generated through the use of tensile tests outlined in ASTM C565 [1], ASTM C749 [2], ASTM C781 [3], and ASTM D7775 [4]. Bend tests are preferred for their simplicity, and flexural test procedures are outlined in ASTM C651 [5]. Given data on the tensile strength of graphite, the first step is to ascertain values for the Weibull distribution parameters using this information. MLEs, outlined in ASTM D7846 [6], are used to compute point estimates. The next question is, have we estimated parameters to the best of our ability? This is directly related to the fundamental question asked repeatedly, How many samples must be tested? The typical answer to this question seems to be about 30. However, the appropriate question to ask is, How many samples must be tested to establish a given confidence level for component reliability? The work outlined here answers this question quantitatively, utilizing interval estimates along with hypothesis testing. The methods outlined here are currently being implemented in the ASME Boiler and Pressure Vessel Code [7]. Confidence intervals are used to indicate the potential variability in estimated parameters. Every time a sample is taken from the same population, a point estimate of the distribution parameters can be calculated. Successive samples produce different point estimate values, and thus the point estimates are treated as random variables. Thus interval estimates bracketing the true distribution parameters are as necessary as point estimates. If the interval that brackets the true distribution parameter contains the estimated parameter, then the estimate is consistent with the true value. Increasing the sample size will always narrow the interval bounds and provide point estimates that approach the true distribution parameters. Interval bounds on parameter estimates represent the range of values for the distribution parameters that are both reasonable and plausible.

4 4 STP 1578 On Graphite Testing for Nuclear Applications Inferences and Hypothesis Testing As noted above, there is a need to know whether a sample is large enough that the point estimates of the distribution parameters are in the same statistical neighborhood as the true population distribution parameters. The techniques for making this kind of assessment utilize inferential statistics. The type of inference focused on here is the bounds on the true population parameters (which are never known) given a particular sample. Here a particular type of bound that is referred to as a likelihood ratio confidence bound is employed. The basic issue is this: consider an infinitely large population with a known frequency distribution but unknown distribution parameters. Because of diminished knowledge of the overall population available from a small sample taken from the infinitely large population, that sample will generate a frequency distribution that is different from that of the parent population different in the sense that the distribution parameters estimated from the small sample (a subset) will not be the same as the parameters of the parent population. The population and the sample can be characterized by the same frequency distribution, but the two will have different distribution parameters. As the sample size increases, the frequency distribution associated with the sample more closely resembles that of the parent population (i.e., the estimated parameters approach the true distribution parameters of the parent population). Hypothesis testing is used to establish whether the true distribution parameters lie close to the point estimates. Two statistical hypotheses are proposed concerning the estimated parameter values. The first, H 1, is referred to as the alternative hypothesis. The latter hypothesis, H 0, is referred to as the null hypothesis. Both hypotheses are then tested with the samples taken from the parent population. The goal of the analyst is to decide whether there is enough evidence (data) to refute the null hypothesis H 0. That decision is made based on the value of a test statistic. Here that test statistic is the ratio of two likelihood functions whose probability is known under the assumption that H 0, the null hypothesis, is true. If the test statistic takes on a value rarely encountered using the data collected, then the test statistic indicates that the null hypothesis is unlikely and H 0 is rejected. The value of the test statistic at which the rejection is made defines a rejection region. The probability that the test statistic falls into the rejection region by chance is referred to as the significance level, denoted by a. The significance level is defined as the probability of mistakenly rejecting a hypothesis when the hypothesis is valid. Rejecting Hypotheses Making a decision regarding a hypothesis is associated with a statistical event with an attending probability, so an ability to assess the probability of making incorrect decisions is required. Fisher [8] established a method for quantifying the amount of evidence required in order for an event to be deemed unlikely to occur by chance.

5 DUFFY AND PARIKH, DOI /STP He originally defined this quantity as the significance level. Significance levels are different than confidence levels, but the two are related. The significance level and the confidence level are functionally related through the following simple expression, with the confidence level denoted by c: (1) c ¼ 1 a The confidence level is associated with a range, or more specifically with bounds or an interval, within which a true population parameter resides. The confidence level and, through the equation above, the significance level are chosen a priori based on the design application at hand. Given a significance level a defined by Eq 1, a rejection region can be established. This is known as the critical region for the test statistic selected. For our purposes, the observed tensile strength data for a graphite material are used to determine whether the computed value of the test statistic associated with a hypothesis (not the parameter estimates) lies within or outside the rejection region. The amount of data helps define the size of the critical region. If the test statistic is within the rejection region, then we say the hypothesis is rejected at the 100a % significance level. If a is quite small, then the probability of rejecting the null hypothesis when it is true can be made quite small. Type I and Type II Errors Consider that for a true distribution parameter h there is a need to test the null hypothesis that h ¼ h 0 (in this context h 0 is a stipulated value) against the alternative that h = h 0 at a significance level a. Under these two hypotheses, a confidence interval can be constructed that contains the true population parameters with a probability of c ¼ (1 a). In addition, this interval also contains the value h 0. Mistakes can be made in rejecting the null hypothesis given above. In hypothesis testing, two types of errors are possible. 1. Type I error: the rejection of the null hypothesis (H 0 ) when it is true. The probability of committing a Type I error is denoted by a. 2. Type II error: the failure to reject the hypothesis (H 0 ) when the alternative hypothesis (H 1 ) is true. The probability of committing a Type II error is denoted by b. In either situation, judgment of the null hypothesis H 0 is incorrect. Now consider the situations in which correct decisions have been made. In the first case, the null hypothesis is not rejected and the null hypothesis is true. The probability of making this choice is c ¼ (1 a). This is the same probability associated with the confidence interval for the true population distribution parameters discussed. For the second situation, the probability of correctly rejecting the null hypothesis is the statistical complement of a Type II error, that is, (1 b). In statistics this is known as the power of the test of the null hypothesis. This quantity is used

6 6 STP 1578 On Graphite Testing for Nuclear Applications to determine the sample size. Maximizing the probability of making a correct decision requires high values of (1 a) and (1 b). So decision protocols must be designed so as to minimize the probability of either type of error in an optimal fashion. The probability of a Type I error is controlled by making a a suitable number, say, 1 in 10 or 1 in 20, or something smaller depending on the consequence of making a Type I error. Minimizing the probability of making a Type II error is not straightforward. b is dependent on the alternative hypothesis, on the sample size n, and on the true value of the distribution parameters tested. As discussed in the next section, the alternative hypothesis is greatly influenced by the test statistic chosen to help quantify the decision. Hence when hypothesis testing is applied to the distribution parameters, a statement of equality is made in the null hypothesis H 0. Achieving statistical significance for this is akin to accepting that the observed results (the point estimates of the distribution parameters) are plausible values if the null hypothesis is not rejected. The alternative hypothesis does not in general specify particular values for the true population parameters. However, as shown in a section that follows, the alternative hypothesis helps us establish bounds on the true distribution parameters. This is important when a confidence ring is formulated for a parameter distribution pair. The size of the ring can be enlarged or reduced based on two controllable parameters, the significance level a and the sample size n. The Likelihood Ratio Test Statistic The test statistic used to influence decision making regarding the alternative hypothesis is a ratio of the natural log of two likelihood functions. In simple terms, one likelihood function is associated with a null hypothesis, and the other is associated with an alternative hypothesis. For a probability density function with a single distribution parameter, the general approaches to testing the null and alternative hypotheses are defined, respectively, as (2) (3) H 0 : h ¼ h 0 H 1 : h ¼ h 1 Note that in the expression for the alternative hypothesis H 1, the fact that h equals h 1 implies that h is not equal to h 0, and the alternative hypothesis is consistent with the discussion in the previous section. As they are used here, the hypotheses describe two complementary notions regarding the distribution parameters, and these notions compete with each other. In this sense the hypotheses can be better described mathematically as (4) (5) H 0 : h 2 H 0 ¼ð 1 h 0 ; 2 h 0 ; :::; r h 0 Þ H 1 : h 2 H c 0 ¼ð 1 h 1 ; 2 h 1 ; :::; r h 1 Þ where r corresponds to the number of parameters in the probability density function. Conceptually h 0 and h 1 are scalar values, whereas H 0 and its complement H c 0

7 DUFFY AND PARIKH, DOI /STP are vectors of distribution parameters. A likelihood function associated with each hypothesis can be formulated, (6) L 0 ¼ Yn i¼1 f ðx i jh 2 H 0 Þ for the null hypothesis and (7) L 1 ¼ Yn i¼1 f ðx i jh 2 H c 0 Þ for the alternative function. The likelihood function L 0 associated with the null hypothesis is evaluated using the maximum likelihood parameter estimates. The sample population (i.e., graphite failure data) is assumed to be characterized by a two-parameter Weibull distribution. There are methods to test the validity of this assumption. However, the material strength is characterized by a random variable, so it makes sense to use a minimum extreme-value distribution such as the Weibull distribution. Because this is a proof-of-concept effort focused on likelihood ratio rings, goodness-of-fit tests that can be used to discriminate between alternative underlying population distributions are left to others to pursue. A vector of distribution parameters whose components are the MLE parameter estimates is identified as (8) ð 1 h 0 ; 2 h 0 Þ¼ð h ~ 1 ; h ~ 2 Þ ¼ð~m; ~r h Þ where: ~m ¼ maximum likelihood estimate of the Weibull modulus, and ~r h ¼ maximum likelihood estimate of the characteristic strength. Now (9) H 0 : h 2 H 0 ¼ð~m; ~r h Þ that is, H 0 contains MLE parameter estimates, and (10) H 1 : h 2 H c 0 with H c 0 representing a vector of point estimates that are not MLE parameter estimates. In essence, we are testing the null hypothesis that the true distribution parameters are equal to the MLE parameter estimates, with an alternative hypothesis that the true distribution parameters are not equal to the MLE parameter estimates. The likelihood functions are now expressed as (11) ~L 0 ¼ Yn i¼1 f ðx i j ~m; ~r h Þ

8 8 STP 1578 On Graphite Testing for Nuclear Applications (12) L 1 ¼ Yn f ðx i jm; r h Þ i¼1 A test statistic is introduced that is defined as the natural log of the ratio of the likelihood functions, T ¼ 2ln L 1 (13) ~L 0 The Neyman Pearson lemma [9] states that this likelihood ratio test is the most powerful test statistic available for testing the null hypothesis. We can rewrite this last expression as T ¼ 2 L ~ (14) L where (15) (16) ( ) Y L ¼ ln ~L n 0 ¼ ln f ðx i j ~m; ~r h Þ i¼1 ( ) Y ~L ¼ ln ~L n 1 ¼ ln f ðx i jm; r h Þ i¼1 The natural log of the likelihood ratio of a null hypothesis to an alternative hypothesis is our test statistic, and its distribution can be determined in the limit as the sample size approaches infinity. The test statistic is then used to form decision regions where the null hypothesis can be accepted or rejected. A convenient result attributable to Wilks [10] indicates that as the sample size n approaches infinity, the value 2 ln(t) will be asymptotically v 2 -distributed for a nested composite hypothesis. If one hypothesis can be derived as a limiting sequence of another, we say that the two hypotheses are nested. In our case the sample ( r X 1, r X 2,..., r X n ) representing the rth sample is drawn from a Weibull distribution under H 0. These same samples are used in the alternative hypothesis H 1, and because their parent distribution is assumed to be a Weibull distribution under both hypotheses, the two hypotheses are nested and conditions are satisfied for the application of Wilks s theorem. The test statistic is designed in such a way that the probability of a Type I error does not exceed a, a value that we control. Thus the probability of a Type I error is fixed, and we search for the test statistic that maximizes (1 b), where again b is the probability of a Type II error. Where inferences are being made on parameters from a population characterized by a two-parameter Weibull distribution, the degree of freedom for the v 2 distribution is one, and the values of the v 2 distribution are easily calculated. One can compute the likelihood ratio T and compare 2ln (T) to the v 2 value corresponding to a desired significance level to define a rejection region. This is outlined in the next section. The value of the ratio of the two

9 DUFFY AND PARIKH, DOI /STP likelihood functions defined above (L 1 /~L 0 ) approaches 1 in the optimal critical region (i.e., the value of the test statistic T should be small). This is a result of minimizing a and maximizing (1 b). The ratio is high in the complementary region. A high ratio corresponds to a high probability of a correct decision under H 0. The likelihood ratio test implies that the null hypothesis should be rejected if the value of the ratio is too small. How small is too small depends on the significance level of the test (i.e., on what probability of Type I error is considered tolerable). Lower values of the likelihood ratio mean that the observed result is much less likely to occur under the null hypothesis than under the alternative hypothesis. Higher values of the likelihood ratio mean that the observed outcome is more or equally likely (or nearly as likely) to occur under the null hypothesis, and the null hypothesis cannot be rejected. The likelihood ratio test and its close relationship to the v 2 test can be used to determine what sample size will provide a reasonable approximation of the true population parameters. The Likelihood Ratio Ring The likelihood ratio confidence bounds are based on the inequality T ¼ 2 L ~ L ¼ 2ln L ð m; r (17) hþ v 2 a;1 L ð~m; ~r h Þ The equality in Eq 17 can be expressed as (18) Lðm; r h Þ ¼ L ð~m; ~r h Þexp v2 a;1 2 Here, ~m and ~r h are maximum likelihood estimates of the distribution parameters based on the data obtained from a sample. These parameter estimates are random variables (they vary from sample to sample), as are the test statistic T and v 2. Equation 17 stipulates a relationship between random variables. The true distribution parameters m and r h are fixed values, but they are unknown to us unless the population is completely sampled. However, if a is designated, then a value for v 2 (i.e., a realization) is established. Once this realization is established for v 2, a realization for the test statistic T can be established through Eq 17. For a given significance level, confidence bounds m 0 and r 0 h can be computed that satisfy Eq 18 (i.e., these bounds satisfy the following expression). (19) L m 0 ; r 0 h L ð ~m; ~rh Þexp v2 a;1 ¼ 0 2 With a given value of m 0, a pair of values can be found for r 0 h. This procedure is repeated for a range of m 0 values until there are enough values to produce a smooth

10 10 STP 1578 On Graphite Testing for Nuclear Applications FIG. 1 Log likelihood frequency plot of L(m,r h ) with likelihood confidence ring and associated test statistic T. ring. These parameters map a contour ring in a plane perpendicular to the log likelihood axis (see Fig. 1). A change in the significance level results in a different-sized ring. From the geometry in Fig. 1 we can see that the true distribution parameters that are unknown to us will lie within the ring. Aspects of Likelihood Confidence Rings In order to present aspects of the likelihood confidence rings, Monte Carlo simulation is utilized to obtain test data. Using Monte Carlo simulation allows us the convenience of knowing what the true distribution parameters are for a particular dataset. Here, it is arbitrarily assumed that the Weibull modulus is 17 and the Weibull characteristic strength is 400 MPa. Figure 2 shows the likelihood confidence ring for a 90 % confidence level and a sample size of 10, along with the true distribution parameters and the estimated distribution parameters. If the true distribution parameter pair were unknown, we would be 90 % confident that the true distribution parameters were within the ring. If the Monte Carlo simulation process were continued nine more times (i.e., if we were in possession of ten simulated datasets), then on average one of those datasets would produce a likelihood confidence ring that did not contain the true distribution parameter pair. In Fig. 3 the effect of holding the sample size fixed and varying the confidence level is presented. The result is a series of nested likelihood confidence rings. Here

11 DUFFY AND PARIKH, DOI /STP FIG. 2 Confidence ring contour for a sample size of 10 (m ¼ 17, r h ¼ 400). we have one dataset and multiple rings associated with increments of the confidence level from 50 % to 95 %. Note that as the confidence level increases, the size of the likelihood confidence ring expands. For a given number of test specimens in a dataset, the area encompassed by the likelihood confidence ring expands as we become more and more confident that the true distribution parameters are contained in the ring. FIG. 3 Dependence of likelihood confidence rings on c for a sample size of 30 (m ¼ 17, r h ¼ 400).

12 12 STP 1578 On Graphite Testing for Nuclear Applications FIG. 4 Likelihood confidence rings for sample sizes ranging from 10 to 100 (m ¼ 17, r h ¼ 400). The next figure, Fig. 4, depicts the effect of varying the sample size and holding the confidence level fixed at c ¼ 90 %. The sample size was increased from n ¼ 10 to n ¼ 100. Note that all the likelihood confidence rings encompass the true distribution parameters used to generate each sample. In addition, the area within the rings grows smaller as the sample size increases. As the sample size increases, we gain information on the population and thereby reduce the region that could contain the true distribution parameters for a given level of confidence. Figure 5 depicts a sampling procedure in which the size of the sample is held fixed (i.e., n ¼ 10) and the sampling process and ring generation have been repeated 100 times. For a fixed confidence level of 90 %, one would expect that ten rings would not encompass the true distribution parameters. Indeed that is the case. The 90 likelihood confidence rings that encompassed the true distribution parameters are outlined in blue. The ten likelihood confidence rings that did not contain the distribution parameters are outlined in dark orange. Confidence Rings and Material Acceptance The material acceptance approach outlined here depends on several things. First one must have the ability to compute the probability of failure of the component under design. This probability is designated (P f ) component and is quantified using a

13 DUFFY AND PARIKH, DOI /STP FIG likelihood confidence rings. For all rings, n ¼ 10 and c ¼ 0.9 (m ¼ 17, r h ¼ 400). hazard rate format that is, the probability of failure is expressed as a fraction with a numerator of 1. The method for computing this quantity is available in the ASME Boiler and Pressure Vessel Code [7]. The component probability of failure is modeled assuming the underlying strength is characterized by a two-parameter Weibull distribution. Thus a component probability of failure curve can be depicted in an m r h graph as shown in Fig. 6. Points along the curve represent parameter pairs equal to a specified probability of failure. This curve is referred to as a material performance curve. We overlay this graph with point estimates of the Weibull distribution parameters obtained from tensile strength data that a typical material supplier might provide. Point estimates from these data that plot to the right of the material performance curve represent a lower probability of failure. Conversely, point estimates to the left of this curve are associated with performance curves with a higher probability of failure. Thus the material performance curve defines two regions of the m r h space, an acceptable performance region and a rejection region relative to a specified component probability of failure. The material performance curve is easily married to a likelihood confidence ring (discussed in previous sections). This allows the component fabricator to decide whether the material supplier is providing a material with high enough

14 14 STP 1578 On Graphite Testing for Nuclear Applications FIG. 6 Generic material performance curve. quality predicated on the component design and the failure data. Keep in mind that parameter estimates are estimates of the true distribution parameters of the population, values that are never known in real-life applications. However, through the use of the likelihood confidence ring method we can define a region in some close proximity of the estimated point parameter pair, knowing with some level of assurance that the true distribution parameters are contained within that region. If that region in its entirety falls to the right of the test specimen performance curve, the component fabricator can accept the material with a known level of quality (i.e., the significance level). Not surprisingly, we define this procedure as the quality acceptance criterion. We have combined the two concepts, the likelihood confidence ring and the material performance curve, in one figure (Fig. 7). Here the material performance curve given in Fig. 6 is overlain with the likelihood confidence ring from Fig. 2. This is a graphical representation of the quality assurance process. Rings that reside completely to the right of the material performance curve would represent acceptable materials. Those rings to the left would represent unacceptable materials and would be rejected. In the specific case presented, the material performance curve cuts through the likelihood confidence ring. In this case there are certain regions of the likelihood confidence ring that produce a safe design space, and there is a region of the likelihood confidence ring that produces an unsafe design space. In this situation we know the distribution parameters, and they are purposely to the right of the material performance curve. But given the sample size, the ring did not reside entirely in the safe region. Moreover, in normal designs we never know the true

15 DUFFY AND PARIKH, DOI /STP FIG. 7 Material performance curve with likelihood confidence ring contour, n ¼ 10 (m ¼ 17, r h ¼ 400). distribution parameters, so we do not know where the true distribution parameter pair resides inside the likelihood confidence ring. When the likelihood confidence ring resides totally to the left of the performance curve, the choice to reject the material is quite clear. When the likelihood confidence ring lies completely to the right of the material performance curve, then once again, the choice is quite clear: accept the material. When the material performance curve slices through the likelihood confidence ring, we can shift the material performance curve to the left, as depicted in Fig. 8. This shift represents a reduction of component reliability or, alternatively, an increase in the component probability of failure. Alternatively, the confidence bound associated with likelihood confidence ring can be reduced so the ring shrinks enough such that the ring is completely to the right of the material performance curve. This is depicted in Fig. 9. An interesting aspect of this approach is that it seems that the likelihood confidence rings give a good indication of which side of the material performance curve the true distribution parameters lie on. If the material performance curve slices through a likelihood confidence ring for a specified confidence level, then as the ring size is diminished the ring becomes tangent to one side of the curve or the other. When this paper was written it was our experience that the side of the component reliability curve that the ring becomes tangent to matches with the side on which the true distribution parameters lie. It is emphasized that this is anecdotal. An example in which the true distribution parameters were chosen to the left of the material performance curve is depicted in Fig. 10. The true distribution parameters

16 16 STP 1578 On Graphite Testing for Nuclear Applications FIG. 8 Two parallel material performance curves with likelihood confidence ring (m ¼ 17, r h ¼ 400). FIG. 9 Material performance curves with likelihood confidence rings for changing values of c (m ¼ 17, r h ¼ 400).

17 DUFFY AND PARIKH, DOI /STP FIG. 10 Material performance curves with likelihood confidence rings for changing values of c (m ¼ 6, r h ¼ 350). are known because we are conducting a Monte Carlo simulation exercise to produce failure data. As the confidence level decreases in Fig. 10, the rings become tangent to the curve on the rejection side. Summary This effort focused on graphite materials and the details associated with calculating point estimates for the Weibull distribution parameters associated with the tensile strength. One can easily generate point estimates from failure data using maximum likelihood estimators. More information regarding the population (i.e., more failure data) always improves the quality of point estimates; the question becomes how much data is sufficient given the application. The work outlined here speaks directly to this issue. Hypothesis testing and the relationship it maintains with parameter estimation were outlined. A test statistic was adopted that allows one to map out an acceptance region in the m r h parameter distribution space. The theoretical support for the equations used to generate the likelihood rings was outlined. Inferential statistics allowed us to generate confidence bounds on the true distribution parameters utilizing the test data at hand. These bounds are dependent on the size of the sample used to calculate point estimates. The effort focused on a particular type of confidence bound known as likelihood confidence rings. Component reliability curves were discussed. The concepts of the likelihood confidence rings and the component probability of failure curves were combined graphically. This combination gives rise to a material qualification process. This

18 18 STP 1578 On Graphite Testing for Nuclear Applications process combines information regarding the reliability of the component and the parameter estimates to assess the quality of the material. References [1] ASTM C565-93(2010)e1: Test Methods for Tension Testing of Carbon and Graphite Mechanical Materials, Annual Book of ASTM Standards, ASTM International, West Conshohocken, PA, [2] ASTM C749-08(2010)e1: Test Method for Tensile Stress-Strain of Carbon and Graphite, Annual Book of ASTM Standards, ASTM International, West Conshohocken, PA, [3] ASTM C781-08: Practice for Testing Graphite and Boronated Graphite Materials for Hightemperature Gas-cooled Nuclear Reactor Components, Annual Book of ASTM Standards, ASTM International, West Conshohocken, PA, [4] ASTM D e1: Guide for Measurements on Small Graphite Specimens, Annual Book of ASTM Standards, ASTM International, West Conshohocken, PA, [5] ASTM C651-11: Test Method for Flexural Strength of Manufactured Carbon and Graphite Articles Using Four-point Loading at Room Temperature, Annual Book of ASTM Standards, ASTM International, West Conshohocken, PA, [6] ASTM D : Reporting Uniaxial Strength Data and Estimating Weibull Distribution Parameters for Advanced Graphites, Annual Book of ASTM Standards, ASTM International, West Conshohocken, PA, [7] ASME, Article HHA-II-3000, Section III, Division 5, High Temperature Reactors, Rules for Construction of Nuclear Facility Components, ASME Boiler and Pressure Vessel Code, ASME, New York, [8] Fisher, R. A., Theory of Statistical Estimation, Proc. Cambridge Philos. Soc., Vol. 22, 1925, pp [9] Neyman, J. and Pearson, E., On the Problem of the Most Efficient Tests of Statistical Hypotheses, Philos. Trans. R. Soc. London Series A, Vol. 231, 1933, pp [10] Wilks, S. S., The Large Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses, Ann. Math. Stat., Vol. 9, No. 1, 1938, pp

Quality Control Using Inferential Statistics In Weibull Based Reliability Analyses S. F. Duffy 1 and A. Parikh 2

Quality Control Using Inferential Statistics In Weibull Based Reliability Analyses S. F. Duffy 1 and A. Parikh 2 1 Cleveland State University 2 N & R Engineering www.inl.gov ASTM Symposium on Graphite