Statistical Intervals and the Applications. Hsiuying Wang Institute of Statistics National Chiao Tung University Hsinchu, Taiwan

and the Applications Institute of Statistics National Chiao Tung University Hsinchu, Taiwan

1. Confidence Interval (CI) 2. Tolerance Interval (TI) 3. Prediction Interval (PI)

Example A manufacturer wanted to characterize the voltage outputs for a new electronic circuit pack design. Five units were built and the five measurements are 50.3, 48.3, 49.6, 50.4 and 51.9 volts. The sample size for this study was small because of the high cost of manufacturing such units. x = 50.1 and s = 1.31

Statistical interval associated with precision The point estimates provide a concise summary of the sample results, but they give no information about their precision. For example, in the example, we will ask how good is this estimate. We need to quantify the uncertainty associated with our estimate. An understanding of this uncertainly is an important input for decision making.

Interval A two-sided 95% CI for the mean µ is 50.1 ± 1.24(1.31) = (48.5, 51.7). A two-sided 95% TI to contain at least 99% of the sampled population is 50.1 ± 6.6(1.31) = (41.5, 58.7). A two-sided 95% PI to contain all of 10 additional circuit packs is. 50.1 ± 5.23(1.31) = (43.2, 57.0)

Interval A two-sided 95% PI to contain the mean of five additional circuit packs is 50.1 ± 1.76(1.31) = (47.8, 52.4). A two-sided 95% CI for the standard deviation σ is (0.6(1.31), 2.87(1.31)) = (0.8, 3.8) A two-sided 95% PI to contain the standard deviation of five additional circuit packs is (0.32(1.31), 3.10(1.31)) = (0.4, 4.1)

Table 2.1: Examples of Some Characteristic General Purpose the Statistical of interest of Description Interval Prediction Location CI for a population PI for a future mean or median sample mean, future or a specified distribution percentile sample median, or a particular ordered observation from a future sample. Spread CI for a PI for the population standard standard deviation of a deviation future sample

Enclosure TI to contain PI to contain interval at least a specified all or most of the proportion of a observations from a population future sample. Probability CI for the PI to contain of an event probability of being the proportion of less than (or greater observations in a than) some future sample that specified value exceed a specified limit.

Model Model 1. Parameteric model (CI, TI, PI) 2. Nonparametric model (Distribution free CI, TI, PI)

CI for the normal distribution A 100(1 α)% confidence interval for an unknown quantity θ may be characterized as follows: If one repeatedly calculates such intervals from many independent random samples, 100(1 α)% of the intervals would, in the long run, correctly bracket the true value θ.

CI for the normal distribution A two-sided 100(1 α)% confidence interval for the mean µ of a normal distribution is s (l, u) = x ± t (1 α/2;n 1), n where t (γ,k) is 100γth percentile of the Student s t distribution with k degrees of freedom.

CI for the normal distribution n X µ P ( t s (1 α/2;n 1) ) = 1 α s P (µ x ± t (1 α/2;n 1) ) = 1 α n

TI for the normal distribution A two-sided 100(1 α)% tolerance interval to contain at least a proportion, p, of a population described by a normal distribution, is (T L, T U ) = ( x k 1 s, x + k 2 s).

TI for the normal distribution Proof The problem is to find k so that either x + ks or x ks is the required tolerance limit. Mathematically, the problem is to find k such that P x,s (P X (X x + ks) p) = 1 α where X has a normal distribution with mean µ and standard deviation σ, and p and 1 α are specified probabilities. Define K p by: 1 Kp e t2 /2 dt = p 2π and then, x + ks µ P X,s (P X (X x + ks) p) = P ( σ Rewriting once more this become K p ).

TI for the normal distribution P ( x µ n Kp σ n s k n) = 1 α σ This is now in the form of the noncentral t-distribution with f degrees of freedom and with δ = K p n and t = k n. Or equality, this may be written P (T f k n δ = K p n) = 1 α where T f has a noncentral t-distribution. Hence the quantity k which is desired may be computed from the percentage points of the noncentral t-distribution.

PI for the normal distribution A two-sided 100(1 α)% prediction interval to contain the mean of m future, independently and randomly selected observations, based upon the results of a previous independent random sample of size n from the same population described by a normal distribution, is x ± t (1 α/2;n 1) ( 1 m + 1 n )1/2 s.

PI for the normal distribution Assume that y 1,...y m are the future observations and x 1,..., x n are the past observations. The statistic ȳ x ( 1 m + 1 n )s2 has a t distribution with n 1 degrees of freedom because s 2 = n i=1 (x i x) 2 /(n 1) is based on the n 1 past observations. Thus, we have ȳ x P ( t (1 α/2;n 1) ) = 1 α. ( 1m + 1n )s2 P ( x t (1 α/2;n 1) ( 1 m + 1 n )1/2 s < ȳ < x+t (1 α/2;n 1) ( 1 m + 1 n )1/2 s) = 1 α

Distribution free Intervals The distribution-free intervals and bounds do not require one to assume a particular distribution. Let x 1, x 2,..., x n represent n independent observations from any continuous distribution and let x (1), x (2),..., x (n) denote the same observations, ordered from smallest to largest. The distribution-free CIs, TIs, and PIs discussed here use selected order statistics as interval endpoints.

Distribution free CI for a percentile A two-sided distribution-free conservative 100(1 α)% CI for Y p, the 100pth percentile of the sampled population, is obtained from a sample of size n as [Y p, Y p ] = [x (l), x (u) ], where l and u satisfy µ i=l 1 ( ) n p i (1 p) n i 1 α i and l and u are symmetric or nearly symmetric, around i = [np] + 1.

Distribution free TI A two-sided distribution-free conservative 100(1 α)% TI to contain at least 100p% of the sampled population from a sample of size n is [T p, T p ] = [x (l), x (u) ], where l and u satisfy u l 1 i=0 ( ) n p i (1 p) n i 1 α i and l and u are symmetric around (n + 1)/2.

Example A production engineer wants to evaluate the capability of a chemical process to produce a particular compound. Measurements are available n = 100 randomly selected batches from the process. Assume that a distribution-free 95% CI is needed for Y 0.5, the median of the population. For n = 100 and 1 α = 0.95, Table A.15g gives l = 41 and µ = 61. A distribution-free conservative 95% CI for Y 0.5 is (Y 0.5, Y 0.5 ) = (x (41), x (61) ).

normality Assessing distribution normality and dealing with nonnormality Probability (P-P) and Q-Q Plots Normal distribution probability plots of the data are a simple and effective tool for doing this, especially if there are 20 or more observations.

Dealing with Nonnormal Data Dealing with Nonnormal Data Box-Cox Transformations { x γ ifγ 0 x = log(x) ifγ = 0 where γ is a parameter that defines the transformation. In practice, one tries to find a value for γ that leads to approximate normality.

分析動機及其重要性預測壽命變動趨勢, 有助於政府, 公司建立更完善的退休金制度預測壽命精確與否, 對於與壽命相關產業 (ex: 壽險業 ) 營運有極重要的影響可建立各年度的生命表

生命表 (Mortality Table) 97 年台灣省簡易生命表 ( 兩性 )

0 歲平均餘命之比較

歷年死亡人數統計年別死亡數 0 14 歲 ˇ 15 29 歲 30 44 歲 45 59 歲 60 74 歲 75 89 歲 90 以上死亡者平均年齡 88 126,65 4 89 126,01 6 90 127,89 2 95 136,37 1 96 140,37 1 97 143,59 4 較 96 年增減 % 3,641 4,759 11,110 18,080 40,836 43,135 5,093 65.59 3,233 4,482 10,798 18,310 39,732 43,984 5,477 66.08 2,912 4,077 10,536 18,785 39,985 45,679 5,918 66.69 1,885 3,683 10,623 22,656 35,452 53,590 8,482 68.31 1,777 3,264 9,996 23,224 35,593 57,008 9,509 69.16 1,687 3,015 9,841 23,713 35,595 59,180 10,563 69.71 2.30 5.06 7.63 1.55 2.11 0.01 3.81 11.08 0.54

97 年死亡年齡統計圖

平均壽命區間估計目的 : 了解當年度平均壽命 Warning: 資料並非服從常態分配 Method: (1) 利用 Box Cox Transformation, 計算出 C.I 再將之反轉換, 計算出區間的上, 下界 (2) 利用 Distribution Free Statistical Interval

Box Cox Transformation r=2 seems better

1. Histogram or stem-and-leaf plot ( 直方圖 ) 2. Check sheet ( 查檢表 ) 3. Pareto chart ( 柏拉圖 ) 4. Cause-and-effect diagram ( 石川圖 ) 5. Flow chart( 流程圖 ) 6. Scatter diagram ( 散布圖 ) 7. Control chart ( 管制圖 )

Basis of the Control Chart The chart contains a center line that represents the average value of the quality characteristic corresponding to the incontrol state. Two other horizontal lines, called the upper control limit (UCL) and the lower control limit (LCL), are also shown on the chart.

Process Control Chart Upper control limit Out of control Process average Lower control limit 1 2 3 4 5 6 7 8 9 10 Sample number 4-2

Normal Distribution 95% 99.74% -3σ -2σ -1σ µ=0 1σ 2σ 3σ 4-3