Math 3070. Treibergs Final Exam Name: December 7, 00. In an experiment to see how hypertension is related to smoking habits, the following data was taken on individuals. Test the hypothesis that the proportions of hypertension in the two groups are not equal. Use a.0 level of significance. State the null hypothesis, the test statistic, why your statistic is appropriate, the rejection region, your computation and conclusion. Smoking Habits Moderate Smokers Heavy Smokers Total Hypertension 36 30 66 No Hypertension 6 9 4 Total 6 49 Let X be the number of hypertensive individuals among n moderate smokers and X the number of hypertensives among n heavy smokers. Thus X i Bin(n i, p i ), where p i is the population proportion of hypertensives in the i-th group. The null and alternative hypotheses are H 0 : p p ; H a : p p. Because the proportions are equal in the null hypothesis, the appropriate statistic has pooled estimate for p i. Since we have large samples: n p 36 > n q 6 > 0 and n p 30 > n q 9 > 0, by rule of thumb, the z-test is appropriate. ˆp ˆp Z ( ) where p i p q n + n X i n i and p X + X n + n. The null hypothesis is rejected if Z z α/. For α.0, z.0.960. We find p 66 36, Z 6 30 49 66 4 ( 6 + 49 ).337. Since Z < z α/, we cannot reject the null hypothesis. The evidence is not strong that the proportion of hypertensives among moderate smokers is different than the proportion of hypertensives among heavy smokers.
. The Wanship Widgets Company is interested in evaluating its current inspection procedure on shipments of identical widgets. The procedure is to take a sample of from the and pass the shipment if no more than two are found to be defective. What is the probability that the shipment will be accepted even though actually 0% of the widgets in the shipment are defective? Let X be the number of defective widgets in the sample of n taken from N without replacement. Suppose the shipment has M defective widgets. Thus X is a hypergeometric variable with X Hypergeometric(n, N, M pn), where p is the proportion of defectives in the shipment. If p. then M 0. The probability of accepting the shipment is β P (X ) hyp(0,,, 0) + hyp(,,, 0) + hyp(,,, 0) )( 40 ) )( 40 ) )( 40 ) 4 3 ( 0 0 ( ) + ( 0 ( ) + ( 0 ( ).9. 3. Extensive records of Wanship Widgets Co. indicates the number of days to fill orders for their widget controller were 3 8 0 9 8 7 4 7 0 8 7 3 9 Does the data strongly indicate that such orders take longer than days to fill? State the null and alternative hypothesis. State the test statistic and rejection region. What assumptions are you making about the data and how well are they satisfied? Perform the computation and state your conclusion. [Hint: X 3.9 and S 3.34.] Let X be the number of days to fill the controller order. Let µ be the population mean of the number of days to fill the order. The null and alternative hypotheses are H 0 : µ ; H a : µ >. The test statistic is T X S/ n.
Since we have a small sample (n 40) and the population standard deviation σ is unknown, we use the t-test. The assumption we make in this problem is that the sample comes from a normal distribution. The normal PP-plot shows that the points line up pretty well so there is no strong indication that the distribution is not normal, although there is an S shape indicating heavy tails. (In fact the data was generated by a random t- distribution!) The null hypothesis is rejected at the α level of significance if t t ν,α where ν n 9 is degrees of freedom. We compute using the hint t 3.9 3.34/ 0.33. From table A.8, for ν 9 d.f., P (T >.).0 and P (T >.6).009. Thus, interpolating, P P (T >.33).00. From table A., t 9,.0.093 and t 9,.0.39 so.0 < P <.0. Thus at the α.0 significance level, we reject the null hypothesis: there is strong evidence that it takes longer than days to fill the controller order. 4. Let X denote the number of years that a Wanship Widget operates before it ceases to function. Assume that the probability density function (pdf) is { f(x) 3 e x/3, if x 0; 0, if x < 0. Find the cumulative density function (cdf) for X. Find the probability that the widget operates at least four years. Find the probability that the widget continues to operate at least another four years given that it has been functioning for two years. Observe that X is an exponential variable X Exponential(λ 3 ). The cdf is by definition F (x) x { x f(x) dx 0 3 e x/3 dx e x/3, if x 0; 0, if x < 0. The probability that the widget lasts at least four years is P (X 4) P (X < 4) F (4) ( e 4/3 ).64. The probability that the widget continues to operate at least another four years given that it has been functioning for two years is by the conditional probability formula P (X 6 X ) P ({X 6} {X }) P (X ) P (X 6) P (X < 6) P (X ) P (X < ) F (6) F () ( e 6/3 ) ( e /3 ) e 4/3.64. Equality in (b.) and (c.) is known as the memoryless property of exponential variables.. A random sample of Dan s customers shows that of 30 shoppers, of them regularly used cents-off coupons. Construct a 99% one-sided upper confidence interval for the proportion of customers who use coupons. Finish the following statement that describes your confidence interval There is a 99% chance that... Let X be the number of coupon users. X Binomial(n, p), where p is the population proportion of coupon users. Since we have a small sample, nˆq 9 < 0, we cannot use the traditional CI given by equation 7.. Instead we use the unrestricted one-sided upper confidence bound ˆp + z α ˆpˆq p < n + z α n + z α 4n + z α n 3
Since z.0.36, ˆp /30.7, ˆq.3, n 30 we find p <.7 +.36 60 +.36 (.7)(.3) +.36 30 30 +.36 3600.8. The following statement describes the interval. There is a 99 % chance that our interval has captured p. The statement There is a 99% chance that p <.8. is vague and imprecise. On the one hand it can be interpreted as my statement, but it can also be interpreted as.99 P (p <.8) which is false because although in this statement we intend that.8 be the random variable, it is a fixed number. 6. Suppose that 0% of all shafts used in widget construction are nonconforming. A group of 00 shafts is randomly selected. What is the exact probability that not more than 30 are nonconforming? Give the formula (as a sum) but do not evaluate. Compute the probability using the best approximation. Why is your approximation valid? Let X denote the number of nonconforming shafts in the sample os n 00. X Binomial(n 00, p.0), where p is the population proportion of nonconforming shafts. Then the probability that not more than 30 are nonconforming is P P (X 30) Binomial(30, 00,.) 30 i0 ( ) 00 (.) i (.9) 00 i. i The rule of thumb for Poisson approximation is not satisfied, although the rule for normal approximation is: nq > np 00 (.) 0 > 0. The best approximation includes the continuity correction ( ) x +. np P (X x) Φ npq Thus we find ( ) 30 +. 0 P (X 30) Φ Φ(.47). 00(.)(.9) Since Φ(.47).993 and Φ(.48).9934, by interpolation, Φ(.47).9933. 7. To compare two different manufacturers of widget shafts, two samples of sizes n and n were ordered from the first and second manufacturers. Suppose X of the first sample were nonconforming and X of the second sample were nonconforming. Show that ˆθ X X is an unbiased estimator for θ p p, where p i is the population proportion n n of nonconforming shafts from manufacturer i. Derive the formula for standard error of ˆθ. What assumptions are you making in your derivation? The statistic ˆθ is unbiased if E(ˆθ) θ. We are given that X i Binomial(n i, p i ) are binomial variables. Hence E(X i ) p i n i and V (X i ) n i p i q i. Using the linearity of expectations ( X E(ˆθ) E X ) E(X ) E(X ) n p n p p p. n n n n n n 4
The standard error is σ ˆθ V (ˆθ). Let us assume that X and X are independent. Then the formula for variance of a linear combination yields ( X σˆθ V X ) V (X ) n n n + V (X ) n p q n n + n p q p q n + p q. n n 8. In the study Vitamin C Retention in Reconstituted Frozen Orange Juice, (VPI Department of Human Nutrition and Foods, 97), levels of vitamin C were measured for twelve samples at two different time periods (3,7 days) between when OJ concentrate was blended and when it was tested. Response is mg/l ascorbic acid. Test whether the data shows that there is a difference in mean vitamin levels at different times. State the assumptions. State the null and alternative hypotheses, test statistic, rejection region and your conclusion. Sample 3 days 7 days D ------ ------ ------ ----- 49.4 4.7 6.7 49. 48.8 0.4 3 4.8 40.4.4 4 3. 47.6.6 48.8 49. -0.4 6 44.0 44.0 0.0 7 44.0 4.0.0 8 4.4 43. -0.8 9 48.0 48. -0. 0 47.0 43.3 3.7 48. 4. 3.0 49.6 47.6.0 Let X i be the acid level for the i-th sample measured at three days and Y i be the acid level of the same sample measured at seven days. X i and Y i are therefore not independent, and we must use the paired test. We have recorded the random variable D i X i Y i in the table. The sample is small (n 40) and σ D is unknown so we use a paired t-test. We assume that D i is a random sample taken from a normal distribution D i N(µ D, σ D ). The null and alternative hypotheses are The test statistic is H 0 : µ D 0; H a : µ D 0. T D S D / n. The null hypothesis is rejected at the α level of significance if t t ν,α/ since it is a two-tailed test where ν n is the degrees of freedom. The calculator gives D.008333333 and S D.440364478. Hence Thus, interpolating Table A.8, t.008.440/.8. P P (T >.8).008. Thus the P -value is.06. Similarly we could have used Table A.: t,.00.78 and t,.00 3.06 so interpolating, t,.008.8. Thus the mean ascorbic acid level at the two times is significantly different.
9. The Wanship Widgets Company wishes to test whether the average tensile strength of steel wire exceeds the tensile strength of iron wire by at least 0 kilograms. To test the claim, pieces of each type of wire are tested under similar conditions. The steel wire had an average tensile strength of 88.9 kilograms with a standard deviation of 6.8 kilograms, while the iron wire had an average strength of 76.9 kilograms with a standard deviation of.6 kilograms. State the null and alternative hypotheses, the test statistic, and the rejection region. Test at the.0 level of significance and state your conclusion. Why is your statistic valid? Compute the probability that the test will reject the null hypothesis assuming that the mean tensile strength of the steel wire is actually 4 kilograms higher than that of the iron. Assume the same standard deviations. Let X i be be the tensile strength of the steel wire and Y j the tensile strength of the iron wire. We are assuming that all the X i s and Y j s are independent, that X i comes from a distribution with population mean µ and population standard deviation σ and Y j comes from a distribution with population mean µ and population standard deviation σ. Since both populations are large, n n > 40, we may use the two sample z-test, even though the σ i s are unknown. We also cannot assume that σ σ so we must not pool the variance computation. The null and alternative hypotheses are The test statistic is H 0 : µ µ 0 0; H a : µ µ > 0. Z X Ȳ 0. S n + S n The null hypothesis is rejected at the α level of significance if z z α since it is a one-tailed test. We have for α.0, z.0.64. Computing, 88.9 76.9 0.0 6.8 + 76.9.679. Thus.679 >.64 so the null hypothesis is rejected: there is strong evidence that the mean tensile strength of steel wire exceeds the mean tensile strength of iron wire by more than 0 kilograms. If the actual difference is 4 kilograms, then the probability of accepting the null hypothesis is β( ) P (Z < z α µ µ ) ( P X Ȳ S 0 + z α n + S n ) µ µ P Z X Ȳ 0 + z α S n + S S n n + S µ µ n Φ 0 4 +.64 Φ(.74).043. 6.8 + 76.9 We have interpolated Table A.: Φ(.7).0436 and Φ(.7).047. 6