The Chi-Square Distributions - PDF Free Download

MATH 183 The Chi-Square Distributions Dr. Neal, WKU The chi-square distributions can be used in statistics to analyze the standard deviation σ of a normally distributed measurement and to test the goodness of fit of various population models on a set of data. A chi-square distribution is based on a parameter known as the degrees of freedom n, where n is an integer greater than or equal to 1. Such a random variable is denoted by X ~ χ (n). The χ (n) distribution is defined to be the sum of the squares of n independent standard normal distributions. For example, suppose X 1,..., X n are independent normally distributed measurements having mean µ i and standard deviation σ i for i = 1,..., n. These measurements could be the heights or IQ scores of various groups of people. By subtracting the mean and then dividing by the standard deviation, we convert each measurement into a standard normal distribution: Z i = X i µ i σ i ~ N(0, 1), for 1 i n. So Z 1 ~ N(0, 1) and its distribution graph will be the common bell-shaped curve which is symmetric about the origin. Then Z 1 ~ χ (1). Its plot will consist of positive values concentrated near the origin, and it will have mean 1 and variance. The standard normal distribution χ (1) distribution χ ()distribution χ (n) distribution By standardizing, squaring, and summing random measurements from the respective normal populations, we obtain a chi-square distribution with n degrees of freedom: χ (n) = X 1 µ 1 σ 1 + X µ σ +... + X n µ n σ n = Z 1 + Z +...+ Zn. The distribution graphs for n 3 are skewed bell-shaped curves, defined on [0, ), with increasingly larger values of x as the point at which the graph obtains its maximum. The mean is now n, the variance is n, and the standard deviation is n. For n 3, the maximum (mode) occurs when x = n. X ~ χ (n) = Z 1 + Z +...+ Zn Mean = n Variance = n Standard Deviation = n Mode = n (for n 3)

The theoretical distribution curve is given by f (x) = C n x n/ 1 e x /, for x 0, where C n is a constant that depends on n given by 1 n/ n 1! C n = (n )/ n 1! (n 1)! π for n even for n odd. A chi-square curve can be plotted using the built-in χ pdf( command from the DISTR menu. For example, to graph the χ (10) curve, enter χ pdf( X,10) into the Y= screen. To compute P(a X b) for X ~ χ (n), enter χ cdf(a, b, n) or Shadeχ (a, b, n). Example 1. Let X ~ χ (10). (a) Where does the maximum of the curve occur? (b) Compute P(6 X 10). Is there symmetry at the outer tails; i.e., does P(0 X 6) = P(X 10)? (c) Find the left and right bounds that contain 90% of the distribution. Solution. (a) For X ~ χ (10), the maximum (mode) occurs when x = n = 8. (b) From the TI output, we see that P(6 X 10) 0.37477. Also, the left-tail is P(0 X 6) 0.1847, and the right-tail is P(X 10) 0.4405. So the two tails outside of the inner region 6 X 10 are not symmetric. For there to be 90% in the middle of the distribution, we must have 5% at each tail. The values where these occur (chi-square scores) can be found with the table on the next page. In this case, the values are about 3.940 and 18.31.

Left and Right Chi Square Scores for 80%, 90%, 95%, and 98% intervals. (L = Prob. of Left Tail, R = Prob. of Right Tail) 0.01 0.05 0.05 0.10 0.10 0.05 0.05 0.01 d.f. L L L L R R R R 1 0.000 0.001 0.004 0.016.706 3.841 5.04 6.635 0.00 0.051 0.103 0.11 4.605 5.991 7.378 9.10 3 0.115 0.16 0.35 0.584 6.51 7.815 9.348 11.34 4 0.97 0.484 0.711 1.064 7.779 9.488 11.14 13.8 5 0.554 0.831 1.145 1.610 9.36 11.07 1.83 15.09 6 0.87 1.37 1.635.04 10.64 1.59 14.45 16.81 7 1.39 1.690.167.833 1.0 14.07 16.01 18.48 8 1.646.180.733 3.490 13.36 15.51 17.54 0.09 9.088.700 3.35 4.168 14.68 16.9 19.0 1.67 10.558 3.47 3.940 4.865 15.99 18.31 0.48 3.1 11 3.053 3.816 4.575 5.578 17.8 19.68 1.9 4.7 1 3.571 4.404 5.6 6.304 18.55 1.03 3.34 6. 13 4.107 5.009 5.89 7.04 19.81.36 4.74 7.69 14 4.660 5.69 6.571 7.790 1.06 3.68 6.1 9.14 15 5.9 6.6 7.61 8.547.31 5.00 7.49 30.58 16 5.81 6.908 7.96 9.31 3.54 6.30 8.84 3.00 17 6.408 7.564 8.67 10.08 4.77 7.59 30.19 33.41 18 7.015 8.31 9.390 10.86 5.99 8.87 31.53 34.80 19 7.633 8.907 10.1 11.65 7.0 30.14 3.85 36.19 0 8.60 9.591 10.85 1.44 8.41 31.41 34.17 37.57 1 8.897 10.8 11.59 13.4 9.6 3.67 35.48 38.93 9.54 10.98 1.34 14.04 30.81 33.9 36.78 40.9 3 10.0 11.69 13.09 14.85 3.01 35.17 38.08 41.64 4 10.86 1.40 13.85 15.66 33.0 36.4 39.36 4.98 5 11.5 13.1 14.61 16.47 34.38 37.65 40.65 44.31 6 1.0 13.84 15.38 17.9 35.56 38.88 41.9 45.64 7 1.88 14.57 16.15 18.11 36.74 40.11 43.19 46.96 8 13.56 15.31 16.93 18.94 37.9 41.34 44.46 48.8 9 14.6 16.05 17.71 19.77 39.09 4.56 45.7 49.59 30 14.95 16.79 18.49 0.60 40.6 43.77 46.98 50.89 40.16 4.43 6.51 9.05 51.80 55.76 59.34 63.69 50 9.71 3.36 34.76 37.69 63.17 67.50 71.4 76.15 60 37.48 40.48 43.19 46.46 74.70 79.08 83.30 88.38 70 45.44 48.76 51.74 55.33 85.53 90.53 95.0 100.4 80 53.34 57.15 60.39 64.8 96.58 101.9 106.6 11.3

Theorems I. Let { x 1, x,..., x n } denote the collection of all random samples of size n from normally distributed measurements having variance σ. Let S n 1 = (x n 1 i x ) be i=1 the distribution of all possible sample variances. Then (n 1) S σ is a χ (n 1) distribution. Thus with a normally distributed measurement, we can evaluate P(a S b) by provided σ is known. P(a S b) = P(a S b ) (n 1)a = P σ (n 1)a = P σ (n 1)S (n 1)b σ σ χ (n 1) (n 1)b σ II. Let S be the sample variance from a random sample of size n of a normally distributed measurement having variance σ. A confidence interval for σ, with level of confidence r = 1 α, is given by (n 1)S R σ (n 1)S L, where L and R are the left and right bounds of the χ (n 1) distribution that give r (n 1)S (n 1)S probability in the middle. A confidence interval for σ is σ. R L III. To test the null hypothesis H 0 : σ = M for a normally distributed measurement, we obtain the sample deviation S from a random sample of size n. The test statistic is then (n 1) S (n 1) S x = σ = M which is compared with the χ (n 1) distribution. Compute the (left-tail) P -value P χ (n 1) x (right-tail) P -value P χ (n 1) x ( ) for the alternative H a : σ < M, and compute the ( ) for the alternative H a : σ > M.

Example. Random samples of size 46 are taken from a measurement that is N(100,15). What is P(13 S 17)? Example 3. From a normally distributed measurement, a sample of size 0 yields S = 3.96. Find a 98% confidence interval for the true standard deviation σ. Example 4. From a normally distributed measurement, a sample of size 5 yields a sample deviation of 13.96. Is there evidence to reject the hypothesis H 0 : σ = 15? Solutions Example : P(13 S 17) = P(13 S 17 ) (n 1)13 (n 1)S (n 1)17 = P σ σ σ = P 45 169 5 χ (n 1) P 33.8 χ (45) 57.8 45 89 5 ( ) 0.794 (using χ cdf(33.8, 57.8, 45) ) (n 1)S Example 3: σ R or.8693 σ 6.4776. (n 1)S L ; hence, 19 3.96 36.19 σ 19 3.96 7.633, Example 4: For S = 13.96, we use the alternative H a : σ < 15. The test statistic is x = (n 1) S 4 13.96 σ = 15 = 0. 78737 ~ χ (n 1) = χ (4) and P χ (4) 0.78737 ( ) 0.348765 (χ cdf(0, 0.78737, 4). If σ = 15 were true, then there is still a 34.8765% chance of obtaining a sample deviation of 13.96 or lower with a sample of size 5. There is not enough evidence to reject H 0.

Exercises 1. Let X ~ χ (15). Find (a) P(13 X 17), (b) P(X < 13) and (c) P(X > 17). Show a graph for each. (d) Find the bounds that contain 95% of the distribution.. Adult heights are found to be normally distributed with mean µ = 68 inches and standard deviation σ = 3.5 inches. Suppose various random samples of size n = 6 are collected. Compute P(.8 S 4.). 3. From a normally distributed measurement, a sample of size 5 yields a sample deviation of 14.85. Find a 95% confidence interval for the true standard deviation. 4. From a normally distributed measurement, a sample of size 16 yields S = 4.6. Is there evidence to reject the hypothesis H 0 : σ = 3? Answers: 1. (a) 0.834 (b) 0.3977 (c) 0.3189 (d) L = 6.6 and R = 7.49. P 3. Use 5.8 3.5 χ 5 4. (5) 3.5 4 14.85 39. 36 σ 4 14.85 1.40 = P 16 χ (5) 36 ( ) 0.843 to obtain 11.6 σ 0.66. 4. Test stat = 30.46, P χ (15) 30. 46 ( ) 0.011. If σ = 3 were true, then there is only a 1.1% chance of getting an S of 4.6 or higher with a sample of size 16. Can reject H 0 in favor of H a : σ > 3.