Chapter The Normal Probability Distribution 3 7 Section 7.1 Properties of the Normal Distribution 2010 Pearson Prentice Hall. All rights 2010 reserved Pearson Prentice Hall. All rights reserved 7-2 Illustrating the Uniform Distribution Suppose that United Parcel Service is supposed to deliver a package to your front door and the arrival time is somewhere between 10 am and 11 am. Let the random variable X represent the time from 10 am when the delivery is supposed to take place. The delivery could be at 10 am (x = 0) or at 11 am (x = 60) with all 1-minute interval of times between x = 0 and x = 60 equally likely. That is to say your package is just as likely to arrive between 10:15 and 10:16 as it is to arrive between 10:40 and 10:41. The random variable X can be any value in the interval from 0 to 60, that is, 0 < X < 60. Because any two intervals of equal length between 0 and 60, inclusive, are equally likely, the random variable X is said to follow a uniform probability distribution. 7-3 7-4 1
The graph below illustrates the properties for the time example. Notice the area of the rectangle is one and the graph is greater than or equal to zero for all x between 0 and 60, inclusive. Values of the random variable X less than 0 or greater than 60 are impossible, thus the equation must be zero for X less than 0 or greater than 60. Because the area of a rectangle is height times width, and the width of the rectangle is 60, the height must be 1/60. 7-5 7-6 How do we use density functions to find probabilities of continuous random variables? Area as a Probability The probability of choosing a time that is between 15 and 30 seconds after the minute is the area under the uniform density function. The area under the graph of the density function over an interval represents the probability of observing a value of the random variable in that interval. Area = P(15 < x < 30) = 15/60 = 0.25 15 30 7-7 7-8 2
Relative frequency histograms that are symmetric and bell-shaped are said to have the shape of a normal curve. If a continuous random variable is normally distributed, or has a normal probability distribution, then a relative frequency histogram of the random variable has the shape of a normal curve (bell-shaped and symmetric). mean = median = mode 7-9 7-10 How does the mean and the standard deviation change the position or shape of the normal curve? 7-11 7-12 3
A Normal Random Variable Below is data for the heights (in inches) of a random sample of 50 two-year old males. (a) Draw a histogram of the data using a lower class limit of the first class equal to 31.5 and a class width of 1. (b) Do you think that the variable height of 2-year old males is normally distributed? 7-13 7-14 In the next slide, we have a normal density curve drawn over the histogram. How does the area of the rectangle corresponding to a height between 34.5 and 35.5 inches relate to the area under the curve between these two heights? 7-15 7-16 4
7-17 7-18 Interpreting the Area Under a Normal Curve The weights of giraffes are approximately normally distributed with mean μ = 2200 pounds and standard deviation σ = 200 pounds. (a) Draw a normal curve with the parameters labeled. (b) Shade the area under the normal curve to the left of x = 2100 pounds. (c) Suppose that the area under the normal curve to the left of x = 2100 pounds is 0.3085. Provide two interpretations of this result. (a), (b) (c) The proportion of giraffes whose weight is less than 2100 pounds is 0.3085 The probability that a randomly selected giraffe weighs less than 2100 pounds is 0.3085. 7-19 7-20 5
Relation Between a Normal Random Variable and a Standard Normal Random Variable Section 7.2 The Standard Normal Distribution The weights of giraffes are approximately normally distributed with mean μ = 2200 pounds and standard deviation σ = 200 pounds. Draw a graph that demonstrates the area under the normal curve between 2000 and 2300 pounds is equal to the area under the standard normal curve between the Z- scores of 2000 and 2300 pounds. 7-21 7-22 7-23 7-24 6
The table gives the area under the standard normal curve for values to the left of a specified Z-score, z o, as shown in the figure. 7-25 7-26 Finding the Area Under the Standard Normal Curve Find the area under the standard normal curve to the left of z = -0.38. Finding the Area Under the Standard Normal Curve Area under the normal curve to the right of z o = 1 Area to the left of z o Find the area under the standard normal curve to the right of Z = 1.25. Area left of z = -0.38 is 0.3520. Area right of 1.25 = 1 area left of 1.25 = 1 0.8944 = 0.1056 7-28 7-27 7
Finding the Area Under the Standard Normal Curve Find the area under the standard normal curve between z = -1.02 and z = 2.94. Area between -1.02 and 2.94 = (Area left of z = 2.94) (area left of z = -1.02) = 0.9984 0.1539 = 0.8445 Using technology In TI-83: 2 nd VARS (distr) 2: normalcdf( Enter in left bound, right bound, mean, sd 7-29 7-30 Finding a z-score from a Specified Area to the Left Finding a z-score from a Specified Area to the Right Find the z-score such that the area to the left of the z-score is 0.7157. Find the z-score such that the area to the right of the z-score is 0.3021. The area left of the z-score is 1 0.3021 = 0.6979. The z-score such that the area to the left of the z-score is 0.7157 is z = 0.57. The approximate z-score that corresponds to an area of 0.6979 to the left (0.3021 to the right) is 0.52. Therefore, z = 0.52. 7-31 7-32 8
Finding a z-score from a Specified Area Find the z-scores that separate the middle 80% of the area under the normal curve from the 20% in the tails. The notation z α (prounounced z sub alpha ) is the z-score such that the area under the standard normal curve to the right of z α is α. Area = 0.8 Area = 0.1 Area = 0.1 z 1 is the z-score such that the area left is 0.1, so z 1 = -1.28. z 2 is the z-score such that the area left is 0.9, so z 2 = 1.28. 7-33 7-34 Find the value of z 0.25 Finding the Value of z Notation for the Probability of a Standard Normal Random Variable We are looking for the z-value such that the area to the right of the z-value is 0.25. This means that the area left of the z-value is 0.75. z 0.25 = 0.67 Using technology In TI-83: 2 nd VARS (distr) 3: invnorm( Enter in the area to the left of the desired z-score (or use symmetry) P(a < Z < b) P(Z > a) P(Z < a) represents the probability a standard normal random variable is between a and b represents the probability a standard normal random variable is greater than a. represents the probability a standard normal random variable is less than a. 7-35 7-36 9
Finding Probabilities of Standard Normal Random Variables Find each of the following probabilities: (a) P(Z < -0.23) (b) P(Z > 1.93) (c) P(0.65 < Z < 2.10) For any continuous random variable, the probability of observing a specific value of the random variable is 0. For example, for a standard normal random variable, P(a) = 0 for any value of a. This is because there is no area under the standard normal curve associated with a single value, so the probability must be 0. Therefore, the following probabilities are equivalent: P(a < Z < b) = P(a < Z < b) = P(a < Z < b) = P(a < Z < b) (a) P(Z < -0.23) = 0.4090 (b) P(Z > 1.93) = 0.0268 (c) P(0.65 < Z < 2.10) = 0.2399 So whether there is an equal sign in there or not, it doesn t matter!! (unlike what was happening with binomial probability distributions) 7-37 7-38 Section 7.3 Applications of the Normal Distribution 7-39 7-40 10
Finding the Probability of a Normal Random Variable It is known that the length of a certain steel rod is normally distributed with a mean of 100 cm and a standard deviation of 0.45 cm. * What is the probability that a randomly selected steel rod has a length less than 99.2 cm? 99.2 100 P( X 99.2) P Z 0.45 P Z 1.78 0.0375 Interpretation: If we randomly selected 100 steel rods, we would expect about 4 of them to be less than 99.2 cm. Finding the Probability of a Normal Random Variable It is known that the length of a certain steel rod is normally distributed with a mean of 100 cm and a standard deviation of 0.45 cm. What is the probability that a randomly selected steel rod has a length between 99.8 and 100.3 cm? 99.8 100 100.3 100 P(99.8 X 100.3) P Z 0.45 0.45 P 0.44 Z 0.67 0.4186 Interpretation: If we randomly selected 100 steel rods, we would expect about 42 of them to be between 99.8 cm and 100.3 cm. Using technology: normalcdf(lowerbound, upperbound, mean, sd) * Based upon information obtained from Stefan Wilk. 7-41 7-42 Finding the Percentile Rank of a Normal Random Variable The combined (verbal + quantitative reasoning) score on the GRE is normally distributed with mean 1049 and standard deviation 189. (Source: http://www.ets.org/media/tests/gre/pdf/994994.pdf.) The Department of Psychology at Columbia University in New York requires a minimum combined score of 1200 for admission to their doctoral program. (Source: www.columbia.edu/cu/gsas/departments/psychology/department.html.) What is the percentile rank of a student who earns a combined GRE score of 1300? The area under the normal curve is a probability, proportion, or percentile. Here, the area under the normal curve to the left of 1300 represents the percentile rank of the student. Area left of 1300 = Area left of (z = 1.33) = 0.91 (rounded to two decimal places) Interpretation: The student scored at the 91 st percentile. This means the student Finding the Proportion Corresponding to a Normal Random Variable It is known that the length of a certain steel rod is normally distributed with a mean of 100 cm and a standard deviation of 0.45 cm. Suppose the manufacturer must discard all rods less than 99.1 cm or longer than 100.9 cm. What proportion of rods must be discarded? The proportion is the area under the normal curve to the left of 99.1 cm plus the area under the normal curve to the right of 100.9 cm. Area left of 99.1 + area right of 100.9 = (Area left of z = -2) + (Area right of z = 2) = 0.0228 + 0.0228 = 0.0456 Interpretation: The proportion of rods that must be discarded is 0.0456. If the company manufactured 1000 rods, they would expect to discard about 46 of them. scored better than 91% of the students who took the GRE. 7-43 7-44 11
Finding the Value of a Normal Random Variable The combined (verbal + quantitative reasoning) score on the GRE is normally distributed with mean 1049 and standard deviation 189. (Source: http://www.ets.org/media/tests/gre/pdf/994994.pdf.) What is the score of a student whose percentile rank is at the 85 th percentile? The z-score that corresponds to the 85 th percentile is the z-score such that the area under the standard normal curve to the left is 0.85. This z-score is 1.04. x = µ + zσ = 1049 + 1.04(189) = 1246 Interpretation: We expect a student who is in the 85 th percentile to obtain a GRE score of 1246. 7-45 7-46 Finding the Value of a Normal Random Variable It is known that the length of a certain steel rod is normally distributed with a mean of 100 cm and a standard deviation of 0.45 cm. Suppose the manufacturer wants to accept 90% of all rods manufactured. Determine the length of rods that make up the middle 90% of all steel rods manufactured. z 1 = -1.645 and z 2 = 1.645 Section 7.4 Assessing Normality Area = 0.05 Area = 0.05 x 1 = µ + z 1 σ = 100 + (-1.645)(0.45) = 99.26 cm x 2 = µ + z 2 σ = 100 + (1.645)(0.45) = 100.74 cm Interpretation: The length of steel rods that make up the middle 90% of all steel rods manufactured would have lengths between 99.26 cm and 100.74 cm. 7-47 7-48 12
Suppose that we obtain a simple random sample from a population whose distribution is unknown. Many of the statistical tests that we perform on small data sets (sample size less than 30) require that the population from which the sample is drawn be normally distributed. Up to this point, we have said that a random variable X is normally distributed, or at least approximately normal, provided the histogram of the data is symmetric and bell-shaped. This method works well for large data sets, but the shape of a histogram drawn from a small sample of observations does not always accurately represent the shape of the population. For this reason, we need additional methods for assessing the normality of a random variable X when we are looking at sample data. A normal probability plot plots observed data versus normal scores. A normal score is the expected Z-score of the data value if the distribution of the random variable is normal. The expected Z- score of an observed value will depend upon the number of observations in the data set. 7-49 7-50 The idea behind finding the expected Z-score is that if the data comes from a population that is normally distributed, we should be able to predict the area left of each of the data values. The value of f i represents the expected area left of the i th data value assuming the data comes from a population that is normally distributed. For example, f 1 is the expected area left of the smallest data value, f 2 is the expected area left of the second smallest data value, and so on. We will be content in reading normal probability plots constructed using the statistical software package, MINITAB. In MINITAB, if the normal probability plot is roughly linear and all the points plotted lie within the bounds provided by the software, then we have reason to believe that the sample data comes from a population that is normally distributed. 7-51 7-52 13
Interpreting a Normal Probability Plot The following data represent the time between eruptions (in seconds) for a random sample of 15 eruptions at the Old Faithful Geyser in California. Is there reason to believe the time between eruptions is normally distributed? 728 678 723 735 735 730 722 708 708 714 726 716 736 736 719 The random variable time between eruptions is likely not normal. 7-53 7-54 Interpreting a Normal Probability Plot Suppose that seventeen randomly selected workers at a detergent factory were tested for exposure to a Bacillus subtillis enzyme by measuring the ratio of forced expiratory volume (FEV) to vital capacity (VC). NOTE: FEV is the maximum volume of air a person can exhale in one second; VC is the maximum volume of air that a person can exhale after taking a deep breath. Is it reasonable to conclude that the FEV to VC (FEV/VC) ratio is normally distributed? Source: Shore, N.S.; Greene R.; and Kazemi, H. Lung Dysfunction in Workers Exposed to Bacillus subtillis Enzyme, Environmental Research, 4 (1971), pp. 512-519. Reasonable to believe that FEV/VC is normally distributed. 7-55 7-56 14