Section 5.4 Ken Ueda Students seem to think that being graded on a curve is a positive thing. I took lasers 101 at Cornell and got a 92 on the exam. The average was a 93. I ended up with a C on the test. -Randall Scalise Suppose we have 10,000 data points, all numbers between 4.5 and -4.5. The actual numbers aren t really important at the moment. Lets take a look at a histogram which contains these 10,000 data points: This histogram has 8 bins (or 8 bars). We can increase the number of bins to 20: 1
We increase the number of bins again to 40: And again to 80: Notice that as we increase the number of bins, which is to say, make the widths smaller, we approach a smooth function: 2
This is the infamous normal distribution curve or the bell curve or if you re really smart, the Gaussian distribution. It s not quite the normal distribution since it is scaled but it still has the important shape. The normal distribution curve is given by this function: y = 1 2πσ e (x µ)2 2σ 2. We can actually put this in a graphing calculator, but even better, the TI- 83/TI-84 already has this function already built in. There are a few properties about the normal distribution: Properties of a Normal Distribution 1. The total area under the curve is 1. 2. The curve is symmetric about µ. In particular, the area under the curve to the right of µ and the area under the curve to the left of µ are both equal to 0.5. 3. The curve extends infinitely in both directions, getting closer to, but not touching, the horizontal axis. Now why is the normal distribution so important? Why not other distributions? It just so happens that the normal distribution occurs frequently in nature, in our lives, and pretty much almost every single measurement that has some sort of statistical bias. Take a look at these examples: Example 1. Draw the normal distribution curves for these statistics: a. The average weight of adult (20 and above) males is 188.8 pounds with standard deviation of 33.3 pounds. b. The standing height of adult (20 and above) females 162.2 centimeters with standard deviation of 5.6 centimeters. 3
Solution. a. b. Now you might be wondering, how did I know that the ticks were going to be in those spots. Well we get to one of the most useful properties of a normal distribution: Theorem 1 (Empirical Rule). For any normal distribution, approximately 68% of the data values lie within 1 standard deviation of the mean, 95% within 2 standard deviations of the mean, and 99.7% of the data lie between 3 standard deviations of the mean. This can be easily confirmed using Calculus. We know that the area underneath the entire curve is 1 by definition. Thus if we were to get the area bounded between the 1st standard deviations, we would get approximately 0.68, which converts to our 68%. Between 2 standard deviations, we have the likelihood of 95% and 3 standard deviations, we have almost 100% of the data. 4
Now you will notice that in the picture above, the normal distribution has tick marks of 1, centered at 0. We call this normal distribution, the standard normal distribution (or Standardized Normal Distribution). This normal distribution is special because the scaling gives you the number of standard deviations away from zero, the mean. We will call the number of standard deviations away from 0, z or the z-score when we are discussing in regards to the standard normal distribution. This will relate later to many other normal distributions. It should be noted that z or the z-score can be either positive or negative, depending on whether it is right of the mean, or left of the mean. Let s do some examples. Example 2. Find P (0 Z 1.37) for the standard normal distribution. Solution. You might be wondering, what the heck does this mean? Remember, the area underneath the entire standard normal distribution (in fact any normal distribution) is 1, which corresponds to a probability. What we want to know 5
is the area underneath the curve between 0 and 1.37 standard deviations away. It is very helpful to always draw what you are trying to graph. Make sure to include tick marks on the bottom, unlike the picture. Go to p.669 in your book and look at Appendix A. Look at the left most column. Notice it has 0.0 to 3.4 listed. Since our value is 1.75, we go down to the row that reads 1.7. We then go to the column that corresponds to 0.05, since our number is 1.75. We then read off the value: 0.4599. This is the probability/area underneath the curve for the region we want. Thus P (0 Z 1.37) = 0.4599. Example 3. Find P (Z 1.42) for the standard normal distribution. Solution. Before attempting anything, draw the normal curve with tick marks and shade the region we are concerned with. Go to p.669 in your book. Again look at the left column, and go to the row that says 1.4. We then go to the column that has 0.02 and read off 0.4222. Now this doesn t match up with our answer, but that s only because the table only gives values from 0 to some z value. In fact we know the area underneath the curve that is less than 0; it is 0.5. Thus 0.5 + 0.4222 = 0.9222, the answer we got the other method. Example 4. Find P ( 2.59 Z 1.1) for the standard normal distribution. Solution. Let s draw the region we are concerned with: 6
How would we verify this using our table? Notice that our region is on the left of the mean, but our table is giving us values to the right. But that s ok! We can exploit the symmetry of the normal distribution curve. This means that P ( 2.59 Z 1.1) = P (1.1 Z 2.59). You can also see that if we get the probability P (0 Z 2.59) and subtract off P (0 Z 1.1), we will get the region that we want. So then we go to our appendix and see that P (0 Z 2.59) 0.4952 and P (0 Z 1.1) 0.3643, and so P ( 2.59 Z 1.1) = P (1.1 Z 2.59) 0.4952 0.3643 = 0.1309 Now we make the connection to the standard normal distribution to all other normal distributions. You might have thought, well that s great that we can do all of these things for the distribution of mean of 0 and standard deviation of 1, but rarely are things in real life have mean of 0 and standard deviation of 1. But the important thing is that any normal distribution describes a probability distribution with the same shape. Example 5. The height of American females 20 years and older is approximately normally distributed with mean 63.8 inches and standard deviation of 2.2 inches. Estimate the percentage of American females who are 20 years and older who are between 61.6 and 64.9 inches. Solution. This is the important connection we have to make: when we draw the normal distribution of mean 63.8 and standard deviation of 2.2, we can correspond this to a curve with mean 0 and standard deviation 1, the standardized normal distribution. This means that if we can figure out the z-scores of 61.6 and 64.9 or the number of standard deviations away from 63.8, we can get the probability that we want. Now how do we figure out how many standard deviations 64.9 is from 63.8? Well we know that 64.9 is greater than 63.8 so we know the number has to be positive. We also know that it isn t quite 1-standard deviation away since one standard deviation is 2.2 and so 66 = 63.8 + 2.2 would be one standard deviation away. We can in fact convert the number of standard deviations away by the formula: So then 64.9 corresponds to z = z = x µ σ. 64.9 63.8 2.2 = 1.1 2.2 = 0.5. So 64.9 corresponds to one half a standard deviation away from the mean. z = 61.6 63.8 2.2 7 = 2.2 2.2 = 1.
So then 61.6 corresponds to negative one standard deviations away from the mean. We then look at our table. Our region can be represented by two different regions: P (61.6 X 64.9) = P ( 1 Z 0.5) = P ( 1 Z 0) + P (0 Z 0.5) = P (0 Z 1) + P (0 Z 0.5) 0.3413 + 0.1915 = 0.5328 So the percentage of American females who are 20 years and older who are between 61.6 and 64.9 inches 53.28%. We have our important formula which corresponds our statistic to how many standard deviations away from the mean it is: z = x µ σ Example 6. Suppose we have a normal distribution where the mean is 42 and our standard deviation is 5. Find P (48 X). Solution. First draw the graph. Notice that we can find our conversion of how many standard deviations 48 is away from 42 by our formula: z = 48 42 5 = 6 5 = 1.2. So we have that 48 is 1.2 standard deviations away from 42. We can then find the probability by looking at our table: P (48 X) = P (1.2 Z) = P (0 Z) P (0 Z 1.2) 0.5 0.3849 = 0.1151 Definition 1. The p-th percentile is a number that divides the lower p percent of the values of a distribution from the upper 100-p percent. For a normally distributed random variable, the pth percentile is unique. In math, if there is a forwards, there is often a backwards. In our calculations, we have been getting the probability or area underneath the curve from a certain value. But we could just as easily go backwards; start out with a probability that we want and get the corresponding value. Example 7. For the standard normal distribution, find 8
a. the 80th percentile b. the 20th percentile Solution. a. First draw the graph. We want to find the z-score such that P (Z z) = 0.8. Notice that in our table, we only have values that are greater than or equal to 0. Thus the probability we need to look for in our table is 0.8 0.5 = 0.3. Look at Appendix A. Notice the z values that correspond closest to 0.3 is 0.84 and 0.85. Since 0.84 is a little bit closer to 0.3, our z value is 0.84 (although 0.85 would have been acceptable on a test). b. First draw the graph. Again we want to find the z-score such that P (Z z) = 0.20. Notice this region isn t quite the region we want from our Appendix A. But, since we know that the distribution is symmetric and one half of the probability is to the left (or right) of the mean, we can actually just look for the probability 0.5 0.2 = 0.3 which from our previous problem we know to be 0.84. So then since it is actually on the left side, we know that z = 0.84. We can actually go to our calculator to easily find this value. Go to 2ND DISTR, and select 3, invnorm(. Plug in invnorm(0.20, 0, 1) (or invnorm(0.20)) and we will get -0.8415212 which we will round to -0.84. Notice there is no input for what side of -0.84 we want; that is, there is nothing we input to tell it that we want our probability to be the left side of the z-score. The invnorm function always assumes that 0.2 corresponds to the left of the z-value. Example 8. Scores on the Critical Reading portion of the 2011 SAT were approximately normally distributed with mean 497 and standard deviation 114. a. Estimate the score that falls at the 80th percentile. b. Estimate the score that falls at the 40th percentile. Solution. a. Draw the graph. From the previous example, we know that the z-score corresponding to the 80th percentile is.84. Now that we know our z-score, we can use our formula backwards: z = x µ σ 0.84 = x 497 114 95.76 = x 497 592.76 = x So 593 is the score that falls at the 80th percentile. b. Draw the graph. We could do the same method as in part a. But instead, we will use the calculator on this part. We simply put in invnorm(0.40, 497, 114) and we get 468.1184 which we will round to 468. 9
If you didn t get it the first time, that s ok. This is a hard section. Do lots and lots of problems and check out this link: http://www.mathsisfun.com/ data/standard-normal-distribution.html Make sure that you know how to do both the calculator methods AND the table method. You can indeed check your work with the calculator but I will expect on tests to be able to also use the table methods. 10