Census. Mean. µ = x 1 + x x n n

MATH 183 Basic Statistics Dr. Neal, WKU Let! be a populatio uder cosideratio ad let X be a specific measuremet that we are aalyzig. For example,! = All U.S. households ad X = Number of childre (uder age 18) livig i the household. To study this sceario, we obtai a set of measuremets { x 1, x 2,...,x } which may be either a cesus or simply a radom sample. Cesus I a cesus, we assume that we have a measuremet from every member i the populatio uder cosideratio. For small populatios, such as studets i oe particular class or players o oe sports team, it is ot hard to obtai a cesus by surveyig each perso i that populatio. But for extremely large populatios, such as all U.S. households, it is early impossible to obtai a real cesus eve whe madated to do so every te years by the Uited States Costitutio. But whe we do have a cesus of measuremets X from a populatio, the we ca fid the true values of the mea µ, the variace! 2, the stadard deviatio!, as well as other populatio parameters. Mea Give a set of measuremets { x 1, x 2,...,x }, the mea (or average) of these specific values is give by µ = x 1 + x 2 +...+ x Whe the values are a cesus of a specific measuremet X from a populatio!, the µ is true average value. It is also called the expected value of X ad may be deoted by µ X or E[X]. Variace The variace, deoted by! 2, is the average squared distace from the mea ad is give by. " 2 = (x 1 # µ) 2 + (x 2 # µ) 2 +... + (x # µ) 2 = 1 $ i =1 (x i # µ) 2 Alterately, the variace is the average of the squares mius the square of average, ad ca be computed by! 2 = x 1 2 + x 2 2 +... +x 2 µ 2. The variace is sometimes deoted by! X 2 or Var( X).

Stadard Deviatio We take the square root of the variace to get the stadard deviatio deoted by! :! =! 2 = 1 #(x i " µ) 2. i =1 The stadard deviatio gives a way of measurig the average spread from the mea. A small! meas that measuremets are cosistetly close to the average µ. Media ad Mode Whe the measuremets { x 1, x 2,...,x } are i icreasig order, the the media is the middle value, or the average of the two middle values if there are a eve umber of measuremets. The mode is the measuremet (or measuremets) that occurs most ofte. Example 1. Below are the umber of credit hours erolled for this semester for all studets i oe sectio of MATH 116. Fid the mea, variace, stadard deviatio, media, ad mode of these values. What percetage of these measuremets are withi oe stadard deviatio of class average? Credit Hours Take This Semester 18 15 14 13 18 14 14 18 17 15 16 18 18 18 15 15 15 17 18 13 15 18 14 18 15 18 15 19 18 14 16 17 16 14 15 15 Solutio. Let! = this specific MATH 116 class ad let X = Number of credit hours erolled i this semester. Because we have a cesus of this class, we ca fid the true mea µ ad the true stadard deviatio!. To do so, we shall eter the data ito the calculator, sort it ito icreasig order, ad use the 1 Vars Stats commad. Eter data ito L1 Sort data the eter 1 Var Stats L1 Output Scroll dow The mea is µ = 18 + 15 + 18 +... + 15 36 = 576 36 = 16 credit hrs Note: The calculator displays this value as x, which stads for sample mea. But because we have a cesus of this class ad ot merely a sample, we use µ to represet that we have the real average of µ = 16 credit hours.

The variace is computed by! 2 = x 1 2 + x 2 2 +...+ x 2 " µ 2 = 9324 36 " 162 = 3 The takig the square root gives us the stadard deviatio of! = 3 1.732. The true stadard deviatio is displayed as σx o the calculator output, ad this value is to be used if we have a cesus of measuremets. So ow we ca say that the class average is 16 credit hours with a average spread from 16 of! 1.732 credit hours. The media is the middle measuremet. But because we have a eve umber of measuremets (36), we must take the average of the middle two measuremets. After sortig the 36 values, the middle values are i the 18th ad 19th positios. The 18th value is 15 while the 19th value is 16. So the media is (15 + 16)/2 = 15.5, which is also displayed o the TI. After sortig the values, it is easy to make a frequecy chart from which we see that the mode is 18 hours. That is, i this class more studets are registered for 18 hours tha for ay other umber of hours. # Hours # Studets 13 2 14 6 15 10 16 3 17 3 18 11 19 1 N = 36 To fid the pct. withi oe stadard deviatio of average, we first compute µ ±! = 16 ± 1. 732, which is about 14.268 to 17.732. So studets takig 15, 16, or 17 hours fall i this rage. There are 10 + 3 + 3 = 16 studets i this rage. Thus, 16/36, or 44.44% of the studets i this class are withi oe stadard deviatio of class average. Questio: Is this class represetative of all studets o campus? Represetative of just udergraduates? Represetative of all studets takig a Ge. Ed. math class this semester? Or perhaps represetative of just MATH 116 studets this semester? Probably the most we ca say is that this class is represetative of all MATH 116 studets this semester. If you wat a sample that is represetative of a larger portio of the studet body, the you must sample accordigly from amog that etire group of studets. But you should ever take a existig sample ad try to say that it is represetative of a larger group that was ot represeted i the sample.

Sample Mea ad Sample Deviatio Ofte a collectio of measuremets is just a sample from a larger populatio. I this case, we caot fid the real average µ. Istead we ca oly compute the sample mea deoted by x. However, x is computed the same way as we computed µ by addig up the values ad dividig by ; we just deote it ow by x to specify that we are oly workig with a sample. The sample deviatio, deoted by S, is computed similarly to! ; however, we use x i the formula, rather tha µ, ad we average the squared differeces by dividig by!1 rather tha. " = 1 $ (x i # µ) 2 S = i =1 For a cesus 1 "1 # (x i " x ) 2 i =1 For a sample By dividig by!1, the sample variace S 2 becomes a ubiased estimator of the true ukow variace! 2. That is, the average of all possible S 2 from all possible samples of size will equal the true variace! 2. Quartiles ad 1.5 IQR The first quartile Q 1 is the media of just the measuremets that are below the overall media. The third quartile Q 3 is the media of just the measuremets that are above the overall media. These values are displayed, alog with the miimum, media, ad maximum, i the 1 Vars Stats output. Together, the values mi Q 1 med Q 3 max make up the five-umber summary. The 1.5 IQR (or 1.5 Iterquartile Rage) is the iterval Q 1!1.5 " (Q 3! Q 1 ) to Q 3 +1.5! (Q 3 " Q 1 ). Values from a sample that are outside this rage are called outliers ad are ofte excluded from samples so as ot to throw off the average too much. Example 2. Below are data o city mpg from a sample of two-seater cars: Model City MPG Model City Acura NSX 17 Hoda Isight 57 Audi TT Quattro 20 Hoda S2000 20 Audi TT Roadster 22 Lamborghii Murcielago 9 BMW M Coupe 17 Mazda Miata 22 BMW Z3 Coupe 19 Mercedes-Bez SL500 16 BMW Z3 Roadster 20 Mercedes-Bez SL600 13 BMW Z8 13 Mercedes-Bez SLK230 23 Chevrolet Corvette 18 Mercedes-Bez SLK320 20 Chrysler Prowler 18 Porsche 911 GT2 15 Ferrari 360 Modea 11 Porsche Boxter 19 Ford Thuderbird 17 Toyota MR2 25

Use your calculator for the followig: (i) Fid the sample mea ad sample deviatio, the media, the mode, ad the fiveumber summary. What percetage of these mileages are withi oe sample deviatio of sample average? (ii) Make a histogram with rage of [5, 60] divided ito bis of legth 5. Which bi has the most measuremets? The secod most? (iii) Give the 1.5 IQR ad deote ay suspected outliers. Solutio. (i) We first eter the data ito a list i the STAT EDIT scree. For this problem we shall use L2. After eterig the data, we sort the data with the commad SortA(L2. The we compute the statistics with the commad 1 Var Stats L2. Eter data ito L2 Sort ad compute stats Output Scroll dow Because the data are oly a sample of measuremets from the populatio! of all two-seater makes of cars, the value of x 19.59 is the sample mea. The sample deviatio is displayed as S 9.22. The miimum value is show to be 9 while the maximum value is 57. The media is give as 18.5. That is, 18.5 is the average of the two middle measuremets whe i icreasig order (the 11th ad 12th with this data set of eve-size 22). The 11th value is 18 while the 12th value is 19. So the media is (18 + 19)/2 = 18.5. The first quartile is Q 1 = 16, which is the media of all values below 18.5. Ad Q 3 = 20, which is the media of all values above 18.5. So the five-umber summary is 9 16 18.5 20 57. By scrollig dow the sorted list, we see that the mode is 20 which occurs most ofte at 4 times. The rage x ± S is 19.59 ± 9.22, which is 10.37 to 28.81, cotais 20 out of 22 measuremets. So we ca say that 90.9% of these mileages are withi oe sample deviatio of sample average. (ii) Adjust the WINDOW ad STAT PLOT settigs to see a histogram (3rd type). Press GRAPH, the TRACE ad scroll to see the bi rage values. The rage [15, 20) has 9 values while the rage [20, 25) has 7 values. Adjust WINDOW Adjust STAT PLOT Histogram TRACE

(iii) The 1.5 IQR is the iterval Q 1!1.5 " (Q 3! Q 1 ) to Q 3 +1.5! (Q 3 " Q 1 ), where Q 3! Q 1 = 20 16 = 4. So the 1.5 IQR is 16! 1.5 " 4 to 20 + 1. 5! 4, or 10 to 26. Thus, the outliers are those values outside of this rage which are 9 mpg ad 57 mpg. Frequecy Charts Ofte measuremets are give i a frequecy chart that states how may times each measuremet occurs. Measuremet x 1 x 2 x 3.. x m Frequecy k 1 k 2 k 3.. k m Now we let = k 1 +... + k m = total umber of measuremets. The the mea µ is actually a weighted average give by µ = k 1 x 1 +... + k m x m Whe usig the calculator, eter the measuremets ito oe list ad eter the frequecies ito aother list.. Example 3. A survey o the umber of childre per household was take throughout a eighborhood. Here are the results from the sample that was obtaied. Number of childre 0 1 2 3 4 5 6 Number of households 60 42 86 59 22 4 2 (i) Fid the mea ad deviatio, the media, the mode, ad the five-umber summary for the umber of childre i this sample of households. What percetage of these households are withi a deviatio of average? (ii) Make a histogram with bis of legth 1. outliers. (iii) Give the 1.5 IQR ad deote the Solutio. Here! = All households i this eighborhood ad X = Number of childre i household. We shall use list L3 for the measuremets ad list L4 for the frequecies, the eter the commad 1 Var Stats L3, L4. (i) Because we have a sample, x 1.86 childre with S 1.34; the media is 2 ad the mode is 2. The five-umber summary is 0 1 2 3 6.

Next we compute x ± S = 1.86 ± 1.34, which is 0.52 to 3.2. This rage icludes all households havig 1, 2, or 3 childre. There are (42 + 86 + 59) = 187 out of 275 such households, or 68% withi a sample deviatio of sample average. The 1.5 IQR is from 1! 1. 5 " 2 to 3 +1.5! 2, or 2 to 6; thus, there are o outliers because all measuremets are withi this rage. Exercise 1. Cosider the Verbal ACT scores from a group of Eglish majors at WKU: 16, 18, 20, 21, 21, 22, 22, 23, 24, 25, 26, 27, 30, 34 (a) Make a histogram with rage [15, 36] ad bis of legth 3. Which bi rage has the most scores? (b) Assumig this group is the etire populatio uder cosideratio: (i) Fid the true mea. (ii) Fid ad explai the media ad the mode. (iii) Fid the true stadard deviatio. (iv) Compute the percetage of these studets whose Verbal ACT score is withi a stadard deviatio of average. (c) Assumig this group is oly a sample from a larger populatio! : (i) Fid the sample mea ad sample deviatio. (ii) Give the boudaries of the 1.5 IQR ad state the outliers. (iii) I this case, what is the appropriate larger populatio! that this sample could represet? Exercise 2. A group of WKU freshma were asked to give the umber of hours take durig their first semester. The results were: Hrs 13 14 14.5 15 15.5 16 16.5 17 18 # Fr 4 5 8 14 8 23 12 18 8 (a) Make a histogram with bis of legth 1. Which bi rage has the most values? (b) Assumig this group is the etire populatio uder cosideratio: (i) Fid the true mea. (ii) Fid ad explai the media ad the mode. (iii) Fid the true stadard deviatio. (iv) Compute the percetage of these studets whose Verbal ACT score is withi a stadard deviatio of average. (c) Assumig this group is oly a sample from a larger populatio! : (i) Fid the sample mea ad sample deviatio. (ii) Explai Q1 ad Q3. (iii) Give the boudaries of the 1.5 IQR ad state the outliers. (iv) I this case, what is the appropriate larger populatio! that this sample could represet?

1. Solutios Dr. Neal, WKU Data i L1 Adjust WINDOW Adjust STAT PLOT (a) [21, 24) has 5 scores (b) (i) µ = 23.5 (ii) Because there are 14 scores, the media is the average of the 7th ad 8th scores, which is (22 + 23)/2 = 22.5. The modes are 21 ad 22 (both occur twice, ad o other score occurs more tha oce). (iii)! 4.547 (iv) µ ±! is 18.953 to 28.047 ad cotais 10/14 or 71.43% of the scores. (c) Assumig this group is oly a sample from a larger populatio!, the (i) x = 23.5 ad S 4.719 (ii) The 1.5 IQR is from 21! 1. 5(26! 21) to 26 + 1. 5(26! 21), or 13.5 to 33.5. The oly outlier is 34. (iii)! = All Eglish majors at WKU. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 2. [16, 17) has 35 scores (b) (i) µ = 15.88 hours. (ii) Because there are 100 measuremets, the media is the average of the 50th ad 51st measuremet, which is (16 + 16)/2 = 16. The mode is also 16 hrs because it occurs most ofte at 23 times. (iii)! 1.1898. (iv) µ ±! is 14.69 to 17.07, which cotais all studets takig 15, 15.5, 16, 16.5, or 17 hours. Thus, there are 75 out of 100 or 75% of the studets withi oe stadard deviatio of average. (c) (i) x = 15.88 ad S 1.19578 (ii) Q1 = 15 is the media of the values below 16. Q3 = 17 is the media of the valuesabove 16. (iii) The 1.5 IQR is from 15!1.5 " 2 to 17 + 1.5! 2, or 12 to 20 which cotais all measuremets. There are o outliers. (iv)! = All WKU Freshme.