8 Chapter Collectg, Dsplayg, ad Aalyzg your Data. Descrptve Statstcs Sectos explaed how to choose a sample, how to collect ad orgaze data from the sample, ad how to dsplay your data. I ths secto, you wll lear how to aalyze the data usg some basc tools from descrptve statstcs. You wll ot have to perform ay complcated calculatos by had; however, t s vtal that you uderstad the cocepts of measurg cetral tedecy, posto, dsperso, ad outlers because these are tools you ca use to aalyze the data from your ow research. You also eed to uderstad the relatoshp betwee these measuremets ad the shape of a frequecy dstrbuto. The measures of cetral tedecy, dsperso, ad the shape of a frequecy dstrbuto are the three basc buldg blocks we wll use to costruct all of the sophstcated cocepts (lke statstcal tests) Chapter ad Chapter. Measures of Cetral Tedecy Whe aalyzg a large ad dverse populato, t s helpful to try to measure characterstcs about ts ceter or mddle. The two ma measures of cetral tedecy are the meda ad the mea. Suppose that you have a lst of scores from a test gve a musc class. The the sample s the class ad the umber of studets your class s the sample sze. The sample sze s represeted by the varable. The meda s the mdpot (or mddle) of the scores whe lsted from smallest to greatest. That meas % of the scores are less tha or equal to the meda ad % of the scores are greater tha or equal to the meda. As you ca tell from the defto, you ca oly calculate a meda f the data ca be ordered or raked from smallest to greatest. Thus, you ca oly calculate a meda f the varable s quattatve (ordal, terval, or rato level). It would ot make sese to calculate a meda for a omal varable. The mode s the score that occurs most frequetly. It s possble for there to be more tha oe mode. For example, f there are te scores: 7, 7, 8, 8, 8, 89, 9, 9, 9, ad 98. The the score of 8 ad 9 occur three tmes ad are the most frequetly occurrg scores. I ths case there are two modes, 8 ad 9. Ulke the other measures of cetral tedecy, you ca calculate the mode for a varable wth a omal level of measuremet. The sample mea s ofte called the sample average. You probably remember from hgh school that you calculate the sample mea by addg all of the scores the sample ad the dvdg by the umber of people the sample. So, f there are studets the sample, the sample mea = x + x + x +... + x where x s the score of the st studet, x s the score of the d studet, etc. Ths s the formula that we wll use throughout the book; however, t wll be wrtte wth slghtly dfferet otato. I addto to beg shorter to wrte, the symbols the otato correspod to the fuctos used by computers to calculate these statstcs. The symbol for the sample mea (the sample average) s X. The Greek letter captal sgma, meas summato (add thgs together). (called the sample sze) s the umber of peces of data you pla to add together. So the symbol x x + x + x +... + x meas. Example: The average of,,, 9, 9 s x + + + 9 + 9 X = = =.
Secto. Descrptve Statstcs 9 I order to calculate a mea, there must be precse measuremets betwee the values of the scores. Thus, you ca oly calculate the sample mea for varables that have a terval or a rato level of measuremet. Measures of Posto Suppose you have a lst of scores from a test gve a musc class. The mmum s the score that s the lowest the class ad the maxmum s the hghest score the class. They are abbrevated M ad Max. The percetle rakg of a score tells you what percetage of the class scored less tha that score. Thus, f Suze scored a 8 o a test ad that was the 7 d percetle, the 7% of the class scored less tha 8. I other words, Suze scored better tha 7% of the people who took the same test. The th percetle s called the frst quartle, the th percetle s called the secod quartle (ths s also the meda), ad the 7 th percetle s called the thrd quartle. The symbols for the st, d, ad rd quartle are Q, Q, ad Q respectvely. % of the scores are less tha Q, % of the scores are betwee Q ad Q, % of the scores are betwee Q ad Q, ad % of the scores are greater tha Q. Thus, the quartles partto the scores to four groups wth a equal umber of studets each group. All of these measures of posto oly apply to scores that ca be ordered or raked, thus you ca oly calculate them for quattatve varables. Measures of Dsperso Suppose that you have a lst of scores from a test gve a musc class. The rage of the scores s the dstace from the lowest score to the hghest score, t measures the overall dstace of the spread of scores the class. The formula s: Rage = Maxmum Mmum. The er quartle rage s the dstace from the frst quartle to the thrd quartle, t measures the dstace across the mddle % of the data (from the th percetle to the 7 th percetle). The abbrevato for the er quartle rage s I.Q.R. The formula s: I.Q.R. = Q Q The rage calculates the overall dstace of the spread of data ad s a somewhat crude measuremet of the dsperso of the data (how spread out the data s). The sample varace s a more refed measuremet of dsperso. A large varace meas that the scores are spread out wth may scores far away from the mea. A small varace meas that oly a few of the scores are far away from the mea (the scores are more clumped together or clustered aroud the ceter). The symbol for sample varace s S. For example: the data,,,, has varace zero (there s o varato). the data,,,, has varace. (there s very lttle varato) the data,,,, has varace. the data,,,, 8 has varace. The formula for sample varace s: S = ( x ) X.
Chapter Collectg, Dsplayg, ad Aalyzg your Data I the examples below, each score the data s blue, the sample sze s gree ad the sample mea s red. Usg the data,,,, we ca see that = ad the average X =. Varace = ( x X ) ( ) + ( ) + ( ) + ( ) + ( ) = =. If we use the data,,, 7, we ca see that = ad the average X =. ( x X ) ( ) + ( ) + ( ) + ( 7 ) Varace = = =. Notce that ths s larger tha. because the data ths set s more spread out. I the formula for varace, ( x X ) s the dfferece betwee a score ad the average. I the formula, ths s squared so that the umber wll always be postve. Notce that the further away the score s from the average, the larger ths umber wll be. Whe we add up all of the squared dstaces betwee each score ad the average, the umber wll be smaller f the scores are close to the average ad larger f the scores are far away from the average. Thus, the varace s smaller f the scores are clumped closer together ad larger f the scores are more spread out. The sample stadard devato s smply the square root of the sample varace. The symbol for sample stadard devato s S, (whch explas why the symbol for sample varace s S ). Sample Stadard Devato = S = ( x ) X. The rage ad the I.Q.R. ca be appled to ay quattatve varable, but both the sample varace ad the sample stadard devato have the sample mea ther calculato, so they oly apply to varables that have a terval or a rato level of measuremet. A box & whskers graph shows the dsperso of the scores by dsplayg each quartle. The left whsker goes from the mmum to the frst quartle, Q. The box goes from the Q to Q. The box has a vertcal le sde of t dcatg the locato of Q. The rght whsker goes from Q to the maxmum. The rage s the dstace betwee the tp of each whsker. The I.Q.R. s the legth of the box. The lowest % of the scores are the left whsker, the mddle % of the scores are sde the box (% o each sde of the le), ad the hghest % of the scores are the rght whsker. Rage 78 % Ier Quartle Rage 78 78 } % % mmum Q Q = the meda Q 7 8 9 maxmum Rage = Maxmum Mmum = = 9 (the total legth of the graph). I.Q.R. = Q Q = 9 = (the legth of the box). % of the grades were betwee ad, % of the grades were betwee ad, % of the grades were betwee ad 9, ad % of the grades were betwee 9 ad. %
Secto. Descrptve Statstcs If the mmum or the maxmum score s really far away from the meda, there wll be some scores that are cosdered outlers. The most wdely used defto for outler s ay score whose dstace from the box s more tha ½ tmes the Ier Quartle Rage. So, f a score s less tha Q. I. Q. R., t s a outler. Or, f a score s greater tha Q +. I. Q. R., t s a outler. I the graph below, Q =, ad Q =, so I. Q. R. = =.. I. Q. R. =. =. Q = ad Q + = 9, so ay score less tha s a outler ad ay score more tha 9 s a outler. Wheever there are outlers, we draw the whsker to the boudary for outlers. Ay scores that are outlers (outsde of these boudares) are draw as sgle pots. mmum outlers % % 78 Q Q Q. I.Q.R. % % 78. I.Q.R. maxmum outlers Left boudary for outlers 7 8 9 Rght boudary for outlers Shapes of Frequecy Dstrbutos The frequecy dstrbuto for a varable shows the frequeces for each of ts values. Bar graphs are ofte used to llustrate frequecy dstrbutos. The shapes of these dstrbutos tell us may thgs about our sample. Below are bar-graphs llustratg the most commo shapes of frequecy dstrbutos. Notce that dfferet expermets ca result dfferet dstrbutos. A de was rolled tmes ad the outcomes were recorded. O the rght s a frequecy graph showg the frequecy for each value. Whle there are fluctuatos due to radom chace, each value appears equally ofte (approxmately). Ths s a uform frequecy dstrbuto. Frequecy 8 Outcome for Rollg a De Ffty persos were gve a co ad told to keep flppg the co utl t came up heads. We recorded how may tmes each perso had to flp the co. The graph o the rght shows a decreasg frequecy dstrbuto (lower umbered values occur more frequetly tha hgh umbered values). Frequecy # of Co Flps Utl a "Head" Appears
Chapter Collectg, Dsplayg, ad Aalyzg your Data Ffty persos ra a -yard dash ad ther tmes were recorded (rouded to the earest secod). The graph o the rght shows a bell-shaped ad symmetrc dstrbuto. It s bell-shaped because values the mddle occur more frequetly tha lower or hgher values. It s symmetrc because the mea ad mode are close to the meda (whch s the mddle). May commoly used dstrbutos have ths shape cludg the most used dstrbuto of all, the ormal dstrbuto (see Chapter ). Frequecy 7 Tme for yd Dash ( sec) A professor was teachg a freshma class choral musc at her college. She otced that half of her studets had atteded hgh schools wth choral musc programs ad half atteded schools that dd ot have a choral musc program. She gave a test (worth pots) durg the frst week of class ad the results are show o the graph at the rght. The tallest bar dcates the mode. Notce that ths case, there are two modes, ad. Ths dstrbuto s called a bmodal frequecy dstrbuto. Frequecy 9 8 7 Score o Choral Test Ms. Smth gave a test her musc class. The scores have bee rouded to the earest %. The graph o the rght shows a bell-shaped dstrbuto that s leftskewed. It s bell-shaped because values ear the mddle occur more frequetly tha lower or hgher values. It s left-skewed because the mea (average score) s to the left of the meda (whch s the mddle). The mea = X =.7 ad the meda =. You ca estmate that t s left-skewed because there s a tal to the left of the mode. Frequecy 7 8 9 Grade o Frst Test Later, Ms. Smth gave a retake of the test. The scores have bee rouded to the earest %. The graph o the rght shows a bell-shaped dstrbuto that s rghtskewed. It s bell-shaped because values ear the mddle occur more frequetly tha lower or hgher values. It s rght-skewed because the mea (average score) s to the rght of the meda (whch s the mddle). The mea = X = 8. ad the meda =. You ca estmate that t s rght-skewed because there s a tal to the rght of the mode. Frequecy 7 8 9 Grade o Retake Test
Secto. Descrptve Statstcs The tallest bar dcates the most frequet occurrece. Thus, the mode for the frst test was a 7 ad the mode for the retake test was a. If you add up all of the heghts of the bars oe class, you wll see that there are 8 studets the class. Thus, the sample sze = = 8. O both tests, scores are less tha ad scores are greater tha. I each case, s the mdpot for the data, so each meda =. You ca eye-ball the skewess of a dstrbuto by lookg to see f there s a log tal o ether sde of the mode. If there s a tal to the left (as o the frst test), the the dstrbuto s left-skewed. If there s a tal to the rght (as o the retake test), the the dstrbuto s rght-skewed. To make a more precse calculato of the skewess of a dstrbuto, you wll eed to calculate the sample mea. You ca calculate the sample mea from the graph by addg up all of the scores ad dvdg by 8. (Remember that the heght of the bar dcates how may of each score you wll eed to add). O the frst test, the mea was.7, whch s to the left of the meda, thus the dstrbuto s left-skewed. O the retake test, the mea was 8., whch s to the rght of the meda, thus the dstrbuto for the retake test s rght-skewed. Some dstrbutos are more skewed tha others, ad f you wat to measure the magtude of the skewess, you ca use the formula below. x X The Coeffcet for Skewess = ( )( ) S s the sample sze, S s the sample stadard devato, ad X s the mea. The symbol meas that we are gog to sum up terms of the form x X S. Notce that f the score x s less tha the mea X, we wll be addg a egatve term. If the score x s greater tha the mea X, we wll be addg a postve term. Thus, f eough scores are far eough to the left of the mea, ths coeffcet wll be egatve dcatg that the dstrbuto s leftskewed. If eough scores are far eough to the rght of the mea, ths coeffcet wll be postve dcatg that the dstrbuto s rght-skewed. The postve or egatve sg dcates the drecto of skewess, the sze of the umber after the sg dcates the degree to whch the dstrbuto s skewed. The more skewed the dstrbuto s, the greater ths umber wll be. I the prevous example, the frst test had a coeffcet for skewess =.7 (left). The retake test had a coeffcet for skewess = +.9 (rght). Thus, the results of the frst test were skewed to a greater degree tha the results for the retake test. You wll ot have to memorze ths formula or perform these calculatos by had. Ths fucto s cluded Mcrosoft Excel, so we wll let the computer calculate ths for us. You ca thk of skewess as pullg your data ether the postve or egatve drecto. Some statstcs are more lkely to be affected by the skewg of data tha others. For example, the data set,,,, has both the meda ad mea equal to. The data set,,,, stll has the meda =, but the mea X =. Thus, whe we added to the last score the frst data set, we skewed the data to the rght ad the mea was skewed ut to the rght. However, the meda s a more reslet measure of cetral tedecy ad t dd ot chage.
Chapter Collectg, Dsplayg, ad Aalyzg your Data Ths has cosequeces that affect how we terpret our data. For example, the varable Aual Icome has a dstrbuto that s rght-skewed (U.S. Cesus, ). % % % Percetage of Populato 8% % % % % $ - $ $ - $ $ - $ $ - $ $ - $ $ - $ $ - $7 $7 - $8 $8 - $9 $9 - $ $ - $ $ - $ $ - $ $ + Aual Icome Ths correspods to the box & whskers graph show at the rght. The two data pots at the far rght are oly kow to be greater tha $, per year, ther precse values are ot kow. $ $, $, $, $, $, As you ca see from the box & whskers graph, there are may outlers the postve drecto. I fact, a come of $ per year s ot a outler, but ayoe earg over $, per year s a outler. These outlers heavly fluece the calculato of the mea. The meda come s $,89 ad mea s X = $,8. However, % of the populato makes less tha the mea. What mght be a good defto of mddle class? Oe defto s the Ier Quartle, because t gves us the mddle % of the populato. People whose come s the bottom % of populato make less tha $, per year ad the people whose come s the top % of the populato make more tha $8, per year. Whch meas the mddle % of the populato ears betwee $, ad $8, per year.