TOPIC 6 MEASURES OF VARIATIO If people s eyes ted to blak out tables if figures, you ca be dar sure that they blak out the small writig that goes aroud them. Ala Graham, 1994 The cocept of variatio sometimes averages are t eough A measure of the average value ca provide a lot of useful iformatio about a set of observatios, but i may cases it is ot sufficiet to tell us everythig about the variable. Cosider, for example, Figure 6.1 below: Figure 6.1 Compariso of Two Distributios Populatio A Populatio B the distributios are differet yet give the same averages While the two distributios show have the same average values, whether measured as a mea, a media, or a mode, we could ot say that the distributios were the same. To describe ad compare them we eed additioal iformatio; we eed alterative ways of describig the distributios. After the average value, the ext most importat property of the distributio that we eed to measure is the variability of the distributio. From Figure 6.1 we ca see that distributio B is much more variable (or spread out) tha distributio A. I this sectio we shall look at differet ways of measurig variability. actual level of variability We wat to measure variability for two mai reasos. Firstly we may be iterested i the actual level of variability ad i comparig this with aother distributio. If we are lookig at icome distributios, for example, the the govermet may be iterested ot oly i the average icome level, but also i the variability of icome level betwee people ad also betwee differet regios of a coutry. May policies are desiged to help redistribute icome from the richest to the poorest (thereby reducig the Data Aalysis Course Topic 6-15
Relative frequecy Measures of variatio variability of icome levels), ad so we would eed to measure variability to see if it chages over time. variability due to sample variatio The secod reaso for watig to measure variability is whe we use samplig to compare populatio values. We the eed to take variability ito accout. We wat to be able to distiguish betwee differeces that might have just happeed by chace (that is, i the selectio of the samples) ad those that idicate some real chage. example Let us look at a example where we are comparig two populatio distributios. Figure 6. Compariso of Icome Levels of Two Populatios Populatio A Populatio B Icome variability is ot ecessarily reflected i averages Populatio A represets the distributio of aual icome per household i oe regio ad Populatio B represets the distributio of aual icome of households i aother regio. Both have the same mea level of icome of $1,800 per year, but we caot say that the two distributios are the same. The distributio of icome of Populatio B is far more spread out tha Populatio A. It also has, therefore, a greater degree of variability. two differet measures of variability It is clear that we should ot oly compare measures of locatio whe lookig at populatios, but also measures of variability. I this topic we shall cosider two differet measures of variability which are basically of two types: a. measures of the distace betwee represetative values of the populatio; ad b. measures of the distace of every uit of the populatio from some specified cetral value. rage ad stadard deviatio for ugrouped data As examples of these measures of variability we shall look i this topic at the rage ad the stadard deviatio (or variace) for ugrouped data. More complicated techiques (such as fidig the stadard deviatio whe the observatios are grouped i a frequecy distributio) are covered i advaced traiig. The rage largest smallest The simplest way to measure the variatio or spread, give a set of observatios, is to calculate the rage. The rage of a set of observatios is defied to be the differece betwee the smallest ad the largest values i the set. This is very simple to uderstad ad easy to calculate ad so has a obvious appeal. It is used i practice, but is oly really useful whe the variable uder cosideratio has a fairly eve spread of values over the rage. It has some obvious drawbacks which ted to restrict its use i Topic 6-16 Secretariat of the Pacific Commuity
practice; some of the more importat disadvatages are: disadvatages a. because the rage is the differece betwee the largest ad the smallest value, it is very sesitive to very large or very small observatios. The iclusio of just oe freak (that is, rare or uusual) value will greatly affect the rage; b. the rage is depedet o the umber of observatios. Icreasig the umber of observatios ca oly icrease the rage; it ca ever make it less. This meas that it is difficult to compare rages for two distributios with differet umbers of observatios; c. while the rage is very easy to calculate, it has the disadvatage that it igores all the data i betwee the highest ad the lowest values. If, for example, we cosider the followig three sets of data: Set 1 3, 5, 7, 9, 11, 13, 15, 17, 17, 17, 17 Set 3, 5, 5, 5, 17, 17, 17, 17, 17, 17, 17 Set 3 3, 6, 7, 8, 10, 11, 14, 14, 15, 16, 17 we see that the rage for all three sets are the same (17-3 = 14), but the degree of variatio is by o meas the same; d. it is difficult to calculate the rage for data grouped i a frequecy distributio. All we ca really do is take the differece betwee the lower limit of the first class ad the upper limit of the last class. This would obviously deped o the defiitios of the classes, ad is impossible if you have a ope-eded class. However, some judgmets ca be made depedig o the kowledge of the subject matter uder observatio. For practical purposes the ope-eded classes are usually closed by guessig a value for the ope-ed. example Let us cosider aother example, the values of imports i various Pacific islad coutries i 1995. Table 6.1 Total imports by coutry, 1995 (i thousad AUD) Coutry Value of imports (A$ 000) Cook Islads 65,363 Fiji 1,17,05 Kiribati 47,547 Marshall Islads 100,073 Papua ew Guiea 1,741,935 Samoa 16,689 Solomo Islads 4,54 Toga 98,047 Tuvalu 1,535 Vauatu 14,51 Source SPESS 14, 1998, Pacific Commuity, oumea method The rage of the import values is the differece betwee the largest ad the smallest value ad i this case the rage is: Data Aalysis Course Topic 6-17
Rage = $ (1,741,935,000 1,535,000) = $1,79 millio do ot usually calculate rage for grouped data The rage from a grouped frequecy distributio is ot usually calculated because of the reasos give i the sectio o disadvatages of the rage. However, it ca be obtaied approximately by takig the differece betwee the upper limit of the last class ad the lower limit of the first class. We must ote that it ca sometimes be very difficult ad at times meaigless if either or both of these classes are ope-eded. Let us cosider oce agai the example of aual household cash icome i two regios of a coutry, which are give i the followig frequecy distributios: Icome ($) Table 6. Compariso of the rage of icome of two regios Regio A Aual Household Cash Icome Frequecy (o. of Households) Icome ($) Regio B Frequecy (o. of Households) Less tha 500 (00*) 137 Less tha 1,000 (500*) 86 500 999 78 1,000-1,999 137 1,000-1,499 406,000 -,999 64 1,500-1,999 331 3,000-3,999 47,000-4,999 188 4,000-6,999 130 5,000-9,999 59 7,000-9,999 6 10,000-19,999 138 10,000 & over (0,000*) 88 0,000 & over (30,000*) 14 Total 1,751 614 Source: Table 5.1 (illustrative data oly) * = Assumed limits ope-eded class itervals Obviously we caot calculate the rage of icome i such cases because of the presece of opeeded class itervals at both eds. However, if we do have to calculate icome rages for the two populatios, we will be forced to make some assumptios. These assumptios may be well-fouded or ill-fouded, but evertheless, if a decisio has to be made, we will have to put some values i the opeeded classes. I the example above, the assumed values are: Assumed Values ($) Regio A B Lower limit (first class) 00 500 Upper limit (last class) 30,000 0,000 rage The icome rages for the distributios may ow be calculated as follows: Regio A Regio B Icome Rage = $(30,000-00) = $9,800 Icome Rage = $(0,000-500) = $19,500 clearly state assumptios I the above example, although we have derived the icome rages as $9,800 for regio A ad $19,500 for regio B the rages could be meaigless if it was later realised that the assumed values were icorrect. However, statisticias ad plaers are ofte cofroted with such problems i their Topic 6-18 Secretariat of the Pacific Commuity
everyday work ad decisios such as those take i the case of the rages of icome i regios A ad B are the types of decisios which they have to live with. The importat thig is that the assumptios applied to geerate a result are clearly stated. use poits other tha the highest ad lowest We ca get aroud most of the problems of the rage as a measure of the variatio by usig other poits i the distributio rather tha the two extreme poits. Aother choice would be to measure what we call the quartile deviatio or the semi iter-quartile rage (that is, to measure the mea average differece betwee the upper ad lower quartiles). For a discussio of upper ad lower quartiles refer to Topic 5, More o measures of locatio. The quartile deviatio is ot icluded i these otes, but covered i the advaced aalysis course. use percetiles Aother alterative is to use the differece betwee, say, the 10 th ad the 90 th percetile (that is, those values for which 10 per cet ad 90 per cet of the observed values are below). As measures of variatio, both of these are quite useful. They are ot affected by ay oe or two extreme or rare observatios, they are less depedet o the umber of observatios, ad they will ted to differetiate betwee differet sets of observatios. I the case of ugrouped frequecy distributios, we ca early always calculate these values. I the case of grouped frequecy distributios, a problem occurs whe oe of the percetile or quartile values falls i a ope-eded class. Stadard deviatio stadard deviatio as a measure of spread Although the rage is a simple measure of variatio or spread, it has may disadvatages. We therefore eed a measure which will overcome these disadvatages while still providig a good measure of variatio. Oe method is the mea deviatio where we measure the distace of observatios from the mea. However, the mea deviatio icorporates absolute values ad these are difficult to deal with mathematically. The stadard deviatio is based o the same priciples as the mea deviatio, but i this case we elimiate the sigs of the deviatios from the mea by squarig them. method How does the stadard deviatios work? Like the mea, the stadard deviatio takes all the observed values ito accout. If there were o dispersio at all i a distributio, all the observed values would be the same. The mea would also be the same as this repeated value. So if everyoe had the same height of 180cm, the mea would be 180cm. o observed value would deviate or differ from the mea. But, with dispersio, the observed values do deviate from the mea, some by a lot, some by oly a little. Quotig the stadard deviatio of a distributio is a way of idicatig a kid of average amout by which all the values deviate from the mea. The greater the dispersio, the bigger the deviatios ad the bigger the stadard deviatio. priciple of stadard deviatio The stadard deviatio is foud by addig the squares of the deviatios of the idividual values from the mea of the distributio, dividig this sum by the umber of items i the distributio, ad the fidig the square root of this umber. Lets ow explai the procedure for calculatig the stadard deviatio i more detail. I terms of a populatio cosistig of values x 1, x, x 3... x with a mea (proouced mu or mew) the stadard deviatio of a populatio is defied as: Data Aalysis Course Topic 6-19
Formula Stadard Deviatio ( ) = i1 ( x i Defiitio To describe the formula we will work through the steps to calculate the stadard deviatio. First we calculate : is calculated the same way as x i the previous chapter (i.e. we add up all the umbers ad divide by how may umbers there were). We call it whe we are dealig with a populatio, rather tha x whe it is a sample. We subtract from each x value: (x i - ) Square each of these values: (x i - ) Sum these values to get the total: ( x ) Divide by the umber of uits i the populatio (): Take the square root of everythig: i1 (x i1 i (x i The stadard deviatio of a populatio is deoted by (the Greek letter for small sigma). variace The square of the stadard deviatio is called the variace ad is deoted by. Whe we square the result of a formula which has a square root, the square root sig is cacelled out ad disappears. We the have: formula Variace ( ) = i1 (x i sample variace If we are dealig with a sample ad wish to calculate the sample variace (or sample stadard deviatio) i order to estimate the value for the populatio, the formula is chaged slightly. I this case s stads for the sample variace, x the sample mea, ad the sample size. The formula for the sample variace is the: Sample Variace (s ) = i1 ( x i x) ( 1) Topic 6-130 Secretariat of the Pacific Commuity
ad the sample stadard deviatio is give by: Sample Stadard Deviatio (s) = i1 ( x i x) 1 sample = (-1) These formulae for samples are effectively the same as those for populatios, except that we have used the divisio ( - 1) istead of. The importat thig to remember is that whe calculatig the variace or stadard deviatio of a sample, divide by ( - 1). Whe calculatig the variace or stadard deviatio of a populatio, divide by. large stadard deviatio = large spread ote that the more the values of idividual items differ from the mea, the greater will be the square of these differeces ad therefore the greater the sum of squares. Therefore, the greater the sum of squares, the larger will s (the stadard deviatio) be. Hece, the greater the dispersio, the larger the stadard deviatio will be. example We will ow go through the calculatio of the stadard deviatio usig the followig data. Table 6.3: 000 Secodary School Erolmet by Provice, PG Deviatio Deviatios Provice Erolmets from mea squared Wester 961 -,470 6,100,900 Gulf 1,53-1,908 3,640,464 CD 4,854 1,43,04,99 Cetral 3,344-87 7,569 Oro 3,134-97 88,09 SHP 1,68-1,749 3,059,001 EHP 5,768,337 5,461,569 Simbu 6,18,751 7,568,001 Mea = 3,431 0 7,950,64 Source: Illustrative data oly first fid the mea To calculate the stadard deviatio we first calculate x. x = i1 x i = (961 + 1,53 + 4,854 + 3,344 + 3,134 + 1,68 + 5,768 + 6,18)/8 = 3,431 I Colum 3 we subtract the mea value from the values for each year. I Colum 4 we square the deviatios ad sum these squared deviatios, givig a total of 7,950,64. Data Aalysis Course Topic 6-131
data from a populatio If the above data are cosidered to be from a populatio, the to derive the stadard deviatio we divide the sum of the squared deviatios by the umber of the observatios ( = 8) ad take the square root. I this case we have: Populatio Stadard Deviatio () = 7,950,64 8 = 3,493,830. 5 = 1,869.18 data from a sample However, if the data are cosidered to be a sample from a populatio, the to derive the stadard deviatio we divide the sum of the squared deviatios by oe less tha the umber of the observatios or idustries (-1 = 7) ad take the square root. I this case we have: Sample Stadard Deviatio (s) = 7,950,64 7 = 3,99,948. 86 = 1,998.4 I this example we would probably cosider the data to be sample data, so would divide by 7. awkward with a large set of umbers Although this is a fairly simple procedure to calculate the stadard deviatio of a small set of umbers, it is quite a cumbersome procedure for a large set of umbers. First of all we have to determie the mea of the set, the calculate the deviatios of each observatio from the mea, square these ad add them up. Eve with the aid of a calculator the operatios take quite a lot of time. It is best to use a computer to perform the calcuatios. rearrage the formula We ca, however, make the calculatio much easier by rearragig the formula for the variace. Thus, for a sample, we have: Sample formula s = i1 ( x i x) 1 = i1 xi xi i1 1 populatio formula = i1 ( x ) i = i1 xi xi i1 steps for sample variace Lets ru through this formula for the sample variace. For the sample variace we first square each idividual x value: x i We the calculate the sum of those squared umbers: x i i1 Call this total A. We also calculate the total of the idividual x values: i1 xi Topic 6-13 Secretariat of the Pacific Commuity
We square this total: i1 xi ad divide by (the umber i the sample): xi i1 Call this total B We the take A - B ad divide by (-1): i1 xi xi i1 1 ot as complicated as it looks Although the secod formula looks more complicated, it is i fact much easier to use whe we are usig a calculator. For example, let us cosider the followig sample values which are the same observatios that we had cosidered i Table 6.3. example 961 1,53 4,854 3,344 3,134 1,68 5,768 6,18 total ad the mea of the observatios a. Calculatig the variace of the sample the first way would etail firstly obtaiig the total ad the mea of the observatios. We have: x i = 7,448, = 8 x = 3,431 secod method The deviatios from the mea are: -,470-1,908 1,43-87 -97-1,749,337,751 The sum of the squares of the deviatio is 7,950,64. Thus, the variace is: s = 7,950,64 / 7 = 3,99,948.86 b. Calculatig the variace usig the secod method or formula we eed: x i = 7,448 ad x i = 1,14,730 s = i1 xi xi i1 1 = [1,14,730 - {(7,448) / 8}] / 7 = (1,14,730 94,174,088) / 7 s = 3,99,948.86 secod method is easier ad faster Thus we see that if we use the memory fuctio i a calculator, the secod calculatio ca be doe without havig to write ay itermediate results. You will also ote that the variace derived usig either of the two methods is the same (3,99,948.86) except that the secod method is easier ad faster. Data Aalysis Course Topic 6-133
Properties of the stadard deviatio remember Whe usig the stadard deviatio it is importat to remember the followig poits: the stadard deviatio is used oly to measure the spread about the mea; the stadard deviatio is ever egative; the stadard deviatio is sesitive to extreme values (called outliers). A sigle outlier ca raise the stadard deviatio a great deal, distortig the picture of spread; ad the greater the spread, the greater the stadard deviatio. Coefficiet of variatio the mea adds meaig to the stadard deviatio The stadard deviatio by itself is ot very meaigful uless it is cosidered alog with the arithmetic mea. For example, a stadard deviatio of $100 whe the mea icome is $10,000 implies a much greater relative variatio tha a stadard deviatio of $100 for a mea GDP figure of $10,000,000. Also, comparig the variability of two populatios with differet uits of measuremet (for example, icome levels i Papua ew Guiea (Kia) ad Vauatu (Vatu) ca be very difficult. iterested i variatio from the mea Hece, the variability i a set of observatios ca usefully be measured relative to a cetral measure such as the arithmetic mea. Such a measure is provided by the coefficiet of variatio, which is the ratio of the stadard deviatio to the arithmetic mea, usually expressed as a percetage, ad is give by the formula: formula Coefficiet of Variatio (C.V.) = ( / x ) 100 (The 100 coverts the umber to a percetage.) ca compare data To compare the variability of two sets of figures would therefore ivolve comparig their respective coefficiets of variatio. The coefficiet of variatio allows for comparisos whe: o the meas of the distributios beig compared are far apart, or o the data are i differet uits. percetage The uits are coverted to a commo deomiator (a percet). example If we look at the data i Table 6.3, we ca calculate the coefficiet of variatio as: C.V. = ( / x ) * 100 = 1,869.18 / 3,431 * 100 = 54.48% illustrative example Let s use some made up data to illustrate the coefficiet of variatio. The mea icome of homeowers i Australia is $40,000 with a stadard deviatio of $4,000. I Topic 6-134 Secretariat of the Pacific Commuity
Kiribati, the mea icome of home owers is $1,000 with a stadard deviatio of $1,00. (ote that the meas are far apart ad the stadard deviatios are differet. Compare ad iterpret the relative dispersio i the two groups o icomes. solutio The first impulse is to say that there is more dispersio i the icomes i Australia because the stadard deviatio is greater. However, whe we covert the two measuremets to relative terms usig the coefficiet of variatio, we fid that the relative dispersio is the same. Australia Kiribati CV(Australia) = ( / x ) * 100 = $4,000/$40,000 * 100 = 10% CV(Kiribati) = ( / x ) * 100 = $1,00/$1,000 * 100 = 10% similar CV I summary the icome for both Australia ad Kiribati have similar amout of variatio. example We could also compare two differet types of data icomes ad age of homeowers. We could compare the spread of icomes of homeowers i Australia with say the spread of the age of homeowers. The mea age of homeowers is 40 years with a stadard deviatio of 10 years. age CV(age) = ( / x ) * 100 = (10/40) * 100 = 5% CV(icome) = 10% We ca see that there is greater relative dispersio i the ages of the homeowers tha i their icomes. ormal distributio used extesively A particular distributio that is used extesively i statistical theory is the ormal distributio: Data Aalysis Course Topic 6-135
properties The ormal distributio has several key properties. o o o it is symmetrical; it is bell shaped; mea of the distributio is the peak; ad o the area uder the curve is always 1. always have the four ormal distributios ca have differet meas ad stadard deviatios, but they always have these four key properties. everyday examples May pheomea i every day life ca be described by the ormal curve, for example people s height. A small umber of people i the populatio are very short, a small umber are very tall, ad the majority of the populatio fall i some middle rage. May other pheomea are also ormally distributed, for example test scores ad weights of people. We could discuss the ormal distributio extesively, but for ow that is all you eed to kow. Referece Rages for a Stadard Deviatio aalysis of data ormally distributed Whe aalysig ormally distributed data, the stadard deviatio is used with the mea to calculate where the data lie withi certai referece rages. The most importat thig to uderstad about referece rages is that for ay set of ormally distributed data: referece rages about 68% of the data lie i the iterval x - s < x < x + s (That is, 68% of the data lie i the rage from the mea mius the stadard deviatio to the mea plus the stadard deviatio) about 95% of the data lie i the iterval x - s < x < x + s about 99% of the data lie i the iterval x - 3s < x < x + 3s where x = the mea; ad s = the stadard deviatio 68% referece rage If we look at the data i Table 6.3, we ca calculate the 68% referece rage for the data as: 68% Referece rage: ( x - s, x + s) (3431-1998.4, 3431 + 1998.4) (143.76, 549.4) That is, 68 % of the data lies i the rage 1,43.76 to 5,49.4. 95% referece rage We ca calculate the 95% referece rage as: 95% Referece rage: ( x - s, x + s) (3431 - (1998.4), 3431 + (1998.4)) Topic 6-136 Secretariat of the Pacific Commuity
(3431 3996.48, 3431 + 3996.48) (-565.48, 747.48) That is, 95 % of the data lies i the rage 565.48 to 7,47.48. Summary of the measures of variability RAGE is easily calculated, except for frequecy distributios, ad is well uderstood; is based o the two extreme observatios ad is thus very ustable; is difficult to maipulate mathematically; provides o iformatio about the geeral behaviour of the distributio; should oly be used as a rough guide to the level of variability. VARIACE/STADARD DEVIATIO is a measure of variability usig iformatio from every observatio; with some maipulatio, the calculatios are reasoably straight-forward; has a cetral role i mathematical ad statistical theory ad is very widely used; ca be affected by extreme values; is the most commoly used measure of variability. COEFFICIET OF VARIATIO is idepedet of the uits of observatios. Therefore, it is useful i comparig distributios where the uits of observatios are differet; a disadvatage of the coefficiet of variatio is that it is ustable whe the arithmetic mea is close to zero. Data Aalysis Course Topic 6-137
Oe fial characteristic of a distributio uderstad the uderlyig structure The objective of summarisig a set of data is to make it possible to comprehed the uderlyig structure ad patter of the distributio of the values of the variable uder cosideratio. The attempt i summarisig the data is to reduce them to a few measures which would give us a idicatio of the cetral values, variatio of the values, cocetratio of the frequecies ad shape of the distributio. The frequecy distributio describes the populatio we are cosiderig, ad the measures of locatio ad variatio help us to characterise the distributio by simple measures. skewed distributios asymmetrical Aother way of characterisig a distributio is to study its skewess (that is, whether the distributio is ot symmetrical ad, if ot, whether the observatios are cocetrated i the low or high values). Examples of skewed distributios are icome, lad holdig size ad household size. For such distributios, oe is iterested i fidig out the type of skewess, whether there are more uits with low values tha uits with high values, or whether there are more uits with high values tha uits with low values. 'right tail A distributio is said to be positively skewed if large frequecy values are cocetrated to the left of the distributio ad the distributio has small frequecy values to the right of the distributio (that is, the distributio has a right tail ad has more low values tha high values). left tail A distributio is said to be egatively skewed if large frequecy values are cocetrated to the right of the distributio ad the distributio has small frequecy values to the left of the distributio (that is, the distributio has a left tail ad has more high values tha low values). three mai features A distributio ca be cosidered to have three mai features which are of iterest i studyig a populatio. These features are: 1 its cetral values; its variatio from the cetral values; 3 whether the distributio is symmetric about the cetral values; ad if ot symmetric, whether it is leaig to the left or right. Topic 6-138 Secretariat of the Pacific Commuity
Exercises 1. The local bus compay employs 10 people. The legth of service, i completed years, for each employee is as follows: 8 8 1 4 1 8 8 7 3 (a) (b) (c) (d) Calculate the rage. Calculate the stadard deviatio (assume the values are sample values). Calculate the coefficiet of variatio. Calculate the referece rage which cotais approximately 68% of observatios.. Customs files reveal the ages of persos leavig the coutry. A sample of ages are: 16, 41, 5, 1, 30, 17, 9, 50, 30 ad 39. (a) (b) (c) (d) Calculate the rage. Assume the values are sample values ad calculate the sample variace usig the secod method of calculatig the variace. Calculate the coefficiet of variatio. Calculate the referece rage that cotais approximately 95% of observatios. Data Aalysis Course Topic 6-139
3. The local market reported the followig umber of people buyig vegetables for the past 9 days: 81 65 58 47 30 51 9 85 4 (a) (b) (c) (d) Calculate the rage. Calculate the stadard deviatio (assume the values are sample values). Calculate the coefficiet of variatio. Calculate the referece rage that cotais approximately 95% of observatios. Topic 6-140 Secretariat of the Pacific Commuity
Self-Review 1. The followig data represet the amout spet (i dollars) by a radom sample of 14 households o basic food items for oe moth: 57 34 7 41 5 18 39 33 37 39 38 47 31 4 (a) (b) (c) (d) Calculate the rage. Calculate the sample stadard deviatio. Calculate the coefficiet of variatio. Calculate the referece rage that cotais approximately 99% of observatios. Data Aalysis Course Topic 6-141
Topic 6-14 Secretariat of the Pacific Commuity
Excel fuctios More statistical fuctios I Topic 5, you were show how to use the fuctios related to Measures of Locatio. I this sectio, those relevat to Measures of Variatio are illustrated. You do t have to use the fuctios istead you ca set up a worksheet with the three colums (observatio, deviatio from the mea ad deviatios squared). See the computer otes for Topic 7 to set up the worksheet to calculate the variace, stadard deviatio ad stadard error from sample data. You have to be careful because the way your sample was selected determies how the stadard error is calculated. If you have ay doubts about the correct formula to use, cotact the SPC Statistics Programme for help. Whe calculatig the variace or stadard deviatio, it might be more useful to use the worksheet method rather tha the Excel fuctio. If you have the colums set up i your worksheet you ca see the differet compoets of the equatio (x etc), ad it would be easier to fid out why you had a larger or smaller tha expected deviatio i your data. You also have to be aware that Excel uses its average fuctio which icludes 0 values i the cout of observatios () which might ot be appropriate i all circumstaces. The rage You do t really eed to use a fuctio to calculate the rage use the sort buttos o the Stadard toolbar. You ca sort from smallest to largest with the butto, ad from largest to smallest with the butto. Be careful whe you sort data either select ALL your data, or click with the mouse i the colum you wat to sort by: it is very easy to corrupt your data with the sort buttos (you do t get a warig like you do with the sort optio o the Data meu). Populatio variace Excel calculates the variace for a POPULATIO usig the formula: which is a differet way of writig the oe used i your otes. Format: Exampl e = varp(cell rage) =varp(a1:a333) will calculate the variace for the POPULATIO i cells A1 to cell A333. Sample variace Excel calculates the variace for a SAMPLE usig the formula: which agai is a differet way of writig the oe used i your otes. Format: = var(cell rage) Example =var(a1:a333) will calculate the variace for the SAMPLE i cells A1 to cell A333. Data Aalysis Course Topic 6-143
Populatio stadard deviatio Excel calculates the stadard deviatio for a POPULATIO usig the formula: which is a differet way of writig the oe used i your otes. Format: Example = stdevp(cell rage) = stdevp(a1:a333) will calculate the stadard deviatio for the POPULATIO i cells A1 to cell A333. Sample stadard deviatio Excel calculates the stadard deviatio for a SAMPLE usig the formula: which agai is a differet way of writig the oe used i your otes. Format: Example = stdev(cell rage) =stdev(a1:a333) will calculate the stadard deviatio for the SAMPLE i cells A1 to cell A333. Cofidece iterval You ca use Excel to calculate the cofidece iterval for a mea. You have to type i the stadard deviatio so the fuctio is ot that user friedly. Format: Example = cofidece(alpha,stadard_dev,size) Where alpha is the sigificace level used to compute the cofidece level. The cofidece level equals 100*(1 - alpha)%, or i other words, a alpha of 0.05 idicates a 95 percet cofidece level. Stadard_dev is the populatio stadard deviatio for the data rage ad is assumed to be kow. Size is the sample size. Suppose we observe that, i a sample of 50 commuters, the average legth of travel to work is 30 miutes with a populatio stadard deviatio of.5. We ca calculate with 95% cofidece that the populatio mea is i the iterval: =COFIDECE(0.05,.5,50) equals 0.69951. I other words, the average legth of travel to work equals 30 ± 0.69951 miutes, or 9.3 to 30.7 miutes. Topic 6-144 Secretariat of the Pacific Commuity