Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day

LECTURE # 8 Mea Deviatio, Stadard Deviatio ad Variace & Coefficiet of variatio Mea Deviatio Stadard Deviatio ad Variace Coefficiet of variatio First, we will discuss it for the case of raw data, ad the we will go o to the case of a frequecy distributio. The first thig to ote is that, whereas the rage as well as the quartile deviatio are two such measures of dispersio which are NOT based o all the values, the mea deviatio ad the stadard deviatio are two such measures of dispersio that ivolve each ad every data-value i their computatio. You must have oted that the rage was measurig the dispersio of the data-set aroud the mid-rage, whereas the quartile deviatio was measurig the dispersio of the data-set aroud the media. How are we to decide upo the amout of dispersio roud the arithmetic mea? It would seem reasoable to compute the DISTANCE of each observed value i the series from the arithmetic mea of the series. Let us do this for a simple data-set show below: The Number of Fatalities i Motorway Accidets i oe Week: Day Number of fatalities Suday 4 Moday 6 Tuesday Wedesday 0 Thursday 3 Friday 5 Saturday 8 Total 8 Let us do this for a simple data-set show below: The Number of Fatalities i Motorway Accidets i oe Week: Day Number of fatalities Suday 4 Moday 6 Tuesday Wedesday 0 Thursday 3 Friday 5 Saturday 8 Total 8

The arithmetic mea umber of fatalities per day is 8 4 7 I order to determie the distaces of the data-values from the mea, we subtract our value of the arithmetic mea from each daily figure, ad this gives us the deviatios that occur i the third colum of the table below: Day Number of fatalities Suday 4 0 Moday 6 + Tuesday Wedesday 0 4 Thursday 3 1 Friday 5 + 1 Saturday 8 + 4 TOTAL 8 0 The deviatios are egative whe the daily figure is less tha the mea (4 accidets) ad positive whe the figure is higher tha the mea. It does seem, however, that our efforts for computig the dispersio of this data set have bee i vai, for we fid that the total amout of dispersio obtaied by summig the (x x) colum comes out to be zero! I fact, this should be o surprise, for it is a basic property of the arithmetic mea that:the sum of the deviatios of the values from the mea is zero. The questio arises: How will we measure the dispersio that is actually preset i our data-set? Our problem might at first sight seem irresolvable, for by this criterio it appears that o series has ay dispersio. Yet we kow that this is absolutely icorrect, ad we must thik of some other way of hadlig this situatio. Surely, we might look at the umerical differece betwee the mea ad the daily fatality figures without cosiderig whether these are positive or egative. Let us deote these absolute differeces by modulus of d or mod d. This is evidet from the third colum of the table below: d d 4 0 0 6 0 4 4 3 1 1 5 1 1 8 4 4 Total 14

By igorig the sig of the deviatios we have achieved a o-zero sum i our secod colum. Averagig these absolute differeces, we obtai a measure of dispersio kow as the mea deviatio. I other words, the mea deviatio is give by the formula: MEAN DEVIATION: M.D. d i As we are averagig the absolute deviatios of the observatios from their mea, therefore the complete ame of this measure is mea absolute deviatio --- but geerally we just say mea deviatio. Applyig this formula i our example, we fid that: The mea deviatio of the umber of fatalities is 14 M.D.. 7 The formula that we have just cosidered is valid i the case of raw data. I case of grouped data i.e. a frequecy distributio, the formula becomes MEAN DEVIATION FOR GROUPED DATA: M.D. f i x i x fi d i As far as the graphical represetatio of the mea deviatio is cocered, it ca be depicted by a horizotal lie segmet draw below the -axis o the graph of the frequecy distributio, as show below:

f Mea Deviatio The approach which we have adopted i the cocept of the mea deviatio is both quick ad simple. But the problem is that we itroduce a kid of artificiality i its calculatio by igorig the algebraic sigs of the deviatios. I problems ivolvig descriptios ad comparisos aloe, the mea deviatio ca validly be applied; but because the egative sigs have bee discarded, further theoretical developmet or applicatio of the cocept is impossible. Mea deviatio is a absolute measure of dispersio. Its relative measure, kow as the co-efficiet of mea deviatio, is obtaied by dividig the mea deviatio by the average used i the calculatio of deviatios i.e. the arithmetic mea. Thus Co-efficiet of M.D: Sometimes, the mea deviatio is computed by averagig the absolute deviatios of the datavalues from the media i.e. M. D. Mea x x~ Mea deviatio Ad whe will we have a situatio whe we will be usig the media istead of the mea?as discussed earlier, the media will be more appropriate tha the mea i those cases where our data-set cotais a few very high or very low values.i such a situatio, the coefficiet of mea deviatio is give by: Co-efficiet of M.D: M.D. Media Let us ow cosider the stadard deviatio --- that statistic which is the most importat ad the most widely used measure of dispersio. The poit that made earlier that from the mathematical poit of view, it is ot very preferable to take the absolute values of the deviatios, This problem is overcome by computig the stadard deviatio. I order to compute the stadard deviatio, rather tha takig the absolute values of the deviatios, we square the deviatios. Averagig these squared deviatios, we obtai a statistic that is kow as the variace.

Variace ( x x) Let us compute this quatity for the data of the above example. Our -values were: 4 6 0 3 5 8 Takig the deviatios of the -values from their mea, ad the squarig these deviatios, we obtai: ( x x ) ( x x ) 4 0 0 6 + 4 4 0 4 16 3 1 1 5 + 1 1 8 + 4 16 4 Obviously, both ( ) ad () equal 4, both ( 4) ad (4) equal 16, ad both ( 1) ad (1) 1.

Hece (x x) 4 is ow positive, ad this positive value has bee achieved without bedig the rules of mathematics. Averagig these squared deviatios, the variace is give by: Variace: ( x x) 4 6 7 The variace is frequetly employed i statistical work, but it should be oted that the figure achieved is i squared uits of measuremet. I the example that we have just cosidered, the variace has come out to be 6 squared fatalities, which does ot seem to make much sese! I order to obtai a aswer which is i the origial uit of measuremet, we take the positive square root of the variace. The result is kow as the stadard deviatio. STANDARD DEVIATION: S ( x x ) Hece, i this example, our stadard deviatio has come out to be.45 fatalities. I computig the stadard deviatio (or variace) it ca be tedious to first ascertai the arithmetic mea of a series, the subtract it from each value of the variable i the series, ad fially to square each deviatio ad the sum. It is very much more straight-forward to use the short cut formula give below: SHORT CUT FORMULA FOR THE STANDARD DEVIATION: S x x I order to apply the short cut formula, we require oly the aggregate of the series ( x) ad the aggregate of the squares of the idividual values i the series ( x). I other words, oly two colums of figures are called for. The umber of idividual calculatios is also cosiderably reduced, as see below:

4 16 6 36 4 0 0 3 9 5 5 8 64 Total 8 154 Therefore S 154 8 7 7 ( 16) 6.45 fatalities The formulae that we have just discussed are valid i case of raw data. I case of grouped data i.e. a frequecy distributio, each squared deviatio roud the mea must be multiplied by the appropriate frequecy figure i.e. STANDARD DEVIATION IN CASE OF GROUPED DATA: S f ( x x ) Ad the short cut formula i case of a frequecy distributio is: SHORT CUT FORMULA OF THE STANDARD DEVIATION IN CASE OF GROUPED DATA: fx fx S Which is agai preferred from the computatioal stadpoit? For example, the stadard deviatio life of a batch of electric light bulbs would be calculated as follows: EAMPLE: Life (i Hudreds of Hours) No. of Bulbs f Midpoit x fx fx 0 5 4.5 10.0 5.0 5 10 9 7.5 67.5 506.5 10 0 38 15.0 570.0 8550.0 0 40 33 30.0 990.0 9700.0 40 ad over 16 50.0 800.0 40000.0

Therefore, stadard deviatio: S 78781.5 437.5 100 100 13.9hudredhours 1390 hours As far as the graphical represetatio of the stadard deviatio is cocered, a horizotal lie segmet is draw below the -axis o the graph of the frequecy distributio --- just as i the case of the mea deviatio. f Stadard deviatio The stadard deviatio is a absolute measure of dispersio. Its relative measure called coefficiet of stadard deviatio is defied as: Coefficiet of S.D: Sta dard Deviatio Mea

Ad, multiplyig this quatity by 100, we obtai a very importat ad well-kow measure called the coefficiet of variatio. Coefficiet of Variatio: S C.V. 100 As metioed earlier, the stadard deviatio is expressed i absolute terms ad is give i the same uit of measuremet as the variable itself. There are occasios, however, whe this absolute measure of dispersio is iadequate ad a relative form becomes preferable. For example, if a compariso betwee the variability of distributios with differet variables is required, or whe we eed to compare the dispersio of distributios with the same variable but with very differet arithmetic meas. To illustrate the usefuless of the coefficiet of variatio, let us cosider the followig two examples: EAMPLE-1 Suppose that, i a particular year, the mea weekly earigs of skilled factory workers i oe particular coutry were $ 19.50 with a stadard deviatio of $ 4, while for its eighborig coutry the figures were Rs. 75 ad Rs. 8 respectively. From these figures, it is ot immediately apparet which coutry has the GREATER VARIABILITY i earigs. The coefficiet of variatio quickly provides the aswer: COEFFICIENT OF VARIATION For coutry No. 1: 4 100 0.5 per cet, 19.5 Ad for coutry No. : 8 100 37.3 per cet. 75 From these calculatios, it is immediately obvious that the spread of earigs i coutry No. is greater tha that i coutry No. 1, ad the reasos for this could the be sought. EAMPLE-: The crop yield from 0 acre plots of wheat-lad cultivated by ordiary methods averages 35 bushels with a stadard deviatio of 10 bushels. The yield from similar lad treated with a ew fertilizer averages 58 bushels, also with a stadard deviatio of 10 bushels. At first glace, the yield variability may seem to be the same, but i fact it has improved (i.e. decreased) i view of the higher average to which it relates. Agai, the coefficiet of variatio shows this very clearly: Coefficiet of Variatio:

Utreated lad: 10 100 8.57 per cet 35 Treated lad: 10 100 17.4 per cet 58 The coefficiet of variatio for the utreated lad has come out to be 8.57 percet, whereas the coefficiet of variatio for the treated lad is oly 17.4 percet.