Chapter 4 4-1 orth Seattle Commuity College BUS10 Busiess Statistics Chapter 4 Descriptive Statistics Summary Defiitios Cetral tedecy: The extet to which the data values group aroud a cetral value. Variatio: The amout of dispersio, or scatterig, of values Shape: The patter of the distributio of values from the lowest value to the highest value. BUS10: Busiess Statistics Descriptive Statistics- The Mea The arithmetic mea is the most commo measure of cetral tedecy The Mea (cotiued) The most commo measure of cetral tedecy Mea = sum of values divided by the umber of values Easily affected by extreme values (outliers) xbar For a sample of size : Sample size X = X i i is the umber of the variable = X 1 + X ++ X These are the observed values BUS10: Busiess Statistics Descriptive Statistics- 3 0 1 3 4 5 6 7 8 9 10 0 1 3 4 5 6 7 8 9 10 Mea = 3 Mea = 4 1+ + 3 + 4 + 5 15 1+ + 3 + 4 + 10 0 = = 3 = = 4 5 5 5 5 BUS10: Busiess Statistics Descriptive Statistics- 4 The Media Locatig the Media The media is the middle value i a ordered array 50% of data values are above, 50% are below 0 1 3 4 5 6 7 8 9 10 Media = 3 0 1 3 4 5 6 7 8 9 10 Media = 3 The locatio of the media i a ordered array: + 1 Media positio = positio ithe ordered data If is odd, the media is the exact middle umber If is eve, the media is the average of the two middle umbers ote: + 1 is ot the value of the media, but it s positio withi the raked data. Little affected by extreme values (if at all) BUS10: Busiess Statistics Descriptive Statistics- 5 BUS10: Busiess Statistics Descriptive Statistics- 6
Chapter 4 4- The Mode The value that occurs the most ofte There may be o mode, or There may be two modes (bimodal), or There may be several modes (multimodal) Used for either umerical or categorical data ot affected by extreme values 0 1 3 4 5 6 7 8 9 10 11 1 13 14 Mode = 9 0 1 3 4 5 6 o Mode Review Example House Prices: $,000,000 $500,000 $300,000 $100,000 $100,000 $3,000,000 Mea: = $600,000 ($3,000,000 divided by 5) Media: = $300,000 middle value of raked data Mode: = $100,000 most frequet value BUS10: Busiess Statistics Descriptive Statistics- 7 BUS10: Busiess Statistics Descriptive Statistics- 8 Which Measure to Choose? Summary The mea is geerally used, uless extreme values (outliers) exist. Cetral Tedecy The media is ofte used, whe you wish to miimize impact of extreme values. For example, media home prices may be reported for a regio; it is less sesitive to outliers. I some situatios it makes sese to report both the mea ad the media. Arithmetic Mea x = x i Media Middle value i the ordered array Mode Most frequetly observed value BUS10: Busiess Statistics Descriptive Statistics- 9 BUS10: Busiess Statistics Descriptive Statistics- 10 Measures of Variatio The Rage Rage Variace Measures of variatio give iformatio o the distributio of the data values: spread, or variability, or dispersio. Variatio Stadard Deviatio x Coefficiet of Variatio Same ceter, differet variatio Simplest measure of variatio Differece betwee the largest ad the smallest values: Rage = x largest x smallest (or x max x mi ) Example: 0 1 3 4 5 6 7 8 9 10 11 1 13 14 Rage = 13-1 = 1 BUS10: Busiess Statistics Descriptive Statistics- 11 BUS10: Busiess Statistics Descriptive Statistics- 1
Chapter 4 4-3 The Rage Ca Be Misleadig The Variace Igores the way i which data are distributed 7 8 9 10 11 1 7 8 9 10 11 1 Rage = 1-7 = 5 Rage = 1-7 = 5 Sesitive to outliers 1,1,1,1,1,1,1,1,1,1,1,,,,,,,,,3,3,3,3,4,5 Rage = 5-1 = 4 1,1,1,1,1,1,1,1,1,1,1,,,,,,,,,3,3,3,3,4,10 Rage = 10-1 = 119 Average of squared deviatios of values from the mea Sample variace: s = " (x i x ) -1 Where x = arithmetic mea = sample size x i = i th value of the variable x BUS10: Busiess Statistics Descriptive Statistics- 13 BUS10: Busiess Statistics Descriptive Statistics- 14 The Stadard Deviatio Shows variatio from the mea Most commoly used measure of variatio Is the square root of the variace Has the same uits as the origial data Stadard deviatio: (for a sample) s = " (x i x ) -1 The Stadard Deviatio Steps for Computig Stadard Deviatio (for a sample) 1) Calculate the mea ) Compute the differece betwee each value ad the mea. 3) Square each differece. 4) Total the squared differeces. 5) Divide this total by -1 to get the sample variace. 6) Take the square root of the sample variace. BUS10: Busiess Statistics Descriptive Statistics- 15 BUS10: Busiess Statistics Descriptive Statistics- 16 Sample Stadard Deviatio Example Sample Data: 10 1 14 15 17 18 18 4 x x 10 16-6 36 1 16-4 16 14 16-4 15 16-1 1 17 16 1 1 18 16 4 18 16 4 4 16 8 64 130 = (x x ) (x x ) s = s = 130 7 s = 4.31 "(x x ) -1 BUS10: Busiess Statistics Descriptive Statistics- 17 Data C Data B Data A Comparig Stadard Deviatios 11 1 13 14 15 16 17 18 19 0 1 11 1 13 14 15 16 17 18 19 0 1 11 1 13 14 15 16 17 18 19 0 1 Mea = 15.5 s = 3.338 Mea = 15.5 s = 0.96 Mea = 15.5 s = 4.570 BUS10: Busiess Statistics Descriptive Statistics- 18
Chapter 4 4-4 Comparig Stadard Deviatios Summary Characteristics Smaller stadard deviatio Larger stadard deviatio The more the data spreads out rage, variace, ad stadard deviatio icrease. The more the data is cocetrated rage, variace, ad stadard deviatio decrease. If the data values are equal to each other rage, variace, ad stadard deviatio are zero. oe of these measures are ever egative. BUS10: Busiess Statistics Descriptive Statistics- 19 BUS10: Busiess Statistics Descriptive Statistics- 0 The Coefficiet of Variatio Measures relative variatio Shows variatio relative to mea Always stated as a percetage (%) CV = s $ # &'100% " x % Useful to compare the variability of multiple data sets whe measured i differet uits BUS10: Busiess Statistics Descriptive Statistics- 1 Stock A: Comparig Coefficiets of Variatio Stadard deviatio = $5 Average price last year = $50 CV A = s $ # &'100% = $5 '100% =10% " x % $50 Stock B: Stadard deviatio = $5 Average price last year = $100 CV B = s $ # &'100% = $5 '100% = 5% " x % $100 Both stocks have the same stadard deviatio, but stock B is less variable relative to its price BUS10: Busiess Statistics Descriptive Statistics- Locatig Extreme Outliers: Z-Score The Z-score is the umber of stadard deviatios a data value is from the mea. the data value z = x x s the sample mea the stadard deviatio A data value is cosidered a extreme outlier if its Z-score is < -3.0 or > +3.0. The larger the absolute value of the Z-score the farther the data value is from the mea. Locatig Extreme Outliers: Z-Score Assume we have a distributio as follows: The mea math SAT score is 490 The stadard deviatio of 100. Compute the Z-score for a test score of 60. z = x x s 60 490 = = 130 100 100 =1.3 60 is 1.3 stadard deviatios above the mea Ad, would ot be cosidered a outlier. BUS10: Busiess Statistics Descriptive Statistics- 3 BUS10: Busiess Statistics Descriptive Statistics- 4
Chapter 4 4-5 Shape of a Distributio Describes how data are distributed Measures of shape: Left-Skewed Symmetric Right-Skewed Geeral Descriptive Stats Usig Microsoft Excel 1. Select Tools.. Select Data Aalysis. 3. Select Descriptive Statistics ad click OK. Mea < Media Mea = Media Media < Mea BUS10: Busiess Statistics Descriptive Statistics- 5 BUS10: Busiess Statistics Descriptive Statistics- 6 4. Eter the cell rage. Geeral Descriptive Stats Usig Microsoft Excel 5. Check the Summary Statistics box. 6. Click OK Excel descriptive statistics output, usig the house price data: House Prices: Geeral Descriptive Stats Excel Output $,000,000 500,000 300,000 100,000 100,000 BUS10: Busiess Statistics Descriptive Statistics- 7 BUS10: Busiess Statistics Descriptive Statistics- 8 umerical Descriptive Measures for a Populatio umerical Descriptive Measures for a Populatio So far, we have oly discussed sample measures: Mea [ x ], Variace [ s ], ad Stadard Deviatio [ s] Kow as statistics For a populatio We have similar summary measures called parameters Which are deoted with Greek letters: µ [mu] Mea, [sigma ] Variace, ad [sigma] Stadard Deviatio The populatio Mea (µ): The sum of the values i the populatio divided by the populatio size, µ = X i = X 1 + X ++ X Where µ = populatio mea = populatio size X i = i th value of the variable X BUS10: Busiess Statistics Descriptive Statistics- 9 BUS10: Busiess Statistics Descriptive Statistics- 30
Chapter 4 4-6 umerical Descriptive Measures for a Populatio The populatio Variace (σ ): Average of squared deviatios of values from the mea Where µ = populatio mea = populatio size X i = i th value of the variable X σ = i= 1 (X µ) i umerical Descriptive Measures for a Populatio The populatio Stadard Deviatio (σ) Is the square root of the populatio variace Shows variatio about the mea Most commoly used measure of variatio Has the same uits as the origial data σ = i= 1 (X µ) i BUS10: Busiess Statistics Descriptive Statistics- 31 BUS10: Busiess Statistics Descriptive Statistics- 3 Mea Variace Measure Stadard Deviatio Sample statistics vs. populatio parameters Populatio Parameter µ σ Sample Statistic Data values σ ote: for variace ad stadard deviatio of a... Sample: you use -1 as the divisor. Populatio: you use as the divisor x s s The Empirical Rule Approximates a bell-shaped distributio Approximately 68% of the data is withi 1 stadard deviatio of the mea 68.7% µ µ ±1 BUS10: Busiess Statistics Descriptive Statistics- 33 BUS10: Busiess Statistics Descriptive Statistics- 34 The Empirical Rule The Empirical Rule Approximately 95% of the data lies withi two stadard deviatios of the mea, or µ ± σ Approximately 99.7% of the data lies withi three stadard deviatios of the mea, or µ ± 3σ For a populatio of Math SAT scores where The distributio is bell-shaped, The mea is 500, ad the stadard deviatio is 90. 95.45% Remember: this is oly true for a bell-shaped distributio 99.73% The we ca assume that 68% of test takers scored betwee 410 & 590 (500 ± 90). 95% of test takers scored betwee 30 & 680 (500 ± 180). 99.7% of them scored betwee 30 & 770 (500 ± 70). µ ± µ ± 3 BUS10: Busiess Statistics Descriptive Statistics- 35 BUS10: Busiess Statistics Descriptive Statistics- 36
Chapter 4 4-7 Regardless of how the data are distributed at least (1-1/k ) x 100% of the values will fall withi k stadard deviatios of the mea (for k>1) Examples: Chebyshev s Rule For (µ ± σ), the k= (1-1/k ) x 100% = (1-1/ ) x 100% = 75% So, at least 75% of data values lie withi deviatios of the mea. For (µ ± 3σ), the k=3 (1-1/k ) x 100% = (1-1/3 ) x 100% = 89% Quartile Measures Quartiles split the raked data ito 4 segmets with a equal umber of values per segmet 5% 5% 75% 5% 5% 5% 50% 50% 75% 5% Q1 Q Q3 Q 1 : the value where 5% of data are smaller ad 75% are larger Q : the value where 50% are smaller ad 50% are larger (media) Q 3 : 75% are less tha this value ad oly 5% greater BUS10: Busiess Statistics Descriptive Statistics- 37 BUS10: Busiess Statistics Descriptive Statistics- 38 Locatig Quartiles To determie a quartile you eed to fid it s positio i the raked data, where Positio: First quartile: Q 1 = (.5)(+1) raked value Secod quartile: Q = (.50)(+1) raked value Third quartile: Q 3 = (.75)(+1) raked value where is the umber of observed values Calculatio Rules Whe calculatig the raked positio use the followig rules: If the result is a whole umber the use that umber as the raked positio If the result is a fractioal half (eds i.5) the average the data values just below with the data value just above. If the result is some other value the roud to the earest iteger. BUS10: Busiess Statistics Descriptive Statistics- 39 BUS10: Busiess Statistics Descriptive Statistics- 40 Fidig the first quartile: ( = 9) Locatig Quartiles Sample Data i Ordered Array: 11 1 13 16 16 17 18 1 For Q 1, the positio is (.5)(9+1) =.5 i the data, so we use the average of the d ad 3 rd values, thus Q 1 = (1+13)/ = 1.5 ote: Q 1 ad Q 3 are measures of o-cetral locatio Q = media, is a measure of cetral tedecy BUS10: Busiess Statistics Descriptive Statistics- 41 Sample Data i Ordered Array: 11 1 13 16 16 17 18 1 Fidig the remaiig quartiles: ( = 9) For Q, the positio is (.50)(9+1) = 5 i the data, so Locatig Quartiles we use that value: Q = media = 16 For Q 3, the positio is (.75)(9+1) = 7.5, so we average the umbers just above ad below: Q 3 = (18+1)/ = 19.5 BUS10: Busiess Statistics Descriptive Statistics- 4
Chapter 4 4-8 The Iterquartile Rage (IQR) (also called the midspread) is a measure of variability measures the spread i the middle 50% of the data is the rage of the ier two quartiles (Q 3 Q 1 ) Iterquartile Rage is ot iflueced by outliers or extreme values ote: Measures like Q 1, Q 3, ad IQR that are ot iflueced by outliers are called resistat measures Example: X mi Iterquartile Rage Q 1 Rage (Media) Q 5% 5% 5% 5% 1 30 45 57 70 This graphic is called a Boxplot IQR Iterquartile rage = 57 30 = 7 Q 3 X max BUS10: Busiess Statistics Descriptive Statistics- 43 BUS10: Busiess Statistics Descriptive Statistics- 44 5 umber Summary Shape ad Boxplot The five umbers (X mi, Q 1, Q, Q 3, X max ) help describe the ceter, spread ad shape of data. Left-Skewed Symmetric Right-Skewed Left-Skewed Symmetric Right-Skewed (Q X mi ) > (X max Q ) (Q X mi ) (X max Q ) (Q X mi ) < (X max Q ) (Q 1 X mi ) > (X max Q 3 ) (Q 1 X mi ) (X max Q 3 ) (Q 1 X mi ) < (X max Q 3 ) (Q Q 1 ) > (Q 3 Q ) (Q Q 1 ) (Q 3 Q ) (Q Q 1 ) < (Q 3 Q ) Q 1 Q Q 3 Q 1 Q Q 3 Q 1 Q Q 3 BUS10: Busiess Statistics Descriptive Statistics- 45 BUS10: Busiess Statistics Descriptive Statistics- 46 Boxplot Example Below is a Boxplot for the followig data: X mi Q 1 Q Q 3 X max 0 3 3 4 5 5 9 7 This data is highly skewed, so we show the Example outlier Boxplot value Showig of 7 plotted A Outlier separately 0 5 10 15 0 5 30 Sample Data A value is cosidered a outlier if it is more tha 1.5 times the iterquartile rage below Q 1 or above Q 3 IRQ = 5- = 3 5 + (1.5)(3) = 9.5 Aythig beyod 9.5 is a outlier. The Covariace The covariace measures the stregth of the liear relatioship betwee two umerical variables For a sample: cov(x,y) = Oly cocered with the stregth of the relatioship o cause ad effect is implied " (x i x )(y i y ) 1 BUS10: Busiess Statistics Descriptive Statistics- 47 BUS10: Busiess Statistics Descriptive Statistics- 48
Chapter 4 4-9 Iterpretig Covariace Covariace betwee two variables: cov(x,y) > 0 cov(x,y) < 0 cov(x,y) = 0 Has a major flaw: x ad y ted to move i the same directio x ad y ted to move i opposite directios x ad y are idepedet It is ot possible to determie the relative stregth of the relatioship from the size of the covariace Coefficiet of Correlatio Measures the relative stregth of the liear relatioship betwee two umerical variables Sample coefficiet of correlatio: where r = cov(x,y) s x s y r = covariace s x = std dev of x s y = std dev of y BUS10: Busiess Statistics Descriptive Statistics- 49 BUS10: Busiess Statistics Descriptive Statistics- 50 Symbol: Coefficiet of Correlatio Y Coefficiet of Correlatio Examples Y For a populatio: ρ (Greek letter rho ) For a sample: r Both have the followig features: Uit free Value rages betwee 1 ad 1 Approachig 1, the stroger the positive liear relatioship Y X r = -.96 r = -.63 Y Y X earer to 0, the weaker the liear relatioship Approachig -1, the stroger the egative liear relatioship r = +.94 X r = +.3 X r = 0 X BUS10: Busiess Statistics Descriptive Statistics- 51 BUS10: Busiess Statistics Descriptive Statistics- 5 Coefficiet of Correlatio Usig Excel Coefficiet of Correlatio Usig Excel 1. Select Tools/Data Aalysis. Choose Correlatio from the selectio meu 3. Click OK... 4. Iput data rage ad select appropriate optios 5. Click OK to get output BUS10: Busiess Statistics Descriptive Statistics- 53 BUS10: Busiess Statistics Descriptive Statistics- 54
Chapter 4 4-10 Pitfalls i umerical Descriptive Measures Chapter Summary Data aalysis is objective Should report the measures that best describe ad commuicate the importat aspects of the data set Should documet both good ad bad results Ethical cosideratios Should be doe i fair, eutral ad clear maer Should ot deliberately use iappropriate summary measures to distort facts Described measures of cetral tedecy Mea, media, mode Described measures of variatio Rage, iterquartile rage, variace ad stadard deviatio, coefficiet of variatio, Z-scores Illustrated shape of distributio Symmetric, skewed Described data usig the 5-umber summary Boxplots BUS10: Busiess Statistics Descriptive Statistics- 55 BUS10: Busiess Statistics Descriptive Statistics- 56 Chapter Summary (cotiued) Discussed covariace ad correlatio coefficiet Addressed pitfalls i umerical descriptive measures ad ethical cosideratios BUS10: Busiess Statistics Descriptive Statistics- 57