Chapter 12 Describig Distributios with Numbers Chapter 12 1 Quick math overview = sum These expressios are algebraically equivalet #(x " x ) 2 = # x 2 " (# x) 2 Examples x :{ 2,3,5,6,6,8 } " x = 2 + 3+ 5 + 6 + 6 + 8 = 30 " x 2 = 2 2 + 3 2 + 5 2 + 6 2 + 6 2 + 8 2 =174 (" x) 2 = 30 2 = 900 " x x = = 30 6 = 5 "(x # x ) = (2 # 5) + (3# 5) +...+ (8 # 5) = 0 "(x # x ) 2 = #3 2 + #2 2 + 0 2 +1 2 +1 2 + 3 2 = 24 1
Turig Data Ito Iformatio Ceter of the data mea media mode Spread of the data (variability) variace stadard deviatio rage iterquartile rage Ceters of Data Average - a sigle data value that represets all of the data mea (arithmetic average) media mode Mea ( X ) Traditioal measure of ceter Sum the values ad divide by the umber of values ( ) = 1 x = 1 x 1 + x 2 +L+ x " x x = " i=1 xi 2
Media (M) A resistat measure of the data s ceter Media - the ceter of value of ordered (raked) data If is odd, the media is the middle ordered value If is eve, the media is the average of the two middle ordered values Media = 1 / 2 (+1) th positio i ordered set Media Example 1 data: 2 4 6 Media (M) = 4 Example 2 data: 2 4 6 8 Media = 5 (avg. of 4 ad 6) Example 3 data: 6 2 4 Media 2 (order the values: 2 4 6, so Media = 4) Example # miutes waitig for the PRT (=8): x: {5, 11, 9, 15, 33, 3, 7, 12} x = 5 +11+ 9 +15 + 33 + 3 + 7 +12 =11.875 8 Media: RANK DATA FIRST! {3, 5, 7, 9, 11, 12, 15, 33} Media is 1 / 2 (+1) th positio (8+1) / 2 = 4 1 / 2 4 1 / 2 th positio is half-way betwee 9 ad 11. (9+11) / 2 =10 Media=10 3
Comparig the Mea & Media The mea ad media of data from a symmetric distributio should be close together. The actual (true) mea ad media of a symmetric distributio are exactly the same. I a skewed distributio, the mea is farther out i the log tail tha is the media [the mea is pulled i the directio of the possible outlier(s)]. Mea vs. Media Which should we use? Symmetric or approx symmetric use mea Sigificatly skewed used media affected by outliers (extreme values) x Outliers? If it is a mistake ad is documeted, we ca elimiate it If it is ot a mistake, do ot elimiate it A statistic is robust if it is ot led too far astray by a few outliers. Meas (ad stadard deviatios) are ot robust. 4
Mode Observed value that occurs with the greatest frequecy Note if o mode, write oe ot 0 If two modes: bimodal Measures of Dispersio spread - A geeral term referrig to how spread out or variable a set of umbers is. Very large spread {0, 100, 9999, 100000} No spread {12, 12, 12, 12, 12} Spread or Variability If all values are the same, the they all equal the mea. There is o spread. Variability exists whe some values are differet from (above or below) the mea. We will discuss the followig measures of spread: rage, iterquartile rage, variace, stadard deviatio. 5
Rage Oe way to measure spread is to give the smallest (miimum) ad largest (maximum) values i the data set: Rage = max mi ( the values rage from mi to max ) The rage is strogly affected by outliers, ad is rarely used Quartiles Three umbers that divide the ordered data ito four equal-sized groups. Q 1 has 25% of the data below it. Q 2 has 50% of the data below it. (Media) Q 3 has 75% of the data below it. Obtaiig the Quartiles Order the data. For Q 2, just fid the media. For Q 1, look at the lower half of the data values, those to the left of the media; fid the media of this lower half. For Q 3, look at the upper half of the data values, those to the right of the media; fid the media of this upper half. 6
Iterquartile Rage (IQR) Used to measure dispersio (spread) with the media Sample IQR = Q3-Q1 # miutes waitig for the PRT (=8): {3, 5, 7, 9, 11, 12, 15, 33} Recall: Media is half-way betwee 9 ad 11 M=10 Q1 positio is half-way betwee 5 ad 7 Q1= 6 Q3 is half-way betwee 12 ad 15 Q3= 13 1 / 2 IQR= Q3-Q1 = 13.5-6 = 7.5 The five-umber summary & boxplots Q1 Mi M Q3 Max 5# summary: Mi Q1 M Q3 Max 7
Boxplot (from Five-Number Summary) Cetral box spas Q 1 ad Q 3. A lie i the box marks the media M. Lies exted from the box out to the miimum ad maximum. PRT example 5 # summary ad boxplot 10 6 13.5 3 33 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Variace ad Stadard Deviatio Whe variability exists, each data value has a associated deviatio from the mea: xi " x What is a typical deviatio from the mea? (stadard deviatio) Small values of this typical deviatio idicate small spread i the data Large values of this typical deviatio idicate large spread i the data 8
Variace Fid the mea Fid the deviatio of each value from the mea Square the deviatios Sum the squared deviatios Divide the sum by -1 (gives typical squared deviatio from mea) Variace Formula 1 s 2 = ( "1) # i=1 (x i " x ) 2 #(x " x ) 2 s 2 = "1 Stadard Deviatio Formula typical deviatio from the mea s = 1 ( "1) # i=1 (x i " x ) 2 s = #(x " x ) 2 "1 [ stadard deviatio = square root of the variace ] 9
Choosig a Summary Outliers affect the values of the mea ad stadard deviatio. The five-umber summary should be used to describe ceter ad spread for skewed distributios, or whe outliers are preset. Use the mea ad stadard deviatio for reasoably symmetric distributios that are free of outliers. Dist of calories i popular cady bars Today s cocepts Numerical Summaries Ceter (mea, media) Spread (variace, std. dev., rage, IQR) Five-umber summary & Boxplots Choosig mea versus media Choosig stadard deviatio versus five-umber summary 10