Data Descriptio Describe Distributio with Numbers Example: Birth weights (i lb) of 5 babies bor from two groups of wome uder differet care programs. Group : 7, 6, 8, 7, 7 Group : 3, 4, 8, 9, Chapter 3 - Chapter 3 - Measure of Cetral Tedecy Mea: the average value of the data. Measure of Cetral Tedecy Describig Ceter Chapter 3-3 If the values of a sample of observatios are deoted by x, x,..., x, their sample mea is x x... x x i * If the data were for the whole populatio the the result from this calculatio would be called the populatio mea, ad the otatio for it is m. x i Chapter 3-4 Example: Birth weights (i lb) of 5 babies bor from a group of wome uder certai diet. Sol: 7, 6, 8, 7, 7 Example: (umber of hysterectomies performed by 5 male doctors) 7, 50, 33, 5, 86, 5, 85, 3, 37, 44, 0, 36, 59, 34, 8 => mea = 4.33 mea = (7 + 6 + 8 +7 + 7) / 5 = 35/5 = 7 [ear the ceter of the data set] Chapter 3-5 Chapter 3-6 Chapter 3 -
Media: of a data set is the data value exactly i the middle of its ordered list if the umber of pieces of data is odd, the mea of the two middle data values i its ordered list if the umber of pieces of data is eve. [media is ot iflueced by outliers ad is best for osymmetric distributio] Example: (umber of times visited class website by 5 studets) 7, 50, 33, 5, 86, 5, 85, 3, 37, 44, 0, 36, 59, 34, 8 ordered list => 0, 5, 5, 7, 8, 3, 33, 34, 36, 37, 44, 50, 59, 85, 86 media = 34 Chapter 3-7 Chapter 3-8 Example: (Birth weights for 6 ifats.) 5, 7, 6, 8, 5, 9 ordered list => 5, 5, 6, 7, 8, 9 Mode: of a data set is the observatio that occurs most frequetly. media = (6+7) / = 6.5 Chapter 3-9 Chapter 3-0 Example : (umber of times visited class website by 5 studets) 7, 50, 33, 5, 86, 5, 85, 3, 37, 44, 0, 36, 59, 34, 8 ordered list => 0, 5, 5, 7, 8, 3, 33, 34, 36, 37, 44, 50, 59, 85, 86 Mode = 5 Example : (Blood type of 5 studets) A, B, A, A, O, AB, A, A, B, B, O, O, A, A, A Mode = A A 8 B 3 O 3 What is a Modal class? Class Frequecy Relative Freq. Cumulative R.F. 90< - 0 / =.09 / 0< - 30 / =.09 4/ 30< - 50 4 4/ =.8 8/ 50< - 70 / =.09 0/ 70< - 90 7 7/ =.38 7/ 90< - 0 3 3/ =.36 0/ 0< - 30 / =.045 / 30< - 50 0 0/ =.000 / 50< - 70 0 0/ =.000 / 70< - 90 / =.045 / Total.000 AB Chapter 3 - Chapter 3 - Chapter 3 -
Mea? Media? Mode? Skewed to the Right Midrage The average of the lowest ad the highest values i the data set. Lowest Value + Highest Value Midrage = Chapter 3-3 Chapter 3-4 Example: (umber of times visited class website by 5 studets) 7, 50, 33, 5, 86, 5, 85, 3, 37, 44, 0, 36, 59, 34, 8 Lowest value = 0 Highest value = 86 Weighted Mea Example: (Grade poit average) A studet received 3 A s, 5 B s, C s. Class (grade poit, x) Frequecy (weight, w) 4 3 3 5 Midrage = (0 + 86) / = 53 Chapter 3-5 3 x 4 + 5 x 3 + x Average grade poit = 3 + 5 + 3 weight = = 3. 0 Chapter 3-6 Weighted Mea Mea Estimatio Weighted mea = = w. x + w. x + + w k. x k w + w + + w k S w. x S w Where w, w, are the weights ad x, x, are the values (or class midpoit or class mark). Chapter 3-7 Class Frequecy (w) Class Mark (x) w. x 90 - < 0 00 x00 0 - < 30 0 x0 30 - <50 3 40 3x40 50 - < 70 60 x60 Total 7 90 Estimated mea = = 3.43 90 7 Chapter 3-8 Chapter 3-3
Measure of Spread: Rage = largest data value smallest data value Measure of Variatio Describig Spread Sample from group I (diet program I): 7, 6, 8, 7, 7 => mea = (7 + 6 + 8 +7 + 7) / 5 = 35/5 = 7 Sample from group II (diet program II): 3, 4, 8, 9, => mea = (3 + 4 + 8 + 9 + ) / 5 = 35/5 = 7 Chapter 3-9 Does the mother s diet program affect the birth weights of babies? Chapter 3-0 Is there ay differece betwee the two samples? rage of sample I = 8-6 = rage of sample II = - 3 = 8 Variace ad Stadard Deviatio Measure the spread of the data aroud the ceter of the data. Chapter 3 - Chapter 3 - Example: Birth weights (i lb) of 5 babies bor from a group of wome uder diet program II. 3, 4, 8, 9, mea = x = 7 Data Value x i 3 4 8 9 Total Deviatio from mea x i x Sample Variace = 46/4 =.5 lb, Sample Stadard Deviatio = 46/ 4 Squared Dev. ( x i x) 3 7 = 4 6 4 7 = 3 9 8 7 = 9 7 = 4 7 = 4 6 0 46 = 3.39 lb. Chapter 3-3 If observatios are deoted by x, x,..., x, their variace ad stadard deviatio are ( xi x) i Sample Variace: s (ubiased estimator for variace of a ifiite populatio.) Sample Mea: Sample Stadard Deviatio: s i ( x x) i x x... x x i x i Chapter 3-4 Chapter 3-4
s A Short Cut formula: x i i xi i 35 9 5.5 4 Data, x x 3 9 4 6 8 64 9 8 35 9 Chapter 3-5 What is the sample stadard deviatio of the weights of babies from the sample of mothers who received diet program I? Diet program I Data: 7, 6, 8, 7, 7 Diet I: mea = 7, s = 0.7 Diet II: mea = 7, s = 3.39 Does the mother s diet program affect the birth weights of babies? Chapter 3-6 About s (sample stadard deviatio) : s measures the spread aroud the mea. the larger s is, the more spread out the data are. if s = 0, the all the observatios must be equal. s is strogly iflueced by outliers. Chapter 3-7 Populatio Parameters If N observatios are deoted by x, x,..., x N, are all the observatio i a fiite populatio, their mea, m, variace, ad stadard deviatio,, are N x x... xn Populatio Mea: m xi N N i N ( xi m) i Populatio Variace: N Populatio Stadard Deviatio: N i ( x m) i N Chapter 3-8 Notatio: If for ay populatio which their mea ad variace exist, the otatios for these measures are usually defied as Populatio Mea: m Populatio Variace: The Use of Mea ad Stadard Deviatio Describe distributio Uderstad the ceter ad the spread of the distributio Populatio Stadard Deviatio: These are ideal umbers. I practice, usually we do t exactly kow these values ad wish to estimate them. Chapter 3-9 Chapter 3-30 Chapter 3-5
Actual Legth of foot x4 x s My Lowe s.5 0. Homeower Depot.0 0.03 Wood Lot.0 0.9 Profit Margi (97-98) x s America Water Works 7.6.68 Which compay should you ivest your moey? Brow & Sharpe 7.6 7.39 Campbell Soup 3.65.05 McDoald s 0.04.0 Pam America.98 4.8 Chapter 3-3 Chapter 3-3 Measure of Relative Variability Which of the followig data has relatively lower variability? Aalyst A: (Slide A) 3, 4, 8, 33, 6,, 9 Aalyst B: (Slide B) 9, 0, 3, 0,, 3, Coefficiet of Variatio (C.V.): is the stadard deviatio expressed as a percetage of the mea. It is a uit-free measure of dispersio. It provides a measuremet for comparig relative variability of data sets from differet scales. s x C.V. = 00% Chapter 3-33 Chapter 3-34 Example: Oe wishes to compare the quality of works from two blood cell cout aalysts. The average from repeated couts o slide A used by aalyst A was 6.43 lb with a s.d.= 3.87, ad average from aalyst B for slide B is.4 with a s.d.=.57. C.V. (Aalyst A) = (3.87/6.43)x00% = 3.06% C.V. (Aalyst B) = (.57/.4) x00% = 4.% Chebychev s iequality There is at least (/k ) of the data i a data set lie withi k stadard deviatio of their mea. Aalyst A has lower variability! Chapter 3-35 Chapter 3-36 Chapter 3-6
Example: Heart rates for asthmatic patiets i a state of respiratory arrest has a mea of 40 beats per miute ad a stadard deviatio of 35.5 beats per miute. What percetage of the populatio of this type of patiets have heart rates lie betwee two stadard deviatios of the mea i a state of respiratory arrest? Empirical Rule: Properties of a symmetric ad Normal distributio the distributio is symmetric about it mea (m), 68% of the area is betwee m ad m, 95% of the area is betwee m ad m, 99.7% of the area is betwee m 3 ad m 3. (i.e., 40-x35.5 = 69 & 40+x35.5 = ) It will be at least 75%, because, (/ ) = ¾ = 75%. Chapter 3-37 m 3 m m 3 Chapter 3-38 Approximatio with E.R. Assume that the heart rate for a particular populatio has a mea of 70 per miutes ad stadard deviatio of 5. If the heart rate for this populatio is bellshaped ormally distributed, what percetage of the populatio have heart rate betwee 60 to 80? Measure of Positio Stadard Score, Quartile, Percetile About 95%, because it is betwee s. Chapter 3-39 Chapter 3.a - 40 Z-score (Stadard Score) If x is a observatio from a distributio that has mea m, ad stadard deviatio, the stadardized value of x is, z-score of x : x m x mea z stadard deviatio If a distributio has a mea 0 ad a s.d., the value 7 has a z-score.5. z-score = (7 0)/ =.5. m 3 has a z-score 3, sice it is 3 s.d. from mea. Chapter 3-4.5 s.d. 6 8 0 4 Chapter 3-4 Chapter 3-7
Heart rates for a certai populatio at a certai coditio follow a bell shape symmetric distributio with mea 70 ad stadard deviatio. What is the stadard scores of the value 74 ad the value 66? Z 74 = (74 70)/ Z 66 = (66 70)/ = - 95% 66 70 74 = Chapter 3-43 Sample z-score z x x s Example: If the mea of a radom sample is 5 ad the stadard deviatio is, what would be the sample z-score of the value 6? x 5, s, x 6 6 5 z 0.5 Chapter 3-44 Percetile The percetile correspodig to a give value X is computed by usig the followig formula. ( umber of values below X ) 0.5 percetile 00% total umber of values Chapter 3-45 Example: A sample of umber of times visited class website by 5 studets is the followig: 7, 50, 33, 5, 86, 5, 85, 3, 37, 44, 0, 36, 59, 34, 8. Fid the percetile of the data value 3 i this sample. Sol: X = 3 Ordered data: 0, 5, 5, 7, 8, 3, 33, 34, 36, 37, 44, 50, 59, 85, 86 5 0. 5 percetile 00% 36.67 = 37 5 (roud to the earest iteger) The value 3 is the 37-th percetile. Chapter 3-46 Fid a Data Value Correspodig to a Give Percetile Step : Sort the data. Step : Compute positio idex c c = p / 00 = total umber of values p = percetile (If for 90 th percetile, p = 90.) Step 3 (fid positio): ) If c is ot whole umber, roud up c to the ext whole umber. ) If c is a whole umber, the percetile is at the positio that is halfway betwee c ad c +. Chapter 3-47 Example: A sample of umber of times visited class website by 5 studets is the followig: 7, 50, 33, 5, 86, 5, 85, 3, 37, 44, 0, 36, 59, 34, 8. Fid the 90 th percetile of the data i this sample. Sol: = 5, p = 90. Ordered data: 0, 5, 5, 7, 8, 3, 33, 34, 36, 37, 44, 50, 59, 85, 86 c = p/00 = 5 x 90 / 00 = 3.5 Roud c to 4. The 4 th umber i the ordered list is the 90 th percetile ad that is 85. Chapter 3-48 Chapter 3-8
Quartiles The first quartile, Q, or 5 th percetile, is the media of the lower half of the list of ordered observatios below the media of the data set. The third quartile, Q 3, or 75 th percetile, is the media of the upper half of the list of ordered observatios above the media of the data set. Example: [eve umber of data] 6, 60,6,63,64,64,65,65,65,66,67,69,7,7,7,7,7,7,7,73,74,75 Q = 64? Media = 68? Q 3 =? 7 Measure of spread: Iterquartile rage (IQR) = Q 3 Q IQR = 7-64 = 8 Chapter 3-49 Chapter 3-50 Example: [odd umber of data values] 60,6,63,64,64,65,65,65,66,67,69,7,7,7,7,7,7,7,73,74,75 Exploratory Data Aalysis Q =? 64.5 Media = 69 Q 3 =? 7 Measure of spread: Iterquartile rage (IQR) = Q 3 Q Stemplot ad Boxplot IQR = 7 64.5 = 7.5 Chapter 3-5 Chapter 3.a - 5 Stemplots (or Stem-ad-leaf plots) -- leadig digits are called stems -- fial digits are called leaves Example: (umber of hysterectomies performed by 5 male doctors) 7, 50, 33, 5, 86, 5, 85, 3, 37, 44, 0, 36, 59, 34, 8 75508 05578 7 5 3 3764 3467 3 4 4 5 09 0 6 7 8 65 56 Chapter 3-53 Chapter 3-54 Chapter 3-9
Example: Number of hysterectomies performed by 5 male doctors: 7, 50, 33, 5, 86, 5, 85, 3, 37, 44, 0, 36, 59, 34, 8 Back-to-back stem-plot by 0 female doctors, the umbers are: 5, 7, 0, 4, 8, 9, 5, 9, 3, 33 (Male) (Female) 05578 0 57 3 3467 0489 4 4 59 5 09 3 3 6 7 8 56 (Female) (Male) 75 0 9840 95 05578 3 3 3467 4 4 5 09 6 7 8 56 Chapter 3-55 Chapter 3-56 Example: (Height data with geder) Female: 60, 63, 64, 65, 65, 65, 66, 67 Male: 6, 64, 69, 7, 7, 7, 7, 7, 7, 7, 73, 74, 75 (See data sheet) Female Male Back-to-back 555430 6 49 7 345 Female Male Split-back-to-back 430 6* 4 * => 0-4 76555 6# 9 # => 5-9 7* 34 7# 5 Chapter 3-57 The five-umber summary.miimum value.q.media.q 3.Maximum value Chapter 3-58 Boxplot Example: (data sheet without outlier 6 ) 60,6,63,64,64,65,65,65,66,67,69,7,7,7,7,7,7,7,73,74,75 Mi = 60, Q = 64.5, Media = 69, Q 3 = 7, Max = 75. 80 Outliers The extremely high or extremely low data value whe compared with the rest of the data values. 70 60 50 N = HEIGHT Chapter 3-59 Chapter 3-60 Chapter 3-0
With 6 i the data: 6, 60,6,63,64,64,65,65,65,66,67,69,7,7,7,7,7,7,7,73,74,75 Q = 64 Media = 68 Q 3 = 7 IQR = 7-64 = 8 80 60 40 0 How to idetify outliers? 0 N = Chapter 3-6 HEIGHT Ier ad outer feces for outliers The ier feces are located at a distace of.5 IQR below Q (lower ier fece = Q -.5 x IQR ) ad at a distace of.5 IQR above Q 3 (upper ier fece = Q 3 +.5 x IQR ). The outer feces are located at a distace of 3 IQR below Q (lower outer fece = Q 3 x IQR ) ad at a distace of 3 IQR above Q 3 (upper outer fece = Q 3 + 3 x IQR ). Chapter 3-6 IQR = 7 64 = 8; Q = 64; Q 3 = 7 The ier feces are located at a distace of.5 IQR below Q (lower ier fece = 64 -.5 x 8 = 5 ) ad at a distace of.5 IQR above Q 3 (upper ier fece = 7 +.5 x 8 = 84). The outer feces are located at a distace of 3 IQR below Q (lower outer fece = 64 3 x 8 = 40) ad at a distace of 3 IQR above Q 3 (upper outer fece = 7 + 3 x 8 = 96). Chapter 3-63 84 5 80 60 40 0 UIF: 7 +.5 x 8 = 84 Ier fece IQR Ier fece LIF: 64 -.5 x 8 = 5 Q = 64; Q 3 = 7; IQR = 7 64 = 8 0 N = HEIGHT Chapter 3-64 96 80 UOF:7 + 3 x 8 = 96 Outer fece Ier fece Mild ad Extreme outliers 40 60 40 IQR LOF: 64-3 x 8 = 40 Ier fece Outer fece Data values fallig betwee the ier ad outer feces are cosidered mild outliers. Data values fallig outside the outer feces are cosidered extreme outliers. 0 0 Whe outliers exist, the whisker exteded to the smallest ad largest data values withi the ier fece. N = HEIGHT Chapter 3-65 Chapter 3-66 Chapter 3 -
HEIGHT Data Descriptio Side-by-side Box Plot Remarks: 80 70 60 50 3 8 4 8 9 If the distributio of the data is symmetric, the the mea ad media will be about the same. The five-umber summary ad boxplot are best for osymmetric data. The media ad quartiles are ot iflueced by outliers. The mea ad stadard deviatio are most appropriate to use oly if the data are symmetric because both of these measures are easily iflueced by outliers. N = 8 3 Female Male ** sex Chapter 3-67 Chapter 3-68 Boxplot For the followig data: 3 7 78 40 50 56 50 5 57 69 30 4 5 5 Fid the five-umber-summary & IRQ Make a boxplot. Chapter 3-69 Chapter 3 -