DAY 4 16 Jan 2014
Recap: Ø Distribution Shape Ø Mean, Median, Mode Ø Standard Deviations
Two Important Three-Standard-Deviation Rules 1. Chebychev s Rule : Implies that at least 89% of the observations for any data set lie with in three standard deviations to either side of the mean. 2. Empirical Rule : Implies that roughly 99.7% of the observations for bell shaped data lie with in three standard deviations to either side of the mean.
Objective of the day: Ø Quartiles, Five number summary, Box Plot Ø Descriptive Measures for Population: Use of Samples
Section 3.3 The Five-Number Summary; Boxplots
Definition 3.7 Quartiles Arrange the data in increasing order and determine the median. The first quartile is the median of the part of the entire data set that lies at or below the median of the entire data set. The second quartile is the median of the entire data set. The third quartile is the median of the part of the entire data set that lies at or above the median of the entire data set.
Note: The quartiles divide the set of measurements into four equal parts. Twenty-five per cent of the measurements are less than the lower quartile, fifty per cent of the measurements are less than the median and seventyfive per cent of the measurements are less than the upper quartile. So, fifty per cent of the measurements are between the lower quartile and the upper quartile. The lower quartile, median and upper quartile are often denoted by Q1, Q2 and Q3 respectively.
Quartiles Arrange the data in increasing order and determine the median. The first quartile is the median of the part of the entire data set that lies at or below the median of the entire data set. Example: Weekly TV viewing time of 20 people. 25 66 34 30 41 35 26 38 27 31 32 30 32 15 38 20 43 5 16 21 5 15 16 20 21 25 26 27 30 30 31 32 32 34 35 38 38 41 43 66 5 15 16 20 21 25 26 27 30 30 31 32 32 34 35 38 38 41 43 66 First Quartile = Q 1 =? Q 1 = 23
Quartiles Arrange the data in increasing order and determine the median. The second quartile is the median of the entire data set. Example: Weekly TV viewing time of 20 people. 25 66 34 30 41 35 26 38 27 31 32 30 32 15 38 20 43 5 16 21 5 15 16 20 21 25 26 27 30 30 31 32 32 34 35 38 38 41 43 66 Second Quartile = Q 2 =? Q 2 = 30.5
Quartiles Arrange the data in increasing order and determine the median. The third quartile is the median of the part of the entire data set that lies at or above the median of the entire data set. Example: Weekly TV viewing time of 20 people. 25 66 34 30 41 35 26 38 27 31 32 30 32 15 38 20 43 5 16 21 5 15 16 20 21 25 26 27 30 30 31 32 32 34 35 38 38 41 43 66 31 32 32 34 35 38 38 41 43 66 Third Quartile = Q 3 =? Q 3 = 36.5
Definition 3.8 Interquartile Range The interquartile range, or IQR, is the difference between the first and third quartiles; that is, IQR = Q 3 Q 1.
Example: Weekly TV viewing time of 20 people. 25 66 34 30 41 35 26 38 27 31 32 30 32 15 38 20 43 5 16 21 5 15 16 20 21 25 26 27 30 30 31 32 32 34 35 38 38 41 43 66 5 15 16 20 21 25 26 27 30 30 31 32 32 34 35 38 38 41 43 66 Q 1 = 23 Q 3 = 36.5 IQR = Q 3 - Q 1 = 36.5 23 = 13.5
From the three quartiles, we can obtain a measure of center ( the median, Q 2 ) and measures of variation of the two middle quarters of the data, Q 2 Q 1 for the second quarter and Q 3 Q 2 for the third quarter. But Q 1, Q 2, Q 3 don t tell us anything about the variation of the first and fourth quarters. To get that information we need Min and Max values of data. Variation of the first quarter can be measured by (Q 1 Min) and Variation of fourth quarter can be measured by ( Max Q 3 )
Definition 3.9 Five-Number Summary The five-number summary of a data set is Min, Q 1, Q 2, Q 3, Max. Example: Weekly TV viewing time of 20 people. 5 15 16 20 21 25 26 27 30 30 31 32 32 34 35 38 38 41 43 66 Min = 5, Q 1 =23, Q 2 =30.5, Q 3 =36.5, Max = 66
Definition 3.10 Example: Weekly TV viewing time of 20 people. 5 15 16 20 21 25 26 27 30 30 31 32 32 34 35 38 38 41 43 66 Q 1 =23, Q 3 =36.5, IQR = 13.5 Lower limit = 23 1.5. 13.5 = 2.75 Upper Limit = 36.5 + 1.5. 13.5 = 56.75
Example: Weekly TV viewing time of 20 people. 5 15 16 20 21 25 26 27 30 30 31 32 32 34 35 38 38 41 43 66 Lower limit = 2.75 Upper Limit = 56.75
Procedure 3.1
Example: Weekly TV viewing time of 20 people. 5 15 16 20 21 25 26 27 30 30 31 32 32 34 35 38 38 41 43 66
Example: Weekly TV viewing time of 20 people. 5 15 16 20 21 25 26 27 30 30 31 32 32 34 35 38 38 41 43 66
Example: Weekly TV viewing time of 20 people. 5 15 16 20 21 25 26 27 30 30 31 32 32 34 35 38 38 41 43 66
Why Quartiles? Mean and Standard deviations are sensitive to the influence of a few extreme observations, but Quartiles are not.
Section 3.4 Descriptive Measures for Populations; Use of Samples
Definition 3.11
Definition 3.12
Figure 3.13 & Definition 3.13 Population and sample for bolt diameters Parameter and Statistic Parameter: A descriptive measure for a population. Statistic: A descriptive measure for a sample.
Definition 3.14 & 3.15 z-score For an observed value of a variable x, the corresponding value of the standardized variable z is called the z-score of the observation. The term standard score is often used instead of z-score.
μ = 3, σ = 2
z = (5-3) / 2 = 1
μ z = 0, σ z = 1
Note: 1. A z-score is calculated for a single value and indicates the distance of that value from the mean in units of standard deviations. 2. A positive z-score indicates that the value is above the mean. 3. A negative z-score indicates that the value is below the mean. 4. z can be any whole number or a fraction, so z = 3, z = 1.3 or z = 0.5 are all valid.
Summary: Ø Quartiles, Five number summary, Box Plot Ø Z-scores
Next Week: Ø Sections 4.1 & 4.2 Ø Quiz 2, Entrance Exam Ø Sections 5.1 & 5.2
Thank You J