Statistics Lecture Notes Three types of average: Mean Median Mode 1. Data Analysis 2. A Baseball Example Runs scored in 30 games played by the University of Arizona Derelicts 7 5 9 3 5 4 2 3 0 7 10 9 2 4 2 0 4 7 12 3 3 2 4 1 8 6 5 4 3 4 3. Frequency Distribution Score Tally Freq Freq Score 0 II 2 0 1 I 1 1 2 IIII 4 8 3 IIII 5 15 4 IIII I 6 24 5 III 3 15 6 I 1 6 7 III 3 21 8 I 1 8 9 II 2 18 10 I 1 10 12 I 1 12 Total 30 138 1
2 4. The Mean What is the average number of runs? Answer Number 1: They scored a total of 138 runs in 30 games, so this works out to be 138 = 4.6 runs per game 30 This number is called the mean. 5. Another Average Answer Number 2: Arrange the runs in order, smallest to largest: 1 2 3 4 5 6 7 8 9 10 0 0 1 2 2 2 2 3 3 3 11 12 13 14 15 16 17 18 19 20 3 3 4 4 4 4 4 4 5 5 21 22 23 24 25 26 27 28 29 30 5 6 7 7 7 8 9 9 10 12 Notice that four is right in the middle of the data. 6. Median and Mode Notice that four is right in the middle of the data. Out of 30 games, the team scored 4 or fewer runs 15 times and 4 or more runs 15 times. The number in the middle, 4, is called the median. Answer Number 3: The team scored 4 runs on 6 different occasions, more than any other score. Viewed this way, the number 4 is called the mode. 7. Formal Definitions Arrange n data points X 1, X 2, X 3,, X n in order, from smallest to largest. The mean of the data is the sum of all the X-values divided by n. The Greek letter mu µ is often used to represent the mean of a set of data. The median is given by
3 (a) X (n+1)/2 if n is odd (b) the average of X n/2 and X (n/2)+1 if n is even A mode is the data point which occurs with the highest frequency. (There may be more than one mode.) 8. A Golf Example Hole 1 2 3 4 5 6 7 8 9 Total Pooh 6 4 7 5 5 3 5 4 5 44 Tigger 4 3 6 4 3 3 4 3 15 45 Pooh: My average is 44/9 = 4.89, while Tigger s is 45/9 = 5.00, so I m the better golfer. Tigger: I beat Pooh on 7 holes, tied Pooh on 1 hole, and only lost 1 out of 9 holes, so I m the better golfer. Who s right? 9. Pooh s Game Frequency Distribution Score Tally Freq Freq Score 3 I 1 3 4 II 2 8 5 IIII 4 20 6 I 1 6 7 I 1 7 Total 9 44 10. Tigger s Game Frequency Distribution
4 Score Tally Freq Freq Score 3 IIII 4 12 4 III 3 12 6 I 1 6 15 I 1 15 Total 9 45 11. Pooh versus Tigger Calculation of Mean: Pooh: µ P ooh = 44/9 = 4.89 Tigger: µ T igger = 45/9 = 5.00 Calculation of Median: Pooh: 3, 4, 4, 5, 5, 5, 5, 6, 7 Tigger: 3, 3, 3, 3, 4, 4, 4, 6, 15 12. Note on Calculating Median If there is an even number of data points, average the two middle values. For example, if there are 10 data points, X 1, X 2, X 3, X 4, X 5 X 6, X 7, X 8, X 9, X 10, arranged in order, then the median is (X 5 + X 6 )/2, the average of the two in the middle. 13. Calculation of Mode Pooh: most common score is 5 Tigger: most common score is 3 14. Comparison Average Pooh Tigger mean 4.89 5.00 median 5 4 mode 5 3
5 15. A Salary Puzzler The mean earning of geology major who attended North Carolina University in 1983 was considerably higher than the salaries for any other science major. Can you guess why? Two types of deviation: Standard Deviation Quartile Deviation 16. Deviation 17. Aspirin Example You buy 10 bottles of Bare Aspirin (50 count) and 10 bottles of MM Brand Aspirin (50 count) and record the exact number of aspirin in each bottle. The results are: 18. Bare Aspirin Distribution Frequency Distribution Bare Aspirin Number Freq Freq Num 47 1 47 48 1 48 49 2 98 50 3 150 51 1 51 52 1 52 54 1 54 Total 10 500
6 19. MM Aspirin Distribution MM Aspirin Number Freq Freq Num 20 1 20 31 1 31 39 1 39 45 1 45 50 2 100 57 1 57 64 1 64 66 1 66 78 1 78 Total 10 500 20. Comparison Average Bare MM mean 50 50 median 50 50 mode 50 50 21. Bare Aspirin Deviation Bare Aspirin X Freq X µ (X µ) 2 Freq (X µ) 2 47 1-3 9 9 48 1-2 4 4 49 2-1 1 2 50 3 0 0 0 51 1 1 1 1 52 1 2 4 4 54 1 4 16 16 Total 10 36 Standard Deviation = Total 10 = 36 10 = 1.90
7 22. Mu The Greek letter sigma σ is often used to represent the standard deviation of a set of data. 23. MM Aspirin Deviation MM Aspirin X Freq X µ (X µ) 2 Freq (X µ) 2 20 1-30 900 900 31 1-19 361 361 39 1-11 121 121 45 1-5 25 25 50 2 0 0 0 57 1 7 49 49 64 1 14 196 196 66 1 16 256 256 78 1 28 784 784 Total 10 2692 24. Comparison For the MM Aspirin Total Standard Deviation = 10 = 2692 10 = 16.4 Expected number of aspirin per bottle Bare Brand: 50 ± 1.9 MM Brand: 50 ± 16.4 25. Quartile Deviation Given a data set {X 1, X 2,.X n } listed in order from smallest to largest (repetitions allowed). To compute quartile deviation: Step 1. Find the median of the data set and call this median Q 2
8 Q 2 is also called the second quartile values. Step 2. Use Q 2 to divide the original data set into two parts. The lower 50% consists of those values Q 2 ; the upper 50% consists of values Q 2. Note: when n is odd, include Q 2 in both the lower and upper data sets. 26. Quartile Deviation Cont d Step{ 3. Q1 = the median of the lower data set Q 3 = the median of the upper data set Q 1 is called the first quartile position. Q 3 is called the third quartile position. Step 4. Q 3 Q 1 2 is called the quartile deviation. 27. Example 1. Test Scores: {49, 54, 59, 60, 62, 65, 65, 68, 77, 83, 84, 89, 90} The number of data points is n = 13 Q 2 = the median of the data set = the 7th data point = 65. 28. Example 1. Cont d {49, 54, 59, 60, 62, 65, 65, 68, 77, 83, 84, 89, 90} Lower data set: {49, 54, 59, 60, 62, 65, 65} Q 1 = the median of the lower data set = the 4th data point = 60. Upper data set: {65, 68, 77, 83, 84, 89, 90} Q 3 = the median of the upper data set = the 4th data point = 83. Quartile deviation = Q 3 Q 1 2 = 83 60 2 = 11.5.
9 29. Example 2. Test Scores: {49, 54, 59, 60, 62, 65, 68, 77, 83, 84, 89, 90} The number of data points is n = 12 Q 2 = the median of the data set = the average of X 6 and X 7 = (65 + 68)/2 = 66.5. 30. Example 2. Cont d {49, 54, 59, 60, 62, 65, 68, 77, 83, 84, 89, 90} Lower data set: {49, 54, 59, 60, 62, 65} Q 1 = the median of the lower data set = (59 + 60)/2 = 59.5. Upper data set: {68, 77, 83, 84, 89, 90} Q 3 = the median of the upper data set = (83 + 84)/2 = 83.5. Quartile deviation = Q 3 Q 1 2 = 83.5 59.5 2 = 12. 31. Exercise Puzzler Suppose you jog for one mile at a speed of 5 mph For the one mile return trip, you walk at a speed of 3 mph What s your average speed for the 2 miles? (a) 3.75 mph (b) 4 mph (c) 4.25 mph The important formula is distance = rate time Equivalently time = distance rate 32. Distance Time Rate Formula
10 For example, the time required to drive 120 miles if your speed is 60 mph is t = 120 60 = 2 hours d = 1 mile r = 5 mph t = 1 mile 5mile/hr = 1 min hr 60 5 1 hr = 12 min 33. Jogging One Mile d = 1 mile r = 3 mph t = 1 mile 3mile/hr = 1 min hr 60 3 1 hr = 20 min 34. Walking One Mile 35. Puzzler Solved Jogging one Mile and Walking One Mile: d = 2 mile t = 12 + 20 = 32 min r = d t = 2 miles 32 min = 60 miles 16 hr 60 min 1 hr
11 = 15 4 miles hr = 3.75 mph 36. Find the Missing Data Point The mean of four numbers is 75 Three of the numbers are 79, 62, and 71 What is the fourth number? Solution: Call the fourth number X. 79 + 62 + 71 + X Then µ = = 75 4 Multiply by 4: 79 + 62 + 71 + X = 4 75 = 300 212 + X = 300 X = 300 212 = 88