What is statistics? Statistics is the science of: Collecting information Organizing and summarizing the information collected Analyzing the information collected in order to draw conclusions
Two types of Statistics Descriptive Statistics Organizing and summarizing the information collected. Inferential Statistics Draws conclusion from the information collected.
Chapter 1 Exploring Data
Lesson 1-1, Displaying Distributions with Graphs Bar Graphs and Pie Charts
Data Individuals are objects described by a set of data. Individuals may be people, animals or things. Variable is any characteristic of an individual. A variable can take different values for different individuals
Types of Variables Categorical variable allows for classification of individuals based on some attribute or characteristics. Quantitative variable provides numerical measures of individuals.
Example, Page 7, #1.2 Data from a medical study contain values of many variables for each of the people who where subjects of the study. Which of the following variables are categorical and which are quantitative?
Example, Page 7, #1.2 a) Gender (female or male) categorical b) Age (years) Quantitative c) Race (Asian, black, white or other) categorical d. Smoker (yes or no) categorical e) Systolic blood pressure (millimeters of mercury) Quantitative f) Level of calcium in blood (micrograms per milliliter) Quantitative
Distribution Distribution Tells us what values the variable takes and how often it takes each value
Displaying Distributions Categorical Variables Bar Graphs Pie Charts Quantitative Variables Dotplots Stemplots Histograms
Example Page 11, #1.6 In 1997 there were 92,353 deaths from accidents in the United States. Among these were 42.340 deaths from Motor vehicle accidents, 11,858 from falls, 10,163 from poisoning, 4051 from drowning, and 3601 from fires. A) Find the percent of accidental deaths from each of these causes, rounded to the nearest percent. What percent of accidental deaths were due to other causes?
Example Page 11, #1.6 Accidents Number Percentage Motor Vehicle 42,340 42,340 45.8 46% Falls 11,858 Poisoning 10,163 Drowning 4051 Fires 3601 Other Causes 20,340 Total 92,353 92,353 13% 11% 4% 4% 22% 100%
Example Page 11, #1.6 STAT
Example Page 11, #1.6
Example Page 11, #1.6 B) Make a well-labeled bar graph of the distribution of causes of accidental deaths. Be sure to include an other causes bar.
Percentage of Accidental Deaths Example Page 11, #1.6 US Accidental Death 1997 50 40 30 20 10 MV Falls Poison Drown Fires OC Causes of Accidental Deaths
Example Page 11, #1.6 C) Would it also be correct to use a pie chart to display these data? If so, construct the pie chart. If not explain why not. Yes, since categories represent parts of a whole.
Example Page 11, #1.6 Accidents Number Percentage MV 42,340 46% Falls 11,858 13% Poisoning 10,163 11% Drowning 4051 4% Fires 3601 4% OC 20,340 22% Total 92,353 100% Pie Chart 0.46 360 165.6 166 47 40 14 14 79 360
Example Page 11, #1.6
Example Page 11, #1.6 US Accidental Deaths - 1997 22% Motor Vehicle Falls 4% 4% 11% 13% 46% Poisoning Drowning Fires Other Causes
Lesson 1-1, Displaying Distributions with Graphs Dot Plots and Stem Leaf Plots
Overall Pattern of Distribution (Quantitative Variables) Center Divides the data in half Spread Smallest to largest values Shape Skewness of the data Outlier Data that falls outside of the pattern
Example Page 16, #1.8 Are you driving a gas guzzler? Table 1.3 displays the highway gas mileage for 32 model year 2000 midsize cars. A). Make a dot plot of these data.
Example Page 16, #1.8
Example Page 16, #1.8 21 23 25 27 29 31 33 Highway Gas Mileage
Example Page 16, #1.8 B) Describe the shape, center, and spread of the distribution of gas mileages. Are there any potential outliers? The shape of the distribution is skewed to the left, with a major peak at 28 and a minor peak at 24. The spread is relatively narrow (21 to 32 mpg). The two observations at 21 and the observation at 32 appear to outliers. The center is 28 mpg.
Example Page 35, #1.28 In 1978 the English scientist Henry Cavendish measured the density of the earth by careful work with a torsion balance. The variable recorded was the density of the earth as a multiple of the density water. Here are Cavendish s 29 measurements: 5.50 5.61 4.88 5.07 5.26 5.55 5.36 5.29 5.58 5.65 5.57 5.53 5.62 5.29 5.44 5.34 5.79 5.10 5.27 5.39 5.42 5.47 5.63 5.34 5.46 5.30 5.75 5.68 5.85
Example Page 35, #1.28 5.50 5.61 4.88 5.07 5.26 5.55 5.36 5.29 5.58 5.65 5.57 5.53 5.62 5.29 5.44 5.34 5.79 5.10 5.27 5.39 5.42 5.47 5.63 5.34 5.46 5.30 5.75 5.68 5.85 Present these measurements graphically in a stemplot. Discuss the shape, center, and spread of the distribution. Are there any outliers? What is estimate of the density of the earth based on these measurements?
Example Page 35, #1.28 Density of the Earth 48 8 49 50 7 51 0 52 6 7 9 9 53 0 4 4 6 9 54 2 4 6 7 55 0 3 5 7 8 56 1 2 3 5 8 57 5 9 58 5 48 8 = 4.88% The shape of the distribution is roughly symmetric with one possible outlier at 4.88 that is somewhat low. The spread between 4.88 to 5.85. The center of the distribution if between 5.4 and 5.5. Based on the plot, we would estimate the Earth s density to be about halfway between 5.4 and 5.5.
Lesson 1-1 Displaying Distributions with Graphs Histograms and Relative Frequency Graphs
Frequency (Count) Histogram and categories Age of Spring 1998 Stat 250 Students GPAs of Spring 1998 Stat 250 Students 60 50 40 30 20 7 6 5 4 3 2 10 1 n=92 students 0 18 23 28 Age (in years) n=92 students 0 2 3 4 GPA too few categories too many categories
Example Histogram Suppose you are considering investing in a Roth IRA. You collect the data table, which represent the three-year rate of return (in percent) for 40 small capitalization growth mutual funds. 27.4 12.7 22.6 32.1 18.2 23.7 18.4 14.7 16.7 28.5 29.6 47.7 32.0 14.7 21.3 37.0 10.8 22.2 11.6 10.9 25.5 12.8 27.0 19.2 24.1 18.4 45.9 18.4 23.7 31.1 19.6 18.5 35.9 17.4 16.6 23.3 38.1 21.9 18.5 29.1
Example Histogram STAT
Example Histogram A) Construct a histogram to display these data. Record your class intervals and counts Step 1 Find the class intervals Locate the smallest number (10.8) and the largest number (47.7) Lower class limit will be 10.0 with a class width of 5
Example Histogram 3-yr Rate of Return 10.00 14.9 15.0 19.9 20.0 24.9 25.0 29.9 30.0 34.9 35.0 39.9 40.0 44.9 45.0 49.9 Total Frequency 7 11 8 6 3 3 0 2 40
Example Histogram Step 2 Graph it using the TI Stat Plot 2 nd Y= Window
Example Histogram Graph Trace
Frequency Example - Histogram 12 3 Year Rate of Return of Mutual Funds 8 4 10 15 20 25 30 35 40 45 50 Rate of Return
Example Histogram B) Describe the distribution of 3 Year Rate of Return. The shape of the distribution is skewed to the right with the center at class 15.0% 19.9%. There is one outlier in class the 45.0% 49.9%. The spread is between 10% to 50%.
Shape of a Distribution Uniform (symmetric) Bell-shaped (Symmetric) Skewed Right Skewed Left
Uniform Distribution
Symmetric Bell Shaped
Skewed Right
Skewed left
Example Relative Cumulative Frequency Suppose you are considering investing in a Roth IRA. You collect the data table, which represent the three-year rate of return (in percent) for 40 small capitalization growth mutual funds. 27.4 12.7 22.6 32.1 18.2 23.7 18.4 14.7 16.7 28.5 29.6 47.7 32.0 14.7 21.3 37.0 10.8 22.2 11.6 10.9 25.5 12.8 27.0 19.2 24.1 18.4 45.9 18.4 23.7 31.1 19.6 18.5 35.9 17.4 16.6 23.3 38.1 21.9 18.5 29.1
Example Relative Cumulative Frequency Class Freq Relative 10.0 14.9 7 15.0 19.9 11 20.0 24.9 8 25.0 29.9 6 30.0 34.9 3 35.0 39.9 3 Frequency 7 40 0.175 0.275 0.20 0.15 0.075 0.075 40.0 44.9 0 0 45.0 49.9 2 0.05 Total 40 1 Cumulative Frequency 7 7 11 18 18 8 26 32 35 38 38 40 Relative cumulative Frequency 0.175 0.175 0.275 0.45 0.45 0.2 0. 65 0.8 0.875 0.95 0.95 1
Example Relative Cumulative Frequency Class Freq Rel Freq Cum Freq Rel Cum Freq 20.0 24.9 8 0.2 26 0.65 45.0 49.9 2 0.05 40 1 26 of the 40 mutual funds had a 3 year rate of return of 24.9% or less 65% of the mutual funds had 3 year rate of return of 24.9% or less A mutual fund with a 3 year rate of return of 45% or higher is out performing 95% of its peers.
Example Relative Cumulative Frequency L3 Upper Class Limits L4 Relative Cumulative Frequency
Example Relative Cumulative Frequency
Cumulative Relative Frequency Example Relative Cumulative Frequency 3 Year Rate of Return for Small Capitalization Mutal Funds 1.2 1 0.8 0.6 0.4 0.2 0 10 14.9 19.9 24.9 29.9 34.9 39.9 44.9 49.9 Rate of Return
Lesson 1-2 Describing Distributions with Numbers Measuring the center
Mean To find the sample mean add up all of the observations and divided by the number of observations. X x x x 1 2... n n X x n Is affected by unusual values called outliers.
Median The median is the midpoint of a distribution, such that half the observation are smaller and the other half are larger. Another name for the 50 th percentile Is not affected by unusual values called outliers
Center and Distribution Mean < Median Skewed Left Mean = Median Symmetric Mean > Median Skewed Right
Measuring the Spread Range Quartiles Boxplots Standard Deviation Variance
Range The range is the difference between the largest and smallest observation. R x x max min
Quartiles Quartiles divides the observation into fourths, or four equal parts. Smallest Data Value Q1 Q2 Q 3 Largest Data Value 25% of the data 25% of the data 25% of the data 25% of the data
Interquartile Range (IQR) The interquartile range (IQR) is the distance between the first and third quartiles IQR Q Q 3 1
Outliers Upper Cutoff Q 1.5( IQR) 3 Lower Cutoff Q 1.5( IQR) 1
Five Number Summary Smallest observation (minimum) Quartile 1 Quartile 2 (median) Quartile 3 Largest observation (maximum)
Example Page 41, #1.32 The Survey of Study Habits and Attitudes (SSHA) is a Psychological test that evaluates college students Motivation, study habits and attitudes toward school. A private college gives the SSHA to a sample of 18 of Its incoming first-year women students. There scores are 154 109 137 115 152 140 154 178 101 103 126 126 137 165 165 129 200 148
Example Page 41, #1.32 A) Make a stemplot of these data. The overall shape of the distribution is irregular, as often happens when only a few observations are available. Are there any potential outliers? About where is the center of the distribution (the score with half the scores above it and half below)? What is the spread of the scores (ignoring any outliers)? STAT EDIT 1:edit
Example Page 41, #1.32 10 1 3 9 11 5 12 6 6 9 13 7 7 14 0 8 15 2 4 4 16 5 5 17 8 18 19 20 0 200 is a potential outlier. The center Is approximately 140. The spread (excluding 200) is 178 101 = 77.
Example Page 41, #1.32 154 109 137 115 152 140 154 178 101 103 126 126 137 165 165 129 200 148
Example Page 41, #1.32 B) Find the mean. x 141.058 C) Find the median of these scores. Which larger: the median or the mean? Explain why. Median 138.5 The mean is larger than the median because the outlier at 200, which pulls the mean towards the long right tail of the distribution.
Example Page 47, #1.36 Here are the scores on the Survey of Study Habits and Attitudes (SSHA) for 18 first-year college women: 154 109 137 115 152 140 154 178 101 103 126 126 137 165 165 129 200 148 and for 20 first-year college men: 108 140 114 91 180 115 126 92 169 146 109 132 75 88 113 151 70 115 187 104 A) Make side-by side boxplots to compare the distribution.
Men Example Page 47, #1.36 SSHA SCORES Box Plot 0 40 80 120 160 200
Example Page 47, #1.36 B) Compute the numerical summaries for these two distributions. x Min Q 1 Median Q 3 Max Women 141.06 101 126 138.5 154 200 Men 121.25 70 98 114.5 143 187
Example Page 47, #1.36 C) Write a paragraph comparing SSHA scores for men and women. All the displays and descriptions reveal that women generally score higher than men. The men s scores (IQR = 45) are more spread out than the women s (even if we don t ignore the outlier). The shapes of the distributions are reasonable similar, with each displaying right skewness.
Describing Distributions with Numbers Standard Deviation and Variance
Standard Deviation The standard deviation (s) measures the average distance of observations from their mean.
Example, Page 52, #1.40 The level of various substances in the blood influence our health. Here are measurements of the level of phosphate in the blood of a patient, in milligrams of phosphate per deciliter of blood, made on 6 consecutive visits to a clinic. 5.6 5.2 4.6 4.9 5.7 6.4
Example, Page 52, #1.40 5.6 5.2 4.6 4.9 5.7 6.4 A. Find the mean. x 5.6 5.2 4.6 4.9 5.7 6.4 32.4 5.4 6 6
Example, Page 52, #1.40 Observation Deviations Square Deviations x x x i 5.6 5.2 4.6 4.9 5.7 6.4 x i x 2 5.6 5.4 0.2 5.2 5.4 0.2 4.6 5.4 0.8 4.9 5.4 0.5 5.7 5.4 0.3 6.4 5.4 1 0 i
Example, Page 52, #1.40 x 4.6 x 5.4 x 6.4 0.8 1 4.5 5.0 5.5 6.0 6.5
Example, Page 52, #1.40 Observation Deviations Square Deviations x x x i 5.6 5.2 4.6 4.9 5.7 6.4 x i x 2 5.6 5.4 0.2 5.2 5.4 0.2 4.6 5.4 0.8 4.9 5.4 0.5 5.7 5.4 0.3 6.4 5.4 1 SUM 0 2 (0.2) 0.04 i 0.04 0.64 0.25 0.09 1 SUM 2.06
Example Page 52, #1.40 B) Find the standard deviation (s) from its definition. 2 1 2 s xi x n 1 1 1 2.06 2.06 0.412 6 1 5 s s 2 0.412 0.64187 0.6419
Example Page 52, #1.40 C) Use your TI-83 to find x and s. Do the result agree with part B. STAT
Example Page 52, #1.40
Standard Deviation Standard deviation (s) is the square root of the variance (s² ) Units are the original units Measures spread about the mean and should only be used when the mean is chosen as the center If s = 0 then there is no spread. Observations are the same value As s gets larger the observations are more spread out. Highly affected by outliers. Best for symmetric data
Variance Variance (s²) measures the average squared deviation of observations from the mean Units are squared Highly affected by outliers.
How to Choose? Skewed Distribution or Outliers Five number summary Symmetric Distribution or No Outliers Mean Standard Deviation
Homework HW, page 52, #1.41, 1.43 Read pages 53 61
Linear Transformation A linear transformation changes the original variable x into the new variable x new given by an equation of the form x a bx new Adding the constant a shifts all values of x upward or downward by the same amount. Multiplying by the positive constant b changes the size of the unit of measurement.
Example Page 56, #1.44 Maria measures the lengths of 5 cockroaches that she finds at school. Here are her results in inches 1.4 2.2 1.1 1.6 1.2 A. Find the mean and standard deviation.
Example Page 56, #1.44 1.4 2.2 1.1 1.6 1.2
Example Page 56, #1.44 B) Maria s science teacher is furious to discover that she has measured the cockroaches lengths in inches rather than centimeters. (There are 2.54 cm in 1 inch). Find the mean and standard deviation of the 5 cockroaches in centimeters. x 1.5 s 0.436 1.5(2.54) 0.436(2.54) 3.81cm 1.017 cm
Example Page 56, #1.44 C) Considering the 5 cockroaches that Maria found as a small sample from the population of all cockroaches at her school, what would you estimate as the average length of the population of cockroaches? How sure of your estimate are you? The average cockroach length can be estimate as the mean length of the 5 sampled cockroaches of 1.5 inches. This is a questionable estimate, because the sample is so small.
Example Page 63, #1.56 A change of units that multiplies each unit by b, such as change xnew 0 2.54x from inches x to centimeters x new, multiplies our usual measures of spread by b. This is true of the IQR and standard deviation. What happens to the variance when we change units this way? Variance is changed by a factor of 2.54² = 6.4516
Homework HW, Page 56, #1.45 HW, Page 63, #1.55
1-2 Describing Distributions with Numbers. Comparing Distributions
Example Page 59, #1.48 The table below gives the distribution grades earned by students taking the Calculus AB and Statistics exam in 2000. Calculus Statistics 5 4 3 2 1 16.8% 23.2% 23.5% 19.6% 16.8% 9.8% 21.5% 22.4% 20.5% 25.8% A. Make a graphical display to compare the AP exam grades for Calculus AB and Statistics.
% of students Earning Grade Example Page 59, #1.48 2000 AP Exam 30.0 25.0 20.0 15.0 Calculus AB Statistics 10.0 5.0 0.0 1 2 3 4 5 Grade on Exam
Example Page 59, #1.48 B) Write a few sentences comparing the two distributions of exam grade. Do you know which now know which exam is easier? Why or why not? The distributions are very similar for grades 2, 3, and 4. The major difference occurs for grades 1 and 5. With a larger proportion of Statistics students receiving a grade of 1 and a smaller proportion of Statistics student receiving a grade of 5. This suggest that the Statistics exam is harder in the sense that students are more likely to get a poor grade on the Statistics Exam than on the Calculus AB exam.
Example Page 63, 1.54 The mean x and standard deviation s measure the center and spread but are not a complete description of a distribution. Data sets with different shapes can have the same mean and standard deviation. To demonstrate this fact, use your calculator to find x and s for the following to small data sets. Then make a stem plot of each and comment on the shape of each distribution Data A 9.14 8.14 8.74 8.77 9.26 8.10 6.13 3.10 9.13 7.26 4.74 Data B 6.58 5.76 7.71 8.84 8.47 7.04 5.25 5.56 7.91 6.89 12.50
Example Page 63, 1.54 Set A Set B
Example Page 63, 1.54 Set A 3 1 4 7 5 6 1 7 2 8 1 1 7 7 9 1 1 2 3 1 = 3.1 Set B 5 2 5 7 6 5 8 7 0 7 9 8 4 8 9 10 11 12 5
Example Page 63, 1.54 The means and standard are basically the same. Set A is skewed to the left, while Set B has a higher outlier.
Homework HW, Page 59, #1.47, #1.49 HW, Page 62, #1.51, 1,57