are the objects described by a set of data. They may be people, animals or things.

Similar documents
CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

Chapter 5: Exploring Data: Distributions Lesson Plan

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Chapter 5: Exploring Data: Distributions Lesson Plan

Chapter 2: Tools for Exploring Univariate Data

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

CHAPTER 1. Introduction

Describing distributions with numbers

Describing distributions with numbers

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

Elementary Statistics

Introduction to Statistics

AP Final Review II Exploring Data (20% 30%)

Histograms allow a visual interpretation

Example 2. Given the data below, complete the chart:

Chapter 4: Displaying and Summarizing Quantitative Data

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Chapter 5. Understanding and Comparing. Distributions

Chapter 4. Displaying and Summarizing. Quantitative Data

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

STAT 200 Chapter 1 Looking at Data - Distributions

Determining the Spread of a Distribution

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Determining the Spread of a Distribution

Last time. Numerical summaries for continuous variables. Center: mean and median. Spread: Standard deviation and inter-quartile range

Units. Exploratory Data Analysis. Variables. Student Data

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

Resistant Measure - A statistic that is not affected very much by extreme observations.

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

MATH 2560 C F03 Elementary Statistics I Lecture 1: Displaying Distributions with Graphs. Outline.

Section 3.2 Measures of Central Tendency

CHAPTER 2: Describing Distributions with Numbers

1.3: Describing Quantitative Data with Numbers

Chapters 1 & 2 Exam Review

Chapter 6 Group Activity - SOLUTIONS

Percentile: Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is included:

CHAPTER 1 Exploring Data

1.3.1 Measuring Center: The Mean

Math 140 Introductory Statistics

Math 140 Introductory Statistics

1. Exploratory Data Analysis

University of California, Berkeley, Statistics 131A: Statistical Inference for the Social and Life Sciences. Michael Lugo, Spring 2012

Sets and Set notation. Algebra 2 Unit 8 Notes

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays

Practice Questions for Exam 1

Describing Distributions with Numbers

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

3.1 Measure of Center

MATH 117 Statistical Methods for Management I Chapter Three

TOPIC: Descriptive Statistics Single Variable

Chapter 1: Exploring Data

Chapter 3: The Normal Distributions

Section 3. Measures of Variation

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Recap: Ø Distribution Shape Ø Mean, Median, Mode Ø Standard Deviations

Chapter 1. Looking at Data

Lecture 1: Descriptive Statistics

+ Check for Understanding

QUIZ 1 (CHAPTERS 1-4) SOLUTIONS MATH 119 SPRING 2013 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS = 100%

Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size. Arrange the values from smallest to largest:

Lecture 2 and Lecture 3

MATH 1150 Chapter 2 Notation and Terminology

A graph for a quantitative variable that divides a distribution into 25% segments.

Stat 101 Exam 1 Important Formulas and Concepts 1

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

Mean/Average Median Mode Range

Descriptive Statistics

Remember your SOCS! S: O: C: S:

Determining the Spread of a Distribution Variance & Standard Deviation

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Exercises from Chapter 3, Section 1

MAT Mathematics in Today's World

Statistics I Chapter 2: Univariate data analysis

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Chapter 3. Data Description

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

STT 315 This lecture is based on Chapter 2 of the textbook.

Statistics I Chapter 2: Univariate data analysis

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Keystone Exams: Algebra

Description of Samples and Populations

QUIZ 1 (CHAPTERS 1-4) SOLUTIONS MATH 119 FALL 2012 KUNIYUKI 105 POINTS TOTAL, BUT 100 POINTS

Statistics and parameters

Section 3.4 Normal Distribution MDM4U Jensen

Unit 4 Probability. Dr Mahmoud Alhussami

Lecture 2. Descriptive Statistics: Measures of Center

ORGANIZATION AND DESCRIPTION OF DATA

The Empirical Rule, z-scores, and the Rare Event Approach

Chapter2 Description of samples and populations. 2.1 Introduction.

The empirical ( ) rule

A is one of the categories into which qualitative data can be classified.

Descriptive Univariate Statistics and Bivariate Correlation

Lecture 1: Description of Data. Readings: Sections 1.2,

Statistics for Managers using Microsoft Excel 6 th Edition

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

2.1 Measures of Location (P.9-11)

Transcription:

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 1 CHAPTER 5: EXPLORING DATA DISTRIBUTIONS 5.1 Creating Histograms are the objects described by a set of data. They may be people, animals or things. A is any characteristic of an individual. A variable can take on different values for different individuals. Some variables are numeric and others are not. is the process of looking at data to describe the main features. Begin by looking at each variable and then the relationships between the variables. Graphs and numerical summaries are useful. The of a variable tells us what values the variable takes and how often it takes these values (or intervals of values). A is a graph of the distribution of outcomes for a single numerical variable. The height of each bar is the number of observations in the class of outcomes covered by the base of the bar. All classes should have the same width and each observation must fall into exactly one class (interval). Display the data below in a histogram: Value Count 6 8 7 11 8 9 9 5 10 0 11 2

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 2 Histogram Bin Width 56, 63, 65, 66, 70, 71, 74, 75, 75, 76, 76, 79, 80, 80, 81, 82, 82, 85, 87, 90, 90, 90, 91, 92, 93, 93, 94, 96, 97, 98, 98, 103, 104, 104, 105, 109, 110, 115, 127, 132 1 Bin 2 Bins 3 Bins 4 Bins 5 Bins 6 Bins 7 Bins 8 Bins 9 Bins 10 Bins 15 Bins 20 Bins 30 Bins 40 Bins

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 3 Steps to Creating a Histogram: 1. Choose the classes. Divide the range of the data into a reasonable number of classes of equal width. 2. Count the number of individuals in each class (frequency). 3. Draw the histogram. The vertical axis is the count in each class. The horizontal axis represents the classes. The following data is the recorded daily high temperature (in F) in College Station for March 2006. Display this data in a histogram. 86 86 85 83 83 82 82 81 81 80 79 77 77 77 76 76 75 74 74 73 72 72 72 69 69 69 67 65 61 58 51 Classes (Size: ) Frequency

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 4 5.2 Interpreting Histograms In any graph of data, look for patterns and deviations from the patterns. Ask yourself questions such as: Does the graph have one or more peaks? Is the graph symmetric or skewed? Are there outliers? Where is the center? Is most of the data spread out or close together? A graph is if the right and left sides of the histogram are approximately mirror images of each other. A graph is if the longer tail is on the right side. This is also called skewed. A graph is if the longer tail is on the left side. This is also called skewed. An is an individual data value that falls outside the overall pattern.

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 5 Comment on the shapes of the histograms below:

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 6 5.3 Creating Stemplots A is a display of the distribution of a variable that attaches the final digits of the observations as leaves on stems made up of all but the final digit. To Make A Stemplot: 1. Separate each observation into a stem (consisting of all but the rightmost digit) and a leaf (the rightmost digit). 2. Write the stems in a vertical column with the smallest at the top. Include all the stem values from smallest to largest, even if some are not used. Draw a vertical line at the right of this column. 3. Write each leaf in the row to the right of its stem in increasing order. 4. Provide a key. Display the following data in a stemplot: 2 5 7 9 11 15 16 18 18 23 23 25 25 28 29 29 34 35 37 39 40 43 44 45 45 57 70

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 7 Round the following data to the nearest 10, drop the ending zero and display the result in a stemplot. 118 122 160 161 203 210 216 247 250 266 301 302 304 313 316 321 328 333 334 335 349 393 403 411 605 1111

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 8 5.4 Describing Center: Mean and Median To find the, x, of a set of observations, add their values and divide by the number of observations. If the n observations are x 1, x 2,, x n then x = x 1 + x 2 + + x n n To find the, M, of a set of observations, 1. **Arrange all the observations in increasing order.** 2. If the number of observations is odd, the median is the observation in the center of the ordered list. 3. If the number of observations is even, the median is the mean of the two center observations in the ordered list. The of a set of observations is the observation that occurs the most frequently. You can have no mode or, if there is a tie, you can have multiple modes. All of these measurements will have the same units as the data values.

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 9 The following are scores on an honors exam from a class of 17 students: 32, 71, 72, 77, 77, 83, 84, 85, 87, 89, 90, 92, 95, 96, 98, 99, 100 What is the average score on the honors exam? Are there any outliers? If so, if they are removed, how will that affect the average scores?

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 10 5.5 Describing Spread: The Quartiles The is a measure of spread of a set of observations. It is obtained by subtracting the smallest observation (minimum) from the largest (maximum). The median cuts the data into two groups half of the data is below the median and half is above the median. Quartiles cut the data into four groups. The median is also called the second quartile, Q 2. A fourth of the data is below the first quartile, Q 1. Q 1 is the median of the data below Q 2. A fourth of the data is above the third quartile, Q 3. Q 3 is the median of the data above Q 2. The interquartile range (or IQR) is Q 3 - Q 1 Min Q 1 M = Q 2 Q 3 Max The following are scores on an honors exam from a class of 17 students: 32, 71, 72, 77, 77, 83, 84, 85, 87, 89, 90, 92, 95, 96, 98, 99, 100 What is the minimum score? Maximum score? What is the range of scores? What is Q 1? What is Q 2? What is Q 3? What is the IQR?

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 11 The following data is the recorded daily high temperature (in F) in College Station for March 2006. 86, 86, 85, 83, 83, 82, 82, 81, 81, 80, 79, 77, 77, 77, 76, 76, 75, 74, 74, 73, 72, 72, 72, 69, 69, 69, 67, 65, 61, 58, 51 Find the following for the given data: Min: Max: Range: Mean: Median: Mode: Q 1 : Q 2 : Q 3 : IQR: Reminder: The numbers given in these examples were already placed in numerical order. If your data is not given to you in numerical order, you must first put the data in numerical order before calculating the median and the quartiles. If the mean, median and mode of the salary of all cultural geography graduates from UNC are given, how will these be changed (if at all) knowing that Michael Jordan is amongst these graduates? What effect will this have on the histogram?

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 12 5.6 The Five-Number Summary and Boxplots The of a distribution consists of the following: Minimum Q 1 Median Q 3 Maximum A is a graph of the five-number summary, Display the exam scores and high temperatures (from previous examples) in a boxplot. Scores: Temps: One definition of an outlier is a data value that is either less than Q 1 1.5 IQR or greater than Q 3 + 1.5 IQR. Are there outliers in the exam grades?

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 13 The CDC reports that the number of new AIDS cases per year from 1990 to 2004 in Iowa are: 75, 118, 156, 104, 110, 104, 97, 76, 60, 78, 79, 80, 75, 76, 69 Show this data in a boxplot, with labels. Are there outliers? The CDC reports that the number of new Lyme disease cases per year from 1990 to 2005 in Iowa are: 16, 22, 33, 8, 17, 16, 19, 8, 27, 24, 34, 54, 65, 72, 49, 88 Show this data in a boxplot, with labels. Are there any outliers? Six numbers have min=5, max=19, mode=17 and med=16. Find a possible data set for these results.

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 14 5.7 Describing Spread: The Standard Deviation 3, 3, 3, 3, 3 2, 2, 3, 4, 4 0, 0, 5, 5, 5 Mean=3, Std. Dev = 0 Mean=3, Std. Dev =1 Mean = 3, Std. Dev=2.7 On average, how far away are the measurements from the mean? The standard deviation answers this question. The table to the right gives the average monthly temperature (in F) of two different cities for four different months. Find the mean temperature and standard deviation for each city, and determine which city s temperature varies the most. San Diego x 65 68 76 75 Jan Apr Jul Oct San Diego 65 68 76 75 Chicago 29 59 84 64 Chicago x 29 59 84 64 The formula for standard deviation, s, is: s = variance = (x 1 x ) 2 + (x 2 x ) 2 + + (x n x ) 2 n 1

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 15 5.8 Normal Distributions When you have data for a single numerical variable you should 1. Graph the data as a histogram, stemplot or dotplot. 2. Look at the shape, center and spread of the graph. 3. Calculate numerical summary numbers such as the five-number summary or the mean and standard deviation. 4. Is the distribution so regular that it could be described by a smooth curve? If so, is the curve bell shaped? One hundred fair coins were flipped and the number of heads was counted. This experiment was repeated 114 times and the results are shown in the histogram below. How many times were 45 or fewer heads observed? What proportion of the time were 45 or fewer heads observed? What proportion of the time were more than 50 heads observed?

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 16 Can we approximate this with a bell curve? The data has a mean of 50.28 heads and a standard deviation of 5.1 heads. This generates a bell curve with the shape as shown. 12 10 8 6 4 2 0 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70 Can we use the smooth curve to find the probabilities? Some general information about bell curves. 1. Referred to as the NORMAL CURVE or NORMAL DIST. 2. The location of the peak depends on where the mean is located. Typically use the symbol µ (mu) for the mean of the distribution. -3-2 -1 0 1 2 3 3. The curve is symmetric about the mean.

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 17 4. The spread is determined by the standard deviation. Typically use the symbol σ (sigma) for the standard deviation of the distribution. A -3-2 -1 0 1 2 3 To find when you are one standard deviation away from the mean, look for where the curvature changes. B C -2-1 0 1 2 The normal curve with a mean of 0 and a standard deviation of 1 is known as the standard normal curve. For all normal curves, the first quartile is located about 0.67 standard deviations below the mean and the third quartile is located about 0.67 standard deviations above the mean. Where are the first and third quartiles located on (a) the standard normal curve? (b) the normal curve with μ = 10, σ = 2?

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 18 A certain type of washing machines has a useful life with a mean of 12 years and a standard deviation of 2 years. (a) Draw a normal curve with the mean and standard deviation located correctly. (b) If a washing machine lasts 10 years, how many standard deviations below the mean is that washing machine? What if it lasted 15 years? The standard score (or ) of a measurement X is how many standard deviations (σ) the measurement is away from the mean (µ). To calculate the standard score, Z, or to find X with a given Z value, use the formula X Z or X Z A normal distribution has a mean of 50 and a standard deviation of 8. (a) Find the z-scores for the following values of X: X = 42, Z = X = 38, Z = X = 60, Z = (b) If the z-score of a measurement is 1.5, what is the value of X?

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 19 5.9 The 68-95-99.7 Rule In any normal distribution, About 68% of the data is within one standard deviation of the mean About 95% of the data is within 2 standard deviations of the mean About 99.7% of the data is within 3 standard deviations of the mean. The length of tape on a roll of a certain type of masking tape is normally distributed with a mean of 25 meters and a standard deviation of 50 centimeters. (a) What is the range of lengths of most (99.7%) of the rolls? (b) What percent of the rolls are longer than 26 meters? (c) What lengths bracket the middle 50% of the rolls of tape?

( c ) E p s t e i n, C a r t e r a n d B o l l i n g e r 2016 C h a p t e r 5 : E x p l o r i n g D a t a : D i s t r i b u t i o n s P a g e 20 A class of 1000 students will be graded based on the normal curve with A grade of A assigned to students who are more than two standard deviations above the mean. A grade of B assigned to students who are between 1 and 2 standard deviations above the mean. A grade of C assigned to the students within one standard deviation of the mean. A grade of D assigned to students between 1 and 2 standard deviations below the mean. A grade of F assigned to students more than two standard deviations below the mean. How many students receive each grade? If the mean grade in the class is 76, with a standard deviation of 12, what are the grade cut-offs?