How spread out is the data? Are all the numbers fairly close to General Education Statistics

Similar documents
Chapter. Numerically Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

Algebra 2. Outliers. Measures of Central Tendency (Mean, Median, Mode) Standard Deviation Normal Distribution (Bell Curves)

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

3.1 Measure of Center

Section 2.3: One Quantitative Variable: Measures of Spread

What are the mean, median, and mode for the data set below? Step 1

Describing distributions with numbers

Using Tables and Graphing Calculators in Math 11

Aim #92: How do we interpret and calculate deviations from the mean? How do we calculate the standard deviation of a data set?

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Lecture 11. Data Description Estimation

Ch. 17. DETERMINATION OF SAMPLE SIZE

1.3: Describing Quantitative Data with Numbers

Background to Statistics

WISE Regression/Correlation Interactive Lab. Introduction to the WISE Correlation/Regression Applet

Chapter 5. Understanding and Comparing. Distributions

Range The range is the simplest of the three measures and is defined now.

Describing distributions with numbers

Chapter 1: Exploring Data

Elementary Statistics

Recall, we solved the system below in a previous section. Here, we learn another method. x + 4y = 14 5x + 3y = 2

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

MAT Mathematics in Today's World

Introduction to Statistics

Unit 1: Statistical Analysis. IB Biology SL

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Chapter 5. Piece of Wisdom #2: A statistician drowned crossing a stream with an average depth of 6 inches. (Anonymous)

The empirical ( ) rule

CHAPTER 2: Describing Distributions with Numbers

Experiment 1: The Same or Not The Same?

Intermediate Algebra Summary - Part I

Resistant Measure - A statistic that is not affected very much by extreme observations.

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

CHAPTER 1. Introduction

Reteach 2-3. Graphing Linear Functions. 22 Holt Algebra 2. Name Date Class

1/11/2011. Chapter 4: Variability. Overview

The Celsius temperature scale is based on the freezing point and the boiling point of water. 12 degrees Celsius below zero would be written as

Accumulation with a Quadratic Function

8.1 Frequency Distribution, Frequency Polygon, Histogram page 326

Chapter 2: Tools for Exploring Univariate Data

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

Lecture 3. The Population Variance. The population variance, denoted σ 2, is the sum. of the squared deviations about the population

Statistics for Managers using Microsoft Excel 6 th Edition

MATH 1150 Chapter 2 Notation and Terminology

1. AN INTRODUCTION TO DESCRIPTIVE STATISTICS. No great deed, private or public, has ever been undertaken in a bliss of certainty.

Measures of Dispersion

Sampling Distributions of the Sample Mean Pocket Pennies

Summarising numerical data

Reminder: Univariate Data. Bivariate Data. Example: Puppy Weights. You weigh the pups and get these results: 2.5, 3.5, 3.3, 3.1, 2.6, 3.6, 2.

Objectives. Materials

Measurements of a Table

Mt. Douglas Secondary

Contents. 13. Graphs of Trigonometric Functions 2 Example Example

Describing Data: Numerical Measures GOALS. Why a Numeric Approach? Chapter 3 Dr. Richard Jerz

Chapter 3. Measuring data

UNIT 3 CONCEPT OF DISPERSION

Numerical Measures of Central Tendency

Marquette University Executive MBA Program Statistics Review Class Notes Summer 2018

Connecticut Common Core Algebra 1 Curriculum. Professional Development Materials. Unit 8 Quadratic Functions

Error Analysis in Experimental Physical Science Mini-Version

Measures of Central Tendency and their dispersion and applications. Acknowledgement: Dr Muslima Ejaz

Section 1.1: Patterns in Division

Sampling, Frequency Distributions, and Graphs (12.1)

Chapter 4. Displaying and Summarizing. Quantitative Data

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

Chapter 2. Mean and Standard Deviation

Name. The data below are airfares to various cities from Baltimore, MD (including the descriptive statistics).

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

MPM2D - Practice Mastery Test #5

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Let the x-axis have the following intervals:

Using a graphic display calculator

Percentile: Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is included:

Describing Distributions with Numbers

MINI LESSON. Lesson 2a Linear Functions and Applications

1.3.1 Measuring Center: The Mean

Expressions and Formulas 1.1. Please Excuse My Dear Aunt Sally

Lesson 4 Linear Functions and Applications

Electrical Circuits Question Paper 1

STT 315 This lecture is based on Chapter 2 of the textbook.

GRE Quantitative Reasoning Practice Questions

are the objects described by a set of data. They may be people, animals or things.

SESSION 5 Descriptive Statistics

Statistical Concepts. Constructing a Trend Plot

Probability Distributions

Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

After completing this chapter, you should be able to:

Density Curves and the Normal Distributions. Histogram: 10 groups

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Continuous Random Variables

WELCOME!! LABORATORY MATH PERCENT CONCENTRATION. Things to do ASAP: Concepts to deal with:

Exploring Graphs of Polynomial Functions

Section 7.1 Properties of the Normal Distribution

Lesson 3 Average Rate of Change and Linear Functions

14-3. Measures of Variability. OBJECTIVES Find the interquartile range, the semiinterquartile

Continuous random variables

Lecture 1: Descriptive Statistics

Instructor: Doug Ensley Course: MAT Applied Statistics - Ensley

Transcription:

How spread out is the data? Are all the numbers fairly close to General Education Statistics each other or not? So what? Class Notes Measures of Dispersion: Range, Standard Deviation, and Variance (Section 3.2) Once we know the mean of a set of data, we might be interested in knowing how close the actual values are to that mean. Are the values spread out or close together and all gathered around the mean? We have a few ways to describe what we will call dispersion. All of these measures require that the data be quantitative. The simplest (and quickest) is the range. Definition: The range, R, of a variable is the difference between the largest data value and the smallest data value. That is, Range = R = Largest Data Value Smallest Data Value expl 1: Let s try this out. The following data represent the travel times to work (in minutes) for all seven employees of a start-up web development company. Find the range of these numbers. 23, 36, 23, 18, 5, 26, 43 Another very useful measure is the standard deviation. Its definition below is a little daunting but the standard deviation can be thought of as the average distance each value is from the mean. Definition: The population standard deviation of a variable is the square root of the sum of squared deviations about the population mean divided by the number of observations in the population, N. That is, it is the square root of the mean of the squared deviations about the population mean. The population standard deviation is symbolically represented by σ (lowercase Greek sigma). Wow, that s a mouthful. Here is the formula. We take each value and find its distance to the mean. We then square those distances, add them up, and divide by N. That gives us variance which we will see later. Square root it and we get the standard deviation. 1

That is well and good, but you may also see an equivalent (computational) formula for the population standard deviation that is sometimes used. It follows. We square each value and add those up. We separately add all of the values and square that, and then divide by N. Subtract those two results and divide by N again. Finally, square root. Wow! Now, the above standard deviation concerned data gotten from an entire population. However, often we have sample data. Here, we see similar but slightly different formulas for the sample standard deviation. Definition: The sample standard deviation, s, of a variable is the square root of the sum of squared deviations about the sample mean divided by n 1, where n is the sample size. We do almost the same as if it was population data, but we divide by one less than the sample size. The computational formula follows. Notice how we use s to denote the standard deviation for a sample and σ for that of a population. Is it resistant?: Since the range is found by subtracting the minimum value from the maximum value, it is affected by extreme values. So we say the range is not resistant. On the other hand, the standard deviation uses all of the values in its calculation. However, it is still affected by extreme values, so it is not considered resistant either. 2

expl 2: Complete the table to find the standard deviation of this sample data set. Data Set 12 16 18 20 24 26 The mean is 19.3333. Notice how this squared difference gets larger as the value gets farther from the mean. What is n 1? total = = = = = 1 = The formula is = 1 Doing this on the calculator is the same as the process we used for finding the mean and median. Enter the data values in column L1 in the STAT editor. We do this by pressing the STAT button and then selecting EDIT > 1: Edit from the menu. If necessary, clear out any data in L1 by arrowing up to the column heading and pressing CLEAR. When you arrow back down, any data should be gone. Enter the values of the salaries in L1, pressing ENTER after each one. Then press the STAT button again. But this time, arrow over to select CALC > 1: 1-Var Stats. That will put this expression on the home screen. Press ENTER and the calculator will fill with many statistics. (Some newer calculators will have an intermediate screen, where you need to select L1 for List and clear out any entry in the FreqList: row. Arrow down and select Calculate.) Look for Sx and record it here. (You will also see σx which is the population standard deviation. Again, the calculator does not know if the data is from a population or sample. You must decide which standard deviation to record.) 3

Definition: The variance of a variable is the square of the standard deviation. The population variance is σ 2 and the sample variance is s 2. Notice the table in the last example has us calculate the variance on the way to the standard deviation. The units of the variance are squared units (for instance, if the variable is in feet, the variance would be in square feet). That makes interpreting the variance a little more challenging. We will see the variance later in inferential statistics. Definition: We call n 1 the degrees of freedom because the first n 1 observations have freedom to be whatever value they wish, but the n th value has no freedom. It must be whatever value forces the sum of the deviations about the mean to equal zero. This does not play a large part now but we may see degrees of freedom later in inferential statistics. Worksheet: Comparison of two data sets with same mean: We will investigate two data sets with the same mean. One data set is more spread out than the other. How do you think their standard deviations will be related? The Empirical Rule: If a distribution is roughly bell shaped, then Approximately 68% of the data will lie within 1 standard deviation of the mean. That is, approximately 68% of the data lie between μ 1σ and μ + 1σ. Approximately 95% of the data will lie within 2 standard deviations of the mean. That is, approximately 95% of the data lie between μ 2σ and μ + 2σ. Approximately 99.7% of the data will lie within 3 standard deviations of the mean. That is, approximately 99.7% of the data lie between μ 3σ and μ + 3σ. Note: We also use the Empirical Rule based on sample data with and s used in place of µ and σ. 4

Here is a picture that illustrates the Empirical Rule. Notice how µ is positioned in the exact middle, showing the center of the data. Let s see this in action to get a better understanding. expl 3: The following data represent the serum HDL cholesterol of the 54 female patients of a family doctor. 41 48 43 38 35 37 44 44 44 62 75 77 58 82 39 85 55 54 67 69 69 70 65 72 74 74 74 60 60 60 61 62 63 64 64 64 54 54 55 56 56 56 57 58 59 45 47 47 48 48 50 52 52 53 a) Compute the population mean and standard deviation. Use a calculator. b) Draw a histogram to verify the data is bell-shaped. c.) Draw a quick sketch of a bell-shaped curve, labeling the mean in the middle. Then mark the various standard deviations (plus or minus 1, 2, and 3) from the mean using a reasonable scale. You must calculate these values and label them on the horizontal axis. Go on to the next page. 5

We complete example 3 here. a) Compute the population mean and standard deviation. Use a calculator. If you want, you can verify this information. The calculator tells us that µ = 57.4 and σ = 11.7. b) Draw a histogram to verify the data is bell-shaped. Parts a and b are done for us. Does the histogram look roughly bellshaped? c.) Draw a quick sketch of a bell-shaped curve for this data, labeling the mean (µ = 57.4) in the middle. Then mark the various standard deviations (plus or minus 1, 2, and 3) from the mean using a reasonable scale. Calculate these values and label them on the horizontal axis. We will use this graph to answer several questions. 6

expl 3 additional questions: d.) What is the percentage of all patients that have serum HDL within 1, 2, and 3 standard deviations of the mean according to the Empirical Rule? within 1 standard deviation from mean: within 2 standard deviations from mean: within 3 standard deviations from mean: e.) Determine the percentage of all patients that have serum HDL between 22.3 and 92.5 according to the Empirical Rule. Refer to the gaph you produced on the previous page and your answer to part d. f.) Determine the percentage of all patients that have serum HDL between 45.7 and 69.1 according to the Empirical Rule. Refer to the gaph you produced on the previous page and your answer to part d. g.) Determine the actual percentage of patients that have serum HDL between 45.7 and 69.1. Do this by looking at the actual data. Compare this to the answer in part f. h.) Determine the percentage of all patients that have serum HDL between 34 and 69.1 according to the Empirical Rule. Refer to the gaph you produced on the previous page and your answer to part d. You will also use the fact that this bell-shaped graph is symmetrical. 7