Determining the Spread of a Distribution

Similar documents
Determining the Spread of a Distribution

Determining the Spread of a Distribution Variance & Standard Deviation

Chapter 2: Tools for Exploring Univariate Data

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

TOPIC: Descriptive Statistics Single Variable

AP Final Review II Exploring Data (20% 30%)

Chapter 4. Displaying and Summarizing. Quantitative Data

Estimation and Confidence Intervals

are the objects described by a set of data. They may be people, animals or things.

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

Describing distributions with numbers

A is one of the categories into which qualitative data can be classified.

Elementary Statistics

Introduction to Statistics

STAT 200 Chapter 1 Looking at Data - Distributions

2011 Pearson Education, Inc

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Section 3. Measures of Variation

CIVL 7012/8012. Collection and Analysis of Information

Lecture 2 and Lecture 3

Test 1 Review. Review. Cathy Poliak, Ph.D. Office in Fleming 11c (Department Reveiw of Mathematics University of Houston Exam 1)

Describing distributions with numbers

MATH 1150 Chapter 2 Notation and Terminology

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

Chapter 3. Data Description

Chapter 3 Data Description

Unit 2. Describing Data: Numerical

Chapter 1 - Lecture 3 Measures of Location

Statistics I Chapter 2: Univariate data analysis

Example 2. Given the data below, complete the chart:

Lecture 11. Data Description Estimation

Percentile: Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is included:

Statistics I Chapter 2: Univariate data analysis

Math Sec 4 CST Topic 7. Statistics. i.e: Add up all values and divide by the total number of values.

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Exercises from Chapter 3, Section 1

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

CHAPTER 1. Introduction

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

P8130: Biostatistical Methods I

MATH 117 Statistical Methods for Management I Chapter Three

Section 3.2 Measures of Central Tendency

Sampling, Frequency Distributions, and Graphs (12.1)

Chapter 3. Measuring data

CHAPTER 2 Modeling Distributions of Data

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 3.1-1

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

Descriptive Univariate Statistics and Bivariate Correlation

Descriptive Statistics-I. Dr Mahmoud Alhussami

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Chapter 5: Exploring Data: Distributions Lesson Plan

Math 7 /Unit 5 Practice Test: Statistics

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

Preliminary Statistics course. Lecture 1: Descriptive Statistics

The Normal Distribuions

Density Curves & Normal Distributions

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

4.2 The Normal Distribution. that is, a graph of the measurement looks like the familiar symmetrical, bell-shaped

Recap: Ø Distribution Shape Ø Mean, Median, Mode Ø Standard Deviations

IB Questionbank Mathematical Studies 3rd edition. Grouped discrete. 184 min 183 marks

6 THE NORMAL DISTRIBUTION

Course ID May 2017 COURSE OUTLINE. Mathematics 130 Elementary & Intermediate Algebra for Statistics

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

3.1 Measure of Center

Mean/Average Median Mode Range

Describing Distributions With Numbers

MEASURING THE SPREAD OF DATA: 6F

Math 14 Lecture Notes Ch Percentile

The Normal Distribution. Chapter 6

GRACEY/STATISTICS CH. 3. CHAPTER PROBLEM Do women really talk more than men? Science, Vol. 317, No. 5834). The study

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Review Packet for Test 8 - Statistics. Statistical Measures of Center: and. Statistical Measures of Variability: and.

Lecture 1: Descriptive Statistics

Continuous Distributions

Math Section SR MW 1-2:30pm. Bekki George: University of Houston. Sections

MgtOp 215 Chapter 3 Dr. Ahn

Chapter 6 Assessment. 3. Which points in the data set below are outliers? Multiple Choice. 1. The boxplot summarizes the test scores of a math class?

Describing Distributions

Math 140 Introductory Statistics

Math 140 Introductory Statistics

additionalmathematicsstatisticsadditi onalmathematicsstatisticsadditionalm athematicsstatisticsadditionalmathem aticsstatisticsadditionalmathematicsst

SESSION 5 Descriptive Statistics

Chapter 1. Looking at Data

Math 6 Common Core. Mathematics Prince George s County Public Schools

2.1 Measures of Location (P.9-11)

Units. Exploratory Data Analysis. Variables. Student Data

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

GRAPHS AND STATISTICS Central Tendency and Dispersion Common Core Standards

Practice Questions for Exam 1

1. Exploratory Data Analysis

Using Dice to Introduce Sampling Distributions Written by: Mary Richardson Grand Valley State University

Transcription:

Determining the Spread of a Distribution 1.3-1.5 Cathy Poliak, Ph.D. cathy@math.uh.edu Department of Mathematics University of Houston Lecture 3-2311 Lecture 3-2311 1 / 58

Outline 1 Describing Quantitative Variables 2 Measurements of Spread 3 Percentiles 4 Quartiles 5 The 1.5IQR Rule 6 Understanding Standard Deviation 7 Calculating The Standard Deviation 8 Coefficient of Variation Lecture 3-2311 2 / 58

Describing distributions of quantitative variables The distribution of a variable tells us what values it takes and how often it takes these values. There are four main characteristics to describe a distribution: 1. Shape 2. Center 3. Spread 4. Outliers Lecture 3-2311 3 / 58

Describing distributions An initial view of the distribution and the characteristics can be shown through the graphs. Then we use numerical descriptions to get a better understanding of the distributions characteristics. Lecture 3-2311 4 / 58

Parameters and Statistics A parameter is a number that describes the population. A parameter is a fixed number, but in practice we usually do not know its value. A statistic is a number that describes a sample. The value of a statistic is known when we have taken a sample, but it can change from sample to sample. We often use a statistic to estimate an unknown parameter. The purpose of sampling or experimentation is usually to use statistics to make statements about unknown parameters, this is called statistical inference. Lecture 3-2311 5 / 58

Notation of Parameters and Statistics Name Statistic Parameter mean x µ mu standard deviation s σ sigma correlation r ρ rho regression coefficient b β beta proportion ˆp p Lecture 3-2311 6 / 58

Describing Center with Numbers Recap Mean: Sum all of the numbers then divide by number of values in the data set. Median: The center value of ordered data. Mode: The value that has the highest frequency in the data. Lecture 3-2311 7 / 58

Example The following are ages of automobiles. Find the mean, median and mode of the age. 8 3 6 5 5 2 10 9 8 2 3 2 2 Lecture 3-2311 8 / 58

Example: Test Scores The test scores of a class of 20 students have a mean of 71.6 and the test scores of another class of 14 students have a mean of 78.4. Find the mean of the combined group. Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics 1.2 & 1.5 University of Houston ) Lecture 2 33 / 36

Example: Conclusions A businesswoman calculates that the median cost of the five business trips that she took in a month is $600 and concludes that the total cost must have been $3000. Explain why the conclusion drawn is not valid. Cathy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics 1.2 & 1.5 University of Houston ) Lecture 2 34 / 36

Average Test Scores? What is the mean and median for each of these sections test scores? Section A Section B 65 42 66 54 67 58 68 62 71 67 73 77 74 77 77 85 77 93 77 100 Lecture 3-2311 9 / 58

Types of Measurements for the Spread Range Percentiles Quartiles IQR; Interquartile range Variance Standard deviation Coefficient of Variation Lecture 3-2311 10 / 58

The Range The range is the difference between the highest and lowest values. Section A: Range = 77-65 = 12 Section B: Range = 100-42 = 58 Lecture 3-2311 11 / 58

Percentiles The pth percentile of data is the value such that p percent of the observations fall at or below it. The use of percentiles to report spread when the median is our measure of center. If you are looking for the measurement that has a desired percentile rank, the 100P th percentile, is the measurement with rank (or position in the list) of np + 0.5, where n represents the number of data values in the sample. Lecture 3-2311 12 / 58

The 90th percentile of Section A test scores 1. Arrange the scores in order from lowest to highest. 65 66 67 68 71 73 74 77 77 77 2. n = 10, P = 0.90, so the 90 th percentile for this list is at np + 0.5 = 10(0.9) + 0.5 = 9.5, the mean of the 9th and 10th place values. 3. The 90th percentile is 77+77 2 = 77 Find the 35th percentile. Find the 75th percentile. Lecture 3-2311 13 / 58

Determine the 25th percentile of the Course Scores Another way to determine percentiles is using the cumulative frequency polygon to estimate percentiles. Cumulative Frequency Chart Cumulative Proportion 0.0 0.2 0.4 0.6 0.8 1.0 0 20 40 60 80 100 120 Scores Lecture 3-2311 14 / 58

Determining Percentiles Suppose you know the position (order) of a value and want to know what percentile it is ranked at. If you have n data measurements, x i represents the 100(i 0.5)/n th percentile. Example: Determine the percentile of the 4 th order statistic for a sample size of n = 15. Lecture 3-2311 15 / 58

Examples of percentiles Suppose you want to know what percentile you are in a certian class. You know there are 200 students in this class and that 20 of the students have scores above you. What is your percentile? Suppose your percentile came out to be 90th percentile, how many students scored the same as or below you? What about at the 50th percentile? Lecture 3-2311 16 / 58

The Quartiles The first quartile is 25th percentile, Q 1. The second quartile is the median and the 50th percentile, Q 2. The third quartile is the 75th percentile, Q 3. Lecture 3-2311 17 / 58

Determining Q 1 for Basketball Shoe Prices Arrange in order n = 15. 100 110 120 120 140 140 140 150 185 185 215 215 250 250 290 Q 1 : P = 0.25 np + 0.5 = 15(0.25) + 0.5 = 4.25. Since we do not get an integer, we find the mean of the 4th and 5th element in the ordered dataset. Q 1 = 120+140 2 = 130. Lecture 3-2311 18 / 58

Determine Q 2 for Basketball Shoe Prices Arrange in order n = 15. 100 110 120 120 140 140 140 150 185 185 215 215 250 250 290 Q 2 : P = 0.5 np + 0.5 = 15(0.5) + 0.5 = 8. So Q 2 is the 8th element of the ordered data. Q 2 = 150. Lecture 3-2311 19 / 58

Determine Q 3 for Basketball Shoe Prices Arrange in order n = 15. 100 110 120 120 140 140 140 150 185 185 215 215 250 250 290 Q 3 : P = 0.75 np + 0.5 = 15(0.75) + 0.5 = 11.75. Again since we did not get and integer, the third quartile is the mean of the 11th and 12th elements in the ordred data. Q 3 = 215+215 2 = 215. Lecture 3-2311 20 / 58

R-code for finding Q 1, Q 2, & Q 3 The values: Minimum, Q 1, Median (Q 2 ), Q 3, and Maximum are called the Five Number Summary > shoeprice=c(100,110,120,120,140,140,140,150, 185,185,215,215,250,250,290) > fivenum(shoeprice) [1] 100 130 150 215 290 Lecture 3-2311 21 / 58

Interquartile Range Interquartile range, IQR, is the difference between Q 3 and Q 1 IQR = Q 3 Q 1 Lecture 3-2311 22 / 58

Example Twelve babies spoke for the first time at the following ages (in months): 8 9 10 11 12 13 15 15 18 20 20 26 Find Q 1, Q 2, Q 3, the range and the IQR. Lecture 3-2311 23 / 58

Find the Five Number Summary of the Course Scores > stem(grades$score,scale=0.5) The decimal point is 1 digit(s) to the right of the 0 827 2 2 4 0391 6 78825 8 0134445701238 10 1114 Lecture 3-2311 24 / 58

Detecting Outliers: 1.5IQR Rule An outlier is an observation that is "distant" from the rest of the data. Outliers can occur by chance or by measurement errors. Any point that falls outside the interval calculated by Q 1 1.5(IQR) and Q 3 + 1.5(IQR) is considered an outlier. Lecture 3-2311 25 / 58

Outliers for Basketball Shoe Prices? Recall: Q 1 = 130, Q 3 = 215, So IQR = 215-130 = 85. Q 1 1.5(IQR) = 130 1.5(85) = 2.5 Q 3 + 1.5(IQR) = 215 + 1.5(85) = 342.5 Any price that is below $2.50 or above $342.50 is considered an outlier. Lecture 3-2311 26 / 58

Outliers? The following is information from 91 pairs of basketball shoes: > fivenum(shoes$price) [1] 40 75 90 120 250 The highest four numbers in the dataset is..., 170, 225, 250, 250. Are there any prices that are considered an outlier? Lecture 3-2311 27 / 58

A Graph of the Five Number Summary: Boxplot A central box spans the quartiles. A line inside the box marks the median. Lines extend from the box out to the smallest and largest observations. Asterisks represents any values that are considered to be outliers. Boxplots are most useful for side-by-side comparison of several distributions. Rcode: boxplot(dataset name$variable name) Lecture 3-2311 28 / 58

Boxplot of Prices 50 100 150 200 250 boxplot(shoes$price,horizontal = T) Lecture 3-2311 29 / 58

Boxplot of Course Scores 20 40 60 80 100 Lecture 3-2311 30 / 58

Boxplot of Course Scores by Session Fal15 Sp16 Sum16 20 40 60 80 100 boxplot(grades$score~grades$session,horizontal=true) Lecture 3-2311 31 / 58

Question about the Graphs Given the first type of plot indicated in each pair, which of the second plots could not always be generated from it? a) dot plot, histogram b) stem and leaf, dot plot c) histogram, stem and leaf d) dot plot, box plot Lecture 3-2311 32 / 58

Measuring Spread: The Standard Deviation Measures spread by looking at how far the observations are from their mean. Most common numerical description for the spread of a distribution. A larger standard deviation implies that the values have a wider spread from the mean. Denoted s when used with a sample. This is the one we calculate from a list of values. Denoted σ when used with a population. This is the "idealized" standard deviation. The standard deviation has the same units of measurements as the original observations. Lecture 3-2311 33 / 58

Definition of the Standard Deviation The standard deviation is the average distance each observation is from the mean. Using this list of values from a sample: 3, 3, 9, 15, 15 The mean is 9. By definition, the average distance each of these values are from the mean is 6. So the standard deviation is 6. Lecture 3-2311 34 / 58

Definition of the Standard Deviation The standard deviation is the average distance each observation is from the mean. Using this list of values from a sample: 3, 3, 9, 15, 15 The mean is 9. By definition, the average distance each of these values are from the mean is 6. So the standard deviation is 6. Lecture 3-2311 34 / 58

Definition of the Standard Deviation The standard deviation is the average distance each observation is from the mean. Using this list of values from a sample: 3, 3, 9, 15, 15 The mean is 9. By definition, the average distance each of these values are from the mean is 6. So the standard deviation is 6. Lecture 3-2311 34 / 58

Definition of the Standard Deviation The standard deviation is the average distance each observation is from the mean. Using this list of values from a sample: 3, 3, 9, 15, 15 The mean is 9. By definition, the average distance each of these values are from the mean is 6. So the standard deviation is 6. Lecture 3-2311 34 / 58

Values of the Standard Deviation The standard deviation is a value that is greater than or equal to zero. It is equal to zero only when all of the observations have the same value. By the definition of standard deviation determine s for the following list of values. 2, 2, 2, 2 : standard deviation = 0 125, 125, 125, 125, 125: standard deviation = 0 Lecture 3-2311 35 / 58

Values of the Standard Deviation The standard deviation is a value that is greater than or equal to zero. It is equal to zero only when all of the observations have the same value. By the definition of standard deviation determine s for the following list of values. 2, 2, 2, 2 : standard deviation = 0 125, 125, 125, 125, 125: standard deviation = 0 Lecture 3-2311 35 / 58

Values of the Standard Deviation The standard deviation is a value that is greater than or equal to zero. It is equal to zero only when all of the observations have the same value. By the definition of standard deviation determine s for the following list of values. 2, 2, 2, 2 : standard deviation = 0 125, 125, 125, 125, 125: standard deviation = 0 Lecture 3-2311 35 / 58

Values of the Standard Deviation The standard deviation is a value that is greater than or equal to zero. It is equal to zero only when all of the observations have the same value. By the definition of standard deviation determine s for the following list of values. 2, 2, 2, 2 : standard deviation = 0 125, 125, 125, 125, 125: standard deviation = 0 Lecture 3-2311 35 / 58

Values of the Standard Deviation The standard deviation is a value that is greater than or equal to zero. It is equal to zero only when all of the observations have the same value. By the definition of standard deviation determine s for the following list of values. 2, 2, 2, 2 : standard deviation = 0 125, 125, 125, 125, 125: standard deviation = 0 Lecture 3-2311 35 / 58

Values of the Standard Deviation The standard deviation is a value that is greater than or equal to zero. It is equal to zero only when all of the observations have the same value. By the definition of standard deviation determine s for the following list of values. 2, 2, 2, 2 : standard deviation = 0 125, 125, 125, 125, 125: standard deviation = 0 Lecture 3-2311 35 / 58

Values of the Standard Deviation The standard deviation is a value that is greater than or equal to zero. It is equal to zero only when all of the observations have the same value. By the definition of standard deviation determine s for the following list of values. 2, 2, 2, 2 : standard deviation = 0 125, 125, 125, 125, 125: standard deviation = 0 Lecture 3-2311 35 / 58

Adding or Subtracting a Value to the Observations Adding or subtracting the same value to all the original observations does not change the standard deviation of the list. Using this list of values: 3, 3, 9, 15, 15 mean = 9, standard deviation = 6. If we add 4 to all the values: 7, 7, 13, 19, 19 mean = 13, standard deviation = 6 Lecture 3-2311 36 / 58

Adding or Subtracting a Value to the Observations Adding or subtracting the same value to all the original observations does not change the standard deviation of the list. Using this list of values: 3, 3, 9, 15, 15 mean = 9, standard deviation = 6. If we add 4 to all the values: 7, 7, 13, 19, 19 mean = 13, standard deviation = 6 Lecture 3-2311 36 / 58

Adding or Subtracting a Value to the Observations Adding or subtracting the same value to all the original observations does not change the standard deviation of the list. Using this list of values: 3, 3, 9, 15, 15 mean = 9, standard deviation = 6. If we add 4 to all the values: 7, 7, 13, 19, 19 mean = 13, standard deviation = 6 Lecture 3-2311 36 / 58

Adding or Subtracting a Value to the Observations Adding or subtracting the same value to all the original observations does not change the standard deviation of the list. Using this list of values: 3, 3, 9, 15, 15 mean = 9, standard deviation = 6. If we add 4 to all the values: 7, 7, 13, 19, 19 mean = 13, standard deviation = 6 Lecture 3-2311 36 / 58

Multiplying or Dividing a Value to the Observations Multiplying or dividing the same value to all the original observations will change the standard deviation by that factor. Using this list of values: 3, 3, 9, 15, 15: mean = 9, standard deviation = 6. If we double all the values: 6, 6, 18, 30, 30 mean = 18, standard deviation = 12 Lecture 3-2311 37 / 58

Multiplying or Dividing a Value to the Observations Multiplying or dividing the same value to all the original observations will change the standard deviation by that factor. Using this list of values: 3, 3, 9, 15, 15: mean = 9, standard deviation = 6. If we double all the values: 6, 6, 18, 30, 30 mean = 18, standard deviation = 12 Lecture 3-2311 37 / 58

Multiplying or Dividing a Value to the Observations Multiplying or dividing the same value to all the original observations will change the standard deviation by that factor. Using this list of values: 3, 3, 9, 15, 15: mean = 9, standard deviation = 6. If we double all the values: 6, 6, 18, 30, 30 mean = 18, standard deviation = 12 Lecture 3-2311 37 / 58

Multiplying or Dividing a Value to the Observations Multiplying or dividing the same value to all the original observations will change the standard deviation by that factor. Using this list of values: 3, 3, 9, 15, 15: mean = 9, standard deviation = 6. If we double all the values: 6, 6, 18, 30, 30 mean = 18, standard deviation = 12 athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics 1.3-1.5University of Houston ) Lecture 3-2311 37 / 58

Population Variance and Standard Deviation If N is the number of values in a population with mean mu, and x i represents each individual in the population, the the population variance is found by: σ 2 = N i=1 (x i µ) 2 N and the population standard deviation is the square root, σ = σ 2. Lecture 3-2311 38 / 58

Sample Variance and Standard Deviation Most of the time we are working with a sample instead of a population. So the sample variance is found by: s 2 = n i=1 (x i x) 2 n 1 and the sample standard deviation is the square root, s = s 2. Where n is the number of observations (samples), x i is the value for the i th observation and x is the sample mean. Lecture 3-2311 39 / 58

Calculating the Standard Deviation By Hand When calculating by hand we will calculate s. 1. Find the mean of the observations x. 2. Calculate the difference between the observations and the mean for each observation x i x. This is called the deviations of the observations. 3. Square the deviations for each observation (x i x) 2. 4. Add up the squared deviations together n i=1 (x i x) 2. 5. Divide the sum of the squared deviations by one less than the number of observations n 1. This is the variance s 2 = 1 n 1 n (x i x) 2 i=1 Lecture 3-2311 40 / 58

Calculating the Standard Deviation By Hand When calculating by hand we will calculate s. 1. Find the mean of the observations x. 2. Calculate the difference between the observations and the mean for each observation x i x. This is called the deviations of the observations. 3. Square the deviations for each observation (x i x) 2. 4. Add up the squared deviations together n i=1 (x i x) 2. 5. Divide the sum of the squared deviations by one less than the number of observations n 1. This is the variance s 2 = 1 n 1 n (x i x) 2 i=1 athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics 1.3-1.5University of Houston ) Lecture 3-2311 40 / 58

Calculating the Standard Deviation By Hand When calculating by hand we will calculate s. 1. Find the mean of the observations x. 2. Calculate the difference between the observations and the mean for each observation x i x. This is called the deviations of the observations. 3. Square the deviations for each observation (x i x) 2. 4. Add up the squared deviations together n i=1 (x i x) 2. 5. Divide the sum of the squared deviations by one less than the number of observations n 1. This is the variance s 2 = 1 n 1 n (x i x) 2 i=1 athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics 1.3-1.5University of Houston ) Lecture 3-2311 40 / 58

Calculating the Standard Deviation By Hand When calculating by hand we will calculate s. 1. Find the mean of the observations x. 2. Calculate the difference between the observations and the mean for each observation x i x. This is called the deviations of the observations. 3. Square the deviations for each observation (x i x) 2. 4. Add up the squared deviations together n i=1 (x i x) 2. 5. Divide the sum of the squared deviations by one less than the number of observations n 1. This is the variance s 2 = 1 n 1 n (x i x) 2 i=1 athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics 1.3-1.5University of Houston ) Lecture 3-2311 40 / 58

Calculating the Standard Deviation By Hand When calculating by hand we will calculate s. 1. Find the mean of the observations x. 2. Calculate the difference between the observations and the mean for each observation x i x. This is called the deviations of the observations. 3. Square the deviations for each observation (x i x) 2. 4. Add up the squared deviations together n i=1 (x i x) 2. 5. Divide the sum of the squared deviations by one less than the number of observations n 1. This is the variance s 2 = 1 n 1 n (x i x) 2 i=1 athy Poliak, Ph.D. cathy@math.uh.edu (Department of Mathematics 1.3-1.5University of Houston ) Lecture 3-2311 40 / 58

Step 6: Standard Deviation 6. Find the square root of the variance. This is the standard deviation s = 1 n (x i x) n 1 2 i=1 Lecture 3-2311 41 / 58

Example: Section A Determine the sample standard deviation of the test scores for Section A. Section A Scores (X i ) 65 66 67 68 71 73 74 77 77 77 Lecture 3-2311 42 / 58

Step 1: Calculate the Mean The sample mean is x = 71.5. Lecture 3-2311 43 / 58

Use Table To Calculate Standard Deviation Variable Deviations Deviations Squared Score (X i ) X i X (X i X) 2 65 66 67 68 71 73 74 77 77 77 sum Lecture 3-2311 44 / 58

Step 2: Calculate Deviations For All Values Variable Deviations Deviations Squared Score (X i ) X i X (X i X) 2 65 65 71.5 = 6.5 66 66 71.5 = 5.5 67 67 71.5 = 4.5 68 68 71.5 = 3.5 71 71 71.5 = 0.5 73 73 71.5 = 1.5 74 74 71.5 = 2.5 77 77 71.5 = 5.5 77 77 71.5 = 5.5 77 77 71.5 = 5.5 sum Lecture 3-2311 45 / 58

Step 3: Calculate Squared Deviations Variable Deviations Deviations Squared Score (X i ) X i X (X i X) 2 65 65 71.5 = 6.5 ( 6.5) 2 = 42.25 66 66 71.5 = 5.5 ( 5.5) 2 = 30.25 67 67 71.5 = 4.5 ( 4.5) 2 = 20.25 68 68 71.5 = 3.5 ( 3.5) 2 = 12.25 71 71 71.5 = 0.5 ( 0.5) 2 = 0.25 73 73 71.5 = 1.5 1.5 2 = 2.25 74 74 71.5 = 2.5 2.5 2 = 6.25 77 77 71.5 = 5.5 5.5 2 = 30.25 77 77 71.5 = 5.5 5.5 2 = 30.25 77 77 71.5 = 5.5 5.5 2 = 30.25 sum Lecture 3-2311 46 / 58

Step 4: Calculate the Sum of the Squared Deviations Variable Deviations Deviations Squared Score(X i ) X i X (X i X) 2 65 65 71.5 = 6.5 ( 6.5) 2 = 42.25 66 66 71.5 = 5.5 ( 5.5) 2 = 30.25 67 67 71.5 = 4.5 ( 4.5) 2 = 20.25 68 68 71.5 = 3.5 ( 3.5) 2 = 12.25 71 71 71.5 = 0.5 ( 0.5) 2 = 0.25 73 73 71.5 = 1.5 1.5 2 = 2.25 74 74 71.5 = 2.5 2.5 2 = 6.25 77 77 71.5 = 5.5 5.5 2 = 30.25 77 77 71.5 = 5.5 5.5 2 = 30.25 77 77 71.5 = 5.5 5.5 2 = 30.25 sum n i=1 (X i X) 2 = 204.5 Lecture 3-2311 47 / 58

Step 5: Calculate the Variance variance = s 2 = 1 n 1 n (x i x) 2 i=1 = 204.5 9 = 22.7222 Lecture 3-2311 48 / 58

Step 6: Take the Square Root of the Variance standard deviation = s = 1 n 1 = 22.7222 = 4.77 n (x i x) 2 i=1 Lecture 3-2311 49 / 58

Sample Standard Deviation of Section A test scores Sample standard deviation is s = 4.77. This implies that from the sample of the 10 students from section A the tests scores has a spread, on average, of 4.77 points from the mean of 71.50 points. Lecture 3-2311 50 / 58

Example A statistics teacher wants to decide whether or not to curve an exam. From her class of 300 students, she chose a sample of 10 students and their grade were: 72, 88, 85, 81, 60, 54, 70, 72, 63, 43 Determine the sample mean. What is the variance? What is the standard deviation? Lecture 3-2311 51 / 58

Add 10 Suppose the statistics instructor decides to curve the grade by adding 10 points to each score. What is the new mean, variance and standard deviation? Lecture 3-2311 52 / 58

Multiply by 2 For the following dataset the mean is x = 4.5, the variance is s 2 = 3.5 and the standard deviation is s = 1.870829. 3, 6, 2, 7, 4, 5 Now, multiply each value by 2. What is the new variance and the new standard deviation? Lecture 3-2311 53 / 58

Calculating Standard Deviation For larger data sets use a calculator or computer software. Each calculator is different if you cannot determine how to compute standard deviation from your calculator ask your instructor. For this course we will be using R as the software. The function for the sample standard deviation in R is sd(data name$variable name). Lecture 3-2311 54 / 58

Coefficient of Variation This is to compare the variation between two groups. The coefficient of variation (cv) is the ratio of the standard deviation to the mean. cv = sd mean A smaller ratio will indicate less variation in the data. Lecture 3-2311 55 / 58

CV of test scores Section A Section B Sample Size 10 10 Sample Mean 71.5 71.5 Sample Standard Deviation 4.770 18.22 4.77 CV 71.5 = 0.066 18.22 71.5 = 0.2548 Lecture 3-2311 56 / 58

CV Example The following statistics were collected on two different groups of stock prices: Portfolio A Portfolio B Sample size 10 15 Sample mean $52.65 $49.80 Sample standard deviation $6.50 $2.95 What can be said about the variability of each portfolio? Lecture 3-2311 57 / 58

Things to do before Thursday 1. Try to download R and R-studio 2. Start working on homework 1 3. Work on quiz 1. Lecture 3-2311 58 / 58