Sections 2.3 and 2.4

Similar documents
1.3: Describing Quantitative Data with Numbers

Description of Samples and Populations

Describing Distributions with Numbers

Chapter 4. Displaying and Summarizing. Quantitative Data

CHAPTER 2: Describing Distributions with Numbers

Section 2.3: One Quantitative Variable: Measures of Spread

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Resistant Measure - A statistic that is not affected very much by extreme observations.

Chapter 1: Exploring Data

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Section 2.4. Measuring Spread. How Can We Describe the Spread of Quantitative Data? Review: Central Measures

Review: Central Measures

1.3.1 Measuring Center: The Mean

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Units. Exploratory Data Analysis. Variables. Student Data

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

Describing distributions with numbers

Describing Distributions

Describing distributions with numbers

Chapter2 Description of samples and populations. 2.1 Introduction.

Elementary Statistics

MATH 1150 Chapter 2 Notation and Terminology

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

CHAPTER 2 Description of Samples and Populations

2.1 Measures of Location (P.9-11)

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

Measures of the Location of the Data

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

2011 Pearson Education, Inc

STOR 155 Introductory Statistics. Lecture 4: Displaying Distributions with Numbers (II)

STAT 200 Chapter 1 Looking at Data - Distributions

Chapter 3. Measuring data

Chapter 2: Tools for Exploring Univariate Data

Describing Distributions With Numbers

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Section 3. Measures of Variation

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

Lecture 2 and Lecture 3

Descriptive Univariate Statistics and Bivariate Correlation

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

are the objects described by a set of data. They may be people, animals or things.

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

The Normal Distribution. Chapter 6

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Finding Quartiles. . Q1 is the median of the lower half of the data. Q3 is the median of the upper half of the data

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

Chapter 5. Understanding and Comparing. Distributions

Descriptive Statistics

Let's Do It! What Type of Variable?

1 Measures of the Center of a Distribution

Chapter 3. Data Description

Unit 2. Describing Data: Numerical

Instructor: Doug Ensley Course: MAT Applied Statistics - Ensley

Histograms allow a visual interpretation

BNG 495 Capstone Design. Descriptive Statistics

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays

MATH 117 Statistical Methods for Management I Chapter Three

CIVL 7012/8012. Collection and Analysis of Information

P8130: Biostatistical Methods I

Full file at

Percentile: Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is included:

Describing Distributions With Numbers Chapter 12

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

Describing Data: Two Variables

1. Exploratory Data Analysis

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Stats Review Chapter 3. Mary Stangler Center for Academic Success Revised 8/16

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

3.1 Measure of Center

Chapter 1. Looking at Data

Let's Do It! What Type of Variable?

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

CHAPTER 1. Introduction

STT 315 This lecture is based on Chapter 2 of the textbook.

Exercises from Chapter 3, Section 1

Stat 101 Exam 1 Important Formulas and Concepts 1

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

Chapter 11: Analysis of variance

Describing Center: Mean and Median Section 5.4

Numerical Measures of Central Tendency

CHAPTER 1 Exploring Data

After completing this chapter, you should be able to:

Chapter 6 Group Activity - SOLUTIONS

3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability

AP Final Review II Exploring Data (20% 30%)

Performance of fourth-grade students on an agility test

UCLA STAT 10 Statistical Reasoning - Midterm Review Solutions Observational Studies, Designed Experiments & Surveys

Recap: Ø Distribution Shape Ø Mean, Median, Mode Ø Standard Deviations

Measures of disease spread

Exploratory data analysis: numerical summaries

Summarising numerical data

Chapter 9 Prerequisite Practice MATH NOTES ETHODS AND MEANINGS. Describing Shape (of a Data Distribution)

Statistics I Chapter 2: Univariate data analysis

Statistics for Managers using Microsoft Excel 6 th Edition

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts.

Lecture 1: Descriptive Statistics

Statistics and parameters

Transcription:

1 / 24 Sections 2.3 and 2.4 Note made by: Dr. Timothy Hanson Instructor: Peijie Hou Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences

2 / 24 Descriptive statistics For continuous data, a histogram (or dotplot) provides a snapshot of the data. This snapshot can be augmented with a few numbers to give a brief quantitative description of the data. These numbers (mean, median, mode, standard deviation, interquartile range, etc.) are called sample statistics.

3 / 24 Sample median The median is a number that splits the data into two groups. Half the observations are smaller than the median, and half are larger. Need to order the data first, then find middle observation. This is unique if n is odd. Take average of middle two if n even.

4 / 24 Example 2.3.1: weight gain in lambs n = 6 lambs weight gain (lbs) recorded over two weeks. The ordered values are: The sample median is 1, 2, 10, 11, 13, 19. ỹ = 10 + 11 2 = 10.5 lbs. 3 obs. larger than median & 3 smaller:

5 / 24 Sample mean The sample mean is ȳ = y 1 + y 2 + + y n n = 1 n n y i, i=1 where the y i s are the observations in the sample and n is the sample size. The sample mean is the average of the n data values. Has interpretation as point of balance. If every observation has the same weight, then ȳ is fulcrum of balance.

6 / 24 Example 2.3.1: weight gain in lambs Sample mean is ȳ = 1 + 2 + 10 + 11 + 13 + 19 6 = 56 6 = 9.33 lbs. Median causes see-saw to tip

7 / 24 Example 2.3.1: weight gain in lambs Sample mean is ȳ = 1 + 2 + 10 + 11 + 13 + 19 6 = 56 6 = 9.33 lbs. Mean balances see-saw

8 / 24 Mean versus median Median is robust to outliers, mean is not. What happens with lamb weight gain when we replace the largest value 19 by 100? Original data: ỹ = 10.5 and ȳ = 9.33 lbs. New data: ỹ = 10.5 and ȳ = 22.83 lbs. 1, 2, 10, 11, 13, 100, Mean is also pulled in direction of skew further than median.

9 / 24 Example 2.3.1: Cricket singing times Male Mormon crickets sing to attract mates. The song duration from n = 51 crickets was measured in minutes.

10 / 24 R code: cricket music R code to get mean and median: times=c(4.3,3.9,17.4,2.3,0.8,1.5,0.7,3.7,24.1,9.4,5.6,3.7,5.2,3.9,4.2, 3.5,6.6,6.2,2.0,0.8,2.0,3.7,4.7,7.3,1.6,3.8,0.5,0.7,4.5,2.2,4.0,6.5, 1.2,4.5,1.7,1.8,1.4,2.6,0.2,0.7,11.5,5.0,1.2,14.1,4.0,2.7,1.6,3.5,2.8, 0.7,8.6) mean(times) median(times) Output: > mean(times) [1] 4.335294 > median(times) [1] 3.7

11 / 24 Example 2.3.1: Cricket singing times Figure : n = 51 cricket singing times; mean pulled toward right tail the direction of skew more than median.

12 / 24 Mean versus median Median may make more sense for skewed data, i.e. may be more typical. Mean annual U.S. household income in 2004 is $60,500. Median is $43,300. The millionaires pull the mean higher than the median. Also the median can be computed in some situations where the mean cannot. Example: survival times. The median can be computed as soon as half the experimental units are dead. The mean needs all units dead.

13 / 24 Quartiles The median cuts the data in half; half the observations are smaller and half larger. If we look at the lower half of the data, the first quartile Q 1 cuts the lower half in two. The third quartile Q 3 cuts the upper half in two. Q 1, the median, and Q 3 cut the data into four parts with roughly equal numbers of observations. Q 1 is the median of the lower half; Q 3 is the median of the upper half.

14 / 24 Example 2.4.2 Pulses n = 12 college student pulses were measured (beats per minute) 62 64 68 70 70 74 74 76 76 78 78 80 Since n is even, the median is given by median = 74+74 2 = 74. Q 1 is the median of the lower half 62 64 68 70 70 74 Q 1 = 68+70 2 = 69. Q 3 is the median of the upper half Q 3 = 76+78 2 = 77. 74 76 76 78 78 80

15 / 24 The interquartile range The interquartile range, IQR, is IQR = Q 3 Q 1. For the pulse data, IQR = 77 69 = 8 bpm. The IQR gives the length of an interval containing the middle 50% of the data. It measures how spread out the data are. Half of the 12 students pulses lie in an interval of length 8 bpm.

16 / 24 Minimum, maximum, and five number summary The maximum of the sample, max, is the largest value. The minimum of the sample, min, is the smallest value. The five number summary is min, Q 1, median, Q 3, max. The range of the data is max-min. For the pulses, the five number summary is min = 62, Q 1 = 69, median = 74, Q 3 = 77, max = 80. The range of the pulses is 80 62 = 18 bpm.

17 / 24 Boxplots The five number summary can be placed on an x-axis to give a snapshot of the data. A boxplot simply places a box around Q 1 to Q 3 and draws lines or whiskers from Q 1 to the min, and Q 3 to the max. Gives a visual representation of a typical value (the median), the spread of the middle 50% (the box) and the spread of the whole data set (the whiskers) all at once.

18 / 24 Example 2.4.3: Radish growth The length (mm) of n = 14 radish shoots grown in total darkness over three days from seeds is The five number summary (p. 48) is min = 8, Q 1 = 15, median = 21, Q 3 = 30, max = 37.

19 / 24 Example 2.4.3: Radish growth Figure : Boxplot of radishes grown in darkness

20 / 24 Outliers & modified boxplots Outliers are observations that are really small or really large and far away from the bulk of the data. Many data sets do not have outliers; many do. We formally define an outlier to be any observation that is Smaller than Q 1 1.5 IQR, or larger than Q 3 + 1.5 IQR. These numbers are called the lower and upper fences. A modified boxplot plots outliers separately and only extends the whiskers as far out as the largest and smallest non-outlying observations. The default boxplot in R is a modified boxplot.

21 / 24 Example 2.4.5: Radish growth, constant light n = 14 radishes were also grown in constant light over three days. Their lengths are min 3 5 5 Q 1 7 7 8 9 ỹ = 9.5 10 10 10 Q 3 10 14 20 max 21 Compute IQR = 10 7 = 3. The lower fence is Q 1 1.5 IQR = 7 1.5(3) = 2.5. The upper fence is Q 3 + 1.5 IQR = 10 + 1.5(3) = 14.5. There are no observations smaller than 2.5 but there are two larger than 14.5: 20 and 21. 20 and 21 are outliers

22 / 24 Modified boxplot for radishes grown in constant light Figure : Dotplot & boxplot of radishes grown in constant light

23 / 24 Example 2.4.1: Radish growth Figure : Radishes grown in constant light; boxplot and modified boxplot.

24 / 24 R code > r=c(3,5,5,7,7,8,9,10,10,10,10,14,20,21) > boxplot(r) 5 10 15 20