Describing Distributions With Numbers Chapter 12

Similar documents
Describing Distributions With Numbers

Describing Distributions

P8130: Biostatistical Methods I

Section 2.3: One Quantitative Variable: Measures of Spread

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Chapter 3. Measuring data

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Chapter 1 - Lecture 3 Measures of Location

The Normal Distribution. Chapter 6

CHAPTER 2: Describing Distributions with Numbers

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Elementary Statistics

Chapter 1: Exploring Data

Describing distributions with numbers

Unit 2. Describing Data: Numerical

Chapter 5. Understanding and Comparing. Distributions

Chapter 1. Looking at Data

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

STAT 200 Chapter 1 Looking at Data - Distributions

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

1. Exploratory Data Analysis

Finding Quartiles. . Q1 is the median of the lower half of the data. Q3 is the median of the upper half of the data

Describing distributions with numbers

Chapter 6 Group Activity - SOLUTIONS

Week 1: Intro to R and EDA

Chapters 1 & 2 Exam Review

Chapter 1:Descriptive statistics

Descriptive Statistics-I. Dr Mahmoud Alhussami

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

BNG 495 Capstone Design. Descriptive Statistics

Measures of the Location of the Data

Statistics and parameters

Performance of fourth-grade students on an agility test

Lecture 2 and Lecture 3

Chapter 2: Tools for Exploring Univariate Data

Data: the pieces of information that have been observed and recorded, from an experiment or a survey

Descriptive Univariate Statistics and Bivariate Correlation

Stat 20: Intro to Probability and Statistics

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

Section 3.2 Measures of Central Tendency

Unit 2: Numerical Descriptive Measures

MATH 117 Statistical Methods for Management I Chapter Three

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

Percentile: Formula: To find the percentile rank of a score, x, out of a set of n scores, where x is included:

1.3: Describing Quantitative Data with Numbers

The empirical ( ) rule

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

Descriptive statistics

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Example 2. Given the data below, complete the chart:

Descriptive Data Summarization

STOR 155 Introductory Statistics. Lecture 4: Displaying Distributions with Numbers (II)

GRAPHS AND STATISTICS Central Tendency and Dispersion Common Core Standards

Describing Data: Two Variables

Chapter 3: The Normal Distributions

Revision Topic 13: Statistics 1

Lecture 1 : Basic Statistical Measures

Chapter 5: Exploring Data: Distributions Lesson Plan

Continuous random variables

MEASURING THE SPREAD OF DATA: 6F

Introduction to Statistics for Traffic Crash Reconstruction

3.1 Measure of Center

Math 140 Introductory Statistics

Math 140 Introductory Statistics

A C E. Answers Investigation 4. Applications

1 Measures of the Center of a Distribution

Statistics I Chapter 2: Univariate data analysis

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Lecture 2. Descriptive Statistics: Measures of Center

MgtOp 215 Chapter 3 Dr. Ahn

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

Math 2311 Sections 4.1, 4.2 and 4.3

CHAPTER 1 Exploring Data

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

M 225 Test 1 B Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 75

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation

Statistics I Chapter 2: Univariate data analysis

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008

Stat 101 Exam 1 Important Formulas and Concepts 1

6.1 Normal Distribution

Chapter 4.notebook. August 30, 2017

MATH4427 Notebook 4 Fall Semester 2017/2018

Stat Lecture Slides Exploring Numerical Data. Yibi Huang Department of Statistics University of Chicago

Section 3. Measures of Variation

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Determining the Spread of a Distribution Variance & Standard Deviation

2011 Pearson Education, Inc

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22

Chapter 4. Displaying and Summarizing. Quantitative Data

CHAPTER 1. Introduction

AMS 5 NUMERICAL DESCRIPTIVE METHODS

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Describing Data with Numerical Measures

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size. Arrange the values from smallest to largest:

CHAPTER 1 Univariate data

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts.

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

Transcription:

Describing Distributions With Numbers Chapter 12 May 1, 2013 What Do We Usually Summarize? Measures of Center. Percentiles. Measures of Spread. A Summary.

1.0 What Do We Usually Summarize? source: Prof. Morita in STAT 221 For one Quantitative Variable (a) Center Value The center of the distribution. ``typical value in a certain sense'' (b) Spread of distribution How variable are the values from one another? Measure how many/what proportion of values are above / below a given value.

2.0 Measures of Center Here are current G.P.A.s of 15 students from one section. 3.2 3.7 3.6 3.7 3.3 3.3 3.8 3.2 3.0 3.5 2.5 3.3 3.5 2.3 3.5

2.1 The Arithmetic Mean The mean x (x-bar) of a set of observations is their average. To find the mean of n observations, add the values and divide by n. x = sum of observations n For the G.P.A. data calculate the arithmetic mean of the G.P.A. scores. 3.2 + 3.7 + 3.6 + + 3.5 x =, 15 = 49.4 15, = 3.29. The mean has the same units as the data points.

2.2 Example Averages can summarize large quantities of data effectively.

2.3 Example The figure below shows age-specific average diastolic blood pressure for men age 20 and over in HANES (2003-2004). True or false? As men age, their diastolic BP increases until age 45 or so and then decreases. If false, how do you explain the pattern?

2.4 The Median The median is the mid-point of the distribution, the number such that (at least) half of the observations are at the median or bigger and (at least) half are at the median or smaller. For the G.P.A. data calculate the median of the G.P.A. scores: 1. Order the observations from smallest to largest: 2.3 2.5 3.0 3.2 3.2 3.3 3.3 3.3 3.5 3.5 3.5 3.6 3.7 3.7 3.8 2. Find the data point that has at least 7.5 observations above and below it. The median has the same units as the data points.

2.5 Means, Medians and Histograms List: 1, 2, 2, 3 50% median mean 0% 1 2 3 4 5 6 7 50% List: 1, 2, 2, 5 0% 1 2 3 4 5 6 7 50% List: 1, 2, 2, 7 0% 1 2 4 5 6 7

2.6 The Mode The mode is the most frequently occurring value in the data set. Here are G.P.A.s of 15 students from one section. 3.2 3.7 3.6 3.7 3.3 3.3 3.8 3.2 3.0 3.5 2.5 3.3 3.5 2.3 3.5 What is the mode? The mode has the same units as the data points.

3.0 Percentiles Definition The cth percentile of a distribution is defined so that (at least) c% of the observations are at or below it and (at least) (100-c)% of the observations are at or above it. Ex. SAT Score: If you scored in the 83rd percentile, this means? The median is the 50th percentile of a distribution. The 25th percentile of a distribution is called the lower quartile Q1. The 75th percentile of a distribution is called the upper quartile Q3.

3.1 Calculating Percentiles Back to the G.P.A. 2.3 2.5 3.0 3.2 3.2 3.3 3.3 3.3 3.5 3.5 3.5 3.6 3.7 3.7 3.8 Median = 3.3 (shown in box). Q1 =? Q3 =?

3.2 The Five Number Summary Definition The five number summary of a distribution consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest. These five numbers offer a complete summary of a distribution. It is typically represented as a box-and-whisker plot.

3.3 The Box-and-Whisker Plot X A central box spans the quartiles. A line in the box marks the median. Sometimes the mean is marked by a cross. Lines extend from the box to the smallest and largest observation. Or they can extend to some other percentile (say 2.5th and 95th).

3.4 Learning from Box Plots source: W. Gray Number of Hurricanes in in wet and dry years in W. Africa 14 hurricanes 12 10 8 6 4 x x Compare the medians. Assess skewness. Assess spread of middle 50% of data. 2 0 dry west.africa wet Does the data support the hypothesis that wet years tend to have more hurricanes?

4.0 Measures of Spread Range = Maximum - Minimum. Inter-quartile range (I.Q.R.) = Q3 - Q1. The I.Q.R. is the range of the middle 50% of a distribution. Some people call a data point an outlier if it is more than 1.5 times I.Q.R below Q1 or above Q3. 1.5 I.Q.R. rule Standard Deviation (S.D.) Definition The standard deviation measures the average distance (or deviation) of the observations from their arithmetic mean.

4.1 Calculating Standard Deviations Find the S.D. for this list of numbers: 2, -6, 12, 4, 3. Step 1: Find the average for the list of numbers. The answer is 3. Step 2: Find the deviation of each value from this average: -1, -9, 9, 1, 0. Step 3: The S.D. tells the average size of a deviation. Step 3.1: Square each deviation: 1, 81, 81, 1, 0. square Step 3.2: Calculate the average of this list but dividing by (n 1) instead of n: The answer is 41. mean Step 3.3: Take the square-root of 41. The answer is 6.4. root The standard deviation is 6.4. has the same units as the list of numbers

4.2 Interpreting Standard Deviations The standard deviation (S.D.) says how far numbers on a list are from their average (or mean). A majority (about 50% or more) of entries will be somewhere around one S.D. from the average. Very few will be more than two or three S.D.s away. Majority of observations Ave- 1 S.D Ave Ave+ 1 S.D. Almost all the observations Ave-2 S.D.s Ave Ave+ 2 S.D.s

4.3 Guesstimating Standard Deviations Each of the following lists has an average of 50. For which one is the standard deviation the biggest? smallest? 1. 0, 20, 40, 50, 60, 80, 100. 2. 0, 48, 49, 50, 51, 52, 100. 3. 0, 1, 2, 50, 98, 99, 100.

4.4 Example Below are sketches of histograms for three lists of numbers. Match the sketch with the description that fits. (i) ave 3.5, S.D. 1 (ii) ave 3.5, S.D. 0.5 iii) ave 3.5, S.D. 2 (iv) ave 2.5, S.D. 1 (v) ave 2.5, S.D. 0.5 (vi) ave 4.5, S.D. 0.5 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 (a) (b) (c)

4.5 Example Household size in the U.S. has a mean of 2.5 people approximately. Which of these numbers would be a good guess for the standard deviation? 0.14, 1.4 or 2?

4.6 Example The Public Health Service found that for boys age 11 in HANES2, the average height was 146 cm and the SD was 8 cm. 1. One boy was 170cm tall. He was above average by SDs. 2. If a boy was within 2 SDs of average height, the shortest he could have been is cm and the tallest is cm. 3. Here are the heights of 3 boys: 150cm, 130 cm and 165cm. Match the heights with the descriptions. A description may be used twice. unusually short about average unusually tall

4.7 A Quick and Dirty Calculation Consider a list with only two different numbers, a big one and a small one. (Each number can be repeated many times). In this case, the S.D. can be estimated using: ( big number small number ) fraction with fraction with big number small number. Find the S.D. of the list of numbers: 1, -2, -2. Find the S.D. of the list of numbers: -1, -1, -1, 1. Can you use the short cut to calculate the standard deviation of the list: 1, 2, 3, 4?

4.8 A Summary To report the average and standard deviation, use the following language: In our data, the variable name here tends to be around average here, give or take standard deviation here. Advice Use means and standard deviations to summarize distributions that are roughly symmetric and with no outliers. Use the five number summary otherwise.