STOR 155 Introductory Statistics. Lecture 4: Displaying Distributions with Numbers (II)

Similar documents
Describing distributions with numbers

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Describing distributions with numbers

Chapter 2: Tools for Exploring Univariate Data

CHAPTER 2: Describing Distributions with Numbers

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Describing Distributions with Numbers

CHAPTER 1. Introduction

Math 14 Lecture Notes Ch Percentile

Describing Distributions

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Lecture 2 and Lecture 3

Elementary Statistics

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

The empirical ( ) rule

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

Chapter 1. Looking at Data

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Describing Center: Mean and Median Section 5.4

Chapter 1 - Lecture 3 Measures of Location

Describing Distributions With Numbers

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

Resistant Measure - A statistic that is not affected very much by extreme observations.

Sections 2.3 and 2.4

Chapters 1 & 2 Exam Review

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

STAT 200 Chapter 1 Looking at Data - Distributions

Determining the Spread of a Distribution

Determining the Spread of a Distribution

Chapter 1: Exploring Data

Section 3. Measures of Variation

Describing Distributions With Numbers Chapter 12

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Statistics I Chapter 2: Univariate data analysis

Chapter 1 Describing Data

Determining the Spread of a Distribution Variance & Standard Deviation

Math 140 Introductory Statistics

Math 140 Introductory Statistics

Measures of the Location of the Data

Statistics I Chapter 2: Univariate data analysis

Data Analysis and Statistical Methods Statistics 651

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Announcements. Lecture 1 - Data and Data Summaries. Data. Numerical Data. all variables. continuous discrete. Homework 1 - Out 1/15, due 1/22

Unit 2. Describing Data: Numerical

1.3.1 Measuring Center: The Mean

Chapter 5: Exploring Data: Distributions Lesson Plan

1.3: Describing Quantitative Data with Numbers

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

are the objects described by a set of data. They may be people, animals or things.

CIVL 7012/8012. Collection and Analysis of Information

Lecture 1: Descriptive Statistics

Quantitative Tools for Research

3.1 Measure of Center

Units. Exploratory Data Analysis. Variables. Student Data

1 Measures of the Center of a Distribution

MATH 1150 Chapter 2 Notation and Terminology

Exploratory data analysis: numerical summaries

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts.

Practice Questions for Exam 1

After completing this chapter, you should be able to:

STAT 155 Introductory Statistics. Lecture 6: The Normal Distributions (II)

Chapter 5: Exploring Data: Distributions Lesson Plan

Chapter2 Description of samples and populations. 2.1 Introduction.

P8130: Biostatistical Methods I

Lecture 12: Small Sample Intervals Based on a Normal Population Distribution

Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size. Arrange the values from smallest to largest:

Visualizing Data: Basic Plot Types

MAT Mathematics in Today's World

Example 2. Given the data below, complete the chart:

1. Exploratory Data Analysis

Full file at

MATH 10 INTRODUCTORY STATISTICS

Unit 2: Numerical Descriptive Measures

Statistical Concepts. Constructing a Trend Plot

3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability

2011 Pearson Education, Inc

AP Final Review II Exploring Data (20% 30%)

Range The range is the simplest of the three measures and is defined now.

CHAPTER 2 Description of Samples and Populations

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 3.1-1

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions

GRAPHS AND STATISTICS Central Tendency and Dispersion Common Core Standards

Selected Answers for Core Connections Algebra

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Business Statistics. Lecture 10: Course Review

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays

BNG 495 Capstone Design. Descriptive Statistics

Descriptive Univariate Statistics and Bivariate Correlation

2.1 Measures of Location (P.9-11)

Name SUMMARY/QUESTIONS TO ASK IN CLASS AP STATISTICS CHAPTER 1: NOTES CUES. 1. What is the difference between descriptive and inferential statistics?

Chapter 3. Measuring data

Section 2.4. Measuring Spread. How Can We Describe the Spread of Quantitative Data? Review: Central Measures

Transcription:

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL STOR 155 Introductory Statistics Lecture 4: Displaying Distributions with Numbers (II) 9/8/09 Lecture 4 1

Numerical Summary for Distributions Center Mean Median Mode Spread Quartiles, IQR, Five-number summary and Boxplot Standard Deviation (starting from page14) 9/8/09 Lecture 4 2

Examples: 2004 Two-Seater Cars Highway mileages of the 21 two-seater cars: 13 15 16 16 17 19 20 22 23 23 23 24 25 25 26 28 28 28 29 32 66 Q1 =18 Q3 =28 IQR = Q3 Q1=10 1.5*IQR=15 Q3+1.5*IQR=43 Q1-1.5*IQR=3 66 is a suspected outlier. 9/8/09 Lecture 4 3

The five-number summary To get a quick summary of both center and spread, use the following five-number summary: Minimum Q1 M Q3 Maximum 9/8/09 Lecture 4 4

Example: HWY Gas Mileage of 2004 Two-seater/Mini Cars Two-seater Five-number summary: 13, 18, 23, 27, 32 Mini-compact Five-number summary: 19, 23, 26, 29, 32 9/8/09 Lecture 4 5

Boxplots a visual representation of the five-number summary. A boxplot consists of A central box spans the quartiles Q1 and Q3. A line inside the box marks the median M. Lines extend from the box out to the smallest and largest observations. 9/8/09 Lecture 4 6

Boxplots of highway/city gas mileages (Two-seaters/minicompacts) 9/8/09 Lecture 4 7

Pros and cons of Boxplots Location of the median line in the box indicates symmetry/asymmetry. Best used for side-by-side comparison of more than one distribution at a glance. Less detailed than histograms or stem plots. The box focuses attention on the central half of the data. 9/8/09 Lecture 4 8

Income for different Education Level 9/8/09 Lecture 4 9

Modified Boxplot The current boxplot can not reveal those possible outliers. To modify it, the two lines extend out from the central box only to the smallest and largest observations that are not suspected outliers. Observations more than 1.5*IQR outside the box are plotted as individual points. 9/8/09 Lecture 4 10

Call length (seconds) 9/8/09 Lecture 4 11

HG for count in a given time interval 9/8/09 Lecture 4 12

9/8/09 Lecture 4 13

9/8/09 Lecture 4 14 Sample Variance s 2 Deviation from mean: :the difference between an observation and the sample mean: Sample Variance s 2 : the average of squares of the deviations of the observations from their mean x i x 1 ) ( 1 ) (... ) ( ) ( 1 2 2 2 2 2 1 2 n x x n x x x x x x s n i i n

Sample Standard Deviation s Sample Standard Deviation s: the square root of the sample variance s n i1 ( x i n 1 x) 2 9/8/09 Lecture 4 15

Toy Examples Data: -2, -1, 0, 1, 2 What is the sample variance and the standard deviation? How about this? 40, 40, 40, 40, 40 9/8/09 Lecture 4 16

Remarks on the definition of Standard Deviation (S.D.) The sum of the deviations of the obs from their mean is always 0. Why square the deviations rather than absolute deviations? Mean is a natural center under the squaring. S.D. is a natural measure of spread for the normal distributions. 9/8/09 Lecture 4 17

Remarks on S.D. Why S.D. rather than variance? S.D. is natural for measuring spread for normal dist. S.D. is in the original scale. Why n-1 rather than n? Intuitively speaking, S.D. is not defined for n=1. Sum of deviations is always 0, which means if we know (n-1) of them, we know the last one. Only (n-1) deviations can change freely. n-1: degrees of freedom. 9/8/09 Lecture 4 18

Properties of the standard deviation (S.D.) s s measures the spread about the mean; s should be used only when the mean is chosen to measure the center; s=0 if and only if there is no spread; When? s>0 almost always, increases with more spread; s, like the mean, is not resistant, i.e. sensitive to outliers. 9/8/09 Lecture 4 19

Examples: 2004 Two-seater Cars Highway mileages of the 21 two-seater cars: 13 15 16 16 17 19 20 22 23 23 23 24 25 25 26 28 28 28 29 32 66 Gasoline-powered cars Mean: 22.6 S.D.=5.3 All cars Mean: 24.7 S.D.=10.8 9/8/09 Lecture 4 20

Three measures of spread The range is the spread of all the observations; The interquartile range is the spread of (roughly) the middle 50% of the observations; S.D. is a measure of the distance from sample mean. S.D. can be regarded as a typical distance of the observations from their mean. 9/8/09 Lecture 4 21

The five-number summary vs Mean and S.D. The five-number summary is preferred for a skewed distribution or a distribution with strong outliers. x and s are preferred for reasonably symmetric distributions that are free of outliers. Always plot your data first. Use boxplots. 9/8/09 Lecture 4 22

Changing the unit of measurement The same variable can be recorded in different units of measurement. Distance: Miles (US) vs Kilometers (Elsewhere) 1 mile = 1.6 km 1 km =? mile Temperature Fahrenheit (US) vs Celsius (Elsewhere) 0 F = -17.8 C 100 F = 37.8 C 212 F =100 C 9/8/09 Lecture 4 23

Boiled Billy An Australian student Billy has recently been on a trip to the States. Soon after he arrived there, he caught a cold and had a fever. He went to see Doctor Z. Doctor Z measured his body temperature and told Billy, Just relax! No big deal! It s only a little above 100 degree! 100!!!, Billy yelled, How can you say it s not a big deal? I am boiled 9/8/09 Lecture 4 24

Linear Transformation A linear transformation changes the original variable x into a new variable xnew according to the following equation, x new a bx. Temperature: Celsius vs Fahrenheit x x in Celsius, in Fahrenheit, x new new How about the inverse transformation? 9 32 x. 5 9/8/09 Lecture 4 25

Effects of Linear Transformation The shape of a distribution remains unchanged, except that the direction of the skewness might change. When? Measures of center and spread change. Multiplying each obs by a positive number b multiplies both measures of center and spread by b; Adding the same number a to each obs adds a to measures of center and to percentiles, but does not change measures of spread. 9/8/09 Lecture 4 26

Example: Salary Raise A sample was taken of the salaries of 20 employees of a large company. Suppose everyone will receive a $3000 increase, then how will the standard deviation of the salaries change? How about the mean? How about the median? How about Q1 and Q3? 9/8/09 Lecture 4 27

What is the effect of x a bx? new mean of X new = a + b (mean of X) median of X new = a + b (median of X) SD of X new = b (SD of X) Variance of X new = b^2 (Variance of X) IQR of X new = b (IQR of X) 9/8/09 Lecture 4 28

Take Home Message Boxplot, modified boxplot, side-by-side boxplot Sample variance and sample standard deviation Remarks on the definition of S.D. Why squaring? Why S.D. instead of Variance? Why n-1? Properties of sample standard deviation 3 measures of spread Five-number summary vs Mean and S.D. Linear transformation and its effects on shape, center and spread 9/8/09 Lecture 4 29