Chapter 1 Descriptive Statistics

Similar documents
Lecture 1: Descriptive Statistics

BNG 495 Capstone Design. Descriptive Statistics

Introduction to Probability and Statistics Slides 1 Chapter 1

Chapter 4. Displaying and Summarizing. Quantitative Data

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Descriptive Univariate Statistics and Bivariate Correlation

Chapter 1 - Lecture 3 Measures of Location

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

1. Exploratory Data Analysis

Chapter 1. Looking at Data

Description of Samples and Populations

Units. Exploratory Data Analysis. Variables. Student Data

Week 1: Intro to R and EDA

Chapter 3. Data Description

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Chapter 2 Descriptive Statistics

1 Measures of the Center of a Distribution

are the objects described by a set of data. They may be people, animals or things.

STAT 200 Chapter 1 Looking at Data - Distributions

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

STT 315 This lecture is based on Chapter 2 of the textbook.

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Chapter 1: Introduction. Material from Devore s book (Ed 8), and Cengagebrain.com

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

Lecture 2 and Lecture 3

Describing Distributions with Numbers

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

Describing distributions with numbers

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

Histograms allow a visual interpretation

P8130: Biostatistical Methods I

Chapter 1: Introduction. Material from Devore s book (Ed 8), and Cengagebrain.com

Chapter2 Description of samples and populations. 2.1 Introduction.

CHAPTER 1. Introduction

STATISTICS 1 REVISION NOTES

CHAPTER 2: Describing Distributions with Numbers

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Exploratory data analysis

MAT Mathematics in Today's World

TOPIC: Descriptive Statistics Single Variable

Descriptive Data Summarization

Descriptive Statistics

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS

Chapter 2: Tools for Exploring Univariate Data

Introduction to Statistics

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

8/4/2009. Describing Data with Graphs

2.1 Measures of Location (P.9-11)

Learning Objectives for Stat 225

Determining the Spread of a Distribution

Describing distributions with numbers

Determining the Spread of a Distribution

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

3 GRAPHICAL DISPLAYS OF DATA

Elementary Statistics

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

AP Final Review II Exploring Data (20% 30%)

Chapter 5: Exploring Data: Distributions Lesson Plan

Stat 20: Intro to Probability and Statistics

Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

CS 147: Computer Systems Performance Analysis

3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability

Glossary for the Triola Statistics Series

Finding Quartiles. . Q1 is the median of the lower half of the data. Q3 is the median of the upper half of the data

The Normal Distribution. Chapter 6

Example 2. Given the data below, complete the chart:

Unit 2: Numerical Descriptive Measures

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS

Chapter 1: Exploring Data

Lecture 1: Description of Data. Readings: Sections 1.2,

Statistics. Nicodème Paul Faculté de médecine, Université de Strasbourg. 9/5/2018 Statistics

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

Section 3. Measures of Variation

1.3: Describing Quantitative Data with Numbers

Psych Jan. 5, 2005

Chapter 6 Group Activity - SOLUTIONS

Statistical Concepts. Constructing a Trend Plot

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

BIOS 2041: Introduction to Statistical Methods

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Lecture 12: Small Sample Intervals Based on a Normal Population Distribution

Continuous random variables

MATH 1150 Chapter 2 Notation and Terminology

Stat Lecture Slides Exploring Numerical Data. Yibi Huang Department of Statistics University of Chicago

Homework Example Chapter 1 Similar to Problem #14

1.3.1 Measuring Center: The Mean

Introduction to Statistics

Chapter 3. Measuring data

STOR 155 Introductory Statistics. Lecture 4: Displaying Distributions with Numbers (II)

MATH 117 Statistical Methods for Management I Chapter Three

REVIEW: Midterm Exam. Spring 2012

Resistant Measure - A statistic that is not affected very much by extreme observations.

Chapter 3 Statistics for Describing, Exploring, and Comparing Data. Section 3-1: Overview. 3-2 Measures of Center. Definition. Key Concept.

Psychology 310 Exam1 FormA Student Name:

Chapter 5. Understanding and Comparing. Distributions

Transcription:

MICHIGAN STATE UNIVERSITY STT 351 SECTION 2 FALL 2008 LECTURE NOTES Chapter 1 Descriptive Statistics Nao Mimoto Contents 1 Overview 2 2 Pictorial Methods in Descriptive Statistics 3 2.1 Different Kinds of Plots............................ 3 2.2 How to draw Stem-and-leaf plot, dot plot and histogram.......... 6 2.3 Shapes of histogram.............................. 10 3 Measures of Location 11 3.1 Mean and Median................................ 11 3.2 Quantiles, percentiles............................. 13 4 Measures of Variability 14 4.1 Sample Variance................................ 14 4.2 Five number summary and Boxplots..................... 16 1

Lecture notes for Devore 7ed. Chapter 1 2 1 Overview Population: our body of interest. Sample: a subset of population chosen in some ramdom manner. Data: Collection of facts, numbers, and measurements. Univariate, bivariate, and multivariate data. Discrete, and continuous variable Inferential Statistics: generalizes the information gained from a sample to a population. Descriptive Statistics: Summarize and describe important feasure of the data. Stem-and-leaf plot Dotplot Scatter plot Histograms Boxplots Mean Median Quantiles, percentiles, trimmed means Outlier Sample variance

Lecture notes for Devore 7ed. Chapter 1 3 2 Pictorial Methods in Descriptive Statistics 2.1 Different Kinds of Plots (Example 1.2 from p.5) Material strength investigations. Flexural strength of high performance concrete (in MegaPascal) 5.9 7.2 7.3 6.3 8.1 6.8 7.0 7.6 6.8 6.5 7.0 6.3 7.9 9.0 8.2 8.7 7.8 9.7 7.4 7.7 9.7 7.8 7.7 11.6 11.3 11.8 10.7 Population: Sample: Data: Univariate, continuous variable. Stem-and-leaf plot 5 9 6 33588 7 00234677889 8 127 9 077 10 7 11 368 Dotplot

Lecture notes for Devore 7ed. Chapter 1 4 Scatter plot (Chapter 12) Histogram

Lecture notes for Devore 7ed. Chapter 1 5 Box Plot

Lecture notes for Devore 7ed. Chapter 1 6 2.2 How to draw Stem-and-leaf plot, dot plot and histogram Raw Data: 5.9 7.2 7.3 6.3 8.1 6.8 7.0 7.6 6.8 6.5 7.0 6.3 7.9 9.0 8.2 8.7 7.8 9.7 7.4 7.7 9.7 7.8 7.7 11.6 11.3 11.8 10.7 1. sort the data Sorted Data: 5.9 6.3 6.3 6.5 6.8 6.8 7.0 7.0 7.2 7.3 7.4 7.6 7.7 7.7 7.8 7.8 7.9 8.1 8.2 8.7 9.0 9.7 9.7 10.7 11.3 11.6 11.8 2. Decide on class intervals (bin width), group accordingly 5.9 6.3 6.3 6.5 6.8 6.8 7.0 7.0 7.2 7.3 7.4 7.6 7.7 7.7 7.8 7.8 7.9 8.1 8.2 8.7 9.0 9.7 9.7 10.7 11.3 11.6 11.8 3. Format them 5 9 6 33588 7 00234677889 8 127 9 077 10 7 11 368 (idea is the same in dot plot or histogram)

Lecture notes for Devore 7ed. Chapter 1 7 Example (Problem 1-13 on p.20) Tensile ultimate strength (ksi) of metallic aerospace vehicles 122.2 124.2 124.3 125.6 126.3 126.5 126.5 127.2 127.3 127.5 127.9 128.6 128.8 129.0 129.2 129.4 129.6 130.2 130.4 130.8 131.8 131.4 131.4 131.5 131.6 131.6 131.8 131.8 132.3 132.4 132.4 132.5 132.5 132.5 132.5 132.6 132.7 132.9 133.0 133.1 133.1 133.1 133.1 133.2 133.2 133.2 133.3 133.3 133.5 133.5 133.5 133.8 133.9 134.0 134.0 134.0 134.0 134.1 134.2 134.3 134.4 134.4 134.6 134.7 134.7 134.7 134.8 134.8 134.8 134.9 134.9 135.2 135.2 135.2 135.3 135.3 135.4 135.5 135.5 135.6 135.6 135.7 135.8 135.8 135.8 135.8 135.8 135.9 135.9 135.9 135.9 136.0 136.0 136.1 136.2 136.2 136.3 136.4 136.4 136.6 136.8 136.9 136.9 137.0 137.1 137.2 137.6 137.6 137.8 137.8 137.8 137.9 137.9 138.2 138.2 138.3 138.3 138.4 138.4 138.4 138.5 138.5 138.6 138.7 138.7 139.0 139.1 139.5 139.6 139.8 139.8 140.0 140.0 140.7 140.7 140.9 140.9 141.2 141.4 141.5 141.6 142.9 143.4 143.5 143.6 143.8 143.8 143.9 144.1 144.5 144.5 147.7 147.7 1. Sort the data: This data is already sorted. 2. Decide on bin width, group accordingly: Let s say we devide 122 to 148 with equal bin width of 2. We have 13 intervals. To get relative frequency, we devide frequency by total number of observations 153. relative frequency = frequency total number of observations class intervals frequency relative frequency 122 - <124 1 0.0065 124 - <126 3 0.0196 126 - <128 7 0.0458 128 - <130 6 0.0392 130 - <132 11 0.0719 132 - <134 29 0.1895 134 - <136 36 0.2353 136 - <138 20 0.1307 138 - <140 20 0.1307 140 - <142 8 0.0523 142 - <144 7 0.0458 144 - <146 3 0.0196 146 - <148 2 0.0131

Lecture notes for Devore 7ed. Chapter 1 8 3. Format them: You can draw your histogram using frequency or relative frequency. Below two historams are the same except the scale on Y-axis.

Lecture notes for Devore 7ed. Chapter 1 9 4. If you use different class intervals: All these three histograms are drawn using same data. Note how the choice of class interval affects the shape of histogram.

Lecture notes for Devore 7ed. Chapter 1 10 2.3 Shapes of histogram There are names to describe the general shape of histogram. unimodal, multimodal, symmetric, positively skewed, nevgatively skewed.

Lecture notes for Devore 7ed. Chapter 1 11 3 Measures of Location 3.1 Mean and Median For a sample of size n, x 1, x 2, x 3,..., x n, we wish to represent location of the data by one simple numbers. We can use sample mean, which is just an average of the observations; n i=1 x = x i. n Or sample median, which is a middle guy in the observations; { n+1 th ordred observations if n is odd 2 x = average of n th and n + 1 th ordered observations if n is even 2 2 That is, if n = 9, then x is the 5th ordred observations. if n = 10, then the sample median is an average over 5th and 6th orderd observation. Why there s mean and median? One reason is that mean is very sensitive to outliers. In other words, by just one big number can change mean by a lot. On the other hand, median is insensitive to outliers. Another reason is that mean sometime is not good measure of average or middle observation in the data. Below is a example of that. Example (Problem 1-27 on p.24) Study on the life distribution of microdrills. Number of holes that a drill machines before it breaks. 11 14 20 23 31 36 39 44 47 50 59 61 65 67 68 71 74 76 78 79 81 84 85 89 91 93 96 99 101 104 105 105 112 118 123 136 139 141 148 158 161 168 184 206 248 263 289 322 388 513 So we have n = 50 x = 119.26 x = average of 25th and 26th ordered observation = (91 + 93)/2 = 92 1. Mean is sensitive to outlier Imagine somebody typed 5013 instead of 513 by mistake. Now your mean is 209.26, but the median remains unchanged. 2. Mean is not always an average guy In some cases, it may be somewhat misleading to use mean as your average number

Lecture notes for Devore 7ed. Chapter 1 12 to represent your data. According to our data, only 16 drills out of 50 drilled more than the mean of 119.26 holes. On the other hand, by definition, half of our sampled drills machined less than the median of 92 holes, and half of them drilled more than 92.

Lecture notes for Devore 7ed. Chapter 1 13 3.2 Quantiles, percentiles 1st quantile is a median of smaller half. Include median in the half if n is odd. 1st quantile is also called lower fourth, or 25th percentile. 2nd quantile is same as median. Median is also called 50th percentile. 3rd quantile is a median of larger half of data. Include median to the half if n is odd. 3rd quantile is also called upper fourth, or 75th percentile. Example If data looks like 1 2 3 4 5 6 7 8 9 10 11 12, with 12 observations, the median is 6.5. Now we break the data into two halves and get 1 2 3 4 5 6 }{{} smaller half } 7 8 9 10 {{ 11 12}. larger half 1st quartile is a median of the smaller half, which is 3.5. 3rd quartile is a median of the larger half, which is 9.5. Example If data looks like 1 2 3 4 5 6 7 8 9 10 11 12 13, With 13 observations, the median is 7. Since we have odd number of observations, we include 7 in both smaller half and larger half. 1 2 3 4 5 6 7 }{{} smaller half 7 8 9 10 11 12 13 }{{} larger half Then, 1st quartile is a median of the smaller half, which is 4. 3rd quartile is a median of the larger half, which is 10.

Lecture notes for Devore 7ed. Chapter 1 14 4 Measures of Variability 4.1 Sample Variance Now we wish to represent the spread or variability of data by a number. To do that we use sample vaiance, n s 2 i=1 = (x i x) 2 n 1 Numerator is called sum of squared deviations, = S xx n 1 S xx = n (x i x) 2. i=1 Notice that we are dividing by n 1 instead of n. Sample standard deviation is defined as s = s 2. Example i x i x i x (x i x) 2 1 87-26.25 689.06 2 103-10.25 105.06 3 130 16.75 280.56 4 160 46.75 2185.56 5 129 15.75 248.06 6 105-8.25 68.06 7 99-14.25 203.06 8 93-20.25 410.06 x = 113.25 n i=1 (x i x) 2 = 4189.5 In this case n = 8. Therefore, the sample variance and sample standard deviation are s 2 = 4189.5 8 1 = 598.5 s = 598.5 = 24.464

Lecture notes for Devore 7ed. Chapter 1 15 There s another formula for S xx that is easier to compute if you are using hand-held calculators. Example S xx = n (x i x) 2 = i=1 n i=1 x 2 i ( n i=1 x i) 2 n i x i x 2 i 1 87 7569 2 103 10609 3 130 16900 4 160 25600 5 129 16641 6 105 11025 7 99 9801 8 93 8649 n i=1 x i= 906 n i=1 x2 i = 106794 The sum of squared deviations can be calculated as S xx = n i=1 x 2 i ( n i=1 x i) 2 n = 106794 (906)2 8 = 4189.5. Therefore, the sample variance and sample standard deviation are s 2 = 4189.5 8 1 = 598.5 s = 598.5 = 24.464

Lecture notes for Devore 7ed. Chapter 1 16 4.2 Five number summary and Boxplots Boxplot is another way to pictorially summarise data. Boxplot is drawn using five number summary. Five number summary is consisted of minimum observation, lower fourth, median, upper fourth, Maximum observation Example (Problem 1-54 on p.40) Shear strength(mpa) of a joint 4.4 16.4 22.2 30.0 33.1 36.6 40.4 66.7 73.7 81.5 109.9 There are 11 observations. Minimum is 4.4. Maximum is 109.9. Median is 36.6. Lower fourth is median of smaller half, {4.4 16.4 22.2 30.0 33.1 36.6} so it s (22.2+30.0)/2 = 26.1 Upper fourth is median of larger half, {36.6 40.4 66.7 73.7 81.5 109.9} so it s (66.7+73.7)/2 = 70.2 So our five number summary looks like Min lower fourth median upper fourth Max 4.4 26.1 36.6 70.2 109.9 The box width f x is defined as f x = upper fourth lower fourth. Now we can draw our boxplot using those five numbers.

Lecture notes for Devore 7ed. Chapter 1 17 Boxplot with outliers Observations farther than 1.5 box width away from the closest fourth is an outlier. If it is more than 3 box width away from the nearest fourth, it s called extreme outlier. Otherwise it is called an mild outlier. Example (Ex. 1.14 on p.28) 2.0 2.4 2.5 2.6 2.7 2.7 2.8 3.0 3.1 3.2 3.3 3.3 3.4 3.4 3.6 3.6 3.6 3.7 4.4 4.6 4.7 4.8 5.3 10.1 We have 24 observations. Mean is 3.7. Min lower fourth median upper fourth Max 2.000 2.775 3.350 3.875 10.100