Types of Information. Topic 2 - Descriptive Statistics. Examples. Sample and Sample Size. Background Reading. Variables classified as STAT 511

Similar documents
Chapter 3. Data Description

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

STAT 200 Chapter 1 Looking at Data - Distributions

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Chapter 4. Displaying and Summarizing. Quantitative Data

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

STT 315 This lecture is based on Chapter 2 of the textbook.

Lecture 1: Descriptive Statistics

Elementary Statistics

A is one of the categories into which qualitative data can be classified.

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

Histograms allow a visual interpretation

Chapter 2: Tools for Exploring Univariate Data

Week 1: Intro to R and EDA

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Chapter 7: Statistics Describing Data. Chapter 7: Statistics Describing Data 1 / 27

Chapter 1. Looking at Data

CIVL 7012/8012. Collection and Analysis of Information

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

Example 2. Given the data below, complete the chart:

Descriptive Statistics

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Introduction to Probability and Statistics Slides 1 Chapter 1

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

Math 140 Introductory Statistics

Math 140 Introductory Statistics

1. Exploratory Data Analysis

Introduction to Statistics

Units. Exploratory Data Analysis. Variables. Student Data

Chapter2 Description of samples and populations. 2.1 Introduction.

CHAPTER 1. Introduction

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

TOPIC: Descriptive Statistics Single Variable

P8130: Biostatistical Methods I

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

ECLT 5810 Data Preprocessing. Prof. Wai Lam

are the objects described by a set of data. They may be people, animals or things.

Performance of fourth-grade students on an agility test

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data

Unit 2. Describing Data: Numerical

University of Jordan Fall 2009/2010 Department of Mathematics

Descriptive Data Summarization

Describing distributions with numbers

Math 082 Final Examination Review

Chapter 2 Class Notes Sample & Population Descriptions Classifying variables

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

Describing distributions with numbers

ORGANIZATION AND DESCRIPTION OF DATA

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

Statistics I Chapter 2: Univariate data analysis

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

MATH 1150 Chapter 2 Notation and Terminology

Descriptive Univariate Statistics and Bivariate Correlation

Describing Distributions with Numbers

Continuous random variables

2.1 Measures of Location (P.9-11)

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

CHAPTER 5: EXPLORING DATA DISTRIBUTIONS. Individuals are the objects described by a set of data. These individuals may be people, animals or things.

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays

Statistics I Chapter 2: Univariate data analysis

MATH 10 INTRODUCTORY STATISTICS

Unit 2: Numerical Descriptive Measures

STATISTICS 1 REVISION NOTES

Chapter 1 - Lecture 3 Measures of Location

Lecture 1: Description of Data. Readings: Sections 1.2,

Stat 101 Exam 1 Important Formulas and Concepts 1

AP Final Review II Exploring Data (20% 30%)

Sets and Set notation. Algebra 2 Unit 8 Notes

The empirical ( ) rule

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

Full file at

Social Studies 201 September 22, 2003 Histograms and Density

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS

Chapter 1: Exploring Data

Chapter 3 Examining Data

Vocabulary: Data About Us

1.0 Continuous Distributions. 5.0 Shapes of Distributions. 6.0 The Normal Curve. 7.0 Discrete Distributions. 8.0 Tolerances. 11.

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

Chapter 1: Introduction. Material from Devore s book (Ed 8), and Cengagebrain.com

Instructor: Doug Ensley Course: MAT Applied Statistics - Ensley

Glossary for the Triola Statistics Series

Introduction to Statistics

Math 221, REVIEW, Instructor: Susan Sun Nunamaker

Statistics in medicine

Statistics for Managers using Microsoft Excel 6 th Edition

1 Measures of the Center of a Distribution

Chapter 1:Descriptive statistics

MATH 117 Statistical Methods for Management I Chapter Three

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008

BNG 495 Capstone Design. Descriptive Statistics

Chapter 4.notebook. August 30, 2017

Transcription:

Topic 2 - Descriptive Statistics STAT 511 Professor Bruce Craig Types of Information Variables classified as Categorical (qualitative) - variable classifies individual into one of several groups or categories Ordinal - Letter grade, Fitness Level Not ordinal - Eye color, Types of Damage Quantitative - variable takes on numerical values for which arithmetic operations make sense Continuous - Height, Serum creatinine level Discrete - Number of defects or successes Background Reading Devore : Section 1.2-1.4 Can often convert ordinal to quantitative and vice versa 2 2-1 Examples Sample and Sample Size Will denote variables by upper case letters (X) Will denote observations by lower case letters (x) The number of accidents on Interstate 65 in the month of December was 10 Individual = Interstate 65 Variable Y = the number of accidents in Dec Observation y = 10 The following table (WSJ - 1997) summarizes a week s beer advertising on TV % Viewers Advertiser Show (Network) Date (Time) < 21 Coors Light Hit List (BET) Sept 2 (8:00 pm) 51 Molson Singled Out (MTV) Sept 2 (7:00 pm) 52 Molson Ice Beavis and Butthead (MTV) Sept 2 (11:30 pm) 48 Foster s Singled Out (MTV) Sept 3 (11:00 pm) 46 Molson Real World (MTV) Sept 3 (8:30 pm) 45 Foster s Melrose Place (E!) Sept 2 (7:00 pm) 41 Miller Unreal (BET) Sept 5 (8:00 pm) 65 Schlitz Yo MTV (MTV) Sept 5 (10:00 pm) 50 Molson Beavis and Butthead (MTV) Sept 6 (10:30 pm) 69 Budweiser Video Music Awards (MTV) Sept 7 (8:30 pm) 46 Describe variables from these examples 2-2 If data set collected under identical conditions or can be considered to be drawn from a population, the data set is called a sample The sample size represents the number of individuals in the sample. Usually represented by the n. Example: Twenty five rolls of a fair die Individual - Roll of die Variable - Face value of die Observation - 1,2,3,4,5, or 6 Sample size - n =25 2-3

Frequency Distributions Graphical summaries To understand data set, must first be able to explore and summarize the information Frequency distribution of variable describes possible values of variable and how frequently each value is an observation Distribution can be summarized in tabular or graphical form Provide visual display of distribution Allows one to examine shape of data Allows one to compare data sets Can check assumptions of statistical tests Easier to read than table or text summary 2-4 2-5 Graphs for a Categorical Variable Displays categories and frequencies (counts) Bar Chart (vertical) Categories listed along horizontal axis Bar extends vertically to represent count Pareto Chart Bar chart with categories ordered from most frequent to least frequent Example Complete the following table and construct graphs Type of Books Purchased at Bookstore Type Count Percent Textbooks 1200 Non-Fiction 100 Fiction Children s 100 Total 2000 Pie Chart Each pie section (wedge) represents count for specific category 2-6 2-7

Graphs for Quantitative Variable Could group obs into ordered categories Categories based on Cut-points of interest Scale of the data Categories often called classes Graph called a histogram Constructing a Histogram 1 Need to first specify non-overlapping classes Want all observations to appear in a class Can specify number of classes or class width Classes usually of equal size or width Range= Largest obs - Smallest obs Number of classes = Range / Class width Number of classes n Other descriptive graphs include Will specify class by [a, b) (a x<b) Dot plot Stem and Leaf diagram 2 Count the number of obs in each class 2-8 2-9 Generating the Histogram Example Types of histograms Frequency Height of bar = # of occurrences in [a, b) Relative Frequency or Percent Height of bar = frequency/n Cumulative Frequency Height of bar = # of occurrences <b Density Histogram Area of bar = relative frequency Height of bar = relative frequency / class width Appropriate for unequal class widths Allows comparison with specific distributions 2-10 Collect daily number of bike accidents over a three-week period requiring urgent care Week Mon Tue Wed Thu Fri Sat Sun 1 4 3 2 5 6 8 2 2 4 1 5 3 2 7 4 3 5 2 3 3 1 4 7 Decide the class width to be 2 accidents Smallest obs = 1 and Largest obs = 8 (8-1)/2 = 3.5 so we need 4 classes Class Freq Rel. Freq Cum. Freq Density 1-3 6.286 3-5 8.381 5-7 4.190 7-9 3.143 Total 21 100 2-11

Stem-and-Leaf Display Histogram which retains data values Breaks each obs into stem and leaf Stem - ten s digit & Leaf - one s digit Stem - one s digit & Leaf - tenth s digit Can break stem down more if desired Bike Accident Example Stem Leaves 0 11222233334444 0 5556778 Stem-and-Leaf Display Can easily identify Typical value Gaps in the data Number and locations of peaks Presence of outlying values Extent of symmetry 2-12 2-13 Shape of Graphical Summary Examples Appropriate when variable ordered (quantitative) If only one mode or peak, defined as unimodal -6-4 -2 0 2 4 6-5 0 5 If two peaks defined as bimodal If similar on each side of middle, termed symmetric Skewed - one tail stretched out more than other -15-10 -5 0 5 10 15 0 5 10 15 2-14 2-15

Numerical Summaries Describe characteristics of distribution s shape Each summary known as a statistic Each statistic is also a variable Observed value depends on the sample Measures of center or location Examples: Mean and Median Measures of spread or dispersion Examples: Standard Deviation and Range 2-16 Mean Measures of Location Arithmetic average x = n i=1 x i/n Center of gravity or point of balance (x i x) =0 Median middle observation 50% of observations at or above median 50% of observations at or below median if n odd, median is the.5(n + 1) largest obs else, median is average of.5n and.5n+1 largest 2-17 Measures of Location Mode most frequent observation(s) Quantiles/Quartiles Divide data into groups using percentiles Quartiles divide data into four equal parts Trimmed Mean remove certain %-age of smallest/largest obs compute mean of remaining observations Bicycle Accident Data Ordered data set (smallest to largest) n = 21 observations 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 6 7 7 8 What are the measures of location? x = (1+1+2+... +7+8)/21 = 81/21 = 3.86 median: since n odd, middle obs (11th) = 4 mode: 2, 3 and 4 each occur 4 times decide on percentage prior to inspection of data 2-18 2-19

Visualizing Numerical Statistics Bicycle Accident Data If distribution symmetric, median = mean What if largest obs accidently recorded as 18? x = (1+1+2+...+7+18)/21 = 91/21 = 4.33 median: since n odd, middle obs (11th) = 4 mode: 2, 3 and 4 If distribution skewed, mean and median will be different Mean pulled more towards the longer tail Comparison of Measures of Center Resistance - insensitivity to changes in data set Mean more sensitive to extreme observations compared to median and trimmed mean Efficiency - ability to use all the information Median more resistant than mean Mean is more efficient 2-20 2-21 Measures of Spread Measures of Spread Range diff between the largest and smallest obs maximum - minimum Interquartile Range diff between the third and first quartiles Variance deviation is defined as x x mean of deviations is always zero compute average of squared deviations commonly divide by n 1 instead of n Standard deviation the square root of the variance measured in same units as observations spread of middle 50% of obs (x x) 2 Sxx s = = n 1 n 1 2-22 2-23

Bicycle Accident Data Ordered data set (smallest to largest) n = 21 observations 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 6 7 7 8 What are the measures of spread? range = 8 1=7 interquartile range = 5 2=3.25(21 + 1) = 5.5 Q 1 =(2+2)/2 =2.75(21 + 1) = 16.5 Q 3 =(5+5)/2 =5 Bicycle Accident Data Ordered data set (smallest to largest) n = 21 observations 1 1 2 2 2 2 3 3 3 3 4 4 4 4 5 5 5 6 7 7 8 What are the measures of spread? standard deviation: use computational formula x 2 ( x) 2 n S = n 1 x 2 =1+1+4+... + 64 = 391 s = (391 (81) 2 /21)/20 = 1.98 2-24 2-25 Another Example Compute the following numerical summaries Mean Median Range Standard Deviation IQR Life Expectancy in Specific Countries (source and year unknown) Life Country Expectancy Kenya 59 Japan 80 Singapore 76 Fiji 72 Germany 76 France 77 Switzerland 78 Taiwan 74 Canada 78 U.S. 77 New Zealand 76 Brunei 74 2-26 2-27

Box Plot Graphical summary of 5 statistics Modified Box Plot Minimum Maximum Lower whisker extends no further than 1st quartile Q 1 1.5IQR 3rd quartile Median Upper whisker extends no further than Box defined by quartiles (first and third) Q 3 +1.5IQR Median represented as line in box Observations outside fence displayed as dots Whiskers extended to min and max 2-28 2-29 Visualizing Statistics Examples Range roughly the width of histogram Common to look at # of obs within x ± ks For fairly symmetric unimodal distribution 68% of observation within ±s of the mean -6-4 -2 0 2 4 6-5 0 5 95% of observations within ±2s of the mean 99% of observations within ±3s of the mean Comparison of spread statistics Range and standard deviation very non-resistant Interquartile range resistant -15-10 -5 0 5 10 15 0 5 10 15 Standard deviation most efficient 2-30 2-31

Measure of Spread Relative to Location Coefficient of Variation Linear Transformation of a Variable Often spread increases with mean Spending relative to income Weight gain relative initial weight Response to dosage level Coefficient of Variation : CV = s/x s and x measured in same units CV is unit-less : ratio of spread:center Expresses std dev as percentage of mean Consider linear transformation of X: ax + b How do the numerical and graphical summaries change? Examples x x x (x x) 2 y y y (y y) 2 z z z (z z) 2 12-4 16 6-2 4 10-4 16 14-2 4 7-1 1 12-2 4 18 2 4 9 1 1 16 2 4 20 4 16 10 2 4 18 4 16 64 0 40 32 0 10 56 0 40 Y =.5X Z = X 2 y =8 x =16 z =14 s y = 10/3 s x = 40/3 s z = 40/3 Allows comparison of data sets with diff means 2-32 2-33 Changes? Linear Transformation of a Variable Measures of location Transform changes statistic just like variable Additive transformation (X Z) shifts each obs equal distance to left or right Y = ax + b Quartiles also change in similar manner distance between observations remains the same Multiplicative transformation (X Y ) distance between values increases or decreases change of scale Measures of spread Measure spread between observations Only multiplicative transform affects spread s Y = as X 2-34 2-35