Descriptive Statistics

Similar documents
Introduction to Basic Statistics Version 2

3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

BNG 495 Capstone Design. Descriptive Statistics

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

Notation Measures of Location Measures of Dispersion Standardization Proportions for Categorical Variables Measures of Association Outliers

Variables, distributions, and samples (cont.) Phil 12: Logic and Decision Making Fall 2010 UC San Diego 10/18/2010

Describing distributions with numbers

Sampling Distributions

Chapter 27 Summary Inferences for Regression

Chapter 3 Statistics for Describing, Exploring, and Comparing Data. Section 3-1: Overview. 3-2 Measures of Center. Definition. Key Concept.

Review of Statistics 101

STAT 1060: Lecture 6 Sampling

Statistical Modeling and Analysis of Scientific Inquiry: The Basics of Hypothesis Testing

L06. Chapter 6: Continuous Probability Distributions

Describing distributions with numbers

Descriptive Statistics-I. Dr Mahmoud Alhussami

Sets and Set notation. Algebra 2 Unit 8 Notes

MALLOY PSYCH 3000 MEAN & VARIANCE PAGE 1 STATISTICS MEASURES OF CENTRAL TENDENCY. In an experiment, these are applied to the dependent variable (DV)

Ch. 17. DETERMINATION OF SAMPLE SIZE

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Marquette University MATH 1700 Class 5 Copyright 2017 by D.B. Rowe

P8130: Biostatistical Methods I

Algebra 2. Outliers. Measures of Central Tendency (Mean, Median, Mode) Standard Deviation Normal Distribution (Bell Curves)

AIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)

MAT Mathematics in Today's World

Chapter 2: Tools for Exploring Univariate Data

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Review of the Normal Distribution

Where would you rather live? (And why?)

MATH 117 Statistical Methods for Management I Chapter Three

Descriptive Univariate Statistics and Bivariate Correlation

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Reporting Measurement and Uncertainty

Chapter 3. Measuring data

Prentice Hall Stats: Modeling the World 2004 (Bock) Correlated to: National Advanced Placement (AP) Statistics Course Outline (Grades 9-12)

Z score indicates how far a raw score deviates from the sample mean in SD units. score Mean % Lower Bound

The sample mean and sample variance are given by: x sample standard deviation Excel: STDEV(values)

Background to Statistics

Chapter 9 Regression. 9.1 Simple linear regression Linear models Least squares Predictions and residuals.

MgtOp 215 Chapter 3 Dr. Ahn

Interactietechnologie

Statistics for Managers using Microsoft Excel 6 th Edition

Probability and Inference. POLI 205 Doing Research in Politics. Populations and Samples. Probability. Fall 2015

Business Statistics. Lecture 3: Random Variables and the Normal Distribution

SESSION 5 Descriptive Statistics

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Learning Objectives for Stat 225

CIVL 7012/8012. Collection and Analysis of Information

Class 15. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

TOPIC: Descriptive Statistics Single Variable

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Math 1313 Experiments, Events and Sample Spaces

Unit 2. Describing Data: Numerical

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Descriptive Statistics. Population. Sample

Week 1: Intro to R and EDA

Statistic: a that can be from a sample without making use of any unknown. In practice we will use to establish unknown parameters.

Lecture 2. Descriptive Statistics: Measures of Center

A is one of the categories into which qualitative data can be classified.

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Ch. 7: Estimates and Sample Sizes

Contents. Acknowledgments. xix

Marquette University Executive MBA Program Statistics Review Class Notes Summer 2018

Lecture 11. Data Description Estimation

Unit 1: Statistics. Mrs. Valentine Math III

WELCOME!! LABORATORY MATH PERCENT CONCENTRATION. Things to do ASAP: Concepts to deal with:

Continuous random variables

Data Analysis and Statistical Methods Statistics 651

Descriptive Statistics C H A P T E R 5 P P

a table or a graph or an equation.

Introduction to Statistics for Traffic Crash Reconstruction

Understanding Inference: Confidence Intervals I. Questions about the Assignment. The Big Picture. Statistic vs. Parameter. Statistic vs.

CS 5014: Research Methods in Computer Science. Bernoulli Distribution. Binomial Distribution. Poisson Distribution. Clifford A. Shaffer.

Measures of Central Tendency

Descriptive Statistics

Chapter 1 - Lecture 3 Measures of Location

Introduction to Statistical Analysis using IBM SPSS Statistics (v24)

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation

Harvard University. Rigorous Research in Engineering Education

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Probability and Statistics

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE

CS 5014: Research Methods in Computer Science. Statistics: The Basic Idea. Statistics Questions (1) Statistics Questions (2) Clifford A.

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Item Reliability Analysis

Central Limit Theorem and the Law of Large Numbers Class 6, Jeremy Orloff and Jonathan Bloom

13: Additional ANOVA Topics. Post hoc Comparisons

Glossary for the Triola Statistics Series

The Union and Intersection for Different Configurations of Two Events Mutually Exclusive vs Independency of Events

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

3.1 Measure of Center

Chapter 9: Sampling Distributions

Business Statistics:

Probability Distribution. Stat Camp for the MBA Program. Debbon Air Seat Release

Business Statistics. Lecture 10: Course Review

Transcription:

Descriptive Statistics Summarizing a Single Variable Reference Material: - Prob-stats-review.doc (see Sections 1 & 2) P. Hammett - Lecture Eercise: desc-stats.ls 1

Topics I. Discrete and Continuous Measurements II. III. IV. Samples Versus Population Types of Descriptive Statistics A. Location: Mean, Median, Mean Bias B. Dispersion: Range, Standard Deviation, Variance Computing statistics using software 2

I. Discrete Vs. Continuous Variables Discrete variables - vary by whole units # of students in class, # errors in report, sum of rolling 2 die Continuous variables - vary to any degree, limited only by precision of measurement system. Height of students in a class, Length of an object, Time to complete a task Precision of Measurement System Concept: Continuous variables may always be broken down further with greater measurement precision. For eample, Time could be: 10 sec, 10.0 sec, 10.01 sec, 10.008 sec Note: all variable measurements have units! 3

Attributes (Categorical Data) Vs. Variables For attributes (e.g., defective / not defective), we typically use counts and % to communicate. (e.g., 30% defective). For discrete or continuous variables, we typically use descriptive statistics to communicate/ summarize. e.g., average time to process a loan is 41 days Note: average is a descriptive statistic In using descriptive statistics, we must recognize if we are summarizing a population or a sample (of some sample size.) 4

II. Samples Vs. Population When describing a variable, we often collect a sample of data from a population. Population - All items in a set (obtain via census). Describe populations using parameters such as the population mean ( µ ) or standard deviation (σ) Sample - Subset of Population. Estimate parameters using statistics, mean ( X ) Eample: suppose you close 8000 loans. You might measure a sample of 100 from among these 8000 (population) to assess if you are meeting your requirements for time to process loan. 5

Population Eample (all possible outputs are known) What is the population for all possible combinations of the sum of rolling two die? Combination Sum Frequency (1,1) 2 1 (1,2) (2,1) 3 2 (1,3) (3,1) (2,2) 4 3 (1,4) (4,1) (2,3) (3,2) 5 4 (1,5) (5,1) (2,4) (4,2) (3,3) 6 5 (1,6) (6,1) (2,5) (5,2) (3,4) (4,3) 7 6 (2,6) (6,2) (3,5) (5,3) (4,4) 8 5 (3,6) (6,3) (4,5) (5,4) 9 4 (4,6) (6,4) (5,5) 10 3 (5,6) (6,5) 11 2 (6,6) 12 1 Total 36 6

Understanding Samples and Populations Suppose you roll two die 10 times (10 samples) and observe the following sum combinations. Is this sample representative of the population? 5 4 Frequency 3 2 1 0 2 3 4 5 6 7 8 9 10 11 12 Sum of Rolling Pair of Dice 7

Understanding Samples and Populations Suppose you roll 50 samples, Have you observed every possible value? Now, is this sample representative? 8 Sum of Rolling a Pair of Dice Frequency 6 4 2 0 2 3 4 5 6 7 8 9 10 11 12 Roll 8

Understanding Samples and Populations As you increase sample size, you will eventually obtain a representative sample of a population. The challenge is how many samples are needed to be representative! 600 n=3000 samples Frequency 500 400 300 200 100 0 2 3 4 5 6 7 8 9 10 11 12 Sum of Rolling Pair of Dice 9

Population Eample: Continuous Data (all possible combinations are unknown) Usually, all possible combinations are not known. Suppose you monitor the time to complete orders (min), but you do not keep track of every order. Instead, you take samples. 10

Samples from Continuous Populations Suppose you take a set of 3 samples from the population with the following order times (min) 1219.1, 1220.1, 1220.5 1220.1 1219.1 1220.5 11

Samples from Continuous Populations If you take another set of 3 samples from this population, you likely will get a different set of values. 1219.5 Sample Set 2 Sample Set 1 1220.25 1218.5 12

Samples from Continuous Populations As total number of samples become large, they likely will converge or form a pattern (if population does NOT change.) This pattern is known as the Underlying Distribution Underlying Distribution shown below is a Normal Distribution 13

Sample Size and Population Representation (Variable Data) Determining # samples to identify underlying distribution is an advanced skill and requires several assumptions. For a normal distribution, confidence in estimating the distribution variance jumps significantly from ~10 ~30 samples and then begins to level off around ~100 and usually more than 300 is unnecessary. Variance Lower 95% Confidence Interval (true variance = 1) 1.00 0.80 0.60 0.40 0.20 0.00 N=10 N=30 N=100 N=300 0 50 100 150 200 250 300 350 400 450 Sample Size 14

Key Sampling Concepts Key Sampling Concepts: You don t need to measure every observation to understand a population. Knowledge of a population increases with the number of samples, BUT eventually the value of additional information diminishes. The notion that we may understand populations by only measuring samples drives the field of statistics. 15

Fields of Statistics Descriptive Statistics Summarize or describe important features in a data set without attempting to infer conclusions. Describe data samples using items such as: X-bar (sample mean) and S (standard deviation). These statistics are used to estimate the population mean (µ) and population sigma σ. Inferential Statistics Use sample of data to draw conclusions (make inferences). Eample: Suppose you compare order times from 2 processes. Process A averages 12.10 min and B averages 12.22 min. We may use inferential statistics to assess if the two processes have significantly different averages. 16

III. Descriptive Statistics Most commonly used descriptive statistics are related to either measuring location or dispersion. Location Statistics Eamples: Mean, Median, Mean Bias Dispersion Statistics Eamples: Range, Standard Deviation, Variance 17

Location and Dispersion Location ~ central tendency Dispersion ~ spread of distribution Classic eample to demonstrate these concepts: Playing Darts On or Off Location Low or High Dispersion 18

Lecture Eercise: Identify On/Off Target & High/Low Dispersion for each A. B. C. D. 19

Location and Dispersion High Dispersion Off Location High Dispersion On Location Low Dispersion Off Location Low Dispersion On Location 20

Quality Problem Solving A General Approach Address problems in order of importance. Priority features that have strong cause-effect relationship with customer satisfaction. In addressing problems, typically first try to reduce variation, then shift mean as necessary to meet endcustomer needs. Stablize process Center Process as necessary 21

Eecuting Quality Problem Solving Approach In solving quality problems, we need useful estimates of: location (e.g., mean) and dispersion (e.g., variation). 22

A. Measures of Location Mean Median Mean Bias 23

Mean Mean (also known as the average) is a measure of the center of a distribution. Typical notation used to represent the mean of a sample of data is X ; Greek letter µ is used to represent the mean of a population. Mean = X X 2 N 1 + +... X Eample: suppose five students take a test and their scores are 70, 68, 71, 69 and 98. Mean = (70+68+71+69+98)/5 = 75.2 N Ecel: =average(array) 24

Median Median (also known as the 50 th percentile) is the middle observation in a data set. Rank the data set and select the middle value. If odd number of observations, the middle value is observation [N + 1] / 2. If even number of observations, the middle value is etrapolated as midway between observation numbers N / 2 and [N / 2] + 1. Prior data values:68, 69, 70, 71, and 98. Median is 70. If another student with a score of 60 was included, the new median would 69.5 (69 + 70 / 2). Ecel: =median(array) 25

Mean Vs. Median Which is a better measure of location for the following set of test scores? 68, 70, 69, 71, and 98 Mean = 75.2 Median = 70.0 26

Mean Vs. Median Which is a better measure of location for the following set of test scores? 68, 70, 69, 71, and 98 Mean = 75.2 Median = 70.0 Be careful with mean if etreme values are present (e.g. High score ~ 98!) 27

Mean Bias Mean bias absolute deviation of the mean from a target or nominal value. Mean Bias = Mean Target Eample: if average length = 1219.7 min and target = 1220 min, then mean bias = Mean Bias = 1219.7 1220 = 0.3 min Note: Mean Bias is non-directional. For instance, in the above eample, if mean = 1220.3, the bias would also be 0.3 min. 28

B. Measures of Dispersion Range Standard Deviation Variance 29

Range Range is the maimum value in a data set minus the minimum value. Eample: Test Scores: 70, 68, 71, 69 and 98. Range = 98-68 = 30. Note: the range is often preferred over the standard deviation for small data sets (e.g., if # of observations for a sample data set < 10). 30

Standard Deviation Standard deviation (StDev), sigma, S measures the dispersion of the individual observations from the mean. For a sample data set, standard deviation is also referred to as the sample standard deviation or the root-mean-square S rms S = or S = i= 1 n n n i= 1 ( X X ) i n 1 X 2 i n ( 2 ( n 1) X i ) 2 31

Standard Deviation (Sigma) The standard deviation is very useful in describing the variation about the mean if the data are normally distributed. 3σ 2σ 1σ +1σ+2σ+3σ +/- 1σ = 68.26% +/- 2σ = 95.46% +/- 3σ = 99.73% For a Normally distributed variable, we epect 99.73% of all values to fall within +/- 3 std deviations of the mean. 32

Order Time Eample (1000 Measurements) If Mean = 1220 and standard deviation = 0.5 min, then 99.73% of all values will be epected to fall between 1218.5 and 1221.5 (+/- 3σ) 33

Effects of Etreme Values Test scores: 70, 68, 71, 69 and 98, sample standard deviation is 12.79. Suppose you eclude the score of 98, sample standard deviation is reduced to 1.3! Standard deviation may be severely influenced by etreme values in sample data set (Note: they may not necessarily be outliers). We may reduce the effects of individual observations by increasing the sample size. 34

Variance Variance is the square of the standard deviation. Represents the average squared deviation of each observation from the sample mean. S 2 ( X X ) Prior Eample where std deviation = 12.79 Variance = (12.79) 2 = 163.72 = n i= 1 i n 1 2 35

Variance Additive Property Variance is often used instead of standard deviation because of its additive properties when combining multiple sources of variation. Suppose you have process time from two independent processes: A and B. Proc A Proc B σ 2 AB = σ2 A + σ2 B Overall AB σ AB = σ A + σ B X Not True! 36

IV. Using Software to Calculate Descriptive Statistics In practice, we rarely calculate statistics by hand. So, let us eplore some useful Ecel functions. Count (N) =count(array) Mean =average(array) Median =median(array) Std Dev =stdev(array) Variance =var(array) Range =ma(array)-min(arrary) Or, we may use QETools for calculations. 37

Eample Using Ecel Given our prior test scores in cells: B2:B6, we can compute the mean (average) by using the formula =average(b2:b6) 1 2 3 4 5 6 7 8 A B C Observation Score 1 68 2 70 3 69 4 71 5 98 Average 75.2 =Average(B2:B6) 38

Lecture Eercise: Compute Descriptive Statistics Given Ecel file desc-stats.ls, compute statistics for test scores. Count (N), Mean, Median, Range, Std Dev, Variance 39

Or, Use QE Tools Ecel file desc-stats.ls Score Sample N 16 Mean 82.78 Median 83.50 StDev 9.17 Variance 84.07 Min 63.00 Ma 95.00 Range 32.00 Sample Results from QETools 40