DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008

Similar documents
Unit 2. Describing Data: Numerical

2011 Pearson Education, Inc

After completing this chapter, you should be able to:

Describing distributions with numbers

Section 3. Measures of Variation

Range The range is the simplest of the three measures and is defined now.

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

P8130: Biostatistical Methods I

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts.

Describing distributions with numbers

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Descriptive Univariate Statistics and Bivariate Correlation

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Statistics for Managers using Microsoft Excel 6 th Edition

3.1 Measure of Center

MgtOp 215 Chapter 3 Dr. Ahn

Measures of the Location of the Data

Chapter 3 Data Description

Chapter 3. Measuring data

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Introduction to Statistics

Chapter 3. Data Description

Section 3.2 Measures of Central Tendency

Unit 2: Numerical Descriptive Measures

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

Statistics I Chapter 2: Univariate data analysis

Chapter 1:Descriptive statistics

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Measures of Location. Measures of position are used to describe the relative location of an observation

Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size. Arrange the values from smallest to largest:

3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability

MATH 117 Statistical Methods for Management I Chapter Three

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions

Chapter. Numerically Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

Statistics I Chapter 2: Univariate data analysis

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

Topic-1 Describing Data with Numerical Measures

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Chapter 2: Tools for Exploring Univariate Data

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

= n 1. n 1. Measures of Variability. Sample Variance. Range. Sample Standard Deviation ( ) 2. Chapter 2 Slides. Maurice Geraghty

Sampling, Frequency Distributions, and Graphs (12.1)

STAT 200 Chapter 1 Looking at Data - Distributions

The Normal Distribution. Chapter 6

Lecture 2 and Lecture 3

Measures of Dispersion

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

Chapter 4. Displaying and Summarizing. Quantitative Data

Chapter 5. Understanding and Comparing. Distributions

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

CHAPTER 1. Introduction

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Exercises from Chapter 3, Section 1

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Chapter 4.notebook. August 30, 2017

Exam: practice test 1 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data

CIVL 7012/8012. Collection and Analysis of Information

Chapter 1. Looking at Data

Lecture Notes for BUSINESS STATISTICS - BMGT 571. Chapters 1 through 6. Professor Ahmadi, Ph.D. Department of Management

Elementary Statistics

Descriptive Data Summarization

TOPIC: Descriptive Statistics Single Variable

Vocabulary: Samples and Populations

Chapter 6 Group Activity - SOLUTIONS

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

Describing Data: Numerical Measures GOALS. Why a Numeric Approach? Chapter 3 Dr. Richard Jerz

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Stats Review Chapter 3. Mary Stangler Center for Academic Success Revised 8/16

1.3: Describing Quantitative Data with Numbers

STT 315 This lecture is based on Chapter 2 of the textbook.

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS

1 Measures of the Center of a Distribution

Descriptive Statistics-I. Dr Mahmoud Alhussami

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

2.1 Measures of Location (P.9-11)

additionalmathematicsstatisticsadditi onalmathematicsstatisticsadditionalm athematicsstatisticsadditionalmathem aticsstatisticsadditionalmathematicsst

Chapter2 Description of samples and populations. 2.1 Introduction.

Describing Distributions with Numbers

1.3.1 Measuring Center: The Mean

FSA Algebra I End-of-Course Review Packet

Determining the Spread of a Distribution Variance & Standard Deviation

Math 14 Lecture Notes Ch Percentile

STA 218: Statistics for Management

1. Exploratory Data Analysis

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Chapter 1: Exploring Data

Chapter 6 The Standard Deviation as a Ruler and the Normal Model

Lecture 1: Descriptive Statistics

Guidelines for comparing boxplots

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the

Summarising numerical data

Resistant Measure - A statistic that is not affected very much by extreme observations.

Lecture 2. Descriptive Statistics: Measures of Center

MATH 1150 Chapter 2 Notation and Terminology

ECLT 5810 Data Preprocessing. Prof. Wai Lam

equal to the of the. Sample variance: Population variance: **The sample variance is an unbiased estimator of the

Transcription:

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS Introduction to Business Statistics QM 120 Chapter 3 Spring 2008

Measures of central tendency for ungrouped data 2 Graphs are very helpful to describe the basic shape of a data distribution; a picture is worth a thousand words. There are limitations, however to the use of graphs. One way to overcome graph problems is to use numerical measures, which can be calculated for either a sample or a population. Numerical descriptive measures associated with a population of measurements are called parameters; those computed from sample measurements are called statistics. A measure of central tendency gives the center of a histogram or frequency distribution curve. The measures are: mean, median,and d mode.

Measures of central tendency for ungrouped data 3 Data that give information on each member of the population or sample individually are called ungrouped data, whereas grouped data are presented in the form of a frequency distribution table. Mean The mean (average) is the most frequently used measure of central tendency. The mean for ungrouped data is obtained by dividing idi the sum of all values by the number if values in the data set. Thus, x Mean for population : µ = N x Mean for sample: x = n

Measures of central tendency for ungrouped data 4 Example: The following table gives the 2002 total payrolls of five MLB teams. MLB team Anaheim Angels Atlanta Braves 2002 total payroll (millions of dollars) 62 93 New York Yankees 126 St. Louis Cardinals Tampa Bay Devil Rays 75 34 The Mean is a balancing point 34 62 75 93 126

Measures of central tendency for ungrouped data 5 Sometimes a data set may contain a few very small or a few very large values. Such values are called outliers or extreme values. We should be very cautious when using the mean. It may not always be the best measure of central tendency. Example: The following table lists the 2000 populations (in thousands) of five Pacific states. State Washington Population (thousands) 5894 Excluding California 5894 + 3421 + 627 + 1212 Mean = = 4 2788.5 Oregon 3421 Including California Alaska 627 An 5894+ 3421+ 627+ 1212+ 33,872 Hawaii outlier 1212 Mean = = 9005. 2 California 33,872 5

Measures of central tendency for ungrouped data 6 Weighted Mean Sometimes we may assign weight (importance) to each observation before we calculate the mean. A mean computed in this manner is refereed to as a weighted mean and it is given by x = W x i i W i where x i is the value of observation i and W i is weight for observation i.

Measures of central tendency for ungrouped data 7 Example: Consider the following sample of four purchases of one stock in the KSE. Find the average cost of the stock. Purchase Price Quantity 1.300 5,000 2.325 15,000 3.350 10,000 4.295 20,000

Measures of central tendency for ungrouped data 8 Median The median is the value of the middle term in a data set has been ranked in increasing order. n + 1 Median = Value of the 2 th term in a ranked data set The value.5(n + 1) indicates the position in the ordered data set. If n is even, we choose a value halfway between the two middle observations Example: Find the median for the following two sets of measurements 2, 9, 11, 5, 6 2, 9, 11, 5, 6, 27

Measures of central tendency for ungrouped data 9 Mode Themodeisthevaluethatoccurswiththehighest frequency inadataset. A data with each value occurring only once has no mode. A data set with only one value occurring with highest frequency has only one mode, it is said to be unimodal. A data set with two values that occurs with the same (highest) frequency has two modes, it is said to be bimodal. If more than two values in a data set occur with the same (highest) h frequency, it issaidid to be multimodal. l

Measures of central tendency for ungrouped data 10 Example: You are given 8 measurements: 3, 5, 4, 6, 12, 5, 6, 7. Find a) The mean. b) The median. c) The mode. Relationships among the mean, median, and Mode Symmetric histograms when Right skewed histograms when Left skewed histograms when Mean = Median = Mode Mean > Median > Mode Mean < Median < Mode

Measures of dispersion for ungrouped data 11 Range Data sets may have same center but look different because of the way the numbers are spread out from center. Example: Company 1: 47 38 35 40 36 45 39 Company 2: 70 33 18 52 27 Measure of variability can help us to create a mental picture of the spread of the data. The range for ungrouped data Range = Largest value Smallest value The range, like the mean, is highly influenced by outliers. The rangeisbased on two values only.

Measures of dispersion for ungrouped data 12 Variance and standard deviation The standard deviation is the most used measure of dispersion. It tells us how closely the values of a data set are clustered around the mean. In general, larger values of standard deviation indicate that values of that data set are spread over a relatively larger range around the mean and vice versa. 2 Population : σ = x 2 ( x ) ( 2 ) 2 x x 2 N N,and sample : s 2 = n 1 n 2 σ = σ and s = s 2 Standard deviation is always non negative negative

Measures of dispersion for ungrouped data 13 Example: Find the standard deviation of the data set in table. MLB team 2002 payroll (millions of dollars) Anaheim Angels 62 Atlanta Braves New York Yankees St. Louis Cardinals Tampa Bay Devil Rays 93 126 75 34

Measures of dispersion for ungrouped data 14 In some situations we may be interested in a descriptive statistics that indicates how large is the standard deviation compared to the mean. It is very useful when comparing two different samples with different means and standard deviations. It is given by: CV σ = 100 % µ

Mean, variance, and standard deviation for grouped data 15 Mean for grouped data Once we group the data, we no longer know the values of individual observations. Thus, we find an approximation for the sum of these values. mf Mean for population lti : µ = N mf Mean for sample: x = n Where m is the midpoint and f is the frequency of a class.

Mean, variance, and standard deviation for grouped data 16 Variance and standard deviation for grouped data 2 Population : σ = Sample : s 2 = m m 2 f ( ) 2 mf 2 f N N ( mf ) n 1 n 2 Where m is the midpoint and f is the frequency of a class. 2 σ = σ and s = s 2

Mean, variance, and standard deviation for grouped data 17 Example: The table below gives the frequency distribution of the daily commuting times (in minutes) from home to CBA for all 25 students in QMIS 120. Calculate the mean and the standard deviation of the daily commuting times. Daily commuting time (min) 0 to less than 10 10 to less than 20 9 20 to less than 30 30 to less than 40 40 to less than 50 Total f 4 6 4 2 25

Use of standard deviation 18 Z Scores We often are interested in the relative location of a data point x i with respect to the mean (how far or how close). The z scores (standardized value) can be used to find the relative location of a data point x i compared to the center of the data. It is given by Z i = x i x s The z score can be interpreted as the number of standard deviations x i is from the mean.

Use of standard deviation 19 Example: Find the z scores for the following data where the mean is 44 and the standard deviation is 8. Number of students in a class 46 54 42 46 32

Use of standard deviation 20 Chebyshevʹs Theorem Chebyshevʹs theorem allows you to understand how the value of a standard deviation can be applied to any data set. Chebyshevʹs s theorem: The fraction of any data set lying within k standard deviations of the mean is at least 1 1 k 2 where k = a number greater than 1. This theorem applies to all data sets, which include a sample or a population. Chebyshev s theorem is very conservative but it can be applied to a data set with any shape.

Use of standard deviation 21 k = 1 k = 2 k = 3

Use of standard deviation 22 Example: The average systolic blood pressure for 4000 women who were screened for high blood pressure was found to be 187 with standard deviation of 22. Using Chebyshev s theorem, find at least what percentage of women in this group have a systolic blood pressure between 143 and 231.

Use of standard deviation 23 Empirical Rule The empirical rule gives more precise information about a data set than the Chebyshevʹs Theorem, however it only applies to a data set that is bell shaped. Theorem: 1 68% of the observations lie within one standard deviation of the mean. 2 95% of the observations lie within ihi two standard d deviations i of the mean. 3 99.7% of the observations lie within three standard deviations of the mean.

Use of standard deviation 24 Example: The age distribution of a sample of 5000 persons is bell shaped with a mean of 40 years and a standard deviation of 12 years. Determine the approximate percentage of people who are16to64yearsold.

Use of standard deviation 25 Detecting outliers Sometimes a data set may have one or more observation that is unusually small or large value. This extreme value is called an outlier and cam be detected using the z score and the empirical rule for data with bell shape distribution. An experienced statistician may face the following situations and need to takeanaction ti Outlier A data value that was incorrectly recorded A data value that was incorrectly included Action Correct it before any further analysis Remove it before any further analysis A datavalue that t belongs to the data set and Keep it! correctly recorded

Measure of position 26 Quartiles and interquartile range A measure of position which determines the rank of a single value in relation to other values in a sample or population. Quartiles are three measures that divide a ranked data set into four equal parts

Measure of position 27 Calculating the quartiles The second quartile is the median of a data set. The first quartile, Q 1, is the value of the middle term among observations s that are less than the median. The third quartile, Q 3, is the value of the middle term among observations that are greater than the median. Interquartile range (IQR) is the difference between the third quartile and the first quartile for a data set. IQR = Interquartile range = Q 3 Q 1

Measure of position 28 Example: For the following data 75.3 82.2 85.8 88.7 94.1 102.1 79.0 97.1 104.2 119.3 81.3 77.1 Find The values of the three quartiles. The interquartile range. Where does the 104.2 fall in relation to these quartiles.

Measure of position 29 Percentile and percentile rank Percentiles are the summery measures that divide a ranked data set into 100 equal parts. The k th percentile is denoted by P k, where k is an integer in the range 1 to 99. Theapproximatevalueofthek th percentile is P k = Value of the kn 100 th term in a ranked data set

Measure of position 30 Example: For the following data 75.3 82.2 2 85.88 88.7 94.1 102.11 79.0 97.1 104.2 119.3 81.3 77.1 Find the value of the 40 th percentile. Solution:

Box-and-whisker plot 31 Box and whisker plot gives a graphic presentation of data using five measures: Q 1, Q 2, Q 3, smallest, and largest values*. Can help to visualize the center, the spread, and the skewness of a data set. Can help epin detecting ee goutliers. Very good tool of comparing more than a distribution. Detecting an outlier: Lower fence: Q 1 1.5(IQR) Upper fence: Q 3 + 1.5(IQR) If a data point is larger than the upper fence or smaller than the lower fence, it is considered to be an outlier.

Box-and-whisker plot 32 To construct a box plot Draw a horizontal line representing the scale of the measurements. Calculate the median, the upper and lower quartiles, and the IQR for the data set and mark them on the line. Form a box just above the line with the right and left ends at Q1 and Q3. Draw a vertical line through the box at the location of the median. Mark any outliers with an asterisk (*) ()on the graph. Extend horizontal lines called Whiskers from the ends of the box to the smallest and largest observation that are not outliers. * * Lower fence Q 1 Q 2 Q 3 Upper fence

Box-and-whisker plot 33 Example: Construct a box plot for the following data. 340 300 470 340 320 290 260 330

Measures of association between two variables 34 Sofarwehavestudiednumericalmethodstodescribedata with one variable. Often decision makers are interested in the relationship between two variables. To do so, we will use descriptive measure that is called covariance. Covariance assigns a numerical value to the linear relationship between two variables (see scatter diagram). It is given by ( x x)( y y) i i Sample covariance:sxy = n 1 x µ x Population covariance: σ xy = N ( )( y ) i i µ y

Measures of association between two variables 35 A big disadvantage of the covariance is that it depends on the units of measurement for x and y. For the same data set, we will have two different covariance values depending on the units (i.e. height in meters or centimeters will make a big difference). Pearson s correlation coefficient is a good remedy to that problem as it can go only from 1 to1. It is given by Population correlation coefficient : σ Sample correlation coefficient : r xy xy Sxy = S S x = σ xy σ σ y, x y

Measures of association between two variables 36 Example: A golfer is interested in investigating the relationship, if any, between driving distance and 18 hole score. Average Driving Distance (meters) Average 18 Hole Score 277.6 69 259.5 71 269.1 70 267.0 70 255.6 71 272.9 69

Measures of association between two variables 37 x y 277.6 259.55 269.1 267.0 255.6 272.9 69 71 70 70 71 69

Measures of association between two variables 38 Example: Find the covariance and Pearson s correlation coefficient for the following data Week Number of commercials (x) Sales in $ (y) 1 2 50 2 5 57 3 1 41 4 3 54 5 4 54 6 1 38 7 5 63 8 3 48 9 4 59 10 2 46