Measures of Central Tendency and their dispersion and applications. Acknowledgement: Dr Muslima Ejaz

Similar documents
Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Unit 2. Describing Data: Numerical

Descriptive Statistics-I. Dr Mahmoud Alhussami

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions

Introduction to Statistics

Elementary Statistics

8.1 Frequency Distribution, Frequency Polygon, Histogram page 326

Chapter 3 Data Description

CIVL 7012/8012. Collection and Analysis of Information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

MATH 117 Statistical Methods for Management I Chapter Three

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

Measures of Central Tendency

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

FREQUENCY DISTRIBUTIONS AND PERCENTILES

Statistics for Managers using Microsoft Excel 6 th Edition

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

Statistics I Chapter 2: Univariate data analysis

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Describing Data: Numerical Measures. Chapter 3

SESSION 5 Descriptive Statistics

Exploring, summarizing and presenting data. Berghold, IMI, MUG

Statistics I Chapter 2: Univariate data analysis

Variables, distributions, and samples (cont.) Phil 12: Logic and Decision Making Fall 2010 UC San Diego 10/18/2010

Chapter 3 Statistics for Describing, Exploring, and Comparing Data. Section 3-1: Overview. 3-2 Measures of Center. Definition. Key Concept.

The science of learning from data.

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

Histograms allow a visual interpretation

Descriptive Statistics C H A P T E R 5 P P

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

Describing Data: Numerical Measures

Chapter. Numerically Summarizing Data Pearson Prentice Hall. All rights reserved

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

3.1 Measure of Center

Lecture 2. Descriptive Statistics: Measures of Center

A is one of the categories into which qualitative data can be classified.

MALLOY PSYCH 3000 MEAN & VARIANCE PAGE 1 STATISTICS MEASURES OF CENTRAL TENDENCY. In an experiment, these are applied to the dependent variable (DV)

Measures of Central Tendency:

Algebra 2. Outliers. Measures of Central Tendency (Mean, Median, Mode) Standard Deviation Normal Distribution (Bell Curves)

Chapter 3. Data Description

Section 3.2 Measures of Central Tendency

EQ: What is a normal distribution?

Describing distributions with numbers

STT 315 This lecture is based on Chapter 2 of the textbook.

Lecture 11. Data Description Estimation

8/28/2017. PSY 5101: Advanced Statistics for Psychological and Behavioral Research 1

Chapter. Numerically Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008

Continuous Probability Distributions

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS

Answer keys for Assignment 10: Measurement of study variables (The correct answer is underlined in bold text)

STAT 200 Chapter 1 Looking at Data - Distributions

Sampling, Frequency Distributions, and Graphs (12.1)

Example 2. Given the data below, complete the chart:

MEASURES OF LOCATION AND SPREAD

Exam: practice test 1 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Practice problems from chapters 2 and 3

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

Describing Data: Numerical Measures

Range The range is the simplest of the three measures and is defined now.

Chapter 2. Mean and Standard Deviation

Stat 20 Midterm 1 Review

Chapter 2 The Mean, Variance, Standard Deviation, and Z Scores. Instructor s Summary of Chapter

Resistant Measure - A statistic that is not affected very much by extreme observations.

GRACEY/STATISTICS CH. 3. CHAPTER PROBLEM Do women really talk more than men? Science, Vol. 317, No. 5834). The study

Correlation & Regression. Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University of Alexandria

How spread out is the data? Are all the numbers fairly close to General Education Statistics

Statistics in medicine

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation

Chapter (3) Describing Data Numerical Measures Examples

Lecture 8: Chapter 4, Section 4 Quantitative Variables (Normal)

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

HYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

After completing this chapter, you should be able to:

Preliminary Statistics course. Lecture 1: Descriptive Statistics

MEASURES OF CENTRAL TENDENCY

Introduction to Statistics Using LibreOffice.org Calc. Fouth Edition. Dana Lee Ling (2012)

CHAPTER 8 INTRODUCTION TO STATISTICAL ANALYSIS

Correlation. Engineering Mathematics III

Homework 7. Name: ID# Section

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Chapter 5: Exploring Data: Distributions Lesson Plan

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

Chapter 2 Solutions Page 15 of 28

Statistics and parameters

Francine s bone density is 1.45 standard deviations below the mean hip bone density for 25-year-old women of 956 grams/cm 2.

Chapter 2: Summarizing and Graphing Data

Further Mathematics 2018 CORE: Data analysis Chapter 2 Summarising numerical data

Lecture 2 and Lecture 3

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data

Review for Exam #1. Chapter 1. The Nature of Data. Definitions. Population. Sample. Quantitative data. Qualitative (attribute) data

psychological statistics

Looking at data: distributions - Density curves and Normal distributions. Copyright Brigitte Baldi 2005 Modified by R. Gordon 2009.

TOPIC: Descriptive Statistics Single Variable

Chapter 3. Measuring data

Answers Part A. P(x = 67) = 0, because x is a continuous random variable. 2. Find the following probabilities:

Transcription:

Measures of Central Tendency and their dispersion and applications Acknowledgement: Dr Muslima Ejaz

LEARNING OBJECTIVES: Compute and distinguish between the uses of measures of central tendency: mean, median and mode. Compute and list some uses for measures of variation of dispersion: range, variance and standard deviation. Understand the distinction between the population mean and the sample mean. Learn the empirical rule and its application. REFERENCES: Basic Statistics for the Health Sciences, Jan W. Kuzma and Stephen E. Bohnenblust, by Mayfield Publishing Company, 2001. An introduction to Statistical Methods and Data Analysis, Lyman Ott PWS-Kent Publishing Company, 1988 9/24/2013 2

Average speed of a car crossing midtown Manhattan during the day is 5.3 miles /hr Average minutes an American father of 4- year-old spend alone with his child each day is 42 Average American man is 5 feet 9 inches and average women is 5 feet 3.6 inches tall The average American man is sick in bed seven days a year missing 5 days of work 9/24/2013 3

Measures of Central Tendency (center of the distribution) Find a single score that is most typical or most representative of the entire group Helpful in comparing groups No single measure representative in every situation - three ways of determining central tendency Mean Median Mode 9/24/2013 4

Mean Also called arithmetic mean or average The sum of all scores divided by the number of scores X n i= = 1 n Xi 9/24/2013 5

Sample Mean Add up all the observations given in the data, then divide by sample size (n) The sample size n is the number of observations 9/24/2013 6

Example; Mean n = 5 Systolic blood pressures ( mmhg) X1 = 120 X2 = 80 X3 = 90 X4 = 110 X5 = 95 9/24/2013 7

Example: Mean X n i = 1 = n Xi Mean Systolic Blood Pressure: X = 495 = 5 99 9/24/2013 8

Pros and Cons of the Mean Pros Mathematical center of a distribution. Just as far from scores above it as it is from scores below it. Does not ignore any information Cons Influenced by extreme scores and skewed distributions One data point could make a great change in sample mean 9/24/2013 9

Example n= 5 Systolic blood pressures ( mmhg) X1 = 120 X2 = 180 X3 = 90 X4 = 110 X5 = 95 Mean Systolic Blood Pressure: X = 595 = 5 119 9/24/2013 10

Population Versus Sample Mean Population The entire group you want information about For example: The blood pressure of all 18- year-old male Medical college students at AKU 9/24/2013 11

Cont Sample A part of the population from which we actually collect information and draw conclusions about the whole population For example: Sample of blood pressures N=five 18-year-old male college students in AKU 9/24/2013 12

Mean Population mu Sample X bar µ X N i= = 1 sigma, the sum of X, add up all scores N n i= = 1 Xi N, the total number of scores in a population 9/24/2013 13 n sigma, the sum of X, add up all scores Xi n, the total number of scores in a sample

The Median The score that divides the distribution exactly in half when observations are ordered The 50 th percentile (50%) Goal: determine the exact midpoint Half of the rank order of observations n+1 / 2 Scores arranged from highest to lowest middle score 9/24/2013 14

Example: Median 110, 90, 80, 95, 120 80, 90, 95, 110, 120 The median is the middle value when observations are ordered. To find the middle, count in (N+1)/2 scores when observations are ordered lowest to highest. Median Systolic BP: (5+1)/2 = 3 9/24/2013 15

Finding the median with an even number of scores. With an even number of scores, the median is the average of the middle two observations when observations are ordered. 80, 90, 95, 110, 120, 125 (95 + 110)/2 = 102.5 9/24/2013 16

Example; Median 80, 90, 95, 110, 220 Median 9/24/2013 17

Pros and Cons of Median Pros Not influenced by extreme scores or skewed distributions Easier to compute than the mean. Cons Doesn t take actual values into account. As its value is determined solely by its rank, provides no information about any of the other values within the distribution 9/24/2013 18

The Mode The highest frequency/most frequently occurring score Applicable to qualitative and quantitative data Could be bi-modal or multi-modal 9/24/2013 19

Central Tendency Example: Mode 75, 76, 90, 90, 95, 99, 100, 120, 120, 135,135, 155, 170, 186, 196, 205, 220 Mode: most frequent observation Mode(s) for Blood Pressure: 90, 120, 135 9/24/2013 20

Pros and Cons of the Mode Pros Easiest to compute and understand. Cons Ignores most of the information in a distribution The score comes from the data set. Small samples may not have a mode 9/24/2013 21

Using different measures of central tendency Two factors are important in making the decision of which measure of central tendency should be used: Scale of measurement (ordinal or numerical) Shape of the distribution of observations. A distribution can be symmetric or skewed to the right, positively skewed or to the left, negatively skewed. 9/24/2013 22

Using different measures of central tendency f(x) In a normal distribution, the mean, median, and mode are the same. µ Mean Median Mode x 9/24/2013 23

The effect of skew on average. In a skewed distribution, the mean is pulled toward the tail. 9/24/2013 24

Using different measures of central tendency The following guidelines help the researcher decide which measure is best with a given set of data: The mean is used for numerical data and for symmetric distribution. y Frequency 0.3 0.0 0.1 0.2-4 -2 0 2 4 Values 9/24/2013 25

Using different measures of central tendency The following guidelines help the researcher decide which measure is best with a given set of data: The median is used for ordinal data or for numerical data whose distribution is skewed. 9/24/2013 26

Using different measures of central tendency The following guidelines help the researcher decide which measure is best with a given set of data: The mode is used primarily for nominal or ordinal data or for numerical data with bimodal distribution Frequency 20 25 30 0 5 10 15 2 0 2 4 6 8 10 Stress Rating 9/24/2013 27

Measures of Variation Or Measures of dispersion 9/24/2013 28

Measures of Variability A single summary figure that describes the spread of observations within a distribution. Centrally located at the Same value on the horizontal axis, but have substantially different amount of variability 9/24/2013 29

Measures of Variability Consider the following two data sets on the ages of all patients suffering from bladder cancer and prostatic cancer. BC PC 47 70 38 33 35 18 40 52 36 27 The mean age of both the groups is 40 years. If we do not know the ages of individual patients and are told only that the mean age of the patients in the two groups is the same, we may assume that the patients in the two groups have a similar age distribution. 9/24/2013 30 45 39 Variation in the patient s ages in each of these two groups is very different. The ages of the prostatic cancer patients have a much larger variation than the ages of the bladder cancer patients.

Measures of Variability Measure the spread in the data Some important measures Range Mean deviation Variance Standard Deviation Coefficient of variation 9/24/2013 31

Variability The purpose of the majority of medical, behavioural and social science research is to explain or account for variance or differences among individuals or groups. Examples 1. What factors account for the variance (or difference) in IQ among individuals? 2. What factors account for the variance in treatment compliance among different groups of patients? 9/24/2013 32

Range The range tells us the span over which the data are distributed, and is only a very rough measure of variability Range: The difference between the maximum and minimum scores 80, 90, 95, 110, 120 Range = 120 80 = 40 9/24/2013 33

Range Range is the simplest measure of dispersion It depends entirely on the extreme scores and doesn t take into consideration the bulk of the observations 9/24/2013 34

X Variation X X X 5 0.00 5 0.00 5 0.00 5 0.00 5 0.00 = 25 n = 5 X = 5 This is an example of data with no i.e. zero variability 9/24/2013 35

Variation X X X X 6 +1.00 4-1.00 6 +1.00 5 0.00 4-1.00 = 25 n = 5 X = 5 This is an example of data with low variability 9/24/2013 36

Variation X X X X 8 +3.00 1-4.00 9 +4.00 5 0.00 2-3.00 = 25 n = 5 = 5 X This is an example of data with high variability 9/24/2013 37

Mean deviation The best measures of dispersion should: take into account all the scores in the distribution and should describe the average deviation of all observations from the mean. Normally, to find the average we would want to sum all deviations from the mean and then divide by n, i.e., X n x 9/24/2013 38

Mean Deviation X X- x n = 6; ΣX = 33 3 3-5.50 = 2.50 X = Σ X/n 5 5-5.50 = 0.50 X = 33/6 9 9-5.50 = 3.50 X = 5.50 2 2-5.50 = 3.50 8 8-5.50 = 2.50 6 6-5.50 = 0.50 = 13 Mean Deviation = 13/ 6 = 2.167 9/24/2013 39

Variance & Standard Deviation However, if we square each of the deviations from the mean, we obtain a sum that is not equal to zero This is the basis for the measures of variance and standard deviation, the two most common measures of variability (or dispersion) of data 9/24/2013 40

Variance & Standard Deviation (cont) X X X ( X X ) 2 8 +3.00 9.00 1-4.00 16.00 9 +4.00 16.00 5 0.00 0.00 2-3.00 9.00 X X X ( ) ( X X ) = 25 = 0.00 = 2 50.00 ( X X ) 2 Note: The is called the Sum of Squares 9/24/2013 41

Steps to calculate Variance Compute the mean. Subtract the mean from each observation. Square each of the deviations. Find the sum of the squares. Divide the sum by N to get the variance Take the square root of the variance to get the standard deviation. 9/24/2013 42

Few Facts The square root of the variance gives the standard deviation (SD) and vice versa Variance is actually the average of the square of the distance that the each value is from the mean Why the squared distances and not the actual ones! Sum of the distances will always be zero, when each value is squared the negative sign is eliminated Why to take the square root? Since distances were squared, the units of the resultant numbers are the squares of the units of the original raw data. Finding the square root of the variance puts the SD in the same units as the raw data. i.e. standard deviation expresses variability in the same units as the data. 9/24/2013 43

Sample Variance The sum of squared deviations from the mean divided by the n - 1 (an estimate of the population variance) s 2 = ( ) X x n 1 2 9/24/2013 44

Variance of a Population The sum of squared deviations from the mean divided by the number of scores (sigma squared): ( X ) µ σ 2 = N 2 9/24/2013 45

Standard Deviation Formulas Population Standard Deviation Sample Standard Deviation σ s = = ( X µ ) N 2 ( ) X x 2 X x n 1 Sample standard deviation usually underestimates population standard deviation. Using n-1 in the denominator corrects for this and gives us a better estimate of the population standard deviation. 9/24/2013 46

Sometimes it is of interest to compare the degree of variability in the distribution of a factor from two different populations or of two different variables from the same populations eg; SBP (factor) among children and adults (two different populations) or among adults the distribution of SBP has more spread than that of DBP 9/24/2013 47

Coefficient of variation: expresses the SD as proportion of the mean It is a dimensionless measure of the relative variation. Constructed by dividing the standard deviation by the mean and multiplying by 100. CV = (SD/mean) * (100) It depicts the size of standard deviation relative to its mean Used to compare the variability in one data set with that in another when a direct comparison of standard deviation is not appropriate. 9/24/2013 48

Coefficient of variation The formula is: CV = (s/x) (100) Suppose two samples of human males yield the following results: Mean age Mean wt SD Adults 25 yrs 145lbs 10lbs Childr en 11 yrs 80lbs 10lbs CV 6.9% 12.5% 9/24/2013 49

Using different measures of dispersion The following guidelines help investigators decide which measure of dispersion is most appropriate for a given set of data: The standard deviation is used when the mean is used i.e., with symmetric distributions of numerical data The range is used with numerical data when the purpose is to emphasize extreme values. The coefficient of variation is used when the intent is to compare two numerical distributions measured on different scales. 9/24/2013 50

Empirical Rule Specifies the proportion of the spread in terms of the standard deviation It applies to the normal symmetric or bell- shaped distribution Approx 68% of the data values will fall within 1 SD of the mean Approx 95% of the data values will fall within 2 SD of the mean Approx 99.7% of the data values will fall within 3 SD of the mean 9/24/2013 51

Empirical Rule Approximate percentage of area within given standard deviations 99.7% 95% 68% 9/24/2013 52 Assume the distribution of underlying variable is symmetric and bell shaped (Normal)

Example Scores on a National Achievement Exam have a mean of 480 and a SD of 90. And if these scores are normally distributed, then approximately 68% will fall between 390 & 570 approximately 95% will fall between 300 & 660 approximately 99.7% will fall between 210 & 750 9/24/2013 53

Application of the Empirical Rule Women participating in a three-day experimental diet regime have been demonstrated to have normally distributed weight loss with mean 600 g and a standard deviation 200 g. a) What percentage of these women will have a weight loss between 400 and 800 g? b) What percentage of women will lose weight too quickly on the diet (where too much weight is defined as >1000g)? 9/24/2013 54

a) X : (600,200) ~ 68% 0 200 400 600 800 1000 1200 9/24/2013 55

b) X : (600,200) 2.3% 0 200 400 600 800 1000 1200 9/24/2013 56