Chapter 1:Descriptive statistics

Similar documents
P8130: Biostatistical Methods I

Chapter 3. Data Description

Chapter 3. Measuring data

CHAPTER 2: Describing Distributions with Numbers

CIVL 7012/8012. Collection and Analysis of Information

Describing Distributions

Chapter 1: Exploring Data

Unit 2. Describing Data: Numerical

Resistant Measure - A statistic that is not affected very much by extreme observations.

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

3.1 Measure of Center

Class 11 Maths Chapter 15. Statistics

1. Exploratory Data Analysis

STAT 200 Chapter 1 Looking at Data - Distributions

Example 2. Given the data below, complete the chart:

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008

1.3.1 Measuring Center: The Mean

Preliminary Statistics course. Lecture 1: Descriptive Statistics

Chapter 7: Statistics Describing Data. Chapter 7: Statistics Describing Data 1 / 27

Describing Distributions With Numbers

Chapter 1. Looking at Data

Describing Distributions with Numbers

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS

additionalmathematicsstatisticsadditi onalmathematicsstatisticsadditionalm athematicsstatisticsadditionalmathem aticsstatisticsadditionalmathematicsst

Descriptive Statistics

Chapter 1 - Lecture 3 Measures of Location

ST Presenting & Summarising Data Descriptive Statistics. Frequency Distribution, Histogram & Bar Chart

MATH 1150 Chapter 2 Notation and Terminology

Descriptive Univariate Statistics and Bivariate Correlation

Describing Distributions With Numbers Chapter 12

Statistics for Managers using Microsoft Excel 6 th Edition

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Instructor: Doug Ensley Course: MAT Applied Statistics - Ensley

= n 1. n 1. Measures of Variability. Sample Variance. Range. Sample Standard Deviation ( ) 2. Chapter 2 Slides. Maurice Geraghty

Descriptive Data Summarization

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

After completing this chapter, you should be able to:

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

Unit 2: Numerical Descriptive Measures

Overview. INFOWO Statistics lecture S1: Descriptive statistics. Detailed Overview of the Statistics track. Definition

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Statistics I Chapter 2: Univariate data analysis

Probabilities and Statistics Probabilities and Statistics Probabilities and Statistics

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Descriptive Statistics-I. Dr Mahmoud Alhussami

Section 3. Measures of Variation

Describing distributions with numbers

2011 Pearson Education, Inc

Statistics I Chapter 2: Univariate data analysis

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data

Chapter 4. Displaying and Summarizing. Quantitative Data

THE ROYAL STATISTICAL SOCIETY 2002 EXAMINATIONS SOLUTIONS ORDINARY CERTIFICATE PAPER II

Let's Do It! What Type of Variable?

TOPIC: Descriptive Statistics Single Variable

1 Measures of the Center of a Distribution

CHAPTER 4 VARIABILITY ANALYSES. Chapter 3 introduced the mode, median, and mean as tools for summarizing the

Statistics Add Ins.notebook. November 22, Add ins

STATISTICS. 1. Measures of Central Tendency

What is statistics? Statistics is the science of: Collecting information. Organizing and summarizing the information collected

Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size. Arrange the values from smallest to largest:

Sets and Set notation. Algebra 2 Unit 8 Notes

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Let's Do It! What Type of Variable?

Data: the pieces of information that have been observed and recorded, from an experiment or a survey

Variables, distributions, and samples (cont.) Phil 12: Logic and Decision Making Fall 2010 UC San Diego 10/18/2010

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

Chapter (3) Describing Data Numerical Measures Examples

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Overview of Dispersion. Standard. Deviation

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation

Units. Exploratory Data Analysis. Variables. Student Data

Stat 101 Exam 1 Important Formulas and Concepts 1

Chapter 2: Tools for Exploring Univariate Data

Introduction to Statistics

Visualizing and summarizing data

Measures of Central Tendency

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Essentials of Statistics and Probability

SESSION 5 Descriptive Statistics

Statistics and parameters

Chapter2 Description of samples and populations. 2.1 Introduction.

Exercises from Chapter 3, Section 1

Elementary Statistics

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Sources and Methods for the Analysis of International Data

1.3: Describing Quantitative Data with Numbers

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Data presentation and descriptive statistics

Measures of the Location of the Data

Introduction to Statistics

Lecture 1: Description of Data. Readings: Sections 1.2,

Figure 1: Conventional labelling of axes for diagram of frequency distribution. Frequency of occurrence. Values of the variable

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

The science of learning from data.

Lecture 11. Data Description Estimation

Topic 3: Introduction to Statistics. Algebra 1. Collecting Data. Table of Contents. Categorical or Quantitative? What is the Study of Statistics?!

Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

CHAPTER 1. Introduction

Describing distributions with numbers

Transcription:

Slide 1.1 Chapter 1:Descriptive statistics Descriptive statistics summarises a mass of information. We may use graphical and/or numerical methods Examples of the former are the bar chart and XY chart, examples of the latter are averages and standard deviations

Slide 1.2 Graphical techniques Education and employment data Higher education Advance d level Other qualifications No qualifications Total In work 8 541 5 501 10 702 2 260 27 004 Unemployed 232 247 758 309 1 546 Inactive 1 024 1 418 3 150 2 284 7 876 9 797 7 166 14 610 4 853 36 426

Slide 1.3 The bar chart 12 000 Number of people (000s) 10 000 8 000 6 000 4 000 2 000 Higher education Advanced level Other qualifications No qualifications Note: the height of each bar is determined by the associated frequency. The first bar is 8541 units high, the second is 5501, and so on. The ordering of the bars could be reversed ( no qualifications becoming the first category) without altering the message. Figure 1.1 Educational qualifications of people in work in the UK, 2006

Slide 1.4 A multiple bar chart Number of people (000s) 12 000 10 000 8 000 6 000 4 000 2 000 In work Unemployed Inactive Higher education Advanced level Other qualifications No qualifications Figure 1.2 Educational qualifications by employment category

Slide 1.5 The stacked bar chart Number of people (000s) 16 000 14 000 12 000 10 000 8 000 6 000 4 000 2 000 Inactive Unemployed In work Higher education Advanced level Other qualifications No qualifications Figure 1.3 Stacked bar chart of educational qualifications and employment status

Slide 1.6 A stacked bar chart (percentages) 100% 80% 60% 40% Inactive Unemployed In work 20% 0% Higher education Advanced level Other qualifications No qualifications Figure 1.4 Percentages in each employment category, by educational qualification

Slide 1.7 The pie chart No qualifications 8% Higher education 32% Other qualifications 40% Advanced level 20% Figure 1.5 Educational qualifications of those in work

Slide 1.8 Data on wealth in the UK Class interval ( ) Number (000s) 0 9,999 2,448 10,000 24,999 1,823 25,000 39,999 1,375 40,000 49,999 480 50,000 59,999 665 60,000 79,999 1,315 80,000 99,999 1,640 100,000 149,999 2,151 150,000 199,000 2,215 200,000 299,000 1,856 300,000 499,999 1,057 500,000 999,999 439 1,000,000 1,999,999 122 2,000,000 or more 50 Total 17,636 Table 1.3 The distribution of wealth, UK, 2003

Slide 1.9 A (misleading!) bar chart 3 000 2 500 2 000 1 500 1 000 500 10 000 25 000 40 000 Number of individuals 50 000 60 000 80 000 100 000 150 000 200 000 300 000 500 000 1000 000 2000 000 0 Income class (lower boundary) Figure 1.7 Bar chart of the distribution of wealth in the UK, 2003

Slide 1.10 The histogram the correct picture Class widths squeezed 0 10 25 40 50 60 80 100 150 200 Wealth ( 000) Figure 1.9 Histogram of the distribution of wealth in the UK, 2003

Slide 1.11 Histogram vs bar chart The bar chart gives the wrong picture because of varying class widths Class interval ) Numbers (thousands) 10,000 24,999 1,823 : : 200,000 299,000 1,856 These two classes have similar frequencies (similar heights for bar chart) but the second is over six times wider. Adjusting for width, its frequency should be 278 (1,856 15/100)

Slide 1.12 The frequency density Applying this principle leads to calculation of the frequency densities: Class interval ( ) Number (000s) Frequency density 0 9,999 2,448 0.2448 10,000 24,999 1,823 0.1215 25,000 39,999 1,375 0.0917 40,000 49,999 480 0.0480 50,000 59,999 665 0.0665

Slide 1.13 Numerical techniques We examine measures of Location Dispersion Skewness

Slide 1.14 Measures of location Mean strictly the arithmetic mean, the well known average Median the wealth of the person in the middle of the distribution Mode the level of wealth that occurs most often These different measures can give different answers

Slide 1.15 The mean of the wealth distribution Range x f fx 0-5.0 2,448 12,240.0 10000-17.5 1,823 31,902.5 25000-32.5 1,375 44,687.5 40000-45.0 480 21,600.0 50000-55.0 665 36,575.0 60000-70.0 1,315 92,050.0 80000-90.0 1,640 147,600.0 100000-125.0 2,151 268,875.0 150000-175.0 2,215 387,625.0 200000-250.0 1,856 464,000.0 300000-400.0 1,057 422,800.0 500000-750.0 439 329,250.0 1000000-1500.0 122 183,000.0 2000000-3000.0 50 150,000.0 Total 17,636 2,592,205.0 = µ fx f 2, 592, 205 = 17, 636 = 146. 984

Slide 1.16 Locating the mean mean 0 10 25 40 50 60 80 100 150 200 Wealth ( 000)

Slide 1.17 The median The wealth of the middle person i.e. the one located halfway through the distribution Poorest Richest This person s wealth The median is little affected by outliers, unlike the mean

Slide 1.18 Calculating the median 17,636 observations, hence person 8,818 in rank order has the median wealth This person is somewhere in the 60-80k interval Range Frequency Cumulative frequency 0-2,448 2,448 10000-1,823 4,271 25000-1,375 5,646 40000-480 6,126 50000-665 6,791 60000-1,315 8,106 80000-1,640 9,746 100000-2,151 11,897 150000-2,215 14,112 Number with wealth less than 80k Number with wealth less than 100k

Slide 1.19 Calculating the median (continued) To find the precise median, use x L + = 80 + ( x x ) U L N 2 Median wealth is 90,829 f F 17,636,000 8,106,000 1,640,000 ( 100 80) 2 = 90. 829

Slide 1.20 The mode The mode is the observation with the highest frequency Size Sales 8 7 10 25 12 36 14 11 16 3 18 1 Modal dress size = 12

Slide 1.21 The mode (continued) For grouped data, the mode corresponds to the interval with greatest frequency density Class interval ( ) Number (000s) Frequency density 0 9,999 2,448 0.2448 10,000 24,999 1,823 0.1215 25,000 39,999 1,375 0.0917 40,000 49,999 480 0.0480 50,000 59,999 665 0.0665 Modal class Mode = 0 10,000

Slide 1.22 Differences between mean, median and mode Mode Median Mean 0 10 25 40 50 60 80 100 150 200 Wealth ( 000) Figure 1.12 The histogram with the mean, median and mode marked

Slide 1.23 Measures of dispersion The range the difference between smallest and largest observation. Not very informative for wealth Inter-quartile range contains the middle half of the observations Variance based on all observations in the sample

Slide 1.24 Inter-quartile range First quartile one quarter of the way through the distribution, person ranked 4,409 4, 409 4, 271 Q 1 = 25, 000 + = 1375, ( 40, 000 25, 000) 26, 505. 5 Third quartile three quarters of the way through the distribution, person ranked 13,227 hence Q 3 = 180,022.6 IQR = Q 3 Q 1 = 180,023 26.505 = 153,517

Slide 1.25 Box and whiskers plot Wealth ( 000) x Outlier 400 300 200 IQR 100 Third quartile Median First quartile

Slide 1.26 The variance The variance is the average of all squared deviations from the mean: σ = ( x ) 2 f µ 2 f The larger this value, the greater the dispersion of the observations

Slide 1.27 The variance (continued) Small variance Large variance

Slide 1.28 σ 2 Calculation of the variance Range midpoint f x- µ (x- µ ) 2 f (x- µ ) 2 0 5.0 2,448-142.0 20,159.38 49,350,158.77 10,000 17.5 1,823-129.5 16,766.04 30,564,482.57 25,000 32.5 1,375-114.5 13,106.52 18,021,469.99 40,000 45.0 480-102.0 10,400.68 4,992,326.62 50,000 55.0 665-92.0 8,461.01 5,626,568.95 60,000 70.0 1,315-77.0 5,926.49 7,793,339.80 80,000 90.0 1,640-57.0 3,247.15 5,325,317.93 100,000 125.0 2,151-22.0 483.28 1,039,544.38 150,000 175.0 2,215 28.0 784.91 1,738,579.16 200,000 250.0 1,856 103.0 10,612.35 19,696,526.45 300,000 400.0 1,057 253.0 64,017.23 67,666,217.05 500,000 750.0 439 603.0 363,628.63 159,632,966.88 1,000,000 1500.0 122 1,353.0 1,830,653.04 223,339,670.45 2,000,000 3000.0 50 2,853.0 8,139,701.86 406,985,092.85 Totals 17,636 1,001,772,261.83 = ( x ) 2 f µ f 1, 001, 772, 261. 83 = 17, 636 = 56, 802. 69

Slide 1.29 The standard deviation The variance is measured squared s (because we used squared deviations) Hence take the square root to get back to s This gives the standard deviation σ = 56, 802. 69 = 238. 333 or 238,333

Slide 1.30 Sample measures For sample data, use ( ) 2 2 = f x x s n 1 To calculate the sample variance This gives an unbiased estimate of the population variance Take the square root of this for the sample standard deviation

Slide 1.31 Measuring skewness Right skewed CS > 0 Left skewed CS < 0 ( x µ ) f Coefficient of skewness = 3 Nσ 2

Slide 1.32 Skew of the wealth distribution Range x f x- µ (x- µ ) 3 f (x- µ ) 3 0 5.0 2,448-142.0-2,862,304-7,006,919,444 10,000 17.5 1,823-129.5-2,170,929-3,957,603,101 25,000 32.5 1,375-114.5-1,500,484-2,063,165,040 40,000 45.0 480-102.0-1,060,700-509,136,073 50,000 55.0 665-92.0-778,275-517,552,779 60,000 70.0 1,315-77.0-456,244-599,960,339 80,000 90.0 1,640-57.0-185,034-303,456,461 100,000 125.0 2,151-22.0-10,624-22,853,059 150,000 175.0 2,215 28.0 21,990 48,708,509 200,000 250.0 1,856 103.0 1,093,245 2,029,062,756 300,000 400.0 1,057 253.0 16,197,402 17,120,654,081 500,000 750.0 439 603.0 219,273,979 96,261,276,819 1,000,000 1500.0 122 1,353.0 2,476,903,349 302,182,208,638 2,000,000 3000.0 50 2,853.0 23,222,701,860 1,161,135,092,991 Total 17,636 4,457.2 25,927,167,232 1,563,796,357,499 f ( x µ ) 3 Nσ 3 1563,, 796, 357, 499 = 17, 636 13, 537, 964 = 6. 550

Slide 1.33 Summary We can use graphical and numerical measures to summarise data The aim is to simplify without distorting the message Measures of location, dispersion and skewness provide a good description of the data