More on Variability. Overview. The Variance is sensitive to outliers. Marriage Data without Nevada

Similar documents
The Empirical Rule, z-scores, and the Rare Event Approach

Descriptive Statistics C H A P T E R 5 P P

Unit 2. Describing Data: Numerical

P8130: Biostatistical Methods I

MgtOp 215 Chapter 3 Dr. Ahn

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

STT 315 This lecture is based on Chapter 2 of the textbook.

Describing distributions with numbers

After completing this chapter, you should be able to:

TOPIC: Descriptive Statistics Single Variable

1. Exploratory Data Analysis

Math 223 Lecture Notes 3/15/04 From The Basic Practice of Statistics, bymoore

A is one of the categories into which qualitative data can be classified.

3 Lecture 3 Notes: Measures of Variation. The Boxplot. Definition of Probability

BNG 495 Capstone Design. Descriptive Statistics

Statistics in medicine

Descriptive Univariate Statistics and Bivariate Correlation

Stat 101 Exam 1 Important Formulas and Concepts 1

Describing distributions with numbers

Preliminary Statistics course. Lecture 1: Descriptive Statistics

(quantitative or categorical variables) Numerical descriptions of center, variability, position (quantitative variables)

CIVL 7012/8012. Collection and Analysis of Information

Z score indicates how far a raw score deviates from the sample mean in SD units. score Mean % Lower Bound

Example 2. Given the data below, complete the chart:

What is Statistics? Statistics is the science of understanding data and of making decisions in the face of variability and uncertainty.

F78SC2 Notes 2 RJRC. If the interest rate is 5%, we substitute x = 0.05 in the formula. This gives

FREC 608 Guided Exercise 9

Section 3. Measures of Variation

are the objects described by a set of data. They may be people, animals or things.

SESSION 5 Descriptive Statistics

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Chapter 1. Looking at Data

Last Lecture. Distinguish Populations from Samples. Knowing different Sampling Techniques. Distinguish Parameters from Statistics

GRACEY/STATISTICS CH. 3. CHAPTER PROBLEM Do women really talk more than men? Science, Vol. 317, No. 5834). The study

Tastitsticsss? What s that? Principles of Biostatistics and Informatics. Variables, outcomes. Tastitsticsss? What s that?

Statistics for Managers using Microsoft Excel 6 th Edition

Chapter 3. Data Description

AP Final Review II Exploring Data (20% 30%)

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

2/2/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY MEASURES OF CENTRAL TENDENCY CHAPTER 3: DESCRIPTIVE STATISTICS AND GRAPHICS

Elementary Statistics

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Introduction to Statistics

Chapter 1:Descriptive statistics

Chapter 3: Displaying and summarizing quantitative data p52 The pattern of variation of a variable is called its distribution.

Chapter 3. Measuring data

QUANTITATIVE DATA. UNIVARIATE DATA data for one variable

Exercises from Chapter 3, Section 1

Descriptive Statistics Example

Chapter 2 Solutions Page 15 of 28

Determining the Spread of a Distribution Variance & Standard Deviation

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

Shape, Outliers, Center, Spread Frequency and Relative Histograms Related to other types of graphical displays

Module 1. Identify parts of an expression using vocabulary such as term, equation, inequality

Chapter 4. Displaying and Summarizing. Quantitative Data

Numerical Measures of Central Tendency

Sections OPIM 303, Managerial Statistics H Guy Williams, 2006

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

MATH 117 Statistical Methods for Management I Chapter Three

Contents. Acknowledgments. xix

Describing Data: Numerical Measures

Quantitative Methods Chapter 0: Review of Basic Concepts 0.1 Business Applications (II) 0.2 Business Applications (III)

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

Unit 1: Statistics. Mrs. Valentine Math III

Descriptive Statistics

3.1 Measures of Central Tendency: Mode, Median and Mean. Average a single number that is used to describe the entire sample or population

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts.

Descriptive Statistics-I. Dr Mahmoud Alhussami

2011 Pearson Education, Inc

Sets and Set notation. Algebra 2 Unit 8 Notes

Vocabulary: Data About Us

Objective A: Mean, Median and Mode Three measures of central of tendency: the mean, the median, and the mode.

Review of Multiple Regression

Finding Quartiles. . Q1 is the median of the lower half of the data. Q3 is the median of the upper half of the data

Midrange: mean of highest and lowest scores. easy to compute, rough estimate, rarely used

Bemidji Area Schools Outcomes in Mathematics Algebra 2A. Based on Minnesota Academic Standards in Mathematics (2007) Page 1 of 6

ALGEBRA I CURRICULUM OUTLINE

DEPARTMENT OF QUANTITATIVE METHODS & INFORMATION SYSTEMS QM 120. Spring 2008

Perhaps the most important measure of location is the mean (average). Sample mean: where n = sample size. Arrange the values from smallest to largest:

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Unit 2: Numerical Descriptive Measures

Chapter 1: Exploring Data

Evaluate the expression if x = 2 and y = 5 6x 2y Original problem Substitute the values given into the expression and multiply

Excel and Statistical Analysis Prof. Jonathan B. Hill University of North Carolina - Chapel Hill

Summarising numerical data

Topic-1 Describing Data with Numerical Measures

The empirical ( ) rule

IB Questionbank Mathematical Studies 3rd edition. Grouped discrete. 184 min 183 marks

Revised: 2/19/09 Unit 1 Pre-Algebra Concepts and Operations Review

MATH 10 INTRODUCTORY STATISTICS

Chapter 2: Descriptive Analysis and Presentation of Single- Variable Data

Harbor Creek School District. Algebra II Advanced. Concepts Timeframe Skills Assessment Standards Linear Equations Inequalities

a) The runner completes his next 1500 meter race in under 4 minutes: <

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

CURRICULUM UNIT MAP 1 ST QUARTER. COURSE TITLE: Applied Algebra 1 GRADE: 9

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 2 Descriptive Statistics

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation

Describing Distributions with Numbers

Transcription:

Overview More on Variability Dr Tom Ilvento Department of Food and Resource Economics Continue the discussion of the variance and standard deviation Introduce the Coefficient of Variation (CV) Revisit Box Plots The variance of a proportion A brief introduction to Covariance 2 The Variance is sensitive to outliers Marriage Data without Nevada The variance and the standard deviation are very sensitive to outliers (extreme values) When you square large numbers you get much larger numbers 5 2 = 25 500 2 = 250,000 Look what happens when we remove Nevada from the Marriage Rate data 3 Calculate the Variance/Standard Deviation s 2 = [3240.79 - (380.69) 2 /50]/(50-1) s 2 = [3240.79-2898.50]/(49) s 2 = [342.29]/(49) s 2 = 6.99 s = 2.64 Stem & Leaf of Marriage Rate Count 4 2 7 2 5 0 5 5 8 9 9 6 6 1 1 1 3 3 4 5 6 6 7 7 8 9 9 14 7 0 0 0 0 3 3 3 4 4 4 7 9 12 8 1 1 2 3 3 4 6 8 9 9 10 9 4 5 2 10 3 5 2 11 0 12 6 1 13 0 14 0 15 0 16 0 17 0 18 0 19 0 20 0 21 0 22 5 1 4 2 = 4.2 n = 50 Sum(x) = 380.69 Sum(x^2) = 3240.79 4

Comparisons with and without Nevada Statistic W Nevada W/O Nevada Sum(x) 441.73 380.69 Sum(x^2) 6967.24 3240.79 Mean 8.66 7.31 Median 7.02 7.00 Mode 7.00 7.00 Minimum 4.20 4.20 Maximum 61.00 22.50 Range 56.80 18.30 Variance 62.82 6.99 Std Dev 7.93 2.64 Excel Commands for Measures of Central Tendency and Variance Sum =SUM(B5:B104) 3,699.40 Count =COUNT(B5:B104) 100.00 Mean =AVERAGE(B5:b104) 36.99 Minimum =MIN(B5:B104) 30.00 Maximum =MAX(B5:B104) 44.90 Median =MEDIAN(B5:B104) 37.00 Mode =MODE(B5:B104) 37.00 Range subtract the max and min 14.90 First Quartile =QUARTILE(B5:B104,1) 35.68 Third Quartile =QUARTILE(B5:B104,3) 38.33 Inter-Quartile Range subtract Q3 minus Q1 2.65 Variance =VAR(B5:B104) 5.85 Std Deviation =STDEV(B5:B104) 2.42 5 6 Descriptive Statistics of Marriage Rate data using Excel The Standard Deviation and the Range In Office 2003 Tools Data Analysis Descriptive Statistics In Office 2007 Data Data Analysis Descriptive Statistics Marriage Rate Mean 8.66 Standard Error 1.11 Median 7.02 Mode #N/A Standard Deviation 7.93 Sample Variance 62.82 Kurtosis 40.05 Skewness 6.10 Range 56.85 Minimum 4.19 Maximum 61.04 Sum 441.73 Count 51 A quick approximation for the standard deviation is the range divided by 4 It is a crude approximation, but in a symmetric, moundshaped distribution, it is reasonable For the marriage rate With Nevada 56.80/4 = 14.2 compared with 7.93 Without Nevada 18.30/4 = 4.58 compared with 2.64 It is just an approximation!!! 7 8

Coefficient of Variation Let s work out a complete example: Body Mass Index for 24 subjects The Coefficient of Variation The ratio of the standard deviation to the absolute value of the mean, usually expressed as a percentage (multiply by 100) By taking a ratio, we express the std dev relative to the mean For the Marriage Rate data, the CV = 7.93/8.66* 100 = 91.57 The higher the CV, the more variability in the variable. 9 Mean Median Mode Minimum Maximum Range Variance Standard Deviation CV BMI Stem and Leaf Plot STEM LEAF 17 7 18 19 6 6 20 6 21 4 5 22 0 7 23 2 5 8 8 24 5 6 25 2 2 4 26 27 5 8 28 1 29 1 9 30 31 4 32 33 5 Stem is the whole number Sum X 593.6 Sum X^2 15040.4 10 Let s work out a complete example: Body Mass Index for 24 subjects Mean = 593.6/24 = 24.73 Median is average of 12th and 13th positions = (23.8 +24.5)/2 = 24.15 Mode is either 19.6, 23.8, or 25.2 Minimum = 17.7 Maximum = 33.5 Range = 33.5-17.7 = 15.80 Variance = (15040.4-(593.6) 2 /24)/23 = 15.60 Standard Deviation = SQRT(15.60) = 3.95 CV = 3.95/24.73*100 = 15.97 11 Let s Revisit Box Plots Box plots are a way to show the distribution of a variable relative to the median, showing shape, skew and outliers Box plots highlight extreme values in data Can be graphed for a small or large sample size Five number summary Minimum Q1 Median Q3 Maximum This gives us the extremes, the middle, the range, and the Inter-Quartile Range 12

Box Plot Fundamentals Box Plots often appear horizontal Let s look at what a Box Plot is, step by step. * * Q1 Q3 Median, Q2 13 14 Stem and Leaf Plot of Health Data - the WEIGHT of the subject Mintab and JMP Box Plots Stem & Leaf Plot of Weight Stem unit: 10 This is Bi-Modal Statistics 9 4 Sample Size 80 10 1 5 7 8 9 Mean 159.385 11 0 2 3 5 9 9 Median 161 12 3 4 4 4 7 Std. Deviation 34.87689 13 0 0 5 5 6 6 7 7 9 Minimum 94.3 14 1 4 4 8 9 Maximum 255.9 15 0 1 1 2 3 6 6 16 0 0 2 2 2 3 3 4 5 6 7 9 17 0 0 3 3 4 5 5 6 7 9 18 0 1 2 7 8 9 19 1 4 6 8 20 1 5 9 21 3 4 9 22 1 23 7 8 24 25 6 15 16

Box Plots of Stock Returns by Level of Risk Dealing with the Mean and Variance of a Proportion 17 Sometimes our data deals with a dichotomous variable or No Male or Female Treatment or Control If we code the variable as a zero/one dichotomy, it is called a dummy variable. The mean of the dummy variable is the proportion of the attribute coded as one And the variance is very easy to compute 18 Coding Strategy, Let 1=, 0 = No Just to be clear, this is what I mean by using a coding strategy I will code the response as dummy variable 1 = 0 = No Do you support candidate A? Response Code 1 1 19 Proportions Let p = Number of Successes/Total Example: # /n = 4/10 =.4 And q = (1-p) The mean = p!x/n = (1+1+0+0+1+0+0+0+1+0)/10 =.4 The variance of a proportion is given by s 2 = p*q s = (p*q).5 Do you support Response Code s 2 =.4*.6 =.24 s =.4899 20

Covariance looks at how two variables vary about their means together, on average (divided by n) The Variance is the covariance of a variable with itself! Covariance The Scatterplot is a picture of Covariance Bivariate Fit of SALES By SALARY SALES 6000 5000 4000 3000 2000 1000 0 0 100000 SALARY Fit Mean Linear Fit 21 22 Summary The Variance and the Standard Deviation are influenced by outliers The Coefficient of Variation allows us to compare the variability of different variables Box and Whisker Plots allow us to see the spread of data and compare different groups Proportions via dummy variables Covariance is related to the variance - it shows how two variables vary about their means together 23