Histograms, Central Tendency, and Variability

Similar documents
Chapter 3 Statistics for Describing, Exploring, and Comparing Data. Section 3-1: Overview. 3-2 Measures of Center. Definition. Key Concept.

Chapter 3. Measuring data

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Chapter 2: Tools for Exploring Univariate Data

Business Statistics. Lecture 10: Course Review

Preliminary Statistics course. Lecture 1: Descriptive Statistics

Chapter 5. Understanding and Comparing. Distributions

3.1 Measure of Center

Chapter 3. Data Description

Statistical Methods: Introduction, Applications, Histograms, Ch

Elementary Statistics

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 03

Chapter 4. Displaying and Summarizing. Quantitative Data

1.0 Continuous Distributions. 5.0 Shapes of Distributions. 6.0 The Normal Curve. 7.0 Discrete Distributions. 8.0 Tolerances. 11.

MATH 10 INTRODUCTORY STATISTICS

Descriptive Univariate Statistics and Bivariate Correlation

Big Data Analysis with Apache Spark UC#BERKELEY

Introduction to Matrix Algebra and the Multivariate Normal Distribution

Section 5.4. Ken Ueda

APPENDIX 1 BASIC STATISTICS. Summarizing Data

BNG 495 Capstone Design. Descriptive Statistics

2.0 Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 3.1- #

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

2011 Pearson Education, Inc

Histograms allow a visual interpretation

Math 141. Quantile-Quantile Plots. Albyn Jones 1. jones/courses/ Library 304. Albyn Jones Math 141

Introduction to Statistics

Introduction to statistics

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

Density Curves and the Normal Distributions. Histogram: 10 groups

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Continuous distributions

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Sampling, Frequency Distributions, and Graphs (12.1)

Statistical Concepts. Constructing a Trend Plot

1-1. Chapter 1. Sampling and Descriptive Statistics by The McGraw-Hill Companies, Inc. All rights reserved.

Summary statistics. G.S. Questa, L. Trapani. MSc Induction - Summary statistics 1

STT 315 This lecture is based on Chapter 2 of the textbook.

Descriptive Statistics-I. Dr Mahmoud Alhussami

Chapter 4.notebook. August 30, 2017

Introduction to Statistics for Traffic Crash Reconstruction

ADMS2320.com. We Make Stats Easy. Chapter 4. ADMS2320.com Tutorials Past Tests. Tutorial Length 1 Hour 45 Minutes

Stat 101 Exam 1 Important Formulas and Concepts 1

Variables, distributions, and samples (cont.) Phil 12: Logic and Decision Making Fall 2010 UC San Diego 10/18/2010

Lecture 2. Quantitative variables. There are three main graphical methods for describing, summarizing, and detecting patterns in quantitative data:

Lecture 30. DATA 8 Summer Regression Inference

STP 420 INTRODUCTION TO APPLIED STATISTICS NOTES

Confidence Intervals. - simply, an interval for which we have a certain confidence.

Density Curves & Normal Distributions

Essentials of Statistics and Probability

Business Statistics. Lecture 5: Confidence Intervals

Review of the Normal Distribution

Interactietechnologie

Chapter. Numerically Summarizing Data. Copyright 2013, 2010 and 2007 Pearson Education, Inc.

Summarizing and Displaying Measurement Data/Understanding and Comparing Distributions

STAT 200 Chapter 1 Looking at Data - Distributions

Measures of Central Tendency. Mean, Median, and Mode

Continuous distributions

Unit 1: Statistical Analysis. IB Biology SL

Lecture 2. Descriptive Statistics: Measures of Center

Unit 2: Numerical Descriptive Measures

Dover- Sherborn High School Mathematics Curriculum Probability and Statistics

Chapter 6 Group Activity - SOLUTIONS

Description of Samples and Populations

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology Kharagpur

Describing distributions with numbers

Units. Exploratory Data Analysis. Variables. Student Data

Sections 6.1 and 6.2: The Normal Distribution and its Applications

Introduction to Statistics

Sampling Populations limited in the scope enumerate

Chapter 6. The Standard Deviation as a Ruler and the Normal Model 1 /67

Note that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).

Chapter 6. Estimation of Confidence Intervals for Nodal Maximum Power Consumption per Customer

Chapter 23: Inferences About Means

Unit 4 Probability. Dr Mahmoud Alhussami

Lecture 27. DATA 8 Spring Sample Averages. Slides created by John DeNero and Ani Adhikari

P8130: Biostatistical Methods I

Averages How difficult is QM1? What is the average mark? Week 1b, Lecture 2

Practitioner's Guide To Statistical Sampling: Part 1

Statistics and Quantitative Analysis U4320. Segment 5: Sampling and inference Prof. Sharyn O Halloran

Chapter 1. Looking at Data

Measures of center. The mean The mean of a distribution is the arithmetic average of the observations:

Numerical Methods Lecture 7 - Statistics, Probability and Reliability

Chapter 11 Sampling Distribution. Stat 115

Lesson Plan. Answer Questions. Summary Statistics. Histograms. The Normal Distribution. Using the Standard Normal Table

Statistics lecture 3. Bell-Shaped Curves and Other Shapes

Describing distributions with numbers

Econometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton.

Lecture 26: Chapter 10, Section 2 Inference for Quantitative Variable Confidence Interval with t

Unit Two Descriptive Biostatistics. Dr Mahmoud Alhussami

EXPERIMENT 2 Reaction Time Objectives Theory

Visualizing and summarizing data

4.12 Sampling Distributions 183

Section 3. Measures of Variation

Lecture 3B: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

a table or a graph or an equation.

Lecture 6: Chapter 4, Section 2 Quantitative Variables (Displays, Begin Summaries)

Slide 1. Slide 2. Slide 3. Pick a Brick. Daphne. 400 pts 200 pts 300 pts 500 pts 100 pts. 300 pts. 300 pts 400 pts 100 pts 400 pts.

Transcription:

The Economist, September 6, 214 1 Histograms, Central Tendency, and Variability Lecture 2 Reading: Sections 5 5.6 Includes ALL margin notes and boxes: For Example, Guided Example, Notation Alert, Just Checking, Optional Math Boxes, What Can Go Wrong? and Ethics in Action 2 Histogram graphically describes how a single variable containing interval data is distributed Range of data divided into non-overlapping and equal width classes (bins) that cover range of values Histogram Fraction n = 174 countries 2 4 6 Inflation Rate, 211 How many bins? Width of bins? http://data.worldbank.org/indicator/fp.cpi.totl.zg 3 Lecture 2, Page 1 of 11

8 n = 174 countries n = 174 countries Frequency 6 4 2 Fraction 2 4 6 Inflation Rate, 211 2 4 6 Inflation Rate, 211.8.6 n = 174 countries 2 4 6 Inflation Rate, 211 Frequency histogram: Bar height number of observations in bin Relative frequency histogram: Bar height fraction of obs. in bin histogram: Bar area measures the fraction of observations in bin 4 2 4 6 Inflation Rate, 211 5 2 4 6 Inflation Rate, 211.5 2 4 6 Inflation Rate, 211 1.5 1.5 Number of bins changes the appearance of the histogram Sturges formula: # of bins = 1 + 3*log(n) [Note log base 1.] 2 4 6 Inflation Rate, 211 OECD inflation: 1 + 3*log(34) = 6.5 6 [but STATA picked 5] 6 Lecture 2, Page 2 of 11

Shape of Things Histogram gives overview of a variable with a single picture Can make informal inferences about the shape of population Symmetric: If draw an imaginary line at center, have mirror image on each side Positively skewed: long tail to right (aka skewed to the right) Negatively skewed: long tail to left (aka skewed to the left) Modality: # major peaks Bell/Normal/Gaussian 7 Four Perfectly Symmetric Histograms.5 1 2 3 4 1 2 3 4 5 2 3 4 5 6.5 5 5 2 4 6 8 8 Four Positively Skewed Histograms.5 5 5 1 15-6 -4-2 2 4.6.8-11 -1-9 -8-7 -6 5 1 15 2 25 9 Lecture 2, Page 3 of 11

Four Negatively Skewed Histograms.5 5 5 1 15 2-15 -1-5.6.8-15 -14-13 -12-11 -1 25 3 35 4 45 5 1.5 5 5.5 5 5 Four Bimodal Histograms 2 4 6 8 2 4 6 8 1.6.8.5 5 1 2 3 5 1 15 11 Two Perfectly Bell Shaped Histograms -4-2 2 4 1 15 2 25 3 Are these symmetric? Skewed? Unimodal? Take a good look because this is one of the last times you will see perfect bell shaped (normal) histograms 12 Lecture 2, Page 4 of 11

.5 n = 185 countries; Source CIA Fraction 2 4 6 8 1 GDP per capita (PPP), 212 est. https://www.cia.gov/library/publications/the-world-factbook/rankorder/rawdata_24.txt 13 Samples vs. Populations Sample is a random subset of population Sampling noise: Chance differences between population and a random sample Driven by the sample size, not sample size relative to the population size, which is assumed infinite (pp. 3 31, The Sample Size is What Matters ) Informal inference: consider sample size (n) Never see the perfect forms (Plato): statements about shape always approximate Nearly Normal Condition 14 5 5 Population, N = 1,,.5 Sample 1; n = 1.5 5 1 15 8 9 1 11.5 Sample 2; n = 1 How many samples would you have in real life? 5.5 8 9 1 11 12 Why aren t these samples perfectly Bell shaped? Sample 3; n = 1 6 8 1 12 15 Lecture 2, Page 5 of 11

5 5.5 Population, N = 1,, 5 1 15 5 5.5 Sample 1; n = 3 Why are there more bins than last slide? 6 8 1 12 14 Sample 2; n = 3 Sample 3; n = 3 7 8 9 1 11 12 6 7 8 9 1 11 16 5 5.5 Population, N = 1,, 5 1 15 Sample 1; n = 1 6 8 1 12 14 Sample 2; n = 1 Sample 3; n = 1 6 8 1 12 14 5 1 15 17 What to Conclude About Shape? 5 5.5 n: 3-4 -2 2 4 X n: 5.8.6 1 11 12 Y 13 14 Is the graph on the left symmetric? Bell shaped? Is the graph on the right symmetric? Bell shaped? Bi-modal? 18 Lecture 2, Page 6 of 11

Hsieh and Olken (214) JEP The Missing Missing Middle Summer 214 http://pubs.aeaweb.org/doi/pdfplus/1257/jep8.89 19 Appendix, http://www.aeaweb.org/jep/app/283/28389_app.pdf There is a clear bimodality in the distribution of valueadded/capital for the large firms. However, the capital questionnaire for large firms was ambiguous as to whether the results were to be entered in thousands of Rupiah or Rupiah. Our best guess is that approximately half the firms used thousands and half used millions. 2 Summary Statistics Statistics (i.e. summary statistics) give a concise idea of what data look like For a single variable, statistics can give numeric measures of: Central tendency: mean and median Variability: range, variance, standard deviation, coefficient of variation, R Relative standing: percentiles For two variables, also measure relationship 21 Lecture 2, Page 7 of 11

Mean and Median Population mean, a parameter: μ = N Sample mean, a statistic: X = n n Which is subject to sampling error? i=1 x i N i=1 x i Median is the middle obs. after sorting if even # of obs., average 2 middle ones Fraction.5 mean = 3, median = 3 2 4 6 Inflation Rate, 211 22 5 5.5 Normal Distribution Population mu=1., med=1. 5 1 15 Is X (a statistic) equal to µ (a parameter)? Sample, n=49 X-bar=13.9, med=13. 6 8 1 12 14 Why is the population mean equal to the population median? Why isn t the sample mean equal to the sample median? 23 A Symmetric Distribution (Uniform).8.6 Population mu=5., med=5. 2 4 6 8 1 Book Rating Sample, n=41 X-bar=55, med=62 5.5 2 4 6 8 1 Book Rating Why is the population mean equal to the population median? Why is the sample median different from the population median? 24 Lecture 2, Page 8 of 11

Fraction n = 174 countries mean = 6.6, median = 5. Why is the mean greater than the median? 2 4 6 Inflation Rate, 211 25 Measures of Variability (Spread) Range: max min Variance: N i=1 σ 2 = x i μ 2 N s 2 = n x 2 i=1 i X n 1 Standard deviation: s = variance Coefficient of variation (textbook) min = -, max = 6.5 var = 1.6, sd = 1 2 4 6 Inflation Rate, 211 26 Breaking Down Variance Numerator: total sum of squares (TSS) If all sampled countries have 3% inflation (x i = 3 for all i), what would TSS & s 2 be? Denominator: ν ( nu ) Only n 1 free obs left after calculate mean Units of variance? How about s.d.? s 2 = n x 2 i=1 i X n 1 n TSS = x i X 2 i=1 degrees of freedom: ν = n 1 27 Lecture 2, Page 9 of 11

Empirical Rule (Normal/Bell) If a random sample is drawn from a Normal population then about: 68% of observations will lie within 1 s.d. of the mean (i.e. between X s and X + s) 95% of observations will lie within 2 s.d. of the mean (i.e. between X 2s and X + 2s) 99.7% of observations will lie within 3 s.d. of the mean (i.e. between X 3s and X + 3s) Empirical Rule only applies if normal 28 n = 246, mean = 99, sd = 14.5 6 8 1 12 14 29.5 Histogram #1, n = 94 8 9 1 11 12 X.8.6 Histogram #2, n = 154 8 9 1 11 12 X Histogram #3, n = 298 5 Histogram #4, n = 521 5 5.e-4.5 5 1 15 X 9 95 1 15 11 X Approximately, what is the s.d. of the variable in each histogram? 3 Lecture 2, Page 1 of 11

Chebysheff s Theorem At least 1*(1 1/k 2 )% of observations lie within k s.d. s of the mean for k>1 At least 75% of obs. lie within 2 s.d. of mean 1 1/2 2 = 3/4 At least 89% of obs. lie within 3 s.d. of mean 1-1/3 2 = 8/9 Can be applied to all samples no matter how population is distributed What about within one s.d.? 31.5 n = 185 countries mean = 14955, sd = 16243. Fraction 2 4 6 8 1 GDP per capita (PPP), 212 est. 32 Lecture 2, Page 11 of 11