Chapter 2 Descriptive Statistics

Similar documents
CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.

Data Description. Measure of Central Tendency. Data Description. Chapter x i

Median and IQR The median is the value which divides the ordered data values in half.

Summarizing Data. Major Properties of Numerical Data

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers


Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all!

MEASURES OF DISPERSION (VARIABILITY)

(# x) 2 n. (" x) 2 = 30 2 = 900. = sum. " x 2 = =174. " x. Chapter 12. Quick math overview. #(x " x ) 2 = # x 2 "

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2

Elementary Statistics

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day

Parameter, Statistic and Random Samples

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Final Review. Fall 2013 Prof. Yao Xie, H. Milton Stewart School of Industrial Systems & Engineering Georgia Tech

Sample Size Determination (Two or More Samples)

2: Describing Data with Numerical Measures

Statistics 511 Additional Materials

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Chapter 4 - Summarizing Numerical Data

1 Lesson 6: Measure of Variation

Final Examination Solutions 17/6/2010

CHAPTER SUMMARIES MAT102 Dr J Lubowsky Page 1 of 13 Chapter 1: Introduction to Statistics

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

HUMBEHV 3HB3 Measures of Central Tendency & Variability Week 2

Topic 9: Sampling Distributions of Estimators

Describing the Relation between Two Variables

Properties and Hypothesis Testing

Chapter 1 (Definitions)

Probability and statistics: basic terms

Topic 9: Sampling Distributions of Estimators

Census. Mean. µ = x 1 + x x n n

Read through these prior to coming to the test and follow them when you take your test.

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

For nominal data, we use mode to describe the central location instead of using sample mean/median.

Lecture 7: Properties of Random Samples

NCSS Statistical Software. Tolerance Intervals

Formulas and Tables for Gerstman

Random Variables, Sampling and Estimation

CHAPTER 8 FUNDAMENTAL SAMPLING DISTRIBUTIONS AND DATA DESCRIPTIONS. 8.1 Random Sampling. 8.2 Some Important Statistics

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

(6) Fundamental Sampling Distribution and Data Discription

1 Inferential Methods for Correlation and Regression Analysis

Topic 9: Sampling Distributions of Estimators

Eco411 Lab: Central Limit Theorem, Normal Distribution, and Journey to Girl State

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Statistics Independent (X) you can choose and manipulate. Usually on x-axis

Tables and Formulas for Sullivan, Fundamentals of Statistics, 2e Pearson Education, Inc.

Chapter 6 Sampling Distributions

The Hong Kong University of Science & Technology ISOM551 Introductory Statistics for Business Assignment 3 Suggested Solution

Descriptive Statistics

DESCRIPTIVE STATISTICS

STAT 203 Chapter 18 Sampling Distribution Models

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

TOPIC 6 MEASURES OF VARIATION

Sampling Distributions, Z-Tests, Power

Stat 421-SP2012 Interval Estimation Section

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

BIOS 4110: Introduction to Biostatistics. Breheny. Lab #9

µ and π p i.e. Point Estimation x And, more generally, the population proportion is approximately equal to a sample proportion

STP 226 EXAMPLE EXAM #1

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

x c the remainder is Pc ().

Computing Confidence Intervals for Sample Data

A proposed discrete distribution for the statistical modeling of

Introducing Sample Proportions

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Lecture 24 Floods and flood frequency

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Lecture 8: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Chapter 8: Estimating with Confidence

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

Estimation for Complete Data

multiplies all measures of center and the standard deviation and range by k, while the variance is multiplied by k 2.

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Zeros of Polynomials

Analysis of Experimental Data

Chapter 13, Part A Analysis of Variance and Experimental Design

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

Statistical Fundamentals and Control Charts


Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Mathematical Notation Math Introduction to Applied Statistics

Introduction There are two really interesting things to do in statistics.

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

A PROBABILITY PRIMER

ANALYSIS OF EXPERIMENTAL ERRORS

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

Transcription:

Chapter 2 Descriptive Statistics

Statistics Most commoly, statistics refers to umerical data. Statistics may also refer to the process of collectig, orgaizig, presetig, aalyzig ad iterpretig umerical data for the purpose of makig decisio. populatio parameter samplig sample statistic descriptive statistics Estimatio Predictio Iferetial statistics Hypothesis testig

Descriptive Statistics Descriptive statistics is the sciece of orgaizig ad summarizig large data sets i ways that make it possible to discer their meaig. Measure of Locatio Measure of locatio idetifies the ceter or middle of the sample. 1. Arithmetic mea 2. Geometric mea 3. Media 4. Mode Measure of Dispersio Dispersio is defied as the variability aroud the cetral locatio 1. Rage 2. Quatiles 3. Variace ad stadard deviatio

Measure of Locatio Arithmetic Mea The arithmetic mea is the sum of all the observatios divided by the umber of observatios. Populatio (arithmetic) mea : 1 N i μ = x N i= 1 N = The umber of observatios i the populatio. Sample (arithmetic) mea : 1 x = = The umber of observatios x i i the sample. i= 1

Arithmetic mea is the most widely used measure of locatio ad has the followig properties : The arithmetic mea is uique. The arithmetic mea is the oly oe measure of locatio which the sum of the deviatios from the mea is zero. If yi axi + b, the N ( x μ) = = i =1K,, y = ax + b i i= 1 i= 1 ( x x) = 0 i The arithmetic mea is oversesitive to extreme values i the sample.

Measure of Locatio Geometric Mea The geeral formula for the geometric mea, G, is as follows : 1 G = xi = x1 x2 x3 x i= 1 1 l G = ( L ) l x 1 = l x There are two properties of a geometric mea that are importat : I order to calculate a geometric mea, all of the values i the data set must be positive. For the same set of umbers, the geometric mea will always be smaller tha the arithmetic mea with oe exceptio that all values are equal. i= 1 i

Measure of Locatio Media ad Mode The media is the value of the middle poit of samples, whe samples are arraged i ascedig order. Media = The [(+1)/2] th largest observatio if is odd. = The average of the (/2) th ad (/2+1) th largest observatio if is eve. The mode is the most frequetly occurrig value amog all the observatios i a sample. It is the most probable value that would be obtaied if oe data were selected at radom from a populatio.

Measure of Locatio Media ad Mode Calculate the media ad mode of the followig data: 12, 24, 36, 25, 17, 19, 24, 11 Sorted data : 11, 12, 17, 19, 24, 24, 25, 36 19 + 24 Media = = 21.5, Mode = 24 2

Measure of Locatio The mea is iflueced by outliers whereas the media is ot. The mode is very ustable. Mior fluctuatios i the data ca chage it substatially; for this reaso it is seldom calculated. bimodal mode mode = = Mea Media Mode

Symmetry ad Skewess i Distributio Whe the shape of a distributio to the left ad the right is mirror image of each other, the distributio is symmetrical. Examples of symmetrical distributio are show below : A skewed distributio is a distributio that is ot symmetrical. Examples of skewed distributios are show below : Positively skewed Negatively skewed

Descriptive Statistics Descriptive statistics is the sciece of orgaizig ad summarizig large data sets i ways that make it possible to discer their meaig. Measure of Locatio Measure of locatio idetifies the ceter or middle of the sample. 1. Arithmetic mea 2. Geometric mea 3. Media 4. Mode Measure of Dispersio Dispersio is defied as the variability aroud the cetral locatio 1. Rage 2. Quatiles 3. Variace ad stadard deviatio

Measure of Dispersio Rage ad Mea Absolute Deviatio (MAD) The Rage is the simplest measure of dispersio. It is simply the differece betwee the largest ad smallest observatios i a sample. Rage = x max x mi The mea absolute deviatio is the average of the absolute values of the deviatios of idividual observatios from the arithmetic mea. xi x i= MAD = 1

Measure of Dispersio Quatiles Quatile (percetile) is the geeral term for a value at or below which a stated proportio of the data i a distributio lies. p th percetile is the value V p such that p% of the sample poits are less tha or equal to V p. If k = p/100 is ot a iteger, V p is the (k +1) th largest sample poit, where k is the largest iteger less tha k. If k = p/100 is a iteger, V p is the average of the k th ad (k+1) th largest observatios. Quartiles : p = 25, 50, 75. Quitiles : p = 20, 40, 60, 80. Deciles : p = 10, 20, 30,, 90.

Measure of Dispersio Variace ad Stadard Deviatio The variace is a measure of how spread out a distributio is. It is computed as the average squared deviatio of each umber from its mea. The stadard deviatio is the square root of the variace. It is the most commoly used measure of spread. sample variace s 2 x = i = 1 ( x i 1 x ) 2 sample stadard deviatio 2 s x = s x If yi axi + b, 2 2 2 = i = 1K,, the s = a, the y s x s y = as x,

Example The price-earigs ratios of the stocks of five compaies i a idustry are as follows: 10%, 12%, 14%, 14%, 50% Calculate the arithmetic mea, variace, ad stadard deviatio of priceearigs ratios for these five compaies. 1 100 X = xi = = 20 5 i= 1 1 1136 s = x X = 1 5 1 2 2 x ( i ) i= 1 s = 284 = 16.85 x

Measure of Dispersio Relative Dispersio Coefficiet of Variatio A direct compariso of two or more measures of dispersio may be difficult because of differece i their meas. Relative dispersio is the amout of variability i a distributio relative to a referece poit or bechmark. A commo measure of relative dispersio is the coefficiet of variatio. sx CV =100 x This measure remais the same regardless of what uits are used.

Grouped Data Uorgaized raw quatitative data are simply a collectio o umbers that ca appear cofusig ad devoid of meaig. For example, suppose a aalyst wats to describe how the price-to-earigs ratios (P/E) of the commo stocks of compaies withi a idustry are distributed. The aalyst might compile the price-to-earig ratios of 96 publicly trade stocks of compaies i the idustry, P/E ratio A stock's price divided by its earigs per share, which idicates how much ivestors are payig for a compay's earig power.

Grouped Data A frequecy distributio is a tabular presetatio of statistical data. Frequecy distributios summarize statistical data by assigig it to specified groups, or iterval. Also the data employed with a frequecy distributio may be measured usig ay type of measuremet scale. Step 1 Defie the itervals. Iterval Frequecy The rage of values for each iterval must have a lower ad upper limit ad be all-iclusive ad ooverlappig. Step 2 Cout the observatios. Step 3 Display itervals ad frequecies i a table

Grouped Data The relative frequecy is aother useful way to preset data. The relative frequecy is calculated by dividig the absolute frequecy of each retur iterval by the total umber of observatios. Simply stated, relative frequecy is the percetage of total observatios fallig withi each itervals. Iterval Frequecy Relative Frequecy

Graphic Methods Bar Graph (Histogram) A bar graph is simply a bar chart of data that has bee classified ito a frequecy distributio. The attractive feature of a bar graph is that it allows us to quickly see where the most of the observatios are cocetrated. Iterval Frequecy

A demostratio of the effect of bi width o histograms For large bi widths, the bimodal ature of the dataset is hidde, ad for small bi widths the plot reduces to a spike at each data poit. What bi width do you thik provides the best picture of the uderlyig data?

Graphic Methods Stem-ad-Leaf Plot The stem-ad-leaf plot simply sorts the data i umerical order ad displays them. The procedure is based o the decisio as to what digits i the data value will be used as the 'leadig (stem) digits' ad the rest will be the 'trailig (leaf) digits'. The sortig of the data should be doe o the basis of leadig digits. stem leaf

Graphic Methods Box Plot The box Plot is summary plot based o the media ad iterquartile rage (IQR) which cotais 50% of the values. Whiskers exted from the box to the highest ad lowest values, excludig outliers. A lie across the box idicates the media. IQR = Q Q 3 1 MIN = Q 1.5 IQR, MAX = Q + 1.5 IQR 1 3 MIN MAX