Chapter 2 Descriptive Statistics
Statistics Most commoly, statistics refers to umerical data. Statistics may also refer to the process of collectig, orgaizig, presetig, aalyzig ad iterpretig umerical data for the purpose of makig decisio. populatio parameter samplig sample statistic descriptive statistics Estimatio Predictio Iferetial statistics Hypothesis testig
Descriptive Statistics Descriptive statistics is the sciece of orgaizig ad summarizig large data sets i ways that make it possible to discer their meaig. Measure of Locatio Measure of locatio idetifies the ceter or middle of the sample. 1. Arithmetic mea 2. Geometric mea 3. Media 4. Mode Measure of Dispersio Dispersio is defied as the variability aroud the cetral locatio 1. Rage 2. Quatiles 3. Variace ad stadard deviatio
Measure of Locatio Arithmetic Mea The arithmetic mea is the sum of all the observatios divided by the umber of observatios. Populatio (arithmetic) mea : 1 N i μ = x N i= 1 N = The umber of observatios i the populatio. Sample (arithmetic) mea : 1 x = = The umber of observatios x i i the sample. i= 1
Arithmetic mea is the most widely used measure of locatio ad has the followig properties : The arithmetic mea is uique. The arithmetic mea is the oly oe measure of locatio which the sum of the deviatios from the mea is zero. If yi axi + b, the N ( x μ) = = i =1K,, y = ax + b i i= 1 i= 1 ( x x) = 0 i The arithmetic mea is oversesitive to extreme values i the sample.
Measure of Locatio Geometric Mea The geeral formula for the geometric mea, G, is as follows : 1 G = xi = x1 x2 x3 x i= 1 1 l G = ( L ) l x 1 = l x There are two properties of a geometric mea that are importat : I order to calculate a geometric mea, all of the values i the data set must be positive. For the same set of umbers, the geometric mea will always be smaller tha the arithmetic mea with oe exceptio that all values are equal. i= 1 i
Measure of Locatio Media ad Mode The media is the value of the middle poit of samples, whe samples are arraged i ascedig order. Media = The [(+1)/2] th largest observatio if is odd. = The average of the (/2) th ad (/2+1) th largest observatio if is eve. The mode is the most frequetly occurrig value amog all the observatios i a sample. It is the most probable value that would be obtaied if oe data were selected at radom from a populatio.
Measure of Locatio Media ad Mode Calculate the media ad mode of the followig data: 12, 24, 36, 25, 17, 19, 24, 11 Sorted data : 11, 12, 17, 19, 24, 24, 25, 36 19 + 24 Media = = 21.5, Mode = 24 2
Measure of Locatio The mea is iflueced by outliers whereas the media is ot. The mode is very ustable. Mior fluctuatios i the data ca chage it substatially; for this reaso it is seldom calculated. bimodal mode mode = = Mea Media Mode
Symmetry ad Skewess i Distributio Whe the shape of a distributio to the left ad the right is mirror image of each other, the distributio is symmetrical. Examples of symmetrical distributio are show below : A skewed distributio is a distributio that is ot symmetrical. Examples of skewed distributios are show below : Positively skewed Negatively skewed
Descriptive Statistics Descriptive statistics is the sciece of orgaizig ad summarizig large data sets i ways that make it possible to discer their meaig. Measure of Locatio Measure of locatio idetifies the ceter or middle of the sample. 1. Arithmetic mea 2. Geometric mea 3. Media 4. Mode Measure of Dispersio Dispersio is defied as the variability aroud the cetral locatio 1. Rage 2. Quatiles 3. Variace ad stadard deviatio
Measure of Dispersio Rage ad Mea Absolute Deviatio (MAD) The Rage is the simplest measure of dispersio. It is simply the differece betwee the largest ad smallest observatios i a sample. Rage = x max x mi The mea absolute deviatio is the average of the absolute values of the deviatios of idividual observatios from the arithmetic mea. xi x i= MAD = 1
Measure of Dispersio Quatiles Quatile (percetile) is the geeral term for a value at or below which a stated proportio of the data i a distributio lies. p th percetile is the value V p such that p% of the sample poits are less tha or equal to V p. If k = p/100 is ot a iteger, V p is the (k +1) th largest sample poit, where k is the largest iteger less tha k. If k = p/100 is a iteger, V p is the average of the k th ad (k+1) th largest observatios. Quartiles : p = 25, 50, 75. Quitiles : p = 20, 40, 60, 80. Deciles : p = 10, 20, 30,, 90.
Measure of Dispersio Variace ad Stadard Deviatio The variace is a measure of how spread out a distributio is. It is computed as the average squared deviatio of each umber from its mea. The stadard deviatio is the square root of the variace. It is the most commoly used measure of spread. sample variace s 2 x = i = 1 ( x i 1 x ) 2 sample stadard deviatio 2 s x = s x If yi axi + b, 2 2 2 = i = 1K,, the s = a, the y s x s y = as x,
Example The price-earigs ratios of the stocks of five compaies i a idustry are as follows: 10%, 12%, 14%, 14%, 50% Calculate the arithmetic mea, variace, ad stadard deviatio of priceearigs ratios for these five compaies. 1 100 X = xi = = 20 5 i= 1 1 1136 s = x X = 1 5 1 2 2 x ( i ) i= 1 s = 284 = 16.85 x
Measure of Dispersio Relative Dispersio Coefficiet of Variatio A direct compariso of two or more measures of dispersio may be difficult because of differece i their meas. Relative dispersio is the amout of variability i a distributio relative to a referece poit or bechmark. A commo measure of relative dispersio is the coefficiet of variatio. sx CV =100 x This measure remais the same regardless of what uits are used.
Grouped Data Uorgaized raw quatitative data are simply a collectio o umbers that ca appear cofusig ad devoid of meaig. For example, suppose a aalyst wats to describe how the price-to-earigs ratios (P/E) of the commo stocks of compaies withi a idustry are distributed. The aalyst might compile the price-to-earig ratios of 96 publicly trade stocks of compaies i the idustry, P/E ratio A stock's price divided by its earigs per share, which idicates how much ivestors are payig for a compay's earig power.
Grouped Data A frequecy distributio is a tabular presetatio of statistical data. Frequecy distributios summarize statistical data by assigig it to specified groups, or iterval. Also the data employed with a frequecy distributio may be measured usig ay type of measuremet scale. Step 1 Defie the itervals. Iterval Frequecy The rage of values for each iterval must have a lower ad upper limit ad be all-iclusive ad ooverlappig. Step 2 Cout the observatios. Step 3 Display itervals ad frequecies i a table
Grouped Data The relative frequecy is aother useful way to preset data. The relative frequecy is calculated by dividig the absolute frequecy of each retur iterval by the total umber of observatios. Simply stated, relative frequecy is the percetage of total observatios fallig withi each itervals. Iterval Frequecy Relative Frequecy
Graphic Methods Bar Graph (Histogram) A bar graph is simply a bar chart of data that has bee classified ito a frequecy distributio. The attractive feature of a bar graph is that it allows us to quickly see where the most of the observatios are cocetrated. Iterval Frequecy
A demostratio of the effect of bi width o histograms For large bi widths, the bimodal ature of the dataset is hidde, ad for small bi widths the plot reduces to a spike at each data poit. What bi width do you thik provides the best picture of the uderlyig data?
Graphic Methods Stem-ad-Leaf Plot The stem-ad-leaf plot simply sorts the data i umerical order ad displays them. The procedure is based o the decisio as to what digits i the data value will be used as the 'leadig (stem) digits' ad the rest will be the 'trailig (leaf) digits'. The sortig of the data should be doe o the basis of leadig digits. stem leaf
Graphic Methods Box Plot The box Plot is summary plot based o the media ad iterquartile rage (IQR) which cotais 50% of the values. Whiskers exted from the box to the highest ad lowest values, excludig outliers. A lie across the box idicates the media. IQR = Q Q 3 1 MIN = Q 1.5 IQR, MAX = Q + 1.5 IQR 1 3 MIN MAX