Elemetary Statistics M. Ghamsary, Ph.D. Sprig 004 Chap 0
Descriptive Statistics Raw Data: Whe data are collected i origial form, they are called raw data. The followig are the scores o the first test of the statistics class i fall of 000. 0 9 66 8 9 60 6 66 68 69 6 99 88 94 7 93 76 88 78 79 7 90 7 97 78 79 76 80 8 86 86 63 89 8 83 73 8 88 70 76 89 86 96 Group Data: raw data that is orgaized ito a frequecy distributio Frequecy Distributio: the orgaizig of raw data i table form, usig classes ad frequecies. Class 0-9 60-69 70-79 80-89 90-99 Frequecy 8 3 7 Class: umber of classes i the above table is. Class Limits: represet the smallest ad largest data values i each class. Lower Class: the lowest umber i each class. I the table above, 0 is the lower class limit of the st class, 60 is the lower class limit of the d class, etc. Upper Class: the highest umber i each class. I the table above, 9 is the upper class limit of the st class, 69 is the upper class limit of the d class, etc. Class Width: for a class i a frequecy distributio, class width is foud by subtractig the lower (or upper) class limit of oe class mius the lower (or upper) class limit of the previous class. I the table above, the class width is 0. (60-0,69-9,etc.)
Class Boudaries: are used to separate the classes so that there are o gaps i the frequecy distributio. Class 0-9 60-69 70-79 80-89 90-99 Class Boudaries 49.-9. 9.-69. 69.-79. 79.-89. 89.-99. Frequecy 8 3 7 3 Cumulative Frequecy: Relative Frequecy: Class 0-9 60-69 70-79 80-89 90-99 Frequecy Cumulative Frequecy 8 8+3 +3 3 3+38 7 7+384 Relative Frequecy /40. 8/40.8 /40.7 3/40.9 7/40.6 N4 The most commoly used graphs i statistics are: The Histogram The Frequecy Polygo. The Cumulative Frequecy Graph The Bar Chart Pie Chart Pareto Charts Dot Plot Stem-Leaf Times Series Graph
. Histogram 4 3 0 Frequecy 8 6 8 7 4 0.0 6.0 7.0 8.0 9.0 Test Scores. The Frequecy Polygo 3 Frequecy 0 9 8 7 6 6 7 8 9 Test Score 3. The Cumulative Frequecy Graph 4 40 Cumulative Frequecy 3 30 0 0 0 6 7 Test Score 8 9 4
4. The Bar Chart 4 Frequecy 0 8 6 4 F D C B A Test Scores. Pie Chart A F D B C 6. Pareto charts 3 Cout 8 7 B C D A F grade
7. Dot plot 0 60 70 Test Scores 80 90 00 8. Stem-Leaf 089 8 6 036689 () 7 036668899 3 8 0366688899 7 9 034679 Types of Distributios: There are several differet types of distributios, but the followig are the most commoly used i statistics. Symmetric, Normal, or Bell shape Positively skewed, Right tail, or Skewed to the right side. Negatively skewed, Left tail, or Skewed to the left side. Uiform 6
Symmetric 7 6 Frequecy 4 3 0 0 60 70 80 90 C3 Positively Skewed 0 Frequecy 0 0 60 70 Test Scores 80 90 7
Negatively Skewed Frequecy 0 0 0 60 70 Test scores 80 90 Uiform 6 Frequecy 4 3 0 0 60 70 Test scores 80 90 8
9 Time Series Graph Moth Price of AOL Price of MSFT Ja Feb Mar Apr May Ju Jul Aug Sep Oct Nov Dec 6 60 8 6 0 48 7 0 48 40 0 0 00 9 90 8 7 80 60 0 40 Dec Oct Aug Ju Apr Feb 0 0 00 90 80 70 60 0 40 MSFT AOL Price Moth
Elemetary Statistics M. Ghamsary, Ph.D. Sprig 00 Lecture #3 0
Statistic: value(s) or measure(s) obtaied from a sample. Parameter: value(s) or measure(s) obtaied from a specific populatio. Measures of Cetral Tedecy: are Mea, Media, ad Mode. Mea: is defied to be the sum of the scores i the data set divided by the total umber of scores. Sample Mea: is deoted by, ad it is defied by: i i, or simply. Populatio Mea: is deoted by µ, ad it is defied by: N i µ N i, or simply µ Eample : Fid the mea of 0, 7, 3,, 8. N. 0 + 7 + 3 + + 8 0. Eample : Fid the mea of 0, 7, 3,, 8, 3, 7,,, ad 30. 0 + 7 + 3 + + 8 + 3 + 7 + + + 30 0 0 0 Eample 3: Fid the mea of the temperatures i data set oe. 6 + 4+ L+ 8 + 4 0 9. 38
Eample 4: Fid the mea of temperatures for males ad females i data set oe. R S T 0.7 Male 8.4 Female Note: The sample mea,, is a ubiased estimate of the populatio mea, µ. Media: is defied to be the midpoit of the data set that is arraged from smallest to largest. Eample : Fid the media of 0, 7, 3,,. First we must sort the data set as follows: 3, 7, 0,,. The media is 0. Eample 6: Fid the media of 0, 7, 3,,, 0. After we sort we get: 3, 7, 0,,, 0. As we observe, there are middle observatios. So to fid the media we average these values, amely: Media(0+)/. Eample 7: Fid the media of temperatures i data set oe. Media Eample 8: Fid the media of temperatures for males ad females i data set oe. Media. for Males Media 8. for Females Mode: is defied to be the value i the data set that occurs the most frequetly. Eample 9A: Fid the mode of 0, 7, 3,,, 3. Mode is 3.
Eample 9B: Fid the mode of 0, 7, 3, 0,, 3. Modes are 3 ad 0. Eample 9C: Fid the mode of 0, 7, 3, 0, 0, 3. Mode is 0. Eample 9D: Fid the mode of 0, 7, 3, 0, 7, 3. There is o mode, sice all values occur with same frequecy Eample 9E: Fid the mode of 0, 7, 3,,, 8. There is o mode, sice o value occurs more tha oce. Eample0: Fid the mea, the media, ad the mode of data set: 0, 7, 3,,, 8, 0, 7, 4, 6, 3, 8,, 7, 3,,, 8, 0, 0 Solutio: First we must sort the data set 0, 0, 0,,, 3, 4,,, 6, 7, 7, 7, 8, 0,, 3, 8, 8, 3 0 + 0 + 0 + +... + 8 + 8 + 3 3 Mea: 7. 6 0 0 6 + 7 Media: 6., sice there are middle observatios Mode: 0, 7 Eample : Fid the mea for the followig group data. Class 0-9 60-69 70-79 80-89 90-99 Frequecy 4 8 6 0 3
Solutio: First we eed to fid the class marks (midpoits) ad the we use the followig formula,. f, where : is the midpoit or class mark, ad f : is the frequecy : is the umber of data poits Class Frequecy Class marks f. f 0-9 4. 09 60-69 4 64. 8 70-79 8 74. 34 80-89 6 84. 07 90-99 0 94. 94 f 40. f 360 So the mea is. f 360 40 79 Weighted Mea: The formula above is also called weighted average or weighted mea. It ca also be writte as follows: w. w where w is weight ad is the score. Eample : Fid the GPA of Joh who has the followig courses with the correspodig uits ad grades. Eglish Math Spaish uits with the grade of A 3 uits with the grade of F uits with the grade of D 4
Solutio: I this problem, will be the value of the grades ad w is the umber of uits, [ w] [ ] [ ] [ ] w + +. 4 + 03 + 0 + 0 +.. 3 0 0 Eample 3: A teacher is teachig 3 classes: There are 30 studets i the first Class with the average of 70 o the fial eam. The secod class has 40 studets with the average of 60 o the fial eam. The 3 rd class has 0 studets with the average of 80 o the fial eam. Fid the weighted average of the three classes combied together. Solutio: Let be the average of ad w be the umber of studets. w. 70( 30) + 60( 40) + 80( 0) w 30 + 40 + 0 00 + 400 + 600 90 600 67. 8 90 Eample 4: What is the approimate mea i the followig graphs a. Mea is about. 0, 7, 00,, 0. b. Mea is about. 0, 7, 00,, 0. c. Mea is about. 0, 7, 00,, 0. d. Mea is about. 0, 7, 00,, 0.
Eample : Locate the approimate mea, media, ad mode i the followig graphs a. b. c. d. Eample 6: for studets at LLU, which is larger: a. Average age or media age? b. Average icome or media icome? c. Average blood pressure or media blood pressure? 6
Special Meas: There are several other meas that are used i differet areas of life such as busiesses, ecoomics, ad physics. The most commoly used are the followig. Harmoic Mea Geometric Mea The Root- Mea Square Mea Deviatio Harmoic Mea: Give some o-zero scores, the fid sum of the reciprocals ad H divide by umber of scores. I other words, harmoic mea is the reciprocal of the average of the reciprocals of the observatios i the data set. or H, which is easier to remember. Eample7: Fid the harmoic mea of 0, 7, 3,, 8. Solutio: H / 0+ / 7+ / 3+ / + / 8 6 Q. What is the iterpretatio of this umber? 699. 7 A. To aswer this questio, do the look at the followig eample. Eample 8: Suppose you drive to work at the speed of 0 miles per hour ad o retur your speed is 40 miles per hour. What is the average speed for the roud trip speed? 7
Solutio: The aswer is ot (40+0)/30. Because these values are rate, the the most appropriate mea will be the harmoic mea, which is H 6. 7 mph / 40+ / 0 As we observe this value is far from 30. Q. I do ot get it. Why is this mea the correct oe? A. To aswer this questio, let us assume the distace to work is 0 miles. Time to go to work d r 0 0 Time to retur home d r 0 40 So the average speed is Total Distace Total time 00. hours. 0. hours harmoic mea, the value we foud i above. (0 + 0) miles (0.0 + 0.) hours 0 07. 6. 7 m/h, which is the Geometric Mea: Give a set of positive umbers,,...,, their geometric mea is defied to be the th root of their product.... GM Eample 9: Fid the geometric mea of 0, 7, 3,, 8. GM ( 0)( 7)( 3)( )( 8) 3780 8. 4 For large data set it is easier to use the followig formula to fid the geometric mea. log So i the above eample we have: GM log + log +... + log log0 + log7 + log 3 + log + log8 log GM 0. 933 Now use atilog to fid the geometric mea 8
GN ati log( 0. 933 ) 8. 4 Note: We ca use either atural logarithm (l) or commo logarithm (log). Q. What is the iterpretatio of this umber? A. To aswer this questio, look at the followig eample. Eample 0: The growth rate of MGM for the et 4 year is 8%, 0%, %, ad %. Fid the average growth rate. Solutio: To do this we compute the geometric mea of.8,.0,., ad.. GM 4 4 ( 8. )( 0. )(. )(. ). 94. So the average growth rate is % over the 4 years. The Root Mea Square: Give the umbers,,..., we defie the root mea square as follows: i i RMS. Eample : Fid the root mea square mea of 0, 7, 3,, ad 8. RMS 0 + 7 + 3 + + 8 00 + 49 + 9 + 44 + 34.. The Mea Deviatio: Give the umbers,,...,, the mea deviatio is defied by. MD i i, 9
Eample : Fid the mea deviatio of 0, 7, 3,, 8. From eample, the mea of 0, 7, 3,, 8 is 0, the we have. 0 0 + 7 0 + 3 0 + 0 + 8 0 MD 0 + 3 + 7 + + 8 0 + 3 + 7 + + 8 0 4 Measures of Variatio Rage Variace Stadard Deviatio The Rage: is defied to be the highest value mius the lowest value i the data set The Variace: is defied by the followig: For the sample: s which is the same as s which is the same as σ 0 bi i g i µ i For the populatio: σ N N b d i g N d i N (short cut formula of the sample variace). (short cut formula of the sample variace).
Stadard Deviatio: is the positive square root of the variace. Stadard deviatio Variace For the sample: s bi g i, ad For the populatio: σ N b g i i N µ Eample : Fid the rage, variace, ad the stadard deviatio of the followig data set. 3, 0, 7,,. Solutio: Rage: Largest- Smallest 0 bi g i Variace: If we use the s, first we eed to fid the sample mea. So 3+ 0+ 7+ + 30 6, the we substitute i the above formula ad we get s b3 6g+ b0 6g+ b7 6g+ b 6g+ b 6g, s b3g b6g bg bg bg 9, + + + + 9+ 36+ + + 8 s, 8 s 3, 4 So the variace is s 3. But if we use the short cut formula s d i,
First we eed to fid their sum ( ), ad their sum of squares ( ). 3 + 0 + 7 + + 30 3 + 0 + 7 + + 9 + 0 + 49 + + 308 The we have s as above. b30g 308 308 4 900 308 80 4 8 3, which is eactly the same 4 Stadard deviatio: As we kow the stadard deviatio is positive square root of variace. stadard deviatio Variace 3. 66 Computer outputs: I MINITAB after we type the data we go Stat the Basic Stat we will get the followig out put Descriptive Statistics Variable N Mea Media TrMea StDev SE Mea X 6.00.00 6.00.66.3 Variable Miimum Maimum Q Q3 X 0.00.00.0.00 ---------------------------------------------------------------------------------- I SPSS after we type the data we go to Descriptive Statistics N Rage Miimum Maimum Mea Std. Deviatio Variace X.00.00.00 6.0000.669 3.000 Valid N (listwise) I SAS we type the followig DATA oe;
iput @@; cards; 3 0 7 proc uivariate; var ; ru; The followig is a portio of the output of the above program. Basic Statistical Measures Locatio Variability Mea 6.000000 Std Deviatio.668 Media.000000 Variace 3.00000 Mode. Rage.00000 Iterquartile Rage 4.00000 --------------------------------------------------------------------------------------------------------------------------- Eample : Fid the rage, variace, ad the stadard deviatio of the followig data set. 0, 7, 3,,, 8, 0, 7, 4, 6, 3, 8,, 7, 3,,, 8, 0, 0 Solutio: This is doe all by computer. I MINITAB after we type the data we go Stat the Basic Stat we will get the followig out put Descriptive Statistics Variable N Mea Media TrMea StDev SE Mea X 0 7.60 6.0 7.06 6.73. Variable Miimum Maimum Q Q3 X 0.00 3.00..0 ---------------------------------------------------------------------------------- I SPSS after we type the data we go to Descriptive Statistics N Rage Miimum Maimum Mea Std. Deviatio Variace X 0.00 0.00 3.00 7.6000 6.7309 4.30 Valid N 0 3
(listwise) I SAS we type the followig DATA oe; iput @@; cards; 0 7 3 8 0 7 4 6 3 8 7 3 8 0 0 ; proc uivariate; var ; ru; The followig is a portio of the output of the above program. Basic Statistical Measures Locatio Variability Mea 7.60000 Std Deviatio 6.7309 Media 6.0000 Variace 4.306 Mode 0.00000 Rage.00000 Iterquartile Rage 8.0000 Eample 3: Fid the stadard deviatio for the followig group data Class Frequecy 0-9 60-69 4 70-79 8 80-89 6 90-99 0 Solutio: First we will modify the above formula for the variace. But we eed to fid the class marks (midpoits) ad the we use the followig formula, s bi g. i f, or s bi g. f 4
Where : is the midpoit or class mark, ad f : is the frequecy : is the umber of data poits We already kow the mea. f 360 79 40 Class f. f bi g b i g. f 0-9 4. 09 (4.-79) 600. 00.0 60-69 4 64. 8 (64.-79) 0. 84.00 70-79 8 74. 34 (74.-79) 0. 364.0 80-89 6 84. 07 (84.-79) 30. 8.0 90-99 0 40 f 94. 94. f 360 (94.-79) 40. 40.0 bi g. f 4990 After substitutio i s bi g. f 4990 we get s 7. 79, ad hece the stadard 40 deviatio will be s 7. 79. 3 If we use the short cut formula s Class f f. f d f i, we eed the followig table.. f 0-9 4. 09 (4.). 940. 60-69 4 64. 8 (64.)..4 664.0 70-79 8 74. 34 (74.).8 99904. 80-89 6 84. 07 (84.).6 484. 90-99 0 40 f 94. 94. f 360 (94.).0 8930.. f 4630
s 4630 40 40 b360g 998600 4630 39 40 4630 49640 39 4990 39 7. 79 which is the same as the above result. Q. What will happe to the mea, media, mode, rage, ad stadard deviatio if we add a fi umber, c, to all values i the data set? A. The mea, media, ad mode will icrease by c uits, but the rage, ad stadard deviatio will ot chage. Eample : Cosider the data set, 3,,,, which has the followig measures Mea 6, Media, Mode, Rage 9, ad Stadard deviatio 3.464 Let us add 7 to all of the values i the above data set. We get, 0,,, 9. Ad the ew values of the above measures are Mea 6+7 3 Media +7 Mode +7 Rage 9 Stadard deviatio 3.464 Q. What will happe to the mea, media, mode, rage, ad stadard deviatio if we subtract a fi umber, c, from all values i the data set? A. The mea, media, ad mode will decrease by c uits, but the rage, ad stadard deviatio will ot chage. Eample : Let us subtract from all of the values i the above data set. We get -, -4, -, -, Ad the ew values of the above measures are Mea 6-7 - Media -7 - Mode -7 - Rage 9, Stadard deviatio 3.464 6
Q3. What will happe to the mea, media, mode, rage, ad stadard deviatio if we multiply a fi umber, c, to all values i the data set? A. The mea, media, ad mode will be multiplied by c uits, so does the rage ad the stadard deviatio. Eample 3: Let us multiply 7 to all of the values i the above data set. We get 3,, 3, 3, ad 84. Ad the ew values of the above measures are. Mea 6(7) 4 Media (7) 3 Mode (7) 3 Rage 9(7) 63 Stadard deviatio 3.464(7) 4.48. I geeral if Y ax + b, the we have Mea of Y a. [Mea of X]+b or y a + b Stadard deviatio of Y a [stadard deviatio of X], S a S Variace of Y a [Variace of X], or S a S Y X y X 7