ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all!

ENGI 44 Probability ad Statistics Faculty of Egieerig ad Applied Sciece Problem Set Solutios Descriptive Statistics. If, i the set of values {,, 3, 4, 5, 6, 7 } a error causes the value 5 to be replaced by 50, (a) what effect will this chage have o the media value? Noe at all! (b) what effect will this chage have o the mea value? The mea will icrease. The arithmetic mea is quite sesitive to outliers. Compared to the other values i the set, 50 is a extreme outlier ad ca be expected to pull the value of the mea up by a cosiderable amout. It is easy to see, from symmetry, that the pre-error mea is 4. A quick calculatio shows that the post-error mea is just over 33. (c) what effect will this chage have o the mode? Noe at all! The mode is ill-defied both before ad after the error. All seve data values are uique.

ENGI 44 Problem Set - Solutios Page of 6 (d) which of mea ad media is the better measure of locatio for this chaged data set ad why? I this case, the media. Usually, the arithmetic mea is preferred as a measure of cetral locatio, because it uses all data values ad the methods of calculus are much easier to apply. However, i this case, the presece of the extreme outlier reders the mea a poor idicator of cetral locatio.. The total scores obtaied o a pair of biased ( loaded ) dice whe they were throw 00 times are summarized i the frequecy table below: Score x Frequecy f Score x 8 7 3 0 9 4 0 0 5 37 6 5 7 5 Total: 00 (a) Display this iformatio o a bar chart. Frequecy f Bar charts produced both by Miitab ad Excel are o the ext page. The source files are available from these liks: Miitab project file (Versio 4) Excel worksheet (97-003 compatible)

ENGI 44 Problem Set - Solutios Page 3 of 6 (a) (cotiued) From Miitab: Bar chart of total score x o dice pair 40 35 30 Frequecy 5 0 5 0 5 0 3 4 5 6 7 x 8 9 0 From Excel:

ENGI 44 Problem Set - Solutios Page 4 of 6 (b) Idetify the mode. The mode is the most fashioable value of x; that is, the value of x for which the frequecy is greatest. A brief ispectio of the table (or the bar chart) shows that the maximum frequecy, 37, occurs at mode = (c) Costruct the cumulative frequecy table ad hece fid the media. (d) Fid the arithmetic mea. (e) Fid the sample variace. The frequecy table is exteded to provide solutios to parts (c), (d) ad (e) together. The cumulative frequecy i the i th class, c i, ca be foud iteratively: c i = c i + f i, where f i is the frequecy i the i th class, together with c = f. Score Frequecy Cum. freq. x f c x f x f 4 3 0 0 0 4 4 6 5 3 5 5 6 5 7 7 5 0 35 45 8 7 7 56 448 9 8 99 89 0 0 48 00 000 37 85 407 4477 5 00 80 60 Total: 00 000 0338 (c) The media of 00 ordered values is the average (arithmetic mea) of the 50 th ad 5 st largest values. From the cumulative frequecy table, it is clear that the 49 th through 85 th values are all. Therefore, media = (d) Arithmetic mea: x x f 000 = = = f 00 0

ENGI 44 Problem Set - Solutios Page 5 of 6 (e) Sample variace: ( ) 338 x f x f 033800 000 000 s = = = = 9900 99 ( ) 3.4 (f) Commet o ay evidece for skew i these data. From the bar chart i part (a), there is clearly a loger tail o the left tha there is o the right. There is evidece for strog egative skew 3. The grades received by a egieerig class i a certai course are as show i the frequecy table below: Grade Frequecy A 34 B 47 C 50 D 8 F 6 Display this iformatio graphically i the form of (a) a bar chart (b) a pie chart Show the calculatio for the agle of ay two segmets of the pie chart. Each agle i the pie chart is calculated from the frequecy usig the formula Frequecy Pie chart agle = 360 Total frequecy The calculatios for all five agles are displayed i the associated Excel file, which also produced the bar ad pie charts o the ext page.

ENGI 44 Problem Set - Solutios Page 6 of 6 Questio 3 (cotiued) Bar chart: Pie chart: I questios 4 to 7 below, use Miitab (or some other software package) to aswer the questios. If you do ot use Miitab, the state what software package you have used.

ENGI 44 Problem Set - Solutios Page 7 of 6 4. For the followig data set, (also available as a plai text file here),.035.545 6.3796 0.6863.498 9.400 8.008 9.3688 7.084.353 7.674.0376.3456.4693.637 3.8840 3.436.4395 9.060 0.385.345 9.0963 9.9664 0.0884 0.689 0.857.53 8.98 8.8498 0.54.3870 7.876 0.64 0.064 7.938 9.403.544 8.3797.705 9.957 (a) create a pritout of Miitab s stadard Descriptive Statistics output, icludig the default bar chart with superimposed ormal graph ad the default boxplot, (as was demostrated i the Miitab tutorial), (or provide equivalet iformatio from some other software package). (b) What evidece do you see for skewess i these data? (a) From the associated Miitab project file: Descriptive Statistics: Data Variable N N* Mea SE Mea StDev Miimum Q Media Q3 Data 40 0 0.7 0.65.678 6.380 9.069 0.335.30 Variable Maximum Data 3.884 4 Histogram (with Normal Curve) of Data Mea 0.7 StDev.678 N 40 0 Frequecy 8 6 4 0 6 8 0 Data 4

ENGI 44 Problem Set - Solutios Page 8 of 6 4 (a) (cotiued) 4 Boxplot of Data 3 Data 0 9 8 7 6 (b) The mea ad media are early equal, the whiskers of the boxplot are approximately equal, there are o outliers ad the media is ear the cetre of the box. There is o clear evidece of skew [I fact, these data were geerated by Miitab from a ormal distributio of mea 0 ad stadard deviatio. Normal distributios have zero skew, although radom samples draw from them may be somewhat skewed by chace.]

ENGI 44 Problem Set - Solutios Page 9 of 6 5. For the followig data set of 00 values, (also available as a plai text file here),.8679 3.03009 6.40883 4.33369 0.63779 0.5385 0.4579 3.079.38530 4.67676.7304.7739 0.854.85599.8534.7757.8583 0.65357 0.4.97.47675.7943 0.66736.5375 3.759.8378 0.790.60064.8358.67403.03660 0.50900.0876.59330 0.969 0.760.6550 0.53473.4 0.67745 3.68679 5.63466 4.460 0.63746.00497.4397.05.760.394.5488.758.878.0864.436.549.36957 3.34404 4.357 0.8697.300 0.66336 3.653.769.94.6554.56736 0.84466 0.4495.48484 4.6585 5.37489.8596.67463 0.87603.675.57 0.68.85488 3.8630 0.6538 0.7766 0.970.0063 0.99977.6056.0060.06657.938 0.8605.809.9997.944.58438 0.94377 0.33508.94735.83459.8873.7406.6448 (a) create a pritout of Miitab s stadard Descriptive Statistics output, (or provide equivalet iformatio from some other software package). From the associated Miitab project file, Descriptive Statistics: Data Variable N N* Mea SE Mea StDev Miimum Q Media Q3 Data 00 0.936 0.7.7 0.3 0.948.707.609 Variable Maximum Data 6.409 (b) costruct a stadard boxplot, orieted horizotally, with gridlies at itervals of 0.5 uits. ENGI 44 Problem Set Questio 5 0.0 0.5.0.5.0.5 3.0 3.5 Data 4.0 4.5 5.0 5.5 6.0 6.5

ENGI 44 Problem Set - Solutios Page 0 of 6 5 (c) idetify ay outliers (list their values). The boxplot clearly displays three outliers, all at the upper ed (ad oe of them extreme). From a sorted list of values i the Miitab project files, the outliers are the values 5.37489, 5.63466 ad 6.40883 (d) costruct a histogram, usig as class boudaries the cosecutive itegers, from 0 to the ext iteger above the largest observed value. 0.35 ENGI 44 Problem Set Questio 5 0.30 0.5 Desity 0.0 0.5 0.0 0.05 0.00 0 3 Data 4 5 6 7 (e) What evidece do you see for skewess i these data? All of the above illustrate a strog positive skew. The boxplot gives the clearest idicatio of positive skew. [I fact, these data were geerated by Miitab from a gamma distributio, with parameters α =, β = ad therefore mea = variace =.]

ENGI 44 Problem Set - Solutios Page of 6 6. For the followig data set of 30 values, (also available as a plai text file here), 0.957438 0.66777 0.69579 0.53556 0.989805 0.740677 0.837656 0.8593 0.97656 0.789 0.930773 0.945 0.96407 0.99488 0.90530 0.98569 0.658793 0.88450 0.978 0.99899 0.93477 0.905575 0.856455 0.7894 0.836906 0.89483 0.5985 0.848346 0.90458 0.96747 (a) create a pritout of Miitab s stadard Descriptive Statistics output, (or provide equivalet iformatio from some other software package). From the associated Miitab project file, Descriptive Statistics: Data Variable N N* Mea SE Mea StDev Miimum Q Media Q3 Data 30 0 0.8467 0.039 0.309 0.536 0.777 0.8979 0.9404 Variable Maximum Data 0.9990 (b) costruct a stadard boxplot ad add a symbol to idicate the locatio of the arithmetic mea..0 ENGI 44 Problem Set Questio 6 0.9 0.8 Data 0.7 0.6 0.5

ENGI 44 Problem Set - Solutios Page of 6 6 (c) idetify ay outliers (list their values). Two outliers are preset, both at the lower ed. At the top of the colum of sorted data i the worksheet, we fid that the outliers are the values 0.53556 ad 0.5985 The iterquartile rage is IQR = xu xl 0.9404 0.777 = 0.634 the lower outer fece is at x = xl 3( IQR) 0.777 3 0.634 = 0.869 Both outliers are therefore mild ideed, they are barely below the lower ier fece! x = x.5 IQR 0.777.5 0.634 = 0.530 ] [The lower ier fece is at ( ) L (d) costruct a histogram, class widths of 0., from 0 to. 5 ENGI 44 Problem Set Questio 6 4 Desity 3 0 0.0 0. 0. 0.3 0.4 0.5 Data 0.6 0.7 0.8 0.9.0 (e) What evidece do you see for skewess i these data? The boxplot ad the histogram both illustrate clearly a strog egative skew. I the boxplot, both outliers are below the box, the mea is below the media, the lower whisker is much loger tha the upper whisker ad the box is ot symmetric about the media, with the lower quartile much farther away from the media tha the upper quartile. [I fact, these data were geerated by Miitab from a [egatively skewed] beta distributio, with parameters α = 4, β = ad therefore mea = 4/5 ad variace = /75.]

ENGI 44 Problem Set - Solutios Page 3 of 6 7. For the followig data set of 60 values, (also available as a plai text file here), 7 6 43 54 54 48 48 59 55 6 50 55 30 66 4 55 48 57 6 48 46 6 30 50 66 73 54 48 66 6 45 57 48 70 68 43 5 50 46 64 46 50 50 50 48 37 45 53 64 50 39 3 66 68 4 70 48 73 39 43 (a) costruct a frequecy bar chart, with classes of width 5 ad cetres at { 3, 37, 4, 47,..., 67, 7 }. From the associated Miitab project file, 4 ENGI 44 Problem Set Questio 7 0 Frequecy 8 6 4 0 3 37 4 47 5 57 Data 6 67 7 (b) create a pritout of Miitab s stadard Descriptive Statistics output, but display oly the umber cout, mea, stadard deviatio, media ad quartiles, (or provide equivalet iformatio from some other software package). Descriptive Statistics: Data Variable N Mea StDev Q Media Q3 Data 60 5.93 0.7 46.00 50.00 6.00

ENGI 44 Problem Set - Solutios Page 4 of 6 7 (c) idetify the modal class ad the media class from your bar chart. The tallest bar, height = frequecy = 3, is the bar cetred o x = 47. Therefore The modal class is [44.5, 49.5) From the summary statistics i part (b), the media is x = 50. This value falls i the class cetred o x = 5. Therefore The media class is [49.5, 54.5) (d) use the grouped data (from the bar chart) to calculate the mea, the populatio stadard deviatio ad the sample stadard deviatio (you may fid this easier to do i a spreadsheet program such as Microsoft Excel ). From the associated Excel file, x f 390 x = = = 53.6 [compare this to the value 5.93 i part (b) above], f 60 N x ( x) 60 7640 ( 390) 40900 σ = N = 60 = 3600 = 3.638 σ = 3.638 0.660 ad s = x ( x) 60 7640 ( 390) 40900 = = 60 59 3540 5.565 ( ) s 5.565 0.750 [compare this to the value 0.7 i part (b) above]. [Note: if these 60 data are a sample draw from a larger populatio, the the sample stadard deviatio is the appropriate form to use as a measure of spread. Oly if these 60 values costitute the etire populatio should the formula for σ be used istead.] (e) Why are the mea ad stadard deviatio that you calculated i part (d) differet from the Miitab values? The differece arises from the loss of precise iformatio caused by groupig the data together ito classes.

ENGI 44 Problem Set - Solutios Page 5 of 6 8. Problem Set Bous Questio, Descriptive Statistics Prove that, for ay real costat a Hit: Use the idetities i = x, ( x i x ) < ( x i a ) i= i= k = k (for ay costat k ) ad x i = x. i = i i i= i= < 0. Rearrage the iequality ito the form ( x x ) ( x a ) Maipulate the left had side of this iequality: ( x x) ( x a) = ( x x x + x ) ( x x a+ a ) i i i i i i i= i= i= i= x i i = = x x + x x i i= i= i= i i i= i= + a x a ( ) ( ) = x x + x + ax a = x + ax a = ( x a) < 0 a x i i i= i= x x < x a a x. Therefore ( ) ( )

ENGI 44 Problem Set - Solutios Page 6 of 6 Additioal Note for Questio 8: It the follows that, for ay radom sample of size draw from a populatio of true mea µ, ( x i x ) ( x i µ ) i= i= (with equality oly i the very ulikely evet that x = µ ). σ = x N i µ (where there are N members i the etire populatio). i = Oe ca the speculate [correctly] that, o average, ( ) xi x σ Recall that ( ) N i = ( x ) i x is said to be a biased estimate of i = σ, i that it uderestimates the true value of σ o average. The bias disappears whe this variace formula is replaced by the sample variace s = ( x ) i x. i = I the sectio o estimators we shall see a proof that s is a ubiased estimate of σ. Partial solutios usig Matlab are available as m-files for the followig questios: Questio Questio 3 Questio 4 Questio 5 Questio 6 Questio 7 Retur to the idex of solutios