ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all!

Similar documents
CHAPTER 2. Mean This is the usual arithmetic mean or average and is equal to the sum of the measurements divided by number of measurements.

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

(# x) 2 n. (" x) 2 = 30 2 = 900. = sum. " x 2 = =174. " x. Chapter 12. Quick math overview. #(x " x ) 2 = # x 2 "

Data Description. Measure of Central Tendency. Data Description. Chapter x i

Chapter 2 Descriptive Statistics


Elementary Statistics

Median and IQR The median is the value which divides the ordered data values in half.

Lecture 1. Statistics: A science of information. Population: The population is the collection of all subjects we re interested in studying.

Chapter 23: Inferences About Means

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

Parameter, Statistic and Random Samples

multiplies all measures of center and the standard deviation and range by k, while the variance is multiplied by k 2.

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

NCSS Statistical Software. Tolerance Intervals

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment HW5 Solution

Random Variables, Sampling and Estimation

MEASURES OF DISPERSION (VARIABILITY)

For nominal data, we use mode to describe the central location instead of using sample mean/median.

2: Describing Data with Numerical Measures

HUMBEHV 3HB3 Measures of Central Tendency & Variability Week 2

Final Examination Solutions 17/6/2010

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Topic 9: Sampling Distributions of Estimators

Confidence Intervals

Expectation and Variance of a random variable

Introducing Sample Proportions

Summarizing Data. Major Properties of Numerical Data

Introducing Sample Proportions

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Estimation for Complete Data

Chapter 6 Sampling Distributions

Measures of Spread: Variance and Standard Deviation

Census. Mean. µ = x 1 + x x n n

MATH/STAT 352: Lecture 15

Topic 9: Sampling Distributions of Estimators

Read through these prior to coming to the test and follow them when you take your test.

1 Inferential Methods for Correlation and Regression Analysis

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

An Introduction to Randomized Algorithms

Test of Statistics - Prof. M. Romanazzi

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

AAEC/ECON 5126 FINAL EXAM: SOLUTIONS

Topic 9: Sampling Distributions of Estimators

The Hong Kong University of Science & Technology ISOM551 Introductory Statistics for Business Assignment 3 Suggested Solution

AMS570 Lecture Notes #2

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Statistics 511 Additional Materials

PRACTICE PROBLEMS FOR THE FINAL

AP Statistics Review Ch. 8

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Chapter 6 Principles of Data Reduction

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Introduction to Probability and Statistics Twelfth Edition

MATHEMATICS: PAPER III (LO 3 AND LO 4) PLEASE READ THE FOLLOWING INSTRUCTIONS CAREFULLY

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

Lecture 5. Random variable and distribution of probability

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date:

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Measures of Variation Cumulative Fequency Box and Whisker Plots Standard Deviation

Good luck! School of Business and Economics. Business Statistics E_BK1_BS / E_IBA1_BS. Date: 25 May, Time: 12:00. Calculator allowed:

MA238 Assignment 4 Solutions (part a)

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

Chapter 4 - Summarizing Numerical Data

(6) Fundamental Sampling Distribution and Data Discription

Formulas and Tables for Gerstman

Module 1 Fundamentals in statistics

Estimating the Population Mean - when a sample average is calculated we can create an interval centered on this average

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Sample Size Determination (Two or More Samples)

(7 One- and Two-Sample Estimation Problem )

Solutions to Odd Numbered End of Chapter Exercises: Chapter 4

Chapter 8: Estimating with Confidence

Number of fatalities X Sunday 4 Monday 6 Tuesday 2 Wednesday 0 Thursday 3 Friday 5 Saturday 8 Total 28. Day

MIT : Quantitative Reasoning and Statistical Methods for Planning I

Lecture 24 Floods and flood frequency

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

CONFIDENCE INTERVALS STUDY GUIDE

µ and π p i.e. Point Estimation x And, more generally, the population proportion is approximately equal to a sample proportion

Statisticians use the word population to refer the total number of (potential) observations under consideration

Data Analysis and Statistical Methods Statistics 651

HYPOTHESIS TESTS FOR ONE POPULATION MEAN WORKSHEET MTH 1210, FALL 2018

Lecture 7: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Lecture 4. Random variable and distribution of probability

Exercise 4.3 Use the Continuity Theorem to prove the Cramér-Wold Theorem, Theorem. (1) φ a X(1).

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Sampling Distributions, Z-Tests, Power

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

NATIONAL SENIOR CERTIFICATE GRADE 12

Bayesian Methods: Introduction to Multi-parameter Models

Transcription:

ENGI 44 Probability ad Statistics Faculty of Egieerig ad Applied Sciece Problem Set Solutios Descriptive Statistics. If, i the set of values {,, 3, 4, 5, 6, 7 } a error causes the value 5 to be replaced by 50, (a) what effect will this chage have o the media value? Noe at all! (b) what effect will this chage have o the mea value? The mea will icrease. The arithmetic mea is quite sesitive to outliers. Compared to the other values i the set, 50 is a extreme outlier ad ca be expected to pull the value of the mea up by a cosiderable amout. It is easy to see, from symmetry, that the pre-error mea is 4. A quick calculatio shows that the post-error mea is just over 33. (c) what effect will this chage have o the mode? Noe at all! The mode is ill-defied both before ad after the error. All seve data values are uique.

ENGI 44 Problem Set - Solutios Page of 6 (d) which of mea ad media is the better measure of locatio for this chaged data set ad why? I this case, the media. Usually, the arithmetic mea is preferred as a measure of cetral locatio, because it uses all data values ad the methods of calculus are much easier to apply. However, i this case, the presece of the extreme outlier reders the mea a poor idicator of cetral locatio.. The total scores obtaied o a pair of biased ( loaded ) dice whe they were throw 00 times are summarized i the frequecy table below: Score x Frequecy f Score x 8 7 3 0 9 4 0 0 5 37 6 5 7 5 Total: 00 (a) Display this iformatio o a bar chart. Frequecy f Bar charts produced both by Miitab ad Excel are o the ext page. The source files are available from these liks: Miitab project file (Versio 4) Excel worksheet (97-003 compatible)

ENGI 44 Problem Set - Solutios Page 3 of 6 (a) (cotiued) From Miitab: Bar chart of total score x o dice pair 40 35 30 Frequecy 5 0 5 0 5 0 3 4 5 6 7 x 8 9 0 From Excel:

ENGI 44 Problem Set - Solutios Page 4 of 6 (b) Idetify the mode. The mode is the most fashioable value of x; that is, the value of x for which the frequecy is greatest. A brief ispectio of the table (or the bar chart) shows that the maximum frequecy, 37, occurs at mode = (c) Costruct the cumulative frequecy table ad hece fid the media. (d) Fid the arithmetic mea. (e) Fid the sample variace. The frequecy table is exteded to provide solutios to parts (c), (d) ad (e) together. The cumulative frequecy i the i th class, c i, ca be foud iteratively: c i = c i + f i, where f i is the frequecy i the i th class, together with c = f. Score Frequecy Cum. freq. x f c x f x f 4 3 0 0 0 4 4 6 5 3 5 5 6 5 7 7 5 0 35 45 8 7 7 56 448 9 8 99 89 0 0 48 00 000 37 85 407 4477 5 00 80 60 Total: 00 000 0338 (c) The media of 00 ordered values is the average (arithmetic mea) of the 50 th ad 5 st largest values. From the cumulative frequecy table, it is clear that the 49 th through 85 th values are all. Therefore, media = (d) Arithmetic mea: x x f 000 = = = f 00 0

ENGI 44 Problem Set - Solutios Page 5 of 6 (e) Sample variace: ( ) 338 x f x f 033800 000 000 s = = = = 9900 99 ( ) 3.4 (f) Commet o ay evidece for skew i these data. From the bar chart i part (a), there is clearly a loger tail o the left tha there is o the right. There is evidece for strog egative skew 3. The grades received by a egieerig class i a certai course are as show i the frequecy table below: Grade Frequecy A 34 B 47 C 50 D 8 F 6 Display this iformatio graphically i the form of (a) a bar chart (b) a pie chart Show the calculatio for the agle of ay two segmets of the pie chart. Each agle i the pie chart is calculated from the frequecy usig the formula Frequecy Pie chart agle = 360 Total frequecy The calculatios for all five agles are displayed i the associated Excel file, which also produced the bar ad pie charts o the ext page.

ENGI 44 Problem Set - Solutios Page 6 of 6 Questio 3 (cotiued) Bar chart: Pie chart: I questios 4 to 7 below, use Miitab (or some other software package) to aswer the questios. If you do ot use Miitab, the state what software package you have used.

ENGI 44 Problem Set - Solutios Page 7 of 6 4. For the followig data set, (also available as a plai text file here),.035.545 6.3796 0.6863.498 9.400 8.008 9.3688 7.084.353 7.674.0376.3456.4693.637 3.8840 3.436.4395 9.060 0.385.345 9.0963 9.9664 0.0884 0.689 0.857.53 8.98 8.8498 0.54.3870 7.876 0.64 0.064 7.938 9.403.544 8.3797.705 9.957 (a) create a pritout of Miitab s stadard Descriptive Statistics output, icludig the default bar chart with superimposed ormal graph ad the default boxplot, (as was demostrated i the Miitab tutorial), (or provide equivalet iformatio from some other software package). (b) What evidece do you see for skewess i these data? (a) From the associated Miitab project file: Descriptive Statistics: Data Variable N N* Mea SE Mea StDev Miimum Q Media Q3 Data 40 0 0.7 0.65.678 6.380 9.069 0.335.30 Variable Maximum Data 3.884 4 Histogram (with Normal Curve) of Data Mea 0.7 StDev.678 N 40 0 Frequecy 8 6 4 0 6 8 0 Data 4

ENGI 44 Problem Set - Solutios Page 8 of 6 4 (a) (cotiued) 4 Boxplot of Data 3 Data 0 9 8 7 6 (b) The mea ad media are early equal, the whiskers of the boxplot are approximately equal, there are o outliers ad the media is ear the cetre of the box. There is o clear evidece of skew [I fact, these data were geerated by Miitab from a ormal distributio of mea 0 ad stadard deviatio. Normal distributios have zero skew, although radom samples draw from them may be somewhat skewed by chace.]

ENGI 44 Problem Set - Solutios Page 9 of 6 5. For the followig data set of 00 values, (also available as a plai text file here),.8679 3.03009 6.40883 4.33369 0.63779 0.5385 0.4579 3.079.38530 4.67676.7304.7739 0.854.85599.8534.7757.8583 0.65357 0.4.97.47675.7943 0.66736.5375 3.759.8378 0.790.60064.8358.67403.03660 0.50900.0876.59330 0.969 0.760.6550 0.53473.4 0.67745 3.68679 5.63466 4.460 0.63746.00497.4397.05.760.394.5488.758.878.0864.436.549.36957 3.34404 4.357 0.8697.300 0.66336 3.653.769.94.6554.56736 0.84466 0.4495.48484 4.6585 5.37489.8596.67463 0.87603.675.57 0.68.85488 3.8630 0.6538 0.7766 0.970.0063 0.99977.6056.0060.06657.938 0.8605.809.9997.944.58438 0.94377 0.33508.94735.83459.8873.7406.6448 (a) create a pritout of Miitab s stadard Descriptive Statistics output, (or provide equivalet iformatio from some other software package). From the associated Miitab project file, Descriptive Statistics: Data Variable N N* Mea SE Mea StDev Miimum Q Media Q3 Data 00 0.936 0.7.7 0.3 0.948.707.609 Variable Maximum Data 6.409 (b) costruct a stadard boxplot, orieted horizotally, with gridlies at itervals of 0.5 uits. ENGI 44 Problem Set Questio 5 0.0 0.5.0.5.0.5 3.0 3.5 Data 4.0 4.5 5.0 5.5 6.0 6.5

ENGI 44 Problem Set - Solutios Page 0 of 6 5 (c) idetify ay outliers (list their values). The boxplot clearly displays three outliers, all at the upper ed (ad oe of them extreme). From a sorted list of values i the Miitab project files, the outliers are the values 5.37489, 5.63466 ad 6.40883 (d) costruct a histogram, usig as class boudaries the cosecutive itegers, from 0 to the ext iteger above the largest observed value. 0.35 ENGI 44 Problem Set Questio 5 0.30 0.5 Desity 0.0 0.5 0.0 0.05 0.00 0 3 Data 4 5 6 7 (e) What evidece do you see for skewess i these data? All of the above illustrate a strog positive skew. The boxplot gives the clearest idicatio of positive skew. [I fact, these data were geerated by Miitab from a gamma distributio, with parameters α =, β = ad therefore mea = variace =.]

ENGI 44 Problem Set - Solutios Page of 6 6. For the followig data set of 30 values, (also available as a plai text file here), 0.957438 0.66777 0.69579 0.53556 0.989805 0.740677 0.837656 0.8593 0.97656 0.789 0.930773 0.945 0.96407 0.99488 0.90530 0.98569 0.658793 0.88450 0.978 0.99899 0.93477 0.905575 0.856455 0.7894 0.836906 0.89483 0.5985 0.848346 0.90458 0.96747 (a) create a pritout of Miitab s stadard Descriptive Statistics output, (or provide equivalet iformatio from some other software package). From the associated Miitab project file, Descriptive Statistics: Data Variable N N* Mea SE Mea StDev Miimum Q Media Q3 Data 30 0 0.8467 0.039 0.309 0.536 0.777 0.8979 0.9404 Variable Maximum Data 0.9990 (b) costruct a stadard boxplot ad add a symbol to idicate the locatio of the arithmetic mea..0 ENGI 44 Problem Set Questio 6 0.9 0.8 Data 0.7 0.6 0.5

ENGI 44 Problem Set - Solutios Page of 6 6 (c) idetify ay outliers (list their values). Two outliers are preset, both at the lower ed. At the top of the colum of sorted data i the worksheet, we fid that the outliers are the values 0.53556 ad 0.5985 The iterquartile rage is IQR = xu xl 0.9404 0.777 = 0.634 the lower outer fece is at x = xl 3( IQR) 0.777 3 0.634 = 0.869 Both outliers are therefore mild ideed, they are barely below the lower ier fece! x = x.5 IQR 0.777.5 0.634 = 0.530 ] [The lower ier fece is at ( ) L (d) costruct a histogram, class widths of 0., from 0 to. 5 ENGI 44 Problem Set Questio 6 4 Desity 3 0 0.0 0. 0. 0.3 0.4 0.5 Data 0.6 0.7 0.8 0.9.0 (e) What evidece do you see for skewess i these data? The boxplot ad the histogram both illustrate clearly a strog egative skew. I the boxplot, both outliers are below the box, the mea is below the media, the lower whisker is much loger tha the upper whisker ad the box is ot symmetric about the media, with the lower quartile much farther away from the media tha the upper quartile. [I fact, these data were geerated by Miitab from a [egatively skewed] beta distributio, with parameters α = 4, β = ad therefore mea = 4/5 ad variace = /75.]

ENGI 44 Problem Set - Solutios Page 3 of 6 7. For the followig data set of 60 values, (also available as a plai text file here), 7 6 43 54 54 48 48 59 55 6 50 55 30 66 4 55 48 57 6 48 46 6 30 50 66 73 54 48 66 6 45 57 48 70 68 43 5 50 46 64 46 50 50 50 48 37 45 53 64 50 39 3 66 68 4 70 48 73 39 43 (a) costruct a frequecy bar chart, with classes of width 5 ad cetres at { 3, 37, 4, 47,..., 67, 7 }. From the associated Miitab project file, 4 ENGI 44 Problem Set Questio 7 0 Frequecy 8 6 4 0 3 37 4 47 5 57 Data 6 67 7 (b) create a pritout of Miitab s stadard Descriptive Statistics output, but display oly the umber cout, mea, stadard deviatio, media ad quartiles, (or provide equivalet iformatio from some other software package). Descriptive Statistics: Data Variable N Mea StDev Q Media Q3 Data 60 5.93 0.7 46.00 50.00 6.00

ENGI 44 Problem Set - Solutios Page 4 of 6 7 (c) idetify the modal class ad the media class from your bar chart. The tallest bar, height = frequecy = 3, is the bar cetred o x = 47. Therefore The modal class is [44.5, 49.5) From the summary statistics i part (b), the media is x = 50. This value falls i the class cetred o x = 5. Therefore The media class is [49.5, 54.5) (d) use the grouped data (from the bar chart) to calculate the mea, the populatio stadard deviatio ad the sample stadard deviatio (you may fid this easier to do i a spreadsheet program such as Microsoft Excel ). From the associated Excel file, x f 390 x = = = 53.6 [compare this to the value 5.93 i part (b) above], f 60 N x ( x) 60 7640 ( 390) 40900 σ = N = 60 = 3600 = 3.638 σ = 3.638 0.660 ad s = x ( x) 60 7640 ( 390) 40900 = = 60 59 3540 5.565 ( ) s 5.565 0.750 [compare this to the value 0.7 i part (b) above]. [Note: if these 60 data are a sample draw from a larger populatio, the the sample stadard deviatio is the appropriate form to use as a measure of spread. Oly if these 60 values costitute the etire populatio should the formula for σ be used istead.] (e) Why are the mea ad stadard deviatio that you calculated i part (d) differet from the Miitab values? The differece arises from the loss of precise iformatio caused by groupig the data together ito classes.

ENGI 44 Problem Set - Solutios Page 5 of 6 8. Problem Set Bous Questio, Descriptive Statistics Prove that, for ay real costat a Hit: Use the idetities i = x, ( x i x ) < ( x i a ) i= i= k = k (for ay costat k ) ad x i = x. i = i i i= i= < 0. Rearrage the iequality ito the form ( x x ) ( x a ) Maipulate the left had side of this iequality: ( x x) ( x a) = ( x x x + x ) ( x x a+ a ) i i i i i i i= i= i= i= x i i = = x x + x x i i= i= i= i i i= i= + a x a ( ) ( ) = x x + x + ax a = x + ax a = ( x a) < 0 a x i i i= i= x x < x a a x. Therefore ( ) ( )

ENGI 44 Problem Set - Solutios Page 6 of 6 Additioal Note for Questio 8: It the follows that, for ay radom sample of size draw from a populatio of true mea µ, ( x i x ) ( x i µ ) i= i= (with equality oly i the very ulikely evet that x = µ ). σ = x N i µ (where there are N members i the etire populatio). i = Oe ca the speculate [correctly] that, o average, ( ) xi x σ Recall that ( ) N i = ( x ) i x is said to be a biased estimate of i = σ, i that it uderestimates the true value of σ o average. The bias disappears whe this variace formula is replaced by the sample variace s = ( x ) i x. i = I the sectio o estimators we shall see a proof that s is a ubiased estimate of σ. Partial solutios usig Matlab are available as m-files for the followig questios: Questio Questio 3 Questio 4 Questio 5 Questio 6 Questio 7 Retur to the idex of solutios