Princeton University Department of Operations Research and Financial Engineering ORF 245 Fundamentals of Engineering Statistics Final Exam May 22, 2008 7:30pm-10:30pm PLEASE DO NOT TURN THIS PAGE AND START THE EXAM UNTIL YOU ARE TOLD TO DO SO. Instructions: This exam is open book and open notes. Calculators are allowed, but not computers or the use of statistical software packages. Write all your work in the space provided after each question. There are questions on both sides of each page. Explain as thoroughly and as clearly as possible all your steps in answering each question. Full or partial credit can only be granted if intermediate steps are clearly indicated. Name: Pledge: I pledge my honor that I have not violated the honor code during this examination. Signature: 1: (12) 6: (15) 11: (12) 2: (06) 7: (20) 12: (10) 3: (10) 8: (10) 13: (10) 4: (05) 9: (20) 14: (12) 5: (05) 10: (08) 15: (20) Total: (175)
Descriptive Statistics: 2 1) Let x n and s n denote the sample mean and variance for the sample x1,..., x n and let 2 x n + 1 and s n + 1 denote these quantities when an additional observation x n + 1 is added to the sample. a) (4 pts.) Show how x n + 1 can be computed from x n and x n + 1. b) (8 pts.) Show that 2 2 n ns ( ) 2 n+ 1 = ( n 1) sn + xn+ 1 x n n + 1 2 2 so that s n + 1 can be computed from x n + 1, x n, and s n. 2
2) Consider the following histogram that shows the time in months that articles submitted to a certain scientific journal in 2002 took to be reviewed for publication. a) (3 pts.) Which class interval contains the median review time? b) (3 pts.) Which class interval contains the third quartile of the review times? 3
Probability: 3) Items are inspected for flaws by two quality inspectors. If a flaw is present, it will be detected by the first inspector with probability 0.9, and by the second inspector with probability 0.7. Assume that the inspectors function independently. a) (4 pts.) If an item has a flaw, what is the probability that it will be found by at least one of the inspectors? b) (6 pts.) Assume that both inspectors inspect every item and that if an item has no flaw, then neither inspector will detect a flaw. Assume also that the probability that an item has a flaw is 0.10. If an item is passed by both inspectors, what is the probability that it actually has a flaw? 4
4) (5 pts.) An urn contains 3 red balls and 7 black balls. Players A and B withdraw balls from the urn consecutively until a red ball is selected. Namely, A draws the first ball, then B draws the second one, then A again, and so on, until the first one of them draws a red ball. If there is no replacement of the drawn balls, find the probability that A selects the red ball. Random Variables: 5) (5 pts.) Two types of coins are produced at a factory: a fair coin and a biased one that comes up heads 55 percent of the time. We have a coin from this factory but do not know whether it is a fair coin or a biased one. In order to ascertain which type of coin we have, we will perform the following statistical test: we will toss the coin 1000 times. If the coin lands on heads 525 or more times, then we will conclude that it is a biased coin, whereas, if it lands heads less than 525 times, then we will conclude that it is the fair coin. If the coin is actually fair, what is the probability that we will reach a false conclusion? [Hint: use the Normal approximation with continuity correction.] 5
6) (15 pts.) A bus travels between two cities A and B, which are 100 miles apart. If the bus has a breakdown, the distance from the breakdown to city A has a uniform distribution over (0, 100). There is a bus service station in city A, in B, and in the center of the route between A and B. It is suggested that it would be more efficient to have the three stations located 25, 50, and 75 miles, respectively, from A. Do you agree? Why? [Hint: compare the expected distance that the bus would have to be towed, from the breakdown point to the nearest service station.] 6
Joint Probability Distributions: 7) Choose a number X at random from the set of numbers { 1,2,3,4,5 }. Now choose a number at random from the subset no larger than X, that is, from { 1,..., X }. Call this second number Y. a) (10 pts.) Find the joint probability mass function of X and Y. b) (7 pts.) Find the expected value and the variance of Y. c) (3 pts.) Are X and Y independent? Explain. 7
Statistical Estimation: 8) (10 pts.) Maximum likelihood estimates possess the property of functional invariance, which means that if ˆ θ is the MLE of θ, and h( θ ) is any function of θ, then h( ˆ θ ) is the MLE of h( θ ). Given a random sample X1,..., X n from a geometric distribution with parameter p, find the MLE of the odds ratio p ( 1 p). 8
Confidence Intervals: 9) Let X represent the number of events that are observed to occur in n units of time or space, and assume that X Poisson nλ, where λ is the mean number of events that ( ) occur in one unit of time or space. Assume that is large, so that X N nλ, nλ. A suitable estimator of λ is given by ˆ λ = X n, with standard error SE( ˆ λ) = λ n. a) (4 pts.) Assuming that X is large, what is the distribution of ˆλ? (Name the distribution and tell the values of its parameters.) X ( ) b) (4 pts.) Use the distribution found in the previous item and the fact that SE ( ˆ λ) ˆ λ n to derive an expression for the 100(1 α ) % confidence interval for λ. c) (4 pts.) A 5 ml sample of a certain suspension is found to contain 300 particles. The mean number of particles per ml in the suspension is, give or take. d) (4 pts.) After 4 minutes, a geologist counted 256 particles emitted from a certain radioactive rock. Find a 95% confidence interval for the rate of emissions in units of particles per minute. 9
e) (4 pts.) For how many minutes should particles be counted so that the 95% confidence interval specifies the rate to within ± 1 particle per minute? 10) A sample of seven concrete blocks had their compressive strength measured in MPa. The results were 1367.6, 1411.5, 1318.7, 1193.6, 1406.2, 1425.7, and 1572.4. Ten thousand bootstrap samples were generated from these data, and the bootstrap sample means were arranged in order. Refer to the smallest mean as Y 1, the second smallest as Y2, and so on, with the largest being Y10000. Assume that Y 50 = 1283.4, Y 51 = 1283.4, Y 100 = 1291.5, Y 101 = 1291.5, Y 250 = 1305.5, Y 251 = 1305.5, Y 500 = 1318.5, Y 501 = 1318.5, Y 9500 = 1449.7, Y 9501 = 1449.7, Y 9750 = 1462.1, Y 9751 = 1462.1, Y 9900 = 1476.2, Y 9901 = 1476.2, Y 9950 = 1483.8, and Y 9951 = 1483.8. a) (4 pts.) Compute the 95% bootstrap confidence interval for the mean compressive strength. b) (4 pts.) Was this a parametric or a nonparametric bootstrap procedure? Explain. 10
Tests of Hypothesis: 11) An article by Abdel-Aty et al. in the Journal of Transportation Engineering presents a tabulation of types of car crashes by the age of the driver over a three-year period in Florida. Here is the table: Age of drivers 15-24 years 25-64 years Total # of accidents 82,486 219,170 # of accidents in driveways 4,243 10,701 a) (4 pts.) The difference between the proportions of driveway accidents for drivers aged 15-24 and drivers aged 25-64 is %, give or take %. b) (4 pts.) Can you conclude that driveway accidents among 15-24 year-olds in FL are indeed likely to be proportionately higher than driveway accidents among 25-64 year-old Floridians? State the hypotheses clearly and answer this question using the P-value. c) (4 pts.) Assuming that young drivers in Florida do present a higher proportion of driveway accidents than older drivers, does this mean that younger Floridian drivers should be required to take a special course on how to drive on driveways, but not older drivers? Explain. 11
12) An engineer claims that a new type of hard disk for laptops lasts longer than the old type. Independent random samples of 75 of each of the two types are chosen, and the sample means and standard deviations of their lifetimes are computed: New: X 1 = 4387 h s 1 = 252 h Old: X 2 = 4260 h s 2 = 231 h a) (4 pts.) Can you conclude that the mean lifetime of new hard disks is greater than that of the old hard disks? State the hypotheses clearly and answer this question at the 1% significance level. b) (4 pts.) If the new hard disks have indeed a mean lifetime 40 h longer than the old ones, what is the probability ( β ) that the test performed in the previous item will incur into error of type II (that is, failing to reject )? H 0 c) (2 pts.) Recompute the probability of error type II for the case of the new hard disks having a mean lifetime 80 h longer than the old ones. 12
Correlation and Linear Regression: 13) A chemical engineer is studying the effect of temperature and stirring rate on the yield of a certain product. The process is run 16 times, at the settings indicated in the following table. The units for yield are percent of a theoretical maximum. The matrix of sample correlation coefficients among the variables in question is as follows: a) (5 pts.) Based on the analysis of sample correlation above, would you try and fit a multiple linear regression model in which the yield is the response variable and temperature and stirring rates are the covariates? Explain. 13
b) (5 pts.) Find the 95% confidence interval for the coefficient of correlation between the stirring rate and the yield. What assumptions did you make in order to compute this confidence interval? 14) The chemical engineer from the previous question has decided to calibrate a simple linear regression model with the yield as the response variable ( Y ) and stirring rate as the covariate ( X ). The results of the calibration obtained through Excel are: a) (2 pts.) What proportion of the observed variation in yield can be attributed to the simple linear regression relationship between yield and stirring rate? b) (5 pts.) Can you say that an increase of 10 rpm in the stirring rate will produce an increase in yield of at least 2%? State the hypotheses clearly and answer this question at the 5% significance level. 14
c) (5 pts.) Construct the 95% confidence interval for the prediction of the yield percentage that corresponds to a stirring rate of 55 rpm. In order to compute this interval, you may need the following additional information: 15
Multiple Linear Regression: 15) A study was made in which data was obtained to relate y = specific surface area 3 ( cm /g ) to x 1 = % NaOH used as a pretreatment chemical and x 2 = treatment time (min) for a batch of pulp. The following R output resulted from a request to fit the Y = β + β x + β x +ε. model 0 1 1 2 2 a) (6 pts.) Fill in the blanks in the tables above by computing the following values: the coefficients of determination regular and adjusted, the regression sum of squares, the mean sums of squares regression and residuals, and the value of the F statistics. Show your computations. b) (2 pts.) What proportion of observed variation in specific surface area can be explained by the model relationship? 16
c) (4 pts.) Does the chosen model appear to specify a useful relationship between the response and the covariates? Explain. d) (4 pts.) Provided that % NaOH remains in the model, would you suggest that the covariate treatment time be eliminated? Explain. e) (4 pts.) Calculate a 95% confidence interval for the expected change in specific surface area associated with an increase of 1 % in NaOH when treatment time is held fixed. 17