Princeton University Department of Operations Research and Financial Engineering ORF 45 Fundamentals of Engineering Statistics Final Exam May 15, 009 1:30pm-4:30pm PLEASE DO NOT TURN THIS PAGE AND START THE EXAM UNTIL YOU ARE TOLD TO DO SO. Instructions: This exam is open book and open notes. Calculators are allowed, but not computers or the use of statistical software packages. Write all your work in the space provided after each question. Use the other side of the page, if necessary. Explain as thoroughly and as clearly as possible all your steps in answering each question. Full or partial credit can only be granted if intermediate steps are clearly indicated. Name: Pledge: I pledge my honor that I have not violated the honor code during this examination. Pledge: Signature: 1: (1) 5: (10) 9: (14) : (1) 6: (15) 10: (10) 3: (10) 7: (10) 11: (0) 4: (10) 8: (10) Total: (133)
Descriptive Statistics: 1) Let x and s x denote the sample mean and variance of a random sample x1,..., xn of a random variable X with expected value μ. n a) (3 pts.) For what value of c is the quantity ( xi c minimized? ) i = 1 n b) (3 pts.) Using the result of part (a), which of the two quantities ( x i 1 i x ) and = n ( x i μ ) will be smaller than the other (assuming x μ )? i= 1 c) (3 pts.) Let a and b be constants and let yi = axi + b for i= 1,,..., n. What are the relationships between x and y and between s x and s y? Show how you obtained the results. d) (3 pts.) A sample of temperatures for initiating a certain chemical reaction yielded a sample average ( o C ) of 87.3 and a sample standard deviation of 1.04. What are the sample average and standard deviation measured in o F? [Hint: F = 95C+ 3.] ( )
) Consider the following stem and leaf plot that shows the GPAs of 30 students recently admitted to the graduate program in IEOR at UC Berkeley. a) (3 pts.) Calculate the sample median x. b) (3 pts.) Knowing that the sample mean is given by x = 3.707 and the sample standard deviation is given by s = 0.1457, determine the proportion of the data values that lies within x ± 1.5s. c) (3 pts.) Determine the proportion of the data values that lies within x ± s. d) (3 pts.) Do the data appear to be approximately normal? Explain. [Hint: use the results obtained in parts (b) and (c).] 3
Probability: 3) Suppose that distinct integer values are written on each of 3 cards. Suppose you are to be offered these cards in a random order. When you are offered a card you must either immediately accept it or reject it. If you accept a card, the process ends. If you reject a card, then the next card (if a card remains) is offered. If you reject the first two cards offered, then you must accept the final card. a) (4 pts.) If you plan to accept the first card offered, what is the probability that you will accept the highest valued card? b) (6 pts.) If you plan to reject the first card offered, and to then accept the second card if and only if its value is greater than the value of the first card, what is the probability that you will accept the highest valued card? 4
4) Each of balls is painted black or gold and then placed in an urn. Suppose that each ball is colored black with probability 1, and that these events are independent. a) (6 pts.) Suppose that you obtain information that the gold paint has been used (and thus at least one of the balls is painted gold). Compute the conditional probability that both balls are painted gold. b) (4 pts.) Suppose, now, that the urn tips over and 1 ball falls out. It is painted gold. What is the probability that both balls are gold in this case? Explain. 5
Random Variables: 5) A large metropolitan transit authority has a fleet of 1000 buses, and it is observed that, on an average, there is a 0.4% chance that a bus will break down on any one day. a) (5 pts.) If the maintenance department of the transit organization has sufficient capacity to cope with 6 breakdowns on any day, calculate, correct to four significant digits, the probability that on any day there will be insufficient staff to attend to all the breakdowns occurring on that day. b) (5 pts.) If the probability of having insufficient staff on any day were actually 11.1%, let Y be the number of days that would have to go by for the organization to experience 3 days of insufficient maintenance capacity (note that Y 3 ). What are the expected value and the standard deviation of Y? 6
6) Each of two detergents, A and B, is marketed in cartons which nominally contain 5 oz of powder. However, it is observed that 15% of the cartons of A contain less than 4.5 oz and 6% more than 6 oz of detergent. On the other hand, 5% of the cartons of B contain less than 4 oz and 8% more than 6.5 oz. Assume that the weight distributions for both detergents are normal and that they are independent from each other. Assume also the price and the quality of the detergents are the same. a) (5 pts.) Determine which detergent gives, on average, better value. [Hint: compare the average weights of the cartons.] b) (5 pts.) Find the probability that detergent B gives better value than detergent A. c) (5 pts.) Find the probabilities that the contents of randomly selected cartons of A and B will be less than the nominal weight. 7
Joint Probability Distributions: 7) (10 pts.) Five individuals, including A and B, take seats around a circular table in a completely random fashion. Suppose the seats are numbered 1,,5. Let X = A s seat number and Y = B s seat number. If A sends a written message around the table to B in the direction in which they are closest, how many individuals (including A and B) would you expect to handle the message? 8
Statistical Estimation: 8) (10 pts.) A binomial population ( X 1 ) gives rise to observations in two distinct classes A and a with probabilities 1 θ and θ respectively, where θ is an unknown parameter such that 0< θ < 1. A second binomial population ( X ), independent from the first, gives rise to observations in the same classes, but with probabilities 1 θ and θ respectively. Random samples of sizes n1 and n are taken from the first and second populations, respectively, and the observed frequencies in the A and a classes are: 1 st sample x 11 1 nd sample x 1 A a x x11 + x1 = n1 x x1 + x = n Derive the equation for ˆ θ, the joint maximum-likelihood estimator of θ obtained from the two samples. [Hint: first find the joint sample likelihood function.] 9
Confidence Intervals and Tests of Hypothesis: 9) A certain professor designed a Java program to randomly pop a quiz with probability p. During the course the popper was used times and in 6 of those a pop quiz did take place. Assume that each usage of the popper was independent from all the others. a) (4 pts.) Your estimate of the probability of a quiz is, give or take. b) (4 pts.) Find the 95% confidence interval for the probability p. [Hint: use the Agresti-Coull interval estimation.] c) ( pts.) What sample size would be needed to obtain a 95% confidence interval with width ±0.1? d) (4 pts.) Can you say at the 10% significance level that the probability p is different from 1/3? Clearly state your hypothesis and answer the question by computing a P-value. 10
10) Consider an experiment in which half of the individuals in a group of 50 postmenopausal overweight women were randomly assigned to a particular vegan diet, and the other half received a diet based on the National Cholesterol Education Program guidelines. The sample mean decrease in body weight, as well as the sample standard deviations, for both diet subgroups are given in the following table: Vegan Diet: X = 5.8 kg s 1 = 3. kg Control Diet: Y = 3.8 kg s =.8 kg Assume that both populations are normal with σ1 = σ. a) (5 pts.) Estimate the difference between the true average weight losses for the two diets with a 95% confidence interval. b) (5 pts.) Does it appear that the true average weight loss for the vegan diet exceeds that for the control diet by more than 1 kg? Carry out an appropriate test of hypothesis at the significance level 0.05 based on calculating a P-value. 11
Correlation and Linear Regression: 11) The grades in the midterm exam of a certain class were not good. So the professor decided to offer the students the chance to retake the same exam at home. He then averaged the in-class grade with the home-retake grade in order to obtain the final midterm grade. His claim was that the final midterm grades would be highly linearly correlated with the in-class grades and thus this would be a fair, but less arbitrary, way of curving the initial grades. Let x represent the in-class grades and y represent the final midterm grades. The following R output shows many of the results of fitting the model Y = β0 + β1x+ε to the 16 ( xi, y i) data points. a) (6 pts.) Fill in the blanks in the tables above by computing the following values: the t-statistics associated to the estimate ˆβ 0, the standard error associated to the estimate ˆβ 1, the coefficient of determination, the regression sum of squares, the mean sums of squares regression and residuals, and the value of the F statistics. Show your computations. 1
b) ( pts.) What is the sample correlation between the in-class grades and the final midterm grades? c) (4 pts.) Is there substantial evidence that the population correlation between inclass and final midterm grades is at least 0.9? [Hint: this is a hypothesis test ] d) ( pts.) What were the assumptions made about the errors ε i in the linear model Y = β + β x+ε above? 0 1 13
e) ( pts.) Based on the plot of residuals versus fitted values below, do you think that the assumptions stated in part (d) hold for the data in question? Explain. f) (4 pts.) Suppose that you wanted to use the calibrated model above to predict the final midterm grade corresponding to an in-class grade of 50. Find the 90% prediction interval for the final midterm grade. Note that x = 4.913. 14