STAT-UB.0103 Exam APRIL.11 SQUARE Version Solutions

Size: px

Start display at page:

Download "STAT-UB.0103 Exam APRIL.11 SQUARE Version Solutions"

Madlyn Barrett
5 years ago
Views:

1 STAT-UB.0103 Exam 01.APIL.11 SQUAE Version Solutions S1. Jason Harter is a professional fund raiser for charities. He s currently working with the Pets--Luv animal shelter. The operating account for Pets--Luv currently is at $80,000. It seems reasonable to assume that the daily changes in this account will be random with mean $3,000 and with standard deviation $,400. The daily changes involve both donations (positive) and expenses, including Jason s fees (negative). What is the probability that in exactly ten days of fund-raising, this account will reach or exceed $100,000? (a) [5 points] Please state any assumptions that are helpful in doing this work. (b)[15 points] Find the actual probability, using the numbers above and your assumptions in (a), that the account will reach or exceed $100,000 in ten days. SOLUTION: For (a) there are two critical assumptions. You ll need to assume that the daily values are independent of each other. You ll also need to assume that the daily changes come from a normal population. The sample of n = 10 is not officially enough to invoke the Central Limit theorem. For (b), let T be the total for 10 days. Let 0 = $80,000 be the current account, and let 10 be the account after 10 days. Observe that 10 = 0 + T. The mean of the distribution of T is 10 $3,000 = $30,000, and the standard deviation of the distribution of T is $, $7, Then P[ 10 $100,000 ] = P[ 0 + T $100,000 ] = P[ T $0,000 ] = T $30, 000 $0, 000 $30, 000 P $7, $7, P[ Z ] = P[ 0 Z ] P[ 0 Z 1.3 ] = = Software will get the slightly more precise answer This charity has a very good chance of reaching the $100,000 objective. S. The output below, based on a sample of used cars, shows the regression of mileage on year of manufacture. Commas were inserted in large numbers to improve readability. Several of the positions are blank. Please supply the missing values. 1 gs 01

2 STAT-UB.0103 Exam 01.APIL.11 SQUAE Version Solutions egression Analysis: Mileage versus Year The regression equation is Mileage = 0,063,597 + Year (a) Predictor Coef SE Coef T P Constant 0,063,597 1,046, Year -10, (b) S = -Sq = -Sq() = (c) (d) (e) Analysis of Variance Source DF SS MS F P egression 1 195,870,000, ,870,000, (f) esidual Error 536,183,115 (g) (h) Total 17 87,557,000,000 SOLUTION: (a) You can copy the slope from the listing below. It s -10,015.. (b) Coef Since t = SE Coef, this is 10, (c) The value in this position is s ε and it s recovered as MS esidual = 536,183,115 (d) (e) 3,156. The value is found as The SS egression SS Total value can be found as either = 195,870,000,000 87,557,000,000 1 s ε s Y To use the first formula, you ll need s Y = = 68.1%. n 1 or as 1 ( 1 ) SSTotal n 1 = 87,557, 000, s ε 3,156 40,888. Then 1 = 1 s Y 40, = 67.93%. To use the second formula, note that = 1 in a simple regression. This 17 gives 1 ( ) gs 01

3 STAT-UB.0103 Exam 01.APIL.11 SQUAE Version Solutions (f) Find F = MS MS egression esidual = 195,870,000, ,183, Since this is a one-predictor regression, you can also get this as t = (-19.11) (g) This is 171. (h) This is obtained directly as the difference SS Total SS egression = 87,557,000, ,870,000,000 = 91,687,000,000. You could also get this SSesidual from the relation MS esidual =. Thus SS esidual = 536,183, ,687,31,665. The complete listing follows. Some of the results differ slightly because values were reconstructed from rounded information. egression Analysis: Mileage versus Year The regression equation is Mileage = 0,063, Year Predictor Coef SE Coef T P Constant 0,063,597 1,046, Year -10, S = 3, Sq = 68.1% -Sq() = 67.9% Analysis of Variance Source DF SS MS F P egression 1 195,870,000, ,870,000, esidual Error ,687,31, ,183,115 Total 17 87,557,000,000 S3. Frank Tanner is the lab manager at TazerTek, a firm that develops games for cell phones, smart phones, ipads, and similar devices. He has been asked to estimate the solution time for a potential new game Angry Pigs. The subjects will be female high school students for whom the typical solution times for games of this style is 6.4 minutes, with a standard deviation of 1.8 minutes. How many students should Frank request if he d like his sample average to be within 1 3 of a minute (0 seconds) of the true average with a probability of at least 80%? zα/ σ SOLUTION: This is governed by the formula n E. Frank should use E = (target error limit), σ = 1.8 (plausible standard deviation), z α/ = 1.8 (corresponding to 1 α = 0.80 and α/ = 0.10). This gets to n So it looks like Frank should ask for 48 subjects. 3 gs 01

4 STAT-UB.0103 Exam 01.APIL.11 SQUAE Version Solutions It s possible that Frank may want to be a little more conservative about the assessed standard deviation. After all, the 1.8 minutes was obtained with slightly different games. Using σ =.5, say, would lead to a request for 93 subjects. On the other hand, it s very easy to find subjects and inexpensive to run the subjects through the experiment. S4. The regression of Z on H gave this Minitab output: egression Analysis: Z versus H The regression equation is Z = H Predictor Coef SE Coef T P Constant H S = Sq = 33.6% -Sq() = 3.1% Analysis of Variance Source DF SS MS F P egression esidual Error Total Please answer T (true) or F (false) to each of the following. (a) The regression slope would be regarded as not statistically significant. (b) The correlation between the variables H and Z is negative. (c) Most of the residuals are between -5 and +5. (d) The total sum of squares, namely 1,097.15, provides solid evidence of the effect of regression to mediocrity. (e) It is somewhat surprising that <. (f) The data provide convincing evidence that increasing H by one unit will cause a decrease in Z of 1.0 units. (g) The fit would be appraised as poor, because SS esidual Error > SS egression. (h) A new data point with H new = 10 would lead to a prediction of Z new = = (i) If all the data values (meaning all the H i s and all the Z i s) were doubled, the slope would remain unchanged. (j) If all the data values (meaning all the H i s and all the Z i s) were doubled, then the F statistic would be SOLUTION: (a) is false. (b) is true. The t statistic for the regression is -4.77, which is easily significant. The correlation coefficient has the same sign as b 1, the slope estimate. Since b 1 = -1.0, we know that the correlation coefficient must also be negative. 4 gs 01

5 STAT-UB.0103 Exam 01.APIL.11 SQUAE Version Solutions (c) is true. (d) is false. (e) is false. (f) is false. (g) is false. (h) is true. (i) is true. (j) is false. The residuals, as a set of numeric values, have mean zero and standard deviation s ε = Thus about 3 of the residuals will be in (-4.03, 4.03). It follows that considerably more than 3 of the residuals are in (-5, 5). The two issues here are completely unrelated. There s no surprise. This is what usually happens. There is absolutely nothing here to suggest causation. The relative sizes of these, as measured through the F statistic, is what really matters. That s exactly how a regression is to be used. The slope would not change. The F statistic would not change. S5. The following printout concerns a best subsets regression. Questions follow. esponse is ZP 139 cases used. D B e E A a p x r k C r p m e l e e o r a c r Vars -Sq -Sq() C-p S r y y t t X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X (a) What is the dependent variable? (b) Give the names of the independent variables. (c) If you had to select the best two-predictor model, which predictors would you use? (d) Which is the best set of predictors to use? Indicate your reasons. 5 gs 01

6 STAT-UB.0103 Exam 01.APIL.11 SQUAE Version Solutions SOLUTIONS: (a) The dependent variable is ZP. (b) There are five independent variables. These are Armor, Bakery, Clay, Deprect, and Expert. (c) The best two-predictor model uses Armor and Clay. (d) The best model, using the C p criterion, clearly uses Armor, Clay, Deprect, Expert (all but Bakery). If you focus on or on s ε, you could make a case for Armor, Clay, Deprect. S6. Each workday, Monday through Friday, a truckload of scrap metal is delivered from ound Two ecycling to Metals Plus. The payment is based on the weights delivered. Based on a long history, the daily weights have a mean of 14.6 tons, with a standard deviation of 1.6 tons. The mean of the previous 40 deliveries was 15.0 tons. What is the probability that, by chance alone, the mean of 40 deliveries would exceed 15.0 tons? SOLUTION: Let X 1, X,, X 40 be random variables representing the daily delivery weights. We should assume that these values are independent, each from a population with mean µ = 14.6 and σ = 1.6. It s not critical to assume that the population values follow a normal distribution. The sample average X will then have a distribution which is approximately normal with mean 14.6 and with standard deviation It follows then that X P[ X 15.0 ] = P P[ Z ] P[ Z 1.58 ] = 0.50 P[ 0 Z < 1.58 ] = = You could get a slightly more refined answer from software (or from a calculator that has a cumulative normal function); this would be For each of the following, please respond either impossible or could happen. You may use abbreviations I and CH. As illustrations, In a sample of size 15, the correlation between X and Y was 1.. The sample x 1, x,, x n produced an average x = impossible could happen (a) (b) (c) In a multiple regression with four predictors, it was found that >. In a multiple regression with four predictors, it was found that SS egression < SS esidual Error. In the regression of W on Z, the fitted regression line passed through the point ( Z, W ). 6 gs 01

7 STAT-UB.0103 Exam 01.APIL.11 SQUAE Version Solutions (d) Variables H and W have a correlation r HW = 0.48, and the linear regression of H on W produced a slope of (e) In a simple linear regression of Y on X, the value of s ε can be a larger number than the standard deviation of the independent variable X. (f) The regression of Y on {G, H, J} had an value that is less than the value from the regression of Y on {G, H}. (g) In a linear regression with n = 3, all the residuals were negative. (h) In a very strong multiple regression, the F statistic was found as F = 1,810. (i) In a regression of Y on X, the slope was b 1 = 0.8, the correlation was r = 0., and the standard deviation of the x-values was 1,80. (j) In a set of n = 59 values, the correlation was r X, Y = 0.9. Steve did the regression of Y on X and got the slope b 1 (Y on X) = 0.5, while Angela did the regression of X on Y and got the slope b 1 (X on Y) = SOLUTION: (a) Impossible. It always happens that n 1 = 1 ( 1 ) which shows that 1 < 1 <. This can be seen in the formula n 1. Just write as ( ). (b) Could happen. The statement is equivalent to < = 1 (c) Could happen. In fact, it has to happen as a simple consequence of b 0 = W - b 1 Z. (d) Impossible. The slope and correlation must have the same sign. (e) Could happen. Of course s ε and s X can only be compared in those cases in which Y and X are in the same units. (f) Impossible. As the set of predictors is enlarged, can only increase. (g) Impossible. The residuals must sum to zero. (h) Could happen. You will actually see monster values for F now and then. (i) Could happen. There are no contradictions in these statements. (j) Impossible. Observe that b 1 (Y on X) = Sxy Sxy and b 1 (X on Y) = S S. These are not reciprocals of each other, except in the very special case = But having = 1.00 means that either r = +1 or r = -1, which is not the story here. xx yy, 7 gs 01

O2. The following printout concerns a best subsets regression. Questions follow.

O2. The following printout concerns a best subsets regression. Questions follow. STAT-UB.0103 Exam 01.APIL.11 OVAL Version Solutions O1. Frank Tanner is the lab manager at BioVigor, a firm that runs studies for agricultural food supplements. He has been asked to design a protocol for