Stat 401B Final Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning will receive NO partial credit. Correct numerical answers to difficult questions unaccompanied by supporting reasoning may not receive full credit. SHOW YOUR WORK/EXPLAIN YOURSELF! 1
1. Some 1 inch finished hex nuts have weights that are normally distributed with mean 17gm and 4 pts standard deviation.6gm. a) What fraction of these nuts have weights above 17.4gm? 6 pts b) These nuts are packaged by weight. A package (intended to hold at least 100 of these hex nuts) will be filled with a weight of nuts that is at least 1710gm. Approximate the probability that 99 nuts have a total weight of at least 1710gm (so that the actual count of nuts is less than desired number). (Hint: What would the average weight of these 99 nuts have to be for this to happen?). A data base is segmented as below in terms of "type of record" and "completeness of record." Suppose that one will select a single record at random from this data base. Type A Type B Type C Complete 00 records 350 records 450 records Incomplete 50 records 50 records 100 records a) Evaluate P Type B Incomplete. b) Are the events "Type B" and "Incomplete" independent? Say why or why not.
3. Some experimental data on the page "Using Central Composite Design for Process Optimization" on the weibull.com web site concern the tensile strength of welds made on steel. We will use these data in various ways in this problem. First, n 7 welds made at a standard set of process conditions produced y 6611 kgf and s 145 kgf. a) Give 95% confidence limits for the mean strength of steel welds made under standard process conditions. (Plug in completely, but you need not do arithmetic.) b) Interpret your interval from a). (Say carefully what is meant by the "95%" figure.) c) A weld is made at a non-standard set of process conditions and its strength tested. The value y 5830 kgf is observed. Give 95% confidence limits for the difference in mean strengths for the two sets of welding process conditions. (Plug in completely, but you need not do arithmetic.) d) The non-standard set of process conditions referred to in part c) actually differs from the standard set only in the electrical current applied. Coded values of the current are x 1 for the non-standard conditions and x 0 for the standard conditions. A plot of the data is below. What model assumptions (be complete in stating them) would you make in order to support a prediction interval for the strength of a weld made with coded current x.5 (and all other process conditions standard)? 3
e) In fact, for the situation of e) LF 159 kgf s, x x x 8 i. Give 95% i 1.15 and.875 prediction limits for the next y at x.5 under your assumptions of d). (The least squares line should be obvious to you from the plot in d) and the information given at the beginning of the problem.) (Plug in completely, but you need not do arithmetic.) f) There was also a weld made at coded electrical current x 1 that had strength y 610. A plot of this data point, the 7 data points mentioned at the start of the problem, and the one mentioned in part c) is below. If strength is a linear function of current, 1 0 0 1 0. Give 95% confidence limits for 1 0 0 1. Is it plausible that strength is linear over this range of x? The entire data set was used to fit a model for strength, y, as a linear function of coded current, x 1, coded voltage, x, coded "stick out", x 3, and coded angle, x 4. (See the R output beginning on page 7.) g) What fraction of the raw variability in y is accounted for using x1, x, x3, and x 4 as predictor variables? h) What is the meaning of b3 305.56? (Interpret this fitted coefficient.) 4
i) There is R output for the fit of a full quadratic model for y in the predictor variables x1, x, x3, and x 4 beginning on page 8. Give the value of an F statistic and degrees of freedom for testing whether the quadratic model is a statistically significant improvement over the linear model in x, x, x, and x for explaining y. 1 3 4 F df.., 3 4. Lab #10 used balanced factorial data of Example 4 of Chapter 8 of Vardeman and Jobe. We will here continue use of that scenario. a) Below is the ANOVA table produced in Lab #10. Suppose that one runs the regsubsets() function from the leaps package using the 7 dummy variables created in the lab. Which model with 4 predictors will be identified as best, and what value of R is associated with it? b) Why would it be impossible to answer part a) based on only the table above if the data were not balanced? c) CVSSE values for the best (in terms of R ) models with k 1,,, 7 factorial effects were computed using 8-fold cross validation. A plot of these is below. In light of this plot and the ANOVA table above, what "few effects" model for Power appears best? How does its CVSSE compare to the SSE that would be obtained fitting it? 5
5. Below is a toy data set consisting of 5 x, y. Find the LOO cross-validation SSE for 1- nn prediction. (You don't need to do arithmetic, but write out a complete numerical expression. ) y.5 4.0 4.5 7.5 8.0 x 1.0.0.5 3.5 4.0 N pairs 6. Below are fake regression trees that you may assume come from B 3 bootstrap samples of a large number, N, of x, x, y data points. 1i i i a) What is the random forest prediction at,.5,.5 B 3 trees represented above.) x x? (Assume that the forest includes only the 1 b) Suppose that in fact,,.5,.5,3 x x y was part of the bootstrap samples that were used to 1 produce trees #1 and #, but not #3. What is this data point's contribution to an OOB error sum of squares? OLS 7. Consider the ordinary least squares predictor ŷ Lasso and a lasso predictor ŷ Lasso OLS ŷ x is always at least as big as that for ŷ x (from standard multiple linear regression) x (for some ) computed for the same data. Is it true that the SSE for x? Say why or why not. 6
R Code and Output > Welds current voltage stickout angle strength 1-1 -1-1 -1 4730 1-1 -1-1 4990 3-1 1-1 -1 440 4 1 1-1 -1 730 5-1 -1 1-1 7130 6 1-1 1-1 490 7-1 1 1-1 4110 8 1 1 1-1 500 9-1 -1-1 1 5560 10 1-1 -1 1 4910 11-1 1-1 1 5330 1 1 1-1 1 7490 13-1 -1 1 1 680 14 1-1 1 1 4030 15-1 1 1 1 3690 16 1 1 1 1 410 17-1 0 0 0 5830 18 1 0 0 0 610 19 0-1 0 0 630 0 0 1 0 0 6530 1 0 0-1 0 6370 0 0 1 0 5510 3 0 0 0-1 6390 4 0 0 0 1 6110 5 0 0 0 0 6550 6 0 0 0 0 6650 7 0 0 0 0 6750 8 0 0 0 0 6610 9 0 0 0 0 6340 30 0 0 0 0 6600 31 0 0 0 0 6780 > summary(lm(strength~.,welds)) Call: lm(formula = strength ~., data = Welds) Residuals: Min 1Q Median 3Q Max -1740.7-103.5 31.6 803. 1607.1 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 5805.16 199.69 9.070 <e-16 *** current 9. 6.06 0.35 0.78 voltage -76.67 6.06-0.93 0.77 stickout -305.56 6.06-1.166 0.54 angle -38.89 6.06-0.148 0.883 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 111 on 6 degrees of freedom Multiple R-squared: 0.05766, Adjusted R-squared: -0.08731 F-statistic: 0.3977 on 4 and 6 DF, p-value: 0.8084 7
> anova(lm(strength~.,welds)) Analysis of Variance Table Response: strength Df Sum Sq Mean Sq F value Pr(>F) current 1 153089 153089 0.138 0.777 voltage 1 105800 105800 0.0856 0.77 stickout 1 1680556 1680556 1.3595 0.54 angle 1 7 7 0.00 0.883 Residuals 6 3141108 136196 > summary(lm(strength~.,data.frame(welds))) Call: lm(formula = strength ~., data = data.frame(welds)) Residuals: Min 1Q Median 3Q Max -6.13-14.13 5.84 104.91 350.64 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 6544.16 69.59 94.045 < e-16 *** current 190.00 165.87 1.145 0.68853 voltage 150.00 165.87 0.904 0.37936 stickout -305.56 55.9-5.56 4.60e-05 *** angle -38.89 55.9-0.703 0.491936 current -445.68 145.61-3.061 0.007469 ** voltage -85.68 145.61-0.588 0.564470 stickout -55.68 145.61-3.610 0.00348 ** angle -15.68 145.61-1.481 0.157979 currentvoltage 753.75 58.64 1.853 7.56e-10 *** currentstickout -56.5 58.64-8.974 1.1e-07 *** currentangle -110.00 175.93-0.65 0.54064 voltagestickout -68.75 58.64-10.7 1.03e-08 *** voltageangle -55.00 175.93-1.449 0.166533 stickoutangle -77.50 58.64-4.73 0.0006 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 34.6 on 16 degrees of freedom Multiple R-squared: 0.974, Adjusted R-squared: 0.9516 F-statistic: 43.13 on 14 and 16 DF, p-value: 5.149e-10 > anova(lm(strength~.,data.frame(welds))) Analysis of Variance Table Response: strength Df Sum Sq Mean Sq F value Pr(>F) current 1 153089 153089.78 0.114767 voltage 1 105800 105800 1.98 0.1845697 stickout 1 1680556 1680556 30.5418 4.601e-05 *** angle 1 7 7 0.4947 0.4919356 current 1 8379097 8379097 15.787 1.371e-09 *** voltage 1 554530 554530 10.0778 0.0058839 ** stickout 1 990677 990677 18.004 0.000600 *** angle 1 1071 1071.1939 0.1579786 currentvoltage 1 90905 90905 165.05 7.560e-10 *** currentstickout 1 443105 443105 80.579 1.1e-07 *** currentangle 1 1511 1511 0.3909 0.540641 8
voltagestickout 1 6355 6355 114.954 1.033e-08 *** voltageangle 1 115600 115600.1009 0.166536 stickoutangle 1 13100 13100.3917 0.00056 *** Residuals 16 880396 5505 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 9