Stat 401B Final Exam Fall 2015

Similar documents
Stat 401B Exam 2 Fall 2016

Stat 401B Exam 2 Fall 2015

Stat 401B Final Exam Fall 2016

Stat 401B Exam 2 Fall 2017

Stat 401B Exam 3 Fall 2016 (Corrected Version)

Stat 401XV Final Exam Spring 2017

ST430 Exam 2 Solutions

Stat 5102 Final Exam May 14, 2015

ST430 Exam 1 with Answers

Stat 602 Exam 1 Spring 2017 (corrected version)

Stat 231 Exam 2 Fall 2013

Stat 231 Final Exam Fall 2013 Slightly Edited Version

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

1 Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression

Inference for Regression

MODELS WITHOUT AN INTERCEPT

Dealing with Heteroskedasticity

STAT 350: Summer Semester Midterm 1: Solutions

Lecture 10. Factorial experiments (2-way ANOVA etc)

Regression and the 2-Sample t

MATH 644: Regression Analysis Methods

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Example: 1982 State SAT Scores (First year state by state data available)

Comparing Nested Models

Unit 6 - Introduction to linear regression

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

Stat 502X Exam 1 Spring 2014

IE 316 Exam 1 Fall 2011

IE 361 Exam 3 Fall I have neither given nor received unauthorized assistance on this exam.

STAT 572 Assignment 5 - Answers Due: March 2, 2007

Variance Decomposition and Goodness of Fit

Density Temp vs Ratio. temp

IE 316 Exam 1 Fall 2011

SCHOOL OF MATHEMATICS AND STATISTICS

Simple Linear Regression

Homework 9 Sample Solution

Stat 231 Final Exam Fall 2011

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Stat 5303 (Oehlert): Randomized Complete Blocks 1

STA 101 Final Review


UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Unit 6 - Simple linear regression

R Output for Linear Models using functions lm(), gls() & glm()

Extensions of One-Way ANOVA.

Swarthmore Honors Exam 2012: Statistics

Unbalanced Data in Factorials Types I, II, III SS Part 1

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Workshop 7.4a: Single factor ANOVA

STAT 215 Confidence and Prediction Intervals in Regression

Linear Regression Model. Badr Missaoui

Chapter 12: Linear regression II

Linear Model Specification in R

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

Regression Analysis IV... More MLR and Model Building

Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Lecture 6 Multiple Linear Regression, cont.

Example: Poisondata. 22s:152 Applied Linear Regression. Chapter 8: ANOVA

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

Inferences for Regression

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Extensions of One-Way ANOVA.

Linear Modelling: Simple Regression

1 Use of indicator random variables. (Chapter 8)

CAS MA575 Linear Models

R 2 and F -Tests and ANOVA

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression

22s:152 Applied Linear Regression. Take random samples from each of m populations.

Simple linear regression

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

22s:152 Applied Linear Regression. There are a couple commonly used models for a one-way ANOVA with m groups. Chapter 8: ANOVA

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

Analysis of variance. Gilles Guillot. September 30, Gilles Guillot September 30, / 29

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester

> modlyq <- lm(ly poly(x,2,raw=true)) > summary(modlyq) Call: lm(formula = ly poly(x, 2, raw = TRUE))

Inference with Heteroskedasticity

> nrow(hmwk1) # check that the number of observations is correct [1] 36 > attach(hmwk1) # I like to attach the data to avoid the '$' addressing

SCHOOL OF MATHEMATICS AND STATISTICS

STAT 525 Fall Final exam. Tuesday December 14, 2010

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

Multiple Regression: Example

IE 316 Exam 1 Fall 2012

General Linear Model (Chapter 4)

Booklet of Code and Output for STAC32 Final Exam

Multiple Predictor Variables: ANOVA

36-707: Regression Analysis Homework Solutions. Homework 3

Pumpkin Example: Flaws in Diagnostics: Correcting Models

Simple Linear Regression

Coefficient of Determination

1 Introduction 1. 2 The Multiple Regression Model 1

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

STAT22200 Spring 2014 Chapter 14

Lecture 2. Simple linear regression

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Transcription:

Stat 401B Final Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning will receive NO partial credit. Correct numerical answers to difficult questions unaccompanied by supporting reasoning may not receive full credit. SHOW YOUR WORK/EXPLAIN YOURSELF! 1

1. Some 1 inch finished hex nuts have weights that are normally distributed with mean 17gm and 4 pts standard deviation.6gm. a) What fraction of these nuts have weights above 17.4gm? 6 pts b) These nuts are packaged by weight. A package (intended to hold at least 100 of these hex nuts) will be filled with a weight of nuts that is at least 1710gm. Approximate the probability that 99 nuts have a total weight of at least 1710gm (so that the actual count of nuts is less than desired number). (Hint: What would the average weight of these 99 nuts have to be for this to happen?). A data base is segmented as below in terms of "type of record" and "completeness of record." Suppose that one will select a single record at random from this data base. Type A Type B Type C Complete 00 records 350 records 450 records Incomplete 50 records 50 records 100 records a) Evaluate P Type B Incomplete. b) Are the events "Type B" and "Incomplete" independent? Say why or why not.

3. Some experimental data on the page "Using Central Composite Design for Process Optimization" on the weibull.com web site concern the tensile strength of welds made on steel. We will use these data in various ways in this problem. First, n 7 welds made at a standard set of process conditions produced y 6611 kgf and s 145 kgf. a) Give 95% confidence limits for the mean strength of steel welds made under standard process conditions. (Plug in completely, but you need not do arithmetic.) b) Interpret your interval from a). (Say carefully what is meant by the "95%" figure.) c) A weld is made at a non-standard set of process conditions and its strength tested. The value y 5830 kgf is observed. Give 95% confidence limits for the difference in mean strengths for the two sets of welding process conditions. (Plug in completely, but you need not do arithmetic.) d) The non-standard set of process conditions referred to in part c) actually differs from the standard set only in the electrical current applied. Coded values of the current are x 1 for the non-standard conditions and x 0 for the standard conditions. A plot of the data is below. What model assumptions (be complete in stating them) would you make in order to support a prediction interval for the strength of a weld made with coded current x.5 (and all other process conditions standard)? 3

e) In fact, for the situation of e) LF 159 kgf s, x x x 8 i. Give 95% i 1.15 and.875 prediction limits for the next y at x.5 under your assumptions of d). (The least squares line should be obvious to you from the plot in d) and the information given at the beginning of the problem.) (Plug in completely, but you need not do arithmetic.) f) There was also a weld made at coded electrical current x 1 that had strength y 610. A plot of this data point, the 7 data points mentioned at the start of the problem, and the one mentioned in part c) is below. If strength is a linear function of current, 1 0 0 1 0. Give 95% confidence limits for 1 0 0 1. Is it plausible that strength is linear over this range of x? The entire data set was used to fit a model for strength, y, as a linear function of coded current, x 1, coded voltage, x, coded "stick out", x 3, and coded angle, x 4. (See the R output beginning on page 7.) g) What fraction of the raw variability in y is accounted for using x1, x, x3, and x 4 as predictor variables? h) What is the meaning of b3 305.56? (Interpret this fitted coefficient.) 4

i) There is R output for the fit of a full quadratic model for y in the predictor variables x1, x, x3, and x 4 beginning on page 8. Give the value of an F statistic and degrees of freedom for testing whether the quadratic model is a statistically significant improvement over the linear model in x, x, x, and x for explaining y. 1 3 4 F df.., 3 4. Lab #10 used balanced factorial data of Example 4 of Chapter 8 of Vardeman and Jobe. We will here continue use of that scenario. a) Below is the ANOVA table produced in Lab #10. Suppose that one runs the regsubsets() function from the leaps package using the 7 dummy variables created in the lab. Which model with 4 predictors will be identified as best, and what value of R is associated with it? b) Why would it be impossible to answer part a) based on only the table above if the data were not balanced? c) CVSSE values for the best (in terms of R ) models with k 1,,, 7 factorial effects were computed using 8-fold cross validation. A plot of these is below. In light of this plot and the ANOVA table above, what "few effects" model for Power appears best? How does its CVSSE compare to the SSE that would be obtained fitting it? 5

5. Below is a toy data set consisting of 5 x, y. Find the LOO cross-validation SSE for 1- nn prediction. (You don't need to do arithmetic, but write out a complete numerical expression. ) y.5 4.0 4.5 7.5 8.0 x 1.0.0.5 3.5 4.0 N pairs 6. Below are fake regression trees that you may assume come from B 3 bootstrap samples of a large number, N, of x, x, y data points. 1i i i a) What is the random forest prediction at,.5,.5 B 3 trees represented above.) x x? (Assume that the forest includes only the 1 b) Suppose that in fact,,.5,.5,3 x x y was part of the bootstrap samples that were used to 1 produce trees #1 and #, but not #3. What is this data point's contribution to an OOB error sum of squares? OLS 7. Consider the ordinary least squares predictor ŷ Lasso and a lasso predictor ŷ Lasso OLS ŷ x is always at least as big as that for ŷ x (from standard multiple linear regression) x (for some ) computed for the same data. Is it true that the SSE for x? Say why or why not. 6

R Code and Output > Welds current voltage stickout angle strength 1-1 -1-1 -1 4730 1-1 -1-1 4990 3-1 1-1 -1 440 4 1 1-1 -1 730 5-1 -1 1-1 7130 6 1-1 1-1 490 7-1 1 1-1 4110 8 1 1 1-1 500 9-1 -1-1 1 5560 10 1-1 -1 1 4910 11-1 1-1 1 5330 1 1 1-1 1 7490 13-1 -1 1 1 680 14 1-1 1 1 4030 15-1 1 1 1 3690 16 1 1 1 1 410 17-1 0 0 0 5830 18 1 0 0 0 610 19 0-1 0 0 630 0 0 1 0 0 6530 1 0 0-1 0 6370 0 0 1 0 5510 3 0 0 0-1 6390 4 0 0 0 1 6110 5 0 0 0 0 6550 6 0 0 0 0 6650 7 0 0 0 0 6750 8 0 0 0 0 6610 9 0 0 0 0 6340 30 0 0 0 0 6600 31 0 0 0 0 6780 > summary(lm(strength~.,welds)) Call: lm(formula = strength ~., data = Welds) Residuals: Min 1Q Median 3Q Max -1740.7-103.5 31.6 803. 1607.1 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 5805.16 199.69 9.070 <e-16 *** current 9. 6.06 0.35 0.78 voltage -76.67 6.06-0.93 0.77 stickout -305.56 6.06-1.166 0.54 angle -38.89 6.06-0.148 0.883 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 111 on 6 degrees of freedom Multiple R-squared: 0.05766, Adjusted R-squared: -0.08731 F-statistic: 0.3977 on 4 and 6 DF, p-value: 0.8084 7

> anova(lm(strength~.,welds)) Analysis of Variance Table Response: strength Df Sum Sq Mean Sq F value Pr(>F) current 1 153089 153089 0.138 0.777 voltage 1 105800 105800 0.0856 0.77 stickout 1 1680556 1680556 1.3595 0.54 angle 1 7 7 0.00 0.883 Residuals 6 3141108 136196 > summary(lm(strength~.,data.frame(welds))) Call: lm(formula = strength ~., data = data.frame(welds)) Residuals: Min 1Q Median 3Q Max -6.13-14.13 5.84 104.91 350.64 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 6544.16 69.59 94.045 < e-16 *** current 190.00 165.87 1.145 0.68853 voltage 150.00 165.87 0.904 0.37936 stickout -305.56 55.9-5.56 4.60e-05 *** angle -38.89 55.9-0.703 0.491936 current -445.68 145.61-3.061 0.007469 ** voltage -85.68 145.61-0.588 0.564470 stickout -55.68 145.61-3.610 0.00348 ** angle -15.68 145.61-1.481 0.157979 currentvoltage 753.75 58.64 1.853 7.56e-10 *** currentstickout -56.5 58.64-8.974 1.1e-07 *** currentangle -110.00 175.93-0.65 0.54064 voltagestickout -68.75 58.64-10.7 1.03e-08 *** voltageangle -55.00 175.93-1.449 0.166533 stickoutangle -77.50 58.64-4.73 0.0006 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 34.6 on 16 degrees of freedom Multiple R-squared: 0.974, Adjusted R-squared: 0.9516 F-statistic: 43.13 on 14 and 16 DF, p-value: 5.149e-10 > anova(lm(strength~.,data.frame(welds))) Analysis of Variance Table Response: strength Df Sum Sq Mean Sq F value Pr(>F) current 1 153089 153089.78 0.114767 voltage 1 105800 105800 1.98 0.1845697 stickout 1 1680556 1680556 30.5418 4.601e-05 *** angle 1 7 7 0.4947 0.4919356 current 1 8379097 8379097 15.787 1.371e-09 *** voltage 1 554530 554530 10.0778 0.0058839 ** stickout 1 990677 990677 18.004 0.000600 *** angle 1 1071 1071.1939 0.1579786 currentvoltage 1 90905 90905 165.05 7.560e-10 *** currentstickout 1 443105 443105 80.579 1.1e-07 *** currentangle 1 1511 1511 0.3909 0.540641 8

voltagestickout 1 6355 6355 114.954 1.033e-08 *** voltageangle 1 115600 115600.1009 0.166536 stickoutangle 1 13100 13100.3917 0.00056 *** Residuals 16 880396 5505 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 9