Stat 401XV Final Exam Spring 2017

Size: px
Start display at page:

Download "Stat 401XV Final Exam Spring 2017"

Transcription

1 Stat 40XV Final Exam Spring 07 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning will receive NO partial credit. Correct numerical answers to difficult questions unaccompanied by supporting reasoning may not receive full credit. SHOW YOUR WORK/EXPLAIN YOURSELF!

2 6 pts. A so-called " k out of n system" will function provided at least k of its n components function. Consider a "4 out of 5 system" with independent components that each have reliability (probability of functioning) p. I need to know how large p must be in order to have overall system reliability (probability of functioning).99. Set up an equation you could solve in order to find this p for me.. Customers arrive at a service counter with inter-arrival times (times between consecutive arrivals) modeled as independent exponential random variables with mean min. a) Under this model, what fraction of inter-arrival times are less than.5 min? 7 pts b) Under this model, approximate the probability that less than 50 customers arrive in a particular 60 minute period. (Hint: This is the probability that the sum of 50 inter-arrival times is larger than 60.)

3 3. A student project concerned measurement of resistivity of a type of copper wire at two different temperatures. Seven pieces of this were used in the study, and measured resistances at 0.0 C and at 8.8 C are in the following table. (Units are 0 Ω m.) Wire C Resistivity C Resistivity pts a) Give and interpret a 95% lower confidence bound for the mean increase in resistivity of this wire associated with an increase in temperature from 0.0 C to.8 C. (PLUG IN COMPLETELY, but there is no need to simplify. Say what the "95%" means.) 5 pts b) Give a two-sided interval that you are "95% sure" will bracket 99% of measured increases in resistivity of this wire associated with an increase of temperature 0.0 C to.8 C. (PLUG IN COMPLETELY, but there is no need to simplify.) In a second study concerning resistivity of this wire, two different meters were both used in measuring resistance at.8 C for the same n = 70 specimens. For 50 of the 70 specimens/trials, meter A produced a higher reading than did meter B. 6 pts c) Give a p -vlaue for assessing whether there is clear evidence that the fraction of specimens for which meter A produces a higher reading than meter B exceeds.5. 3

4 3 4. Beginning on Page 8 there is R analysis of a partially replicated factorial experiment due to R. Snee treated in Engineering Statistics by Hogg and Ledolter. It concerned the effects of factors Factor Levels A-Polymer Type Standard () vs New (But Expensive) () B-Polymer Concentration.0% () vs.04% () C-Amount of an Additive lb () vs lb () on y = percentage impurity produced by a chemical process. Use that in the following questions. 5 pts 6 pts a) Give "margins of error" based on 95% two-sided confidence limits to associate with the 8 sample means in the study. (Some of these "sample means" are based on only observation.) Where n combination = : Where n combination = : b) Give the value of an F statistic and degrees of freedom for testing the hypothesis that all 8 experimental combinations produce the same mean purity. F = df.. =, c) Based on the last 3 runs of the lm() routine with these data, what model for y in terms of the experimental variables do you judge to be best? (Name and interpret values of detectable effects and say what other effects are not detectable.) d) For the first case, the predicted value produced in the final lm() run is.895. If it were printed out, what would be the corresponding value for the next-to-final run? If it is.895 say why. If it is not.895 say why not. 4

5 5. There is a dataset on the UCI Machine Learning Data Set Repository that provides -0 quality ratings by experts ( y ) for wine samples and corresponding results of chemical analyses ( x = ( x, x,, x) ). This problem concerns data analysis for 599 red wine samples. Beginning on Page 0 there is relevant R code and output. Consider first a SLR analysis of the variable quality using the predictor variable alcohol. Below is a scatterplot for these variables and the least squares line through the data pairs. (The plotting locations have ben randomly "jittered" slightly to minimize the visual effects of over-plotting.) a) Say what the plot suggests about the appropriateness of the Gaussian simple linear regression model (particularly the modeling of "errors" ε i ). b) Would you be willing to use a 95% prediction interval for the expert quality rating, y, of a new specimen with alcohol content x = based on these data and the Gaussian SLR model? Explain. 5 pts c) Is there definitive evidence that average quality rating increases with alcohol content? Provide quantitative support for your answer based on the R output. 5

6 Suppose that one suspends any concerns about model assumptions and adopts the usual MLR model y = β0 + βx+ βx + + βx+ ε for quality rating as a function of the chemical analysis results. d) Interpret the fitted regression coefficient for x = grams of acetic acid per cubic decimeter. 8 pts e) Give the value of an F statistic and degrees of freedom for judging whether after accounting for x = alcohol content, the other 0 chemical analysis results add detectably to one's ability to predict quality rating. F = df.. =, Consider now the only issue of building an effective predictor of Gaussian model assumptions.) y = quality rating. (Leave behind f) Below is a table of some summaries for several linear predictors fit by least squares. Which linear predictor (set of chemical analysis terms) is most attractive and why? Chemical Analysis Terms R MSE CV -RMSPE through ,3,5,6,7,9,0, ,5,6,7,9,0, ,5,7,9,0, ,5,7,0, ,7,0, ,0, ,

7 g) Searching for an elastic net predictor for y = quality rating based on the predictors, the best CV- RMSPE available seems to be about.650 for α.00and λ.03. The predictions it produces are not much different from ordinary MLR. Why is this not surprising given the elastic net parameters and what you know about the MLR model from part f)? h) There is code and output from train() in caret for k -nearest-neighbor and random forest predictors for y = quality rating based on the predictors. What value of " k " is best for the former and what value of " mtry " is best for the latter? How do these predictors compare to each other and to MLR predictors in terms of performance? (Give numerical support for your latter answer.) i) The printout presents a scatterplot matrix and correlations between y and MLR, knn, and random forest predictions. It seems impossible to improve much on the best of these predictors using a linear combination of them. Based on the information available to you, give rationale for this happening. 6 pts j) Rather than predict y, one could instead use a classification tree to identify chemical analysis x, x,, x that produce y 7. The tree below was fit using rpart (and cp =.0). vectors ( ) What is the misclassification rate for this tree on the training set? Describe in simple terms what chemical analysis results it associates with a quality score of 7 or more. ("" is the " y 7 " class and "to the left" is "the condition holds" circumstance.) 7

8 R Code and Output for Chemical Process Analyses Type<-c(,,,3,4,5,5,6,7,7,8) PolyType<-c(,,,,,,,,,,) PolyConc<-c(,,,,,,,,,,) AddAmount<-c(rep(,5),rep(,6)) Impurity<-c(,,.,.,.5,.9,.7,.,.,.3,.5) cbind(type,polytype,polyconc,addamount,impurity) Type PolyType PolyConc AddAmount Impurity [,].0 [,].0 [3,] [4,] [5,] [6,] [7,] [8,] 6. [9,] [0,] [,] aggregate(impurity,by=list(type),fun="mean") Group. x aggregate(impurity,by=list(type),fun="sd") Group. x NA NA NA NA NA Type<-as.factor(Type) PolyType<-as.factor(PolyType) PolyConc<-as.factor(PolyConc) AddAmount<-as.factor(AddAmount) Snee<-data.frame(Type,PolyType,PolyConc,AddAmount,Impurity) summary(snee) Type PolyType PolyConc AddAmount Impurity : :6 :6 :5 Min. : : :5 :5 :6 st Qu.: : Median : : Mean : : 3rd Qu.: : Max. :.000 (Other): options(contrasts = rep("contr.sum", )) Snee.out<-lm(Impurity~Type,data=Snee) summary(snee.out) 8

9 Call: lm(formula = Impurity ~ Type, data = Snee) Residuals: e e-0.000e-0.343e e-7.000e e e e e e-8 Coefficients: Estimate Std. Error t value Pr(t) (Intercept) *** Type Type Type * * Type Type5 Type * Type * --- Signif. codes: 0 *** 0.00 ** 0.0 * Residual standard error: 0.5 on 3 degrees of freedom Multiple R-squared: 0.967, Adjusted R-squared: F-statistic:.6 on 7 and 3 DF, p-value: Snee.out<-lm(Impurity~PolyType*PolyConc*AddAmount,data=Snee) summary(snee.out) Call: lm(formula = Impurity ~ PolyType * PolyConc * AddAmount, data = Snee) Residuals: e e-0.000e e e-8.000e e e e e-0.6e-7 Coefficients: Estimate Std. Error t value Pr(t) (Intercept) *** PolyType PolyConc AddAmount ** PolyType:PolyConc PolyType:AddAmount PolyConc:AddAmount PolyType:PolyConc:AddAmount Signif. codes: 0 *** 0.00 ** 0.0 * Residual standard error: 0.5 on 3 degrees of freedom Multiple R-squared: 0.967, Adjusted R-squared: F-statistic:.6 on 7 and 3 DF, p-value: Snee.out3<-lm(Impurity~PolyType+PolyConc,data=Snee) summary(snee.out3) Call: lm(formula = Impurity ~ PolyType + PolyConc, data = Snee) Residuals: Min Q Median 3Q Max

10 Coefficients: Estimate Std. Error t value Pr(t) (Intercept) e-08 *** PolyType ** PolyConc e-06 *** Signif. codes: 0 *** 0.00 ** 0.0 * Residual standard error: on 8 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on and 8 DF, p-value: 8.569e-06 predict(snee.out3) R Code and Output for the Wines Data wines<-read.clipboard(header=true,sep=",") wines$quality<-as.numeric(wines$quality) Good<-rep(0,599) for (i in :599) if (wines$quality[i] 6) Good[i]<- GoodF<-as.factor(Good) Wines<-data.frame(wines,GoodF) summary(wines) fixed.acidity volatile.acidity citric.acid residual.sugar chlorides Min. : 4.60 Min. :0.00 Min. :0.000 Min. : Min. :0.000 st Qu.: 7.0 st Qu.: st Qu.:0.090 st Qu.:.900 st Qu.: Median : 7.90 Median :0.500 Median :0.60 Median :.00 Median : Mean : 8.3 Mean :0.578 Mean :0.7 Mean :.539 Mean : rd Qu.: 9.0 3rd Qu.: rd Qu.:0.40 3rd Qu.:.600 3rd Qu.: Max. :5.90 Max. :.5800 Max. :.000 Max. :5.500 Max. :0.600 free.sulfur.dioxide total.sulfur.dioxide density ph sulphates Min. :.00 Min. : 6.00 Min. :0.990 Min. :.740 Min. : st Qu.: 7.00 st Qu.:.00 st Qu.: st Qu.:3.0 st Qu.: Median :4.00 Median : Median : Median :3.30 Median :0.600 Mean :5.87 Mean : Mean : Mean :3.3 Mean : rd Qu.:.00 3rd Qu.: rd Qu.: rd Qu.: rd Qu.: Max. :7.00 Max. :89.00 Max. :.0037 Max. :4.00 Max. :.0000 alcohol quality GoodF Min. : 8.40 Min. : :38 st Qu.: 9.50 st Qu.:5.000 : 7 Median :0.0 Median :6.000 Mean :0.4 Mean : rd Qu.:.0 3rd Qu.:6.000 Max. :4.90 Max. :8.000 Wines<-Wines[,:] summary(lm(quality~alcohol,data=wines)) Call: lm(formula = quality ~ alcohol, data = Wines) Residuals: Min Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(t) (Intercept) <e-6 *** alcohol <e-6 *** --- Signif. codes: 0 *** 0.00 ** 0.0 * Residual standard error: on 597 degrees of freedom Multiple R-squared: 0.67, Adjusted R-squared: 0.63 F-statistic: on and 597 DF, p-value: <.e-6 0

11 summary(lm(quality~.,data=wines)) Call: lm(formula = quality ~., data = Wines) Residuals: Min Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(t) (Intercept).97e+0.9e fixed.acidity.499e-0.595e volatile.acidity citric.acid -.084e+00.e < e-6 *** -.86e-0.47e residual.sugar.633e-0.500e chlorides free.sulfur.dioxide -.874e e e-06 *** 4.36e-03.7e * total.sulfur.dioxide -3.65e e e-06 *** density ph -.788e+0.63e e-0.96e * sulphates 9.63e-0.43e e-5 *** alcohol.76e-0.648e < e-6 *** --- Signif. codes: 0 *** 0.00 ** 0.0 * Residual standard error: on 587 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: 8.35 on and 587 DF, p-value: <.e-6 lmtune<-train(y=wines[,], + x=wines[,:], + method="lm", + preprocess = c("center","scale"), + trcontrol=traincontrol(method="repeatedcv",repeats=00,number=0)) lmtune Linear Regression 599 samples predictor Pre-processing: centered (), scaled () Resampling: Cross-Validated (0 fold, repeated 00 times) Summary of sample sizes: 440, 439, 439, 438, 438, 44,... Resampling results: RMSE Rsquared Tuning parameter 'intercept' was held constant at a value of TRUE lmpred<-predict(lmtune) knntune<-train(y=wines[,], + x=wines[,:], + method="knn", + preprocess = c("center","scale"), + tunegrid=data.frame(.k=:5), + trcontrol=traincontrol(method="repeatedcv",repeats=00,number=0)) knntune k-nearest Neighbors 599 samples predictor Pre-processing: centered (), scaled () Resampling: Cross-Validated (0 fold, repeated 00 times)

12 Summary of sample sizes: 439, 440, 439, 439, 438, 438,... Resampling results across tuning parameters: k RMSE Rsquared RMSE was used to select the optimal model using the smallest value. The final value used for the model was k = 8. knnpred<-predict(knntune) ForestTune<-train(y=Wines[,], + x=wines[,:], + tunegrid=data.frame(mtry=:), + method="rf",ntree=000, + trcontrol=traincontrol(method="oob")) ForestTune Random Forest 599 samples predictor No pre-processing Resampling results across tuning parameters: mtry RMSE Rsquared RMSE was used to select the optimal model using the smallest value. The final value used for the model was mtry = 4. rfpred<-predict(foresttune) round(cor(cbind(quality,lmpred,knnpred,rfpred)),) quality lmpred knnpred rfpred quality lmpred knnpred rfpred pairs(cbind(quality,lmpred,knnpred,rfpred))

13 3

Stat 401B Exam 2 Fall 2017

Stat 401B Exam 2 Fall 2017 Stat 0B Exam Fall 07 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning will

More information

Stat 401B Final Exam Fall 2016

Stat 401B Final Exam Fall 2016 Stat 40B Final Exam Fall 0 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning

More information

Stat 401B Exam 2 Fall 2016

Stat 401B Exam 2 Fall 2016 Stat 40B Eam Fall 06 I have neither given nor received unauthorized assistance on this eam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning will

More information

Stat 401B Exam 3 Fall 2016 (Corrected Version)

Stat 401B Exam 3 Fall 2016 (Corrected Version) Stat 401B Exam 3 Fall 2016 (Corrected Version) I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied

More information

Stat 401B Final Exam Fall 2015

Stat 401B Final Exam Fall 2015 Stat 401B Final Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning

More information

Stat 401B Exam 2 Fall 2015

Stat 401B Exam 2 Fall 2015 Stat 401B Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen June 6, 2013 1 Motivation Problem: Many clinical covariates which are important to a certain medical

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD

More information

CS 229 Final report A Study Of Ensemble Methods In Machine Learning

CS 229 Final report A Study Of Ensemble Methods In Machine Learning A Study Of Ensemble Methods In Machine Learning Abstract The idea of ensemble methodology is to build a predictive model by integrating multiple models. It is well-known that ensemble methods can be used

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

Stat 328 Final Exam (Regression) Summer 2002 Professor Vardeman

Stat 328 Final Exam (Regression) Summer 2002 Professor Vardeman Stat Final Exam (Regression) Summer Professor Vardeman This exam concerns the analysis of 99 salary data for n = offensive backs in the NFL (This is a part of the larger data set that serves as the basis

More information

Stat 231 Final Exam Fall 2011

Stat 231 Final Exam Fall 2011 Stat 3 Final Exam Fall 0 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed . An experiment was run to compare the fracture toughness of high purity 8%

More information

Stat 602 Exam 1 Spring 2017 (corrected version)

Stat 602 Exam 1 Spring 2017 (corrected version) Stat 602 Exam Spring 207 (corrected version) I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed This is a very long Exam. You surely won't be able to

More information

Stat 231 Exam 2 Fall 2013

Stat 231 Exam 2 Fall 2013 Stat 231 Exam 2 Fall 2013 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed 1 1. Some IE 361 students worked with a manufacturer on quantifying the capability

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56 STAT 391 - Spring Quarter 2017 - Midterm 1 - April 27, 2017 Name: Student ID Number: Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56 Directions. Read directions carefully and show all your

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow) STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points

More information

Regression Analysis: Exploring relationships between variables. Stat 251

Regression Analysis: Exploring relationships between variables. Stat 251 Regression Analysis: Exploring relationships between variables Stat 251 Introduction Objective of regression analysis is to explore the relationship between two (or more) variables so that information

More information

IE 316 Exam 1 Fall 2011

IE 316 Exam 1 Fall 2011 IE 316 Exam 1 Fall 2011 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed 1 1. Suppose the actual diameters x in a batch of steel cylinders are normally

More information

IE 316 Exam 1 Fall 2011

IE 316 Exam 1 Fall 2011 IE 316 Exam 1 Fall 2011 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed 1 1. Suppose the actual diameters x in a batch of steel cylinders are normally

More information

Stat 231 Final Exam Fall 2013 Slightly Edited Version

Stat 231 Final Exam Fall 2013 Slightly Edited Version Stat 31 Final Exam Fall 013 Slightly Edited Version I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed 1 1. An IE 361 project group studied the operation

More information

IE 361 Exam 1 October 2004 Prof. Vardeman

IE 361 Exam 1 October 2004 Prof. Vardeman October 5, 004 IE 6 Exam Prof. Vardeman. IE 6 students Demerath, Gottschalk, Rodgers and Watson worked with a manufacturer on improving the consistency of several critical dimensions of a part. One of

More information

IE 316 Exam 1 Fall 2012

IE 316 Exam 1 Fall 2012 IE 316 Exam 1 Fall 2012 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed 1 20 pts 1. Here are 10 True-False questions worth 2 points each. Write (very

More information

Statistics for Engineers Lecture 9 Linear Regression

Statistics for Engineers Lecture 9 Linear Regression Statistics for Engineers Lecture 9 Linear Regression Chong Ma Department of Statistics University of South Carolina chongm@email.sc.edu April 17, 2017 Chong Ma (Statistics, USC) STAT 509 Spring 2017 April

More information

Mrs. Poyner/Mr. Page Chapter 3 page 1

Mrs. Poyner/Mr. Page Chapter 3 page 1 Name: Date: Period: Chapter 2: Take Home TEST Bivariate Data Part 1: Multiple Choice. (2.5 points each) Hand write the letter corresponding to the best answer in space provided on page 6. 1. In a statistics

More information

STAT 350: Summer Semester Midterm 1: Solutions

STAT 350: Summer Semester Midterm 1: Solutions Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

Inferences on Linear Combinations of Coefficients

Inferences on Linear Combinations of Coefficients Inferences on Linear Combinations of Coefficients Note on required packages: The following code required the package multcomp to test hypotheses on linear combinations of regression coefficients. If you

More information

Applied Regression Analysis

Applied Regression Analysis Applied Regression Analysis Chapter 3 Multiple Linear Regression Hongcheng Li April, 6, 2013 Recall simple linear regression 1 Recall simple linear regression 2 Parameter Estimation 3 Interpretations of

More information

This gives us an upper and lower bound that capture our population mean.

This gives us an upper and lower bound that capture our population mean. Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: "Statistics Tables" by H.R. Neave PAS 371 SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester 2008 9 Linear

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

Stat 4510/7510 Homework 7

Stat 4510/7510 Homework 7 Stat 4510/7510 Due: 1/10. Stat 4510/7510 Homework 7 1. Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details

More information

Advanced Statistical Methods: Beyond Linear Regression

Advanced Statistical Methods: Beyond Linear Regression Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi

More information

Math 2311 Written Homework 6 (Sections )

Math 2311 Written Homework 6 (Sections ) Math 2311 Written Homework 6 (Sections 5.4 5.6) Name: PeopleSoft ID: Instructions: Homework will NOT be accepted through email or in person. Homework must be submitted through CourseWare BEFORE the deadline.

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

STAT 215 Confidence and Prediction Intervals in Regression

STAT 215 Confidence and Prediction Intervals in Regression STAT 215 Confidence and Prediction Intervals in Regression Colin Reimer Dawson Oberlin College 24 October 2016 Outline Regression Slope Inference Partitioning Variability Prediction Intervals Reminder:

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

IE 361 EXAM #3 FALL 2013 Show your work: Partial credit can only be given for incorrect answers if there is enough information to clearly see what you were trying to do. There are two additional blank

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting)

Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Advanced Machine Learning Practical 4b Solution: Regression (BLR, GPR & Gradient Boosting) Professor: Aude Billard Assistants: Nadia Figueroa, Ilaria Lauzana and Brice Platerrier E-mails: aude.billard@epfl.ch,

More information

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 STAC67H3 Regression Analysis Duration: One hour and fifty minutes Last Name: First Name: Student

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

Chapter 8 Conclusion

Chapter 8 Conclusion 1 Chapter 8 Conclusion Three questions about test scores (score) and student-teacher ratio (str): a) After controlling for differences in economic characteristics of different districts, does the effect

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December

More information

A discussion on multiple regression models

A discussion on multiple regression models A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Stat 502X Exam 1 Spring 2014

Stat 502X Exam 1 Spring 2014 Stat 502X Exam 1 Spring 2014 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed This is a long exam consisting of 11 parts. I'll score it at 10 points

More information

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS STAT 512 MidTerm I (2/21/2013) Spring 2013 Name: Key INSTRUCTIONS 1. This exam is open book/open notes. All papers (but no electronic devices except for calculators) are allowed. 2. There are 5 pages in

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Hierarchical models for the rainfall forecast DATA MINING APPROACH

Hierarchical models for the rainfall forecast DATA MINING APPROACH Hierarchical models for the rainfall forecast DATA MINING APPROACH Thanh-Nghi Do dtnghi@cit.ctu.edu.vn June - 2014 Introduction Problem large scale GCM small scale models Aim Statistical downscaling local

More information

Lecture 18 MA Applied Statistics II D 2004

Lecture 18 MA Applied Statistics II D 2004 Lecture 18 MA 2612 - Applied Statistics II D 2004 Today 1. Examples of multiple linear regression 2. The modeling process (PNC 8.4) 3. The graphical exploration of multivariable data (PNC 8.5) 4. Fitting

More information

LECTURE 03: LINEAR REGRESSION PT. 1. September 18, 2017 SDS 293: Machine Learning

LECTURE 03: LINEAR REGRESSION PT. 1. September 18, 2017 SDS 293: Machine Learning LECTURE 03: LINEAR REGRESSION PT. 1 September 18, 2017 SDS 293: Machine Learning Announcements Need help with? Visit the Stats TAs! Sunday Thursday evenings 7 9 pm in Burton 301 (SDS293 alum available

More information

Chemical Engineering: 4C3/6C3 Statistics for Engineering McMaster University: Final examination

Chemical Engineering: 4C3/6C3 Statistics for Engineering McMaster University: Final examination Chemical Engineering: 4C3/6C3 Statistics for Engineering McMaster University: Final examination Duration of exam: 3 hours Instructor: Kevin Dunn 07 April 2012 dunnkg@mcmaster.ca This exam paper has 8 pages

More information

Regression on Faithful with Section 9.3 content

Regression on Faithful with Section 9.3 content Regression on Faithful with Section 9.3 content The faithful data frame contains 272 obervational units with variables waiting and eruptions measuring, in minutes, the amount of wait time between eruptions,

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

This document contains 3 sets of practice problems.

This document contains 3 sets of practice problems. P RACTICE PROBLEMS This document contains 3 sets of practice problems. Correlation: 3 problems Regression: 4 problems ANOVA: 8 problems You should print a copy of these practice problems and bring them

More information

IE 361 Exam 3 Fall I have neither given nor received unauthorized assistance on this exam.

IE 361 Exam 3 Fall I have neither given nor received unauthorized assistance on this exam. IE 361 Exam 3 Fall 2012 I have neither given nor received unauthorized assistance on this exam. Name Date 1 1. I wish to measure the density of a small rock. My method is to read the volume of water in

More information

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Hujia Yu, Jiafu Wu [hujiay, jiafuwu]@stanford.edu 1. Introduction Housing prices are an important

More information

No other aids are allowed. For example you are not allowed to have any other textbook or past exams.

No other aids are allowed. For example you are not allowed to have any other textbook or past exams. UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Sample Exam Note: This is one of our past exams, In fact the only past exam with R. Before that we were using SAS. In

More information

Lecture 10. Factorial experiments (2-way ANOVA etc)

Lecture 10. Factorial experiments (2-way ANOVA etc) Lecture 10. Factorial experiments (2-way ANOVA etc) Jesper Rydén Matematiska institutionen, Uppsala universitet jesper@math.uu.se Regression and Analysis of Variance autumn 2014 A factorial experiment

More information

Assignment 2: K-Nearest Neighbors and Logistic Regression

Assignment 2: K-Nearest Neighbors and Logistic Regression Assignment 2: K-Nearest Neighbors and Logistic Regression SDS293 - Machine Learning Due: 4 Oct 2017 by 11:59pm Conceptual Exercises 4.4 parts a-d (p. 168-169 ISLR) When the number of features p is large,

More information

Stat 502X Exam 2 Spring 2014

Stat 502X Exam 2 Spring 2014 Stat 502X Exam 2 Spring 2014 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed This exam consists of 12 parts. I'll score it at 10 points per problem/part

More information

Statistics 100 Exam 2 March 8, 2017

Statistics 100 Exam 2 March 8, 2017 STAT 100 EXAM 2 Spring 2017 (This page is worth 1 point. Graded on writing your name and net id clearly and circling section.) PRINT NAME (Last name) (First name) net ID CIRCLE SECTION please! L1 (MWF

More information

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

STAT 526 Spring Midterm 1. Wednesday February 2, 2011 STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Linear Probability Model

Linear Probability Model Linear Probability Model Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables. If

More information

Conditions for Regression Inference:

Conditions for Regression Inference: AP Statistics Chapter Notes. Inference for Linear Regression We can fit a least-squares line to any data relating two quantitative variables, but the results are useful only if the scatterplot shows a

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

SMAM 314 Exam 42 Name

SMAM 314 Exam 42 Name SMAM 314 Exam 42 Name Mark the following statements True (T) or False (F) (10 points) 1. F A. The line that best fits points whose X and Y values are negatively correlated should have a positive slope.

More information

Linear Modelling: Simple Regression

Linear Modelling: Simple Regression Linear Modelling: Simple Regression 10 th of Ma 2018 R. Nicholls / D.-L. Couturier / M. Fernandes Introduction: ANOVA Used for testing hpotheses regarding differences between groups Considers the variation

More information

Statistical Prediction

Statistical Prediction Statistical Prediction P.R. Hahn Fall 2017 1 Some terminology The goal is to use data to find a pattern that we can exploit. y: response/outcome/dependent/left-hand-side x: predictor/covariate/feature/independent

More information

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section: Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 You have until 10:20am to complete this exam. Please remember to put your name,

More information

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb Stat 42/52 TWO WAY ANOVA Feb 6 25 Charlotte Wickham stat52.cwick.co.nz Roadmap DONE: Understand what a multiple regression model is. Know how to do inference on single and multiple parameters. Some extra

More information

STAT FINAL EXAM

STAT FINAL EXAM STAT101 2013 FINAL EXAM This exam is 2 hours long. It is closed book but you can use an A-4 size cheat sheet. There are 10 questions. Questions are not of equal weight. You may need a calculator for some

More information

1 The Classic Bivariate Least Squares Model

1 The Classic Bivariate Least Squares Model Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating

More information

Regression Analysis IV... More MLR and Model Building

Regression Analysis IV... More MLR and Model Building Regression Analysis IV... More MLR and Model Building This session finishes up presenting the formal methods of inference based on the MLR model and then begins discussion of "model building" (use of regression

More information

Lecture 20: Multiple linear regression

Lecture 20: Multiple linear regression Lecture 20: Multiple linear regression Statistics 101 Mine Çetinkaya-Rundel April 5, 2012 Announcements Announcements Project proposals due Sunday midnight: Respsonse variable: numeric Explanatory variables:

More information

Introduction to Linear Regression

Introduction to Linear Regression Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46

More information

Booklet of Code and Output for STAC32 Final Exam

Booklet of Code and Output for STAC32 Final Exam Booklet of Code and Output for STAC32 Final Exam December 7, 2017 Figure captions are below the Figures they refer to. LowCalorie LowFat LowCarbo Control 8 2 3 2 9 4 5 2 6 3 4-1 7 5 2 0 3 1 3 3 Figure

More information

We d like to know the equation of the line shown (the so called best fit or regression line).

We d like to know the equation of the line shown (the so called best fit or regression line). Linear Regression in R. Example. Let s create a data frame. > exam1 = c(100,90,90,85,80,75,60) > exam2 = c(95,100,90,80,95,60,40) > students = c("asuka", "Rei", "Shinji", "Mari", "Hikari", "Toji", "Kensuke")

More information

Sociology 593 Exam 2 Answer Key March 28, 2002

Sociology 593 Exam 2 Answer Key March 28, 2002 Sociology 59 Exam Answer Key March 8, 00 I. True-False. (0 points) Indicate whether the following statements are true or false. If false, briefly explain why.. A variable is called CATHOLIC. This probably

More information

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis STAT 3900/4950 MIDTERM TWO Name: Spring, 205 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis Instructions: You may use your books, notes, and SPSS/SAS. NO

More information

Bootstrap, Jackknife and other resampling methods

Bootstrap, Jackknife and other resampling methods Bootstrap, Jackknife and other resampling methods Part VI: Cross-validation Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland dahyot@mee.tcd.ie 2005 R. Dahyot (TCD) 453 Modern

More information

STAT 526 Spring Final Exam. Thursday May 5, 2011

STAT 526 Spring Final Exam. Thursday May 5, 2011 STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer

Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer Solutions to Exam in 02402 December 2012 Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer 3 1 5 2 5 2 3 5 1 3 Exercise IV.2 IV.3 IV.4 V.1

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Lecture 2. The Simple Linear Regression Model: Matrix Approach

Lecture 2. The Simple Linear Regression Model: Matrix Approach Lecture 2 The Simple Linear Regression Model: Matrix Approach Matrix algebra Matrix representation of simple linear regression model 1 Vectors and Matrices Where it is necessary to consider a distribution

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

Swarthmore Honors Exam 2012: Statistics

Swarthmore Honors Exam 2012: Statistics Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may

More information

St 412/512, D. Schafer, Spring 2001

St 412/512, D. Schafer, Spring 2001 St 412/512, D. Schafer, Spring 2001 Midterm Exam Your name:_solutions Your lab time (Circle one): Tues. 8:00 Tues 11:00 Tues 2:00 This is a 50-minute open-book, open-notes test. Show work where appropriate.

More information