Stat 401B Final Exam Fall 2016

Size: px
Start display at page:

Download "Stat 401B Final Exam Fall 2016"

Transcription

1 Stat 40B Final Exam Fall 0 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning will receive NO partial credit. Correct numerical answers to difficult questions unaccompanied by supporting reasoning may not receive full credit. SHOW YOUR WORK/EXPLAIN YOURSELF!

2 . A manufacturing process produces cylindrical parts. pts a) Suppose that parts are L inches long with diameter D (also in inches). Part volume is then V = LD π /4. Suppose that L and D can be modeled as independent continuous random variables, ~ U 9.99,0.0 D ~ U.99,.00. The mean and standard deviation of part volume ( EV L ( ) and ( ) and VarV respectively) are of interest and simulation will be used to evaluate these. Provide a few lines of R code that will do this. (The R syntax for density, cdf, random variable, etc. calls associated U ab, distribution is "_unif(..,min=a,max=b,..)".) with the ( ) 8 pts b) Suppose that the parts are steel and as manufactured have weights with mean 0. oz and standard deviation. oz. Use the central limit theorem and approximate the probability that such parts have a total weight above oz. (Hint: Rephrase the question in terms of sample average weight.). The 03 International Journal of Microbiology article "Design and Optimization of a Process for Sugarcane Molasses Fermentation by Saccharomyces cerevisiae Using Response Surface Methodology" by El-Gendy, Madian, and Amr, presents results of a study made to optimize the performance of a bioethanol production process. This question employs some results in that paper. Under a first single set of process conditions, n = runs of the process produce yields (in gm/l) of bioethanol with sample mean y =.9 and sample standard deviation s =.03. a) Give 9% two-sided confidence limits for the standard deviation of process yield under these conditions. (Plug in completely, but you need not simplify.)

3 b) Provide a number, #, such that you are 9% sure that 99% of all process yields under these conditions are at least the value #. (Plug in completely, but you need not simplify.) c) Suppose that a single additional run of the process made under a second set of conditions produces a yield of y =. Assuming that the standard deviation of yields is the same for both sets of conditions, significance testing will be used to evaluate whether this new setup has a different mean yield than the first. Give the value of an appropriate test statistic and name an appropriate reference distribution to be used in finding a p -value. Value of the test statistic Reference distribution pts d) Suppose that several process conditions of interest differ only in the value of incubation period ( x x, y data pairs are needed to make confidence limits for in h). What model assumptions for a set of ( ) the rate of change of mean y with respect to x? Beginning on Page 8 there is some R code and output for a MLR analysis of n = process runs. These are potentially useful for describing yield, y (in gm/l), as a function of the process variables x = incubation period (h) x = initial ph x3 = incubation temperature ( C) x = molasses concentration (wt %) 4 3

4 e) What is estimated by the value " Period 0.43 " reported in the table on the output? (What does the value.43 represent in the context of the problem?) f) Does a model linear in the predictors x, x, x3, and x 4 provide useful ability to predict yield? Provide some quantitative support for your answer. YES or NO (circle one) As it turns out, a MLR with the (4) predictors x, x, x, x, x, x, x, x, x x, x x, x x, x x, x x, x x has R =.98 and a LOOCV RMSPE.8. The (quadratic) model fit by least squares predicts (an optimal) yield for x =, x =., x3 = 40, and x4 = 8.. This set of processing conditions has y ˆ = 9. and se = yˆ pts g) Based on the information above and the R printout, fill out the ANOVA table for computing the overall F for the quadratic model and provide s SF for the quadratic model. ANOVA Table Source SS df MS F Regression Error Total s SF = h) What do comparisons between s =.03 (from the bottom of Page ), s SF from above, and the LOOCV RMSPE of.8 suggest about this situation? s versus s SF s SF versus LOOCV RMSPE 4

5 i) Based on the (quadratic) MLR model, give 9% prediction limits for the next yield at process conditions x =, x =., x3 = 40, and x4 = 8.. (Plug in completely, but do not simplify.) j) Before recommending adoption of process conditions from part i), what steps would you take, and why? 3. The "Appendicitis data set" on the KEEL website provides measured values of medical variables for N = 0 patients and values of a 0- variable indicating whether the patient had an appendicitis. Beginning on Page 9 there is R code and output for a logistic regression analysis of these data. a) Which of the medical variables seems least helpful in predicting whether or not a patient has an appendicitis? Why? Using bestglm() in the bestglm package and cross validation (presumably on the log likelihood criterion) it is possible to identify a good "reduced" logistic regression model as one with the two predictor variables At3 and At4. There is some R code and output for this model included. b) Give two-sided 9% confidence limits for the log odds ratio of the probability that a patient with At3 =.0 and At4 = 0 has an appendicitis. (Plug in completely, but do not simplify.)

6 4. Beginning on Page 0 there is R code and output for a factorial analysis of some experimental data on the charge lives of batteries made of 3 materials at 3 different temperatures taken from an experimental design book of Montgomery. (Though the temperature is clearly quantitative, here treat both factors as qualitative.) The data for this study comprise a balanced 3 3 factorial data set. a) Are there statistically detectable interactions between Material and Temperature? Explain. YES or NO (circle one) b) Give 9% two-sided confidence limits for the difference between the Material and Material main effects. (Plug in completely, but you need not simplify.) c) Find the fitted/predicted value of battery life for a battery of Material under Temperature for a "main effects only" model of life. (If this is not possible based on the given information, say why.). There is a famous "Boston Housing" data set on the UCI ML Data Repository. It concerns the median home price in counties around Boston in the late 90's as predicted by 3 measures of community composition. This question concerns use of data from 4 counties with complete records and prediction of "MEDV". There is R code and output provided (in pretty much the same format as for Lab #) beginning on Page. (As a baseline, MLR of MEDV on 3 predictors produces ssf = 4.3 and R =.404.) Use the output to answer the following questions. a) Which of the predictors of MEDV do you like the best, and why?

7 b) What (if anything) convinces you that you can do better here than MLR for prediction purposes? c) Why/how is it obvious that the ordinary MLR (OLS) predictor differs very little from the elastic net predictor in this case? What about the particular elastic net fit (chosen by repeated cross-validation) makes this similarity unsurprising? c) What is the origin of the "vertical stripes" appearance of the plots in the "Tree" column of the matrix of scatterplots of predicted values? d) Below is a schematic of the tree predictor (plotting is "condition TRUE to the LEFT"). Give a simple description of the conditions (values of the predictors) producing the largest predicted MEDV. e) If you were going to "stack" two of the predictors here, which two would you consider and why?

8 R Code and Output for Bioethanol Analyses Biofuels Period InitialpH Temp Conc Yield summary(lm(yield~.,data=biofuels)) Call: lm(formula = Yield ~., data = Biofuels) Residuals: Min Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr( t ) (Intercept) Period InitialpH Temp Conc Residual standard error: 8.8 on degrees of freedom Multiple R-squared: 0.009, Adjusted R-squared: -0.3 F-statistic: on 4 and DF, p-value: 0.99 anova(lm(yield~.,data=biofuels)) Analysis of Variance Table Response: Yield Df Sum Sq Mean Sq F value Pr(F) Period InitialpH Temp Conc Residuals

9 R Code and Output for the Appendicitis Data Analyses Appendicitis[:0,] At At At3 At4 At At At Class summary(glm(class~.,data=appendicitis)) Call: glm(formula = Class ~., data = Appendicitis) Deviance Residuals: Min Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr( t ) (Intercept) At e-08 *** ** At At3 At At At At Signif. codes: 0 *** 0.00 ** 0.0 * (Dispersion parameter for gaussian family taken to be 0.038) Null deviance:.840 on 0 degrees of freedom Residual deviance: 0.09 on 98 degrees of freedom AIC: 3.8 Number of Fisher Scoring iterations: summary(glm(class~at3+at4,data=appendicitis)) Call: glm(formula = Class ~ At3 + At4, data = Appendicitis) Deviance Residuals: Min Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr( t ) (Intercept) e- *** At e-0 *** At ** --- Signif. codes: 0 *** 0.00 ** 0.0 * (Dispersion parameter for gaussian family taken to be 0.883) Null deviance:.84 on 0 degrees of freedom Residual deviance:.4 on 03 degrees of freedom AIC: 9.98 Number of Fisher Scoring iterations: 9

10 Predlogit<-predict(glm(Class~At3+At4,data=Appendicitis),se.fit=TRUE)$fit SEPredlogit<-predict(glm(Class~At3+At4,data=Appendicitis), + se.fit=true)$se.fit cbind(appendicitis$at3,appendicitis$at4,appendicitis$class, + round(predlogit,3),round(sepredlogit,3))[:0,] [,] [,] [,3] [,4] [,] R Code and Output for Battery Life Study Batteries Life MaterialA TempB

11 summary(lm(life~materiala*tempb,data=batteries)) Call: lm(formula = Life ~ MaterialA * TempB, data = Batteries) Residuals: Min Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr( t ) (Intercept) < e- *** MaterialA ** MaterialA TempB e-0 *** TempB MaterialA:TempB MaterialA:TempB MaterialA:TempB ** MaterialA:TempB Signif. codes: 0 *** 0.00 ** 0.0 * Residual standard error:.98 on degrees of freedom Multiple R-squared: 0., Adjusted R-squared: 0.9 F-statistic: on 8 and DF, p-value: 9.4e-0 anova(lm(life~materiala*tempb,data=batteries)) Analysis of Variance Table Response: Life Df Sum Sq Mean Sq F value Pr(F) MaterialA ** TempB e-0 *** MaterialA:TempB * Residuals Signif. codes: 0 *** 0.00 ** 0.0 * R Code and Output for Boston Housing Data Analysis Boston[:,] CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSAT MEDV summary(boston) CRIM ZN INDUS CHAS NOX Min. :0.003 Min. : 0.00 Min. : 0.4 Min. : Min. :0.380 st Qu.: st Qu.: 0.00 st Qu.: 4.93 st Qu.: st Qu.:0.440 Median :0.903 Median : 0.00 Median : 8.4 Median : Median :0.90 Mean :.4083 Mean :. Mean :0. Mean :0.043 Mean : rd Qu.:.4 3rd Qu.: rd Qu.:8.0 3rd Qu.: rd Qu.:0.00 Max. :9.94 Max. :00.00 Max. :.4 Max. : Max. :0.80 RM AGE DIS RAD TAX Min. :3. Min. :.90 Min. :. Min. :.000 Min. :8.0 st Qu.:.9 st Qu.: 40.9 st Qu.:.3 st Qu.: st Qu.:.8 Median :.9 Median :.80 Median : 3.0 Median :.000 Median :.0 Mean :.344 Mean :. Mean : Mean :.83 Mean :3.4 3rd Qu.:.3 3rd Qu.: 9. 3rd Qu.:.40 3rd Qu.:.000 3rd Qu.:4.0 Max. :8.80 Max. :00.00 Max. :. Max. :4.000 Max. :.0 PTRATIO B LSAT MEDV Min. :.0 Min. : 0.3 Min. :. Min. :. st Qu.:.80 st Qu.:3. st Qu.:.88 st Qu.:8.0 Median :8.0 Median :39.08 Median :0.0 Median :.9 Mean :8. Mean :39.83 Mean :.44 Mean :3. 3rd Qu.:0.0 3rd Qu.:39. 3rd Qu.:.0 3rd Qu.:.0 Max. :.00 Max. :39.90 Max. :34.40 Max. :0.00

12 #k= for knn prediction is chosen by repeated CV sqrt(knn.reg(train=boston,y=boston[,4],k=)$press/4) [] 4.3 knnpred<-knn.reg(train=boston,y=boston[,4],k=)$pred cbind(boston$medv[:0],knnpred[:0]) [,] [,] [,] 4.0. [,] [3,] [4,] [,] [,] 8..4 [,].9. [8,]. 8.3 [9,]..0 [0,] #alpha=.00 and lambda=.3 for the elastic net are chosen by repeated CV #producing CV RMSPE x<-as.matrix(boston[,:3]) y<-as.matrix(boston[,4]) BostonNet<-glmnet(x,y,family="gaussian",alpha=.00,lambda=.3) ENetPred<-predict(BostonNet,newx=x) cbind(boston$medv[:0],enetpred[:0]) [,] [,] [,] [,] [3,] [4,] [,] [,] [,].9.49 [8,] [9,] [0,] #cp=.003 is a good choice of regression tree complexity parameter #chosen by CV and producing RMSPE BestTree<-rpart(MEDV~.,data=Boston,method="anova", control=rpart.control(cp=.003)) cbind(boston$medv[:0],predict(besttree)[:0]) [,] [,] #mtry= is a good choice for random forest parameter, chosen by CV on OOB error BostonRf<-randomForest(MEDV~.,data=Boston, + type="regression",ntree=000,mtry=) sqrt(bostonrf$mse[000]) [] 3.009

13 comppred<-cbind(y,lm(medv~.,data=boston)$fitted.values,knnpred,enetpred, + predict(besttree),bostonrf$predicted) colnames(comppred)<-c("medv","ols","nn","enet","tree","rf") pairs(comppred,panel=function(x,y,...){ + points(x,y) + abline(0,)},xlim=c(0,0),ylim=c(0,0)) round(cor(as.matrix(comppred)),) MEDV OLS NN ENET Tree RF MEDV OLS NN ENET Tree RF

Stat 401B Final Exam Fall 2015

Stat 401B Final Exam Fall 2015 Stat 401B Final Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning

More information

Stat 401XV Final Exam Spring 2017

Stat 401XV Final Exam Spring 2017 Stat 40XV Final Exam Spring 07 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning

More information

Stat 401B Exam 3 Fall 2016 (Corrected Version)

Stat 401B Exam 3 Fall 2016 (Corrected Version) Stat 401B Exam 3 Fall 2016 (Corrected Version) I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied

More information

Stat 401B Exam 2 Fall 2015

Stat 401B Exam 2 Fall 2015 Stat 401B Exam Fall 015 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning

More information

Stat 401B Exam 2 Fall 2016

Stat 401B Exam 2 Fall 2016 Stat 40B Eam Fall 06 I have neither given nor received unauthorized assistance on this eam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning will

More information

Stat 401B Exam 2 Fall 2017

Stat 401B Exam 2 Fall 2017 Stat 0B Exam Fall 07 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed ATTENTION! Incorrect numerical answers unaccompanied by supporting reasoning will

More information

Multiple Regression Part I STAT315, 19-20/3/2014

Multiple Regression Part I STAT315, 19-20/3/2014 Multiple Regression Part I STAT315, 19-20/3/2014 Regression problem Predictors/independent variables/features Or: Error which can never be eliminated. Our task is to estimate the regression function f.

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

ST430 Exam 2 Solutions

ST430 Exam 2 Solutions ST430 Exam 2 Solutions Date: November 9, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textbook are permitted but you may use a calculator. Giving

More information

R Output for Linear Models using functions lm(), gls() & glm()

R Output for Linear Models using functions lm(), gls() & glm() LM 04 lm(), gls() &glm() 1 R Output for Linear Models using functions lm(), gls() & glm() Different kinds of output related to linear models can be obtained in R using function lm() {stats} in the base

More information

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. FINAL EXAM ** Two different ways to submit your answer sheet (i) Use MS-Word and place it in a drop-box. (ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box. Deadline: December

More information

STK 2100 Oblig 1. Zhou Siyu. February 15, 2017

STK 2100 Oblig 1. Zhou Siyu. February 15, 2017 STK 200 Oblig Zhou Siyu February 5, 207 Question a) Make a scatter box plot for the data set. Answer:Here is the code I used to plot the scatter box in R. library ( MASS ) 2 pairs ( Boston ) Figure : Scatter

More information

HW1 Roshena MacPherson Feb 1, 2017

HW1 Roshena MacPherson Feb 1, 2017 HW1 Roshena MacPherson Feb 1, 2017 This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code. Question 1: In this question we will consider some real

More information

STAT 510 Final Exam Spring 2015

STAT 510 Final Exam Spring 2015 STAT 510 Final Exam Spring 2015 Instructions: The is a closed-notes, closed-book exam No calculator or electronic device of any kind may be used Use nothing but a pen or pencil Please write your name and

More information

Logistic Regression - problem 6.14

Logistic Regression - problem 6.14 Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values

More information

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

STAT 526 Spring Midterm 1. Wednesday February 2, 2011 STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points

More information

Week 7 Multiple factors. Ch , Some miscellaneous parts

Week 7 Multiple factors. Ch , Some miscellaneous parts Week 7 Multiple factors Ch. 18-19, Some miscellaneous parts Multiple Factors Most experiments will involve multiple factors, some of which will be nuisance variables Dealing with these factors requires

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

Logistic Regressions. Stat 430

Logistic Regressions. Stat 430 Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to

More information

Logistic Regression 21/05

Logistic Regression 21/05 Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

Stat 602 Exam 1 Spring 2017 (corrected version)

Stat 602 Exam 1 Spring 2017 (corrected version) Stat 602 Exam Spring 207 (corrected version) I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed This is a very long Exam. You surely won't be able to

More information

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

More information

ST430 Exam 1 with Answers

ST430 Exam 1 with Answers ST430 Exam 1 with Answers Date: October 5, 2015 Name: Guideline: You may use one-page (front and back of a standard A4 paper) of notes. No laptop or textook are permitted but you may use a calculator.

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Stat 231 Final Exam Fall 2011

Stat 231 Final Exam Fall 2011 Stat 3 Final Exam Fall 0 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed . An experiment was run to compare the fracture toughness of high purity 8%

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

CAS MA575 Linear Models

CAS MA575 Linear Models CAS MA575 Linear Models Boston University, Fall 2013 Midterm Exam (Correction) Instructor: Cedric Ginestet Date: 22 Oct 2013. Maximal Score: 200pts. Please Note: You will only be graded on work and answers

More information

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/ Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Exercise 5.4 Solution

Exercise 5.4 Solution Exercise 5.4 Solution Niels Richard Hansen University of Copenhagen May 7, 2010 1 5.4(a) > leukemia

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

Biostatistics 380 Multiple Regression 1. Multiple Regression

Biostatistics 380 Multiple Regression 1. Multiple Regression Biostatistics 0 Multiple Regression ORIGIN 0 Multiple Regression Multiple Regression is an extension of the technique of linear regression to describe the relationship between a single dependent (response)

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

DISCRIMINANT ANALYSIS: LDA AND QDA

DISCRIMINANT ANALYSIS: LDA AND QDA Stat 427/627 Statistical Machine Learning (Baron) HOMEWORK 6, Solutions DISCRIMINANT ANALYSIS: LDA AND QDA. Chap 4, exercise 5. (a) On a training set, LDA and QDA are both expected to perform well. LDA

More information

Swarthmore Honors Exam 2012: Statistics

Swarthmore Honors Exam 2012: Statistics Swarthmore Honors Exam 2012: Statistics 1 Swarthmore Honors Exam 2012: Statistics John W. Emerson, Yale University NAME: Instructions: This is a closed-book three-hour exam having six questions. You may

More information

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.

Regression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph. Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would

More information

Data Mining Techniques. Lecture 2: Regression

Data Mining Techniques. Lecture 2: Regression Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 2: Regression Jan-Willem van de Meent (credit: Yijun Zhao, Marc Toussaint, Bishop) Administrativa Instructor Jan-Willem van de Meent Email:

More information

Unit 6 - Introduction to linear regression

Unit 6 - Introduction to linear regression Unit 6 - Introduction to linear regression Suggested reading: OpenIntro Statistics, Chapter 7 Suggested exercises: Part 1 - Relationship between two numerical variables: 7.7, 7.9, 7.11, 7.13, 7.15, 7.25,

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018 Work all problems. 60 points needed to pass at the Masters level, 75 to pass at the PhD

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Stat 5303 (Oehlert): Randomized Complete Blocks 1

Stat 5303 (Oehlert): Randomized Complete Blocks 1 Stat 5303 (Oehlert): Randomized Complete Blocks 1 > library(stat5303libs);library(cfcdae);library(lme4) > immer Loc Var Y1 Y2 1 UF M 81.0 80.7 2 UF S 105.4 82.3 3 UF V 119.7 80.4 4 UF T 109.7 87.2 5 UF

More information

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours Instructions: STATS216v Introduction to Statistical Learning Stanford University, Summer 2017 Remember the university honor code. Midterm Exam (Solutions) Duration: 1 hours Write your name and SUNet ID

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

STAT 572 Assignment 5 - Answers Due: March 2, 2007

STAT 572 Assignment 5 - Answers Due: March 2, 2007 1. The file glue.txt contains a data set with the results of an experiment on the dry sheer strength (in pounds per square inch) of birch plywood, bonded with 5 different resin glues A, B, C, D, and E.

More information

On the Inference of the Logistic Regression Model

On the Inference of the Logistic Regression Model On the Inference of the Logistic Regression Model 1. Model ln =(; ), i.e. = representing false. The linear form of (;) is entertained, i.e. ((;)) ((;)), where ==1 ;, with 1 representing true, 0 ;= 1+ +

More information

Reaction Days

Reaction Days Stat April 03 Week Fitting Individual Trajectories # Straight-line, constant rate of change fit > sdat = subset(sleepstudy, Subject == "37") > sdat Reaction Days Subject > lm.sdat = lm(reaction ~ Days)

More information

Design & Analysis of Experiments 7E 2009 Montgomery

Design & Analysis of Experiments 7E 2009 Montgomery Chapter 5 1 Introduction to Factorial Design Study the effects of 2 or more factors All possible combinations of factor levels are investigated For example, if there are a levels of factor A and b levels

More information

GRAD6/8104; INES 8090 Spatial Statistic Spring 2017

GRAD6/8104; INES 8090 Spatial Statistic Spring 2017 Lab #5 Spatial Regression (Due Date: 04/29/2017) PURPOSES 1. Learn to conduct alternative linear regression modeling on spatial data 2. Learn to diagnose and take into account spatial autocorrelation in

More information

IE 361 EXAM #3 FALL 2013 Show your work: Partial credit can only be given for incorrect answers if there is enough information to clearly see what you were trying to do. There are two additional blank

More information

R Hints for Chapter 10

R Hints for Chapter 10 R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

MODELS WITHOUT AN INTERCEPT

MODELS WITHOUT AN INTERCEPT Consider the balanced two factor design MODELS WITHOUT AN INTERCEPT Factor A 3 levels, indexed j 0, 1, 2; Factor B 5 levels, indexed l 0, 1, 2, 3, 4; n jl 4 replicate observations for each factor level

More information

Modeling Overdispersion

Modeling Overdispersion James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in

More information

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)

More information

ANOVA, ANCOVA and MANOVA as sem

ANOVA, ANCOVA and MANOVA as sem ANOVA, ANCOVA and MANOVA as sem Robin Beaumont 2017 Hoyle Chapter 24 Handbook of Structural Equation Modeling (2015 paperback), Examples converted to R and Onyx SEM diagrams. This workbook duplicates some

More information

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb

Stat 412/512 TWO WAY ANOVA. Charlotte Wickham. stat512.cwick.co.nz. Feb Stat 42/52 TWO WAY ANOVA Feb 6 25 Charlotte Wickham stat52.cwick.co.nz Roadmap DONE: Understand what a multiple regression model is. Know how to do inference on single and multiple parameters. Some extra

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013 STAC67H3 Regression Analysis Duration: One hour and fifty minutes Last Name: First Name: Student

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 370 Two regression models are called nested if one contains all the predictors of the other, and some additional predictors. For example, the first-order model in two independent

More information

Regression Methods for Survey Data

Regression Methods for Survey Data Regression Methods for Survey Data Professor Ron Fricker! Naval Postgraduate School! Monterey, California! 3/26/13 Reading:! Lohr chapter 11! 1 Goals for this Lecture! Linear regression! Review of linear

More information

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression

More information

Statistical Prediction

Statistical Prediction Statistical Prediction P.R. Hahn Fall 2017 1 Some terminology The goal is to use data to find a pattern that we can exploit. y: response/outcome/dependent/left-hand-side x: predictor/covariate/feature/independent

More information

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester

SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: "Statistics Tables" by H.R. Neave PAS 371 SCHOOL OF MATHEMATICS AND STATISTICS Autumn Semester 2008 9 Linear

More information

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam Booklet of Code and Output for STAD29/STA 1007 Midterm Exam List of Figures in this document by page: List of Figures 1 NBA attendance data........................ 2 2 Regression model for NBA attendances...............

More information

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam Booklet of Code and Output for STAD29/STA 1007 Midterm Exam List of Figures in this document by page: List of Figures 1 Packages................................ 2 2 Hospital infection risk data (some).................

More information

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Statistics 203 Introduction to Regression Models and ANOVA Practice Exam Prof. J. Taylor You may use your 4 single-sided pages of notes This exam is 7 pages long. There are 4 questions, first 3 worth 10

More information

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation Background Regression so far... Lecture 23 - Sta 111 Colin Rundel June 17, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical

More information

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples. Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression

More information

STAT 350: Summer Semester Midterm 1: Solutions

STAT 350: Summer Semester Midterm 1: Solutions Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.

More information

Lecture 10. Factorial experiments (2-way ANOVA etc)

Lecture 10. Factorial experiments (2-way ANOVA etc) Lecture 10. Factorial experiments (2-way ANOVA etc) Jesper Rydén Matematiska institutionen, Uppsala universitet jesper@math.uu.se Regression and Analysis of Variance autumn 2014 A factorial experiment

More information

Unit 6 - Simple linear regression

Unit 6 - Simple linear regression Sta 101: Data Analysis and Statistical Inference Dr. Çetinkaya-Rundel Unit 6 - Simple linear regression LO 1. Define the explanatory variable as the independent variable (predictor), and the response variable

More information

Interactions in Logistic Regression

Interactions in Logistic Regression Interactions in Logistic Regression > # UCBAdmissions is a 3-D table: Gender by Dept by Admit > # Same data in another format: > # One col for Yes counts, another for No counts. > Berkeley = read.table("http://www.utstat.toronto.edu/~brunner/312f12/

More information

Consider fitting a model using ordinary least squares (OLS) regression:

Consider fitting a model using ordinary least squares (OLS) regression: Example 1: Mating Success of African Elephants In this study, 41 male African elephants were followed over a period of 8 years. The age of the elephant at the beginning of the study and the number of successful

More information

TA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM

TA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM STAT 301, Fall 2011 Name Lec 4: Ismor Fischer Discussion Section: Please circle one! TA: Sheng Zhgang... 341 (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan... 345 (W 1:20) / 346 (Th

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Stat 231 Exam 2 Fall 2013

Stat 231 Exam 2 Fall 2013 Stat 231 Exam 2 Fall 2013 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed 1 1. Some IE 361 students worked with a manufacturer on quantifying the capability

More information

STA 101 Final Review

STA 101 Final Review STA 101 Final Review Statistics 101 Thomas Leininger June 24, 2013 Announcements All work (besides projects) should be returned to you and should be entered on Sakai. Office Hour: 2 3pm today (Old Chem

More information

Module 4: Regression Methods: Concepts and Applications

Module 4: Regression Methods: Concepts and Applications Module 4: Regression Methods: Concepts and Applications Example Analysis Code Rebecca Hubbard, Mary Lou Thompson July 11-13, 2018 Install R Go to http://cran.rstudio.com/ (http://cran.rstudio.com/) Click

More information

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson ) Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation

More information

Regression and the 2-Sample t

Regression and the 2-Sample t Regression and the 2-Sample t James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Regression and the 2-Sample t 1 / 44 Regression

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow) STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points

More information

IE 316 Exam 1 Fall 2011

IE 316 Exam 1 Fall 2011 IE 316 Exam 1 Fall 2011 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed 1 1. Suppose the actual diameters x in a batch of steel cylinders are normally

More information

IE 361 Exam 3 Fall I have neither given nor received unauthorized assistance on this exam.

IE 361 Exam 3 Fall I have neither given nor received unauthorized assistance on this exam. IE 361 Exam 3 Fall 2012 I have neither given nor received unauthorized assistance on this exam. Name Date 1 1. I wish to measure the density of a small rock. My method is to read the volume of water in

More information

STAT 212: BUSINESS STATISTICS II Third Exam Tuesday Dec 12, 6:00 PM

STAT 212: BUSINESS STATISTICS II Third Exam Tuesday Dec 12, 6:00 PM STAT212_E3 KING FAHD UNIVERSITY OF PETROLEUM & MINERALS DEPARTMENT OF MATHEMATICS & STATISTICS Term 171 Page 1 of 9 STAT 212: BUSINESS STATISTICS II Third Exam Tuesday Dec 12, 2017 @ 6:00 PM Name: ID #:

More information

Multiple Regression: Example

Multiple Regression: Example Multiple Regression: Example Cobb-Douglas Production Function The Cobb-Douglas production function for observed economic data i = 1,..., n may be expressed as where O i is output l i is labour input c

More information

Sample solutions. Stat 8051 Homework 8

Sample solutions. Stat 8051 Homework 8 Sample solutions Stat 8051 Homework 8 Problem 1: Faraway Exercise 3.1 A plot of the time series reveals kind of a fluctuating pattern: Trying to fit poisson regression models yields a quadratic model if

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.

More information

Dealing with Heteroskedasticity

Dealing with Heteroskedasticity Dealing with Heteroskedasticity James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Dealing with Heteroskedasticity 1 / 27 Dealing

More information

Checking the Poisson assumption in the Poisson generalized linear model

Checking the Poisson assumption in the Poisson generalized linear model Checking the Poisson assumption in the Poisson generalized linear model The Poisson regression model is a generalized linear model (glm) satisfying the following assumptions: The responses y i are independent

More information

IE 316 Exam 1 Fall 2011

IE 316 Exam 1 Fall 2011 IE 316 Exam 1 Fall 2011 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed 1 1. Suppose the actual diameters x in a batch of steel cylinders are normally

More information

STATISTICS 110/201 PRACTICE FINAL EXAM

STATISTICS 110/201 PRACTICE FINAL EXAM STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable

More information

PAPER 206 APPLIED STATISTICS

PAPER 206 APPLIED STATISTICS MATHEMATICAL TRIPOS Part III Thursday, 1 June, 2017 9:00 am to 12:00 pm PAPER 206 APPLIED STATISTICS Attempt no more than FOUR questions. There are SIX questions in total. The questions carry equal weight.

More information