STAT Fall HW8-5 Solution Instructor: Shiwen Shen Collection Day: November 9

Similar documents
Statistical Methods II. Derek L. Sonderegger

x 21 x 22 x 23 f X 1 X 2 X 3 ε

Homework for Lecture Regression Analysis Sections

Lab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model

Chapter 8: Simple Linear Regression

Stat 411/511 ESTIMATING THE SLOPE AND INTERCEPT. Charlotte Wickham. stat511.cwick.co.nz. Nov

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Chapter 12: Linear regression II

STAT 3022 Spring 2007

Inference for Regression

Multiple Linear Regression (solutions to exercises)

Forestry experiment and dose-response modelling. In the News: High water mark: the rise in sea levels may be accelerating Economist, Jan 17

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Variance Decomposition and Goodness of Fit

Biostatistics 380 Multiple Regression 1. Multiple Regression

L21: Chapter 12: Linear regression

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

MATH 644: Regression Analysis Methods

ST430 Exam 1 with Answers

The Classical Linear Regression Model

Handout 4: Simple Linear Regression

1 Multiple Regression

Outline. 1 Preliminaries. 2 Introduction. 3 Multivariate Linear Regression. 4 Online Resources for R. 5 References. 6 Upcoming Mini-Courses

Applied Regression Analysis

Regression. Bret Larget. Department of Statistics. University of Wisconsin - Madison. Statistics 371, Fall Correlation Plots

MODELS WITHOUT AN INTERCEPT

Density Temp vs Ratio. temp

lm statistics Chris Parrish

Chapter 12: Multiple Linear Regression

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

Inferences for Regression

Regression Analysis II

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Regression and Models with Multiple Factors. Ch. 17, 18

STA 4210 Practise set 2a

Introduction and Single Predictor Regression. Correlation

The Big Picture. Model Modifications. Example (cont.) Bacteria Count Example

Model Modifications. Bret Larget. Departments of Botany and of Statistics University of Wisconsin Madison. February 6, 2007

Linear regression and correlation

STAT 350: Summer Semester Midterm 1: Solutions

Math 2311 Written Homework 6 (Sections )

Tests of Linear Restrictions

Stat 412/512 REVIEW OF SIMPLE LINEAR REGRESSION. Jan Charlotte Wickham. stat512.cwick.co.nz

ST430 Exam 2 Solutions

Lecture 18: Simple Linear Regression

Introduction to Linear Regression

Leverage. the response is in line with the other values, or the high leverage has caused the fitted model to be pulled toward the observed response.

ANOVA (Analysis of Variance) output RLS 11/20/2016

AMS 7 Correlation and Regression Lecture 8

Announcements: You can turn in homework until 6pm, slot on wall across from 2202 Bren. Make sure you use the correct slot! (Stats 8, closest to wall)

Linear Regression Model. Badr Missaoui

Stat 4510/7510 Homework 7

Lecture 11: Simple Linear Regression

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.

Coefficient of Determination

Stat 401B Final Exam Fall 2015

Regression Analysis lab 3. 1 Multiple linear regression. 1.1 Import data. 1.2 Scatterplot matrix

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

Inferences on Linear Combinations of Coefficients

Poisson regression 1/15

Regression on Faithful with Section 9.3 content

Ch 2: Simple Linear Regression

Stat 5303 (Oehlert): Randomized Complete Blocks 1

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics January, 2018

Lecture 10. Factorial experiments (2-way ANOVA etc)

Multiple Regression: Example

Introduction to Mixed Models in R

15.1 The Regression Model: Analysis of Residuals

Statistical View of Least Squares

Comparing Nested Models

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Generalised linear models. Response variable can take a number of different formats

Ch 3: Multiple Linear Regression

R 2 and F -Tests and ANOVA

Lecture 4 Multiple linear regression

Statistics and Quantitative Analysis U4320

Multiple Linear Regression

Psychology 405: Psychometric Theory

Linear Probability Model

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Practice 2 due today. Assignment from Berndt due Monday. If you double the number of programmers the amount of time it takes doubles. Huh?

BIOSTATS 640 Spring 2018 Unit 2. Regression and Correlation (Part 1 of 2) R Users

De-mystifying random effects models

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

General Linear Statistical Models - Part III

Section Least Squares Regression

Lecture 19: Inference for SLR & Transformations

Chapter 12 - Part I: Correlation Analysis

610 - R1A "Make friends" with your data Psychology 610, University of Wisconsin-Madison

Simple Linear Regression

Homework 9 Sample Solution

5. Linear Regression

Simple Linear Regression

Chapter 11: Linear Regression and Correla4on. Correla4on

Hypothesis Testing hypothesis testing approach

Pumpkin Example: Flaws in Diagnostics: Correcting Models

Topics on Statistics 2

Inference with Heteroskedasticity

Transcription:

STAT 509 2016 Fall HW8-5 Solution Instructor: Shiwen Shen Collection Day: November 9 1. There is a gala dataset in faraway package. It concerns the number of species of tortoise on the various Galapagos Islands. There are 30 cases (Islands) and 7 variables in the dataset, including Species The number of species of tortoise found on the island Endemics The number of endemic species Elevation The highest elevation of the island (m) Nearest The distance from the nearest island (km) Scruz The distance from Santa Cruz island (km) Adjacent The area of the adjacent island (km 2 ) Fit a simple linear regression model with Species as response and Elevation as explanatory variable. Show me the output. > fit <- lm(species ~ Elevation, data=gala) > summary(fit) Call: lm(formula = Species ~ Elevation, data = gala) Residuals: Min 1Q Median 3Q Max -218.319-30.721-14.690 4.634 259.180 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 11.33511 19.20529 0.590 0.56 Elevation 0.20079 0.03465 5.795 3.18e-06 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 78.66 on 28 degrees of freedom Multiple R-squared: 0.5454, Adjusted R-squared: 0.5291 F-statistic: 33.59 on 1 and 28 DF, p-value: 3.177e-06 (a) Calculate Ŷ (a vector) and Ȳ (a number). > yhat <- predict(fit) > yhat Baltra Bartolome Caldwell Champion Coamano Daphne.Major 80.80921 33.22146 34.22542 20.57155 26.79611 35.22938 Daphne.Minor Darwin Eden Enderby Espanola Fernandina

STAT509 Homework 8-5 Solution - Page 2 of 5 November 2 30.00879 45.06820 25.59136 33.82384 51.09197 311.31865 Gardner1 Gardner2 Genovesa Isabela Marchena Onslow 21.17393 56.91494 26.59532 354.08739 80.20684 16.35492 Pinta Pinzon Las.Plazas Rabida SanCristobal SanSalvador 167.35065 103.29794 30.20958 85.02585 155.10232 193.25284 SantaCruz SantaFe SantaMaria Seymour Tortuga Wolf 184.81957 63.34029 139.84212 40.85157 48.68246 62.13554 > ybar <- mean(gala$species) > ybar [1] 85.23333 (b) Calculate SSTO, SSE, and SSR. > SSTO <- sum((gala$species - ybar)^2) > SSTO [1] 381081.4 > SSE <- sum((gala$species - yhat)^2) > SSE [1] 173253.9 > SSR <- sum((yhat - ybar)^2) > SSR [1] 207827.5 (c) Draw the scatter plot (with the regression line) and residual plot. Do you think the equal variance assumption holds? Both scatter plot and the residual plot indicate that the variance of the error term ɛ increases as the fitted value (Ŷ ) increases. The equal variance assumption clearly breaks. plot(gala$elevation, gala$species, xlab="elevation", ylab="species", pch=16) abline(fit) plot(yhat, residuals(fit), xlab="fitted Value", ylab="residuals")

STAT509 Homework 8-5 Solution - Page 3 of 5 November 2 (d) Use qq plot to check whether the normality assumption holds. It is clear that the tail part of the qq plot doesn t pass the fat-pencil test. Therefore, we suspect the normality assumption doesn t hold perfectly here. qqnorm(residuals(fit)) qqline(residuals(fit)) (e) Re-fit the model with the transformation log Y, and draw the scatter plot, residual plot, and qq plot. Make comments to each plot. Does the transformation make your model better? After transformation, the scatter plot looks better in the way that not all points are concentrated in the corner (meaning extreme large values of Species are relatively smaller due to log transformation.) and magnitude of the variance is more similar. From the residual plot, we can confirm this observation. Overall, the model is better than the original one. > fit2 <- lm(log(species) ~ Elevation, data=gala) > summary(fit2)

STAT509 Homework 8-5 Solution - Page 4 of 5 November 2 (Intercept) 2.5913986 0.2879198 9.000 9.33e-10 *** Elevation 0.0024895 0.0005194 4.793 4.88e-05 *** Residual standard error: 1.179 on 28 degrees of freedom Multiple R-squared: 0.4507, Adjusted R-squared: 0.4311 F-statistic: 22.97 on 1 and 28 DF, p-value: 4.885e-05 plot(gala$elevation, log(gala$species), xlab="elevation", ylab="log-species", pch=16) abline(fit2) plot(predict(fit2), residuals(fit2), xlab="fit 2 Fitted Value", ylab="residuals") (f) Re-fit the model with the transformation Y, and draw the scatter plot, residual plot, and qq plot. Make comments to each plot. Does the transformation make your model better? The megaphone shape of the variance still exist observing from the scatter plot and residual plot. It means the variance goes large when the fitted value goes large. The squre root transformation doesn t make the model any better. > fit3 <- lm(sqrt(species) ~ Elevation, data=gala) > summary(fit3) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) 3.954656 0.874615 4.522 0.000102 *** Elevation 0.009753 0.001578 6.181 1.13e-06 *** Residual standard error: 3.582 on 28 degrees of freedom Multiple R-squared: 0.5771, Adjusted R-squared: 0.562 F-statistic: 38.21 on 1 and 28 DF, p-value: 1.125e-06

STAT509 Homework 8-5 Solution - Page 5 of 5 November 2 plot(gala$elevation, sqrt(gala$species), xlab="elevation", ylab="sqrt-species", pch=16) abline(fit3) plot(predict(fit3), residuals(fit3), xlab="fit 3 Fitted Value", ylab="residuals") (g) Compare the coefficient of determination in the original regression model and the model with Y transformation. Make comments. From summary(fit) we find the coefficient of determination in the original model is 0.5454, and from summary(fit3), the one in sqare root transferred model is 0.5771. Even though the sqre root transformation doesn t solve the unequal variance assumption problem, it slightly increases the R 2. The interpretation for the transferred model is: Elevation explains the 57.74% variability of the Species. Note: if you have problem loading faraway package, download the gala dataset from the course webpage and save it in D drive. Run the following code to load. gala <- read.table("d:/galadata.txt", sep="\t")