HOMEWORK ANALYSIS #3 - WATER AVAILABILITY (DATA FROM WEISBERG 2014)

Similar documents
HOMEWORK ANALYSIS #2 - STOPPING DISTANCE

appstats27.notebook April 06, 2017

Chapter 27 Summary Inferences for Regression

Unit 6 - Introduction to linear regression

ST505/S697R: Fall Homework 2 Solution.

Inferences for Regression

Warm-up Using the given data Create a scatterplot Find the regression line

STAT 350 Final (new Material) Review Problems Key Spring 2016

Unit 6 - Simple linear regression

Chapter 2: Looking at Data Relationships (Part 3)

Final Exam - Solutions

Regression Analysis: Exploring relationships between variables. Stat 251

Lecture 11: Simple Linear Regression

Final Exam - Solutions

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Chapter 3: Examining Relationships

Simple Linear Regression for the Climate Data

L21: Chapter 12: Linear regression

Mrs. Poyner/Mr. Page Chapter 3 page 1

Can you tell the relationship between students SAT scores and their college grades?

Simple Linear Regression

ASSIGNMENT 3 SIMPLE LINEAR REGRESSION. Old Faithful

Correlation Analysis

Statistics 191 Introduction to Regression Analysis and Applied Statistics Practice Exam

STATISTICS 110/201 PRACTICE FINAL EXAM

Chapter 24. Comparing Means

Ch Inference for Linear Regression

Lecture 30. DATA 8 Summer Regression Inference

Midterm 2 - Solutions

AMS 7 Correlation and Regression Lecture 8

y n 1 ( x i x )( y y i n 1 i y 2

Chi-square tests. Unit 6: Simple Linear Regression Lecture 1: Introduction to SLR. Statistics 101. Poverty vs. HS graduate rate

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Announcements. Unit 6: Simple Linear Regression Lecture : Introduction to SLR. Poverty vs. HS graduate rate. Modeling numerical variables

Looking at data: relationships

Correlation and Regression (Excel 2007)

Section 3: Simple Linear Regression

Solutions to Practice Test 2 Math 4753 Summer 2005

Inference for Regression

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

Assumptions, Diagnostics, and Inferences for the Simple Linear Regression Model with Normal Residuals

Introduction to Regression

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

Sociology 6Z03 Review II

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Density Temp vs Ratio. temp

Econometrics Review questions for exam

Homework 1 Solutions

The following formulas related to this topic are provided on the formula sheet:

Statistics 5100 Spring 2018 Exam 1

Stat 5102 Final Exam May 14, 2015

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Correlation and Regression

t-test for b Copyright 2000 Tom Malloy. All rights reserved. Regression

Linear Regression Communication, skills, and understanding Calculator Use

The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

Scatterplots and Correlation

Sociology 593 Exam 2 Answer Key March 28, 2002

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

7.0 Lesson Plan. Regression. Residuals

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Homework for Lecture Regression Analysis Sections

INFERENCE FOR REGRESSION

The Mean Version One way to write the One True Regression Line is: Equation 1 - The One True Line

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

ST430 Exam 1 with Answers

Stat 529 (Winter 2011) A simple linear regression (SLR) case study. Mammals brain weights and body weights

Introduction to Linear Regression

This document contains 3 sets of practice problems.

Inference for Regression Simple Linear Regression

Midterm 2 - Solutions

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear

Business Statistics. Lecture 9: Simple Regression

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

Survey on Population Mean

McGill University. Faculty of Science MATH 204 PRINCIPLES OF STATISTICS II. Final Examination

1. Define the following terms (1 point each): alternative hypothesis

Extra Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences , July 2, 2015

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Relationships Regression

UNIT 12 ~ More About Regression

Ch 2: Simple Linear Regression

ECO220Y Simple Regression: Testing the Slope

Multiple Linear Regression for the Supervisor Data

SMAM 314 Exam 49 Name. 1.Mark the following statements true or false (10 points-2 each)

LECTURE 5. Introduction to Econometrics. Hypothesis testing

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Review of Statistics 101

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Lecture notes on Regression & SAS example demonstration

Do Now 18 Balance Point. Directions: Use the data table to answer the questions. 2. Explain whether it is reasonable to fit a line to the data.

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Simple Linear Regression Using Ordinary Least Squares

Stat 139 Homework 2 Solutions, Spring 2015

Applied Statistics and Econometrics

determine whether or not this relationship is.

Transcription:

HOMEWORK ANALYSIS #3 - WATER AVAILABILITY (DATA FROM WEISBERG 2014) 1. In your own words, summarize the overarching problem and any specific questions that need to be answered using the water data. Discuss how statistical modeling will be able to answer the posed questions. To receive full credit for this question, students should: (a) (2.5pts) State that we need to predict how much water runoff will result from snowfall. (b) (2.5pts) State that statistical modeling allows us to use the linear relationship to get the prediction. (c) (Example) California is experiencing a drought, therefore being able to predict how much water will be available from water runoff from the Sierra Nevada Mountains would be very useful for determining what restrictions on water consumption will need to be put in place to make sure there is enough water. If we can determine a linear relationship between snow fall and runoff exists we should be able model that relationship and use snowfall for predicting runoff. 2. Use the data to assess if a simple linear regression (SLR) model is suitable to analyze the water data. Justify your answer using any necessary graphics and relevant summary statistics that would suggest an SLR model would be successful at achieving the goals of the study. Provide discussion on why an SLR model is or is not appropriate. To receive full credit for this question, students should: (a) (3pts) draw a scatterplot like the one below (make sure the x and y axes are labeled appropriately). (b) (2pts) comment that the scatterplot looks roughly linear with a correlation of r = 0.94 so a linear model is appropriate. (c) If we plot the data on a scatterplot we can see that there seems to be a linear relationship between snowfall and runoff. In fact, with a r = 0.94 we know that the linear relationship is quite strong. Therefore, it looks like using an SLR model is an appropriate method to analyze the data. Code for Problem 2: plot(water$precip,water$runoff,pch=19,xlab="snowfall",ylab="runoff") cor(water$precip,water$runoff) 3. Write out (in mathematical form with greek letters) a justifiable SLR model that would help answer the questions in problem. Clearly state any assumptions you are using in your model. Provide an interpretation of each mathematical term (β parameter) included in your model. Using the mathematical form, discuss how your model, after fitting it to the data, will be able to answer the questions in this problem. The model that I used is Runoff i = β 0 +β 1 Snow i +ɛ i where ɛ i N (0, σ 2 ) students need all parts (including the ɛ and normal distribution piece). If they used a transformation or centered their x s then, that is fine but I didn t think it was necessary. To receive full credit for this question, students should: 1

Runoff 40000 60000 80000 100000 120000 140000 5 10 15 20 25 30 Snowfall (a) (1pt) Write out the model Runoff i = β 0 + β 1 Snow i + ɛ i where ɛ i N (0, σ 2 ) students need all parts (including the ɛ and normal distribution piece) (b) (2pts) interpret β 0 and β 1 appropriately. (c) (2pts) mention assumptions of linearity, independence, normality and equal variance. (d) (Example) We will use the following model: Runoff i = β 0 + β 1 Snow i + ɛ i where ɛ i N (0, σ 2 ) i. β 0 - The y-intercept. For all years with no snow, runoff will be β 0 on average. ii. β 1 - The slope. As snow increases by one unit, runoff will increase by β 1, on average. After fitting this model to the data we will be able to predict runoff for a given year by just plugging in the amount of snowfall for that year into the equation. To use this model we are assuming linearity, independence, normality, and equal variance. 4. Fit your model in #3 to the water data and summarize the results by displaying the fitted model in equation form (do NOT just provide a screen shot of the R or SAS output). Interpret each of the fitted parameters in the context of the problem. Provide a plot of the data with a fitted regression line. My model was estimated to be Runoff = 27014 + 3752.5 Snow but if they used a transformation that is OK. (a) (1pt) State their estimated equation in equation form (not a snapshot of R output). (b) (2pts) interpret each estimate in context (c) (2pts) provide a plot of the fitted regression line like the one below. (d) (Example) By fitting our data to the model we get the following equation: Runoff = 27014 + 3752.5 Snow i. 27014 is our estimate of β 0 ; for a year with no snow we would expect a runoff of 27014 acre-feet. 2

Runoff 40000 60000 80000 100000 120000 140000 5 10 15 20 25 30 Snowfall ii. 3752.5 is our estimate of β 1 ; for every one unit increase in snow, runoff increases by 3752.5 acre-feet, on average. Code for Problem 4: my.lm <- lm(runoff Precip,data=water) plot(water$precip,water$runoff,pch=19,xlab="snowfall",ylab="runoff") abline(my.lm$coef[1],my.lm$coef[2],col="red",lwd=3) 5. Justify your model assumptions using appropriate graphics or summary statistics. Assess the fit and predictive capability of your model. Discuss on the level of your target audience (e.g. interpret your model R 2 ). (a) (1pt) State their model R 2 (I got R 2 = 0.88) (b) (2pts) Interpret R 2 in context of the problem. E.g. The percent of variability in runoff explained by snowfall is R 2. (c) (2pts) Look at plots like those below and appropriately justify the assumptions of linearity, independence, normality and equal variance. (d) (example) We believe all four assumptions, linearity, indepenence, normality, and equal variance, are met. Looking at the scatterplot of the data we can see that there is a linear relationship. We can reasonably assume that runoff from any one year does not affect the runoff of any other year and are therefore independent. The histogram of standardized residuals looks normal. Finally, based on the Residuals Vs Fitted Values plot we can see that the residuals vary relatively equally about the line. The R 2 for this data was.88. This means that 88% of the variablilty in runoff is explained by snowfall. Considering that such a high percentage of variabilty is explained by our model, we happy with how well our model fits the data. After performing a cross validation procedure using 6 randomly selected obervations as our test data, we calculated a bias of -4212. This tells us that on average our predictions are too low. We also calculated rpmse to be 9218, telling us that our predictions miss the mark by an average of 9218 acre-feet. Considering the scale of the data 9218 acre-feet isn t a very significant error and the bias of -4212 isnt very large either. Therefore, we feel that our model predicts well enough to rely on it for predicting runoff for snowfall levels that are within the range of our data. 3

Histogram of sres Std. Resids 2 1 0 1 2 Density 0.0 0.1 0.2 0.3 0.4 0.5 0.6 60000 80000 100000 120000 140000 Fitted Values 2 1 0 1 2 sres Code for Problem 5: sres <- stdres(my.lm) plot(my.lm$fitted.values,sres,pch=19,xlab="fitted Values",ylab="Std. Resids") abline(h=0,lwd=3,col="red") hist(sres,freq=false) lines(seq(-3,3,length=100),dnorm(seq(-3,3,length=100)),col="red",lwd=3) n.test<- 6 test.obs<-sample(1:nrow(water),n.test) test.data<-water[test.obs,] train.data<-water[-test.obs,] train.data train.lm<-lm(runoff Precip,data=train.data) summary(train.lm) preds<-predict(train.lm,newdata=test.data) bias<-mean(preds-test.data$runoff) pmse<-mean((preds-test.data$runoff)ˆ2) rpmse<-sqrt(pmse) 6. Carry out a test that there is no relationship between snowfall and runoff (e.g., write out the hypotheses, report an appropriate p-value and conclude in context). (a) (1pts) Null and alternative hypothesises written correctly. (b) (2pts) Report a p-value of approximately zero (c) (2pts) State that they reject the null hypothesis and conclude that there is a relationship between snowfall and runoff. (d) (Example) We ran a hypothesis test on the slope with H 0 : β 1 = 0 and H a : β 1 0. That resulted in a p-value that was approximately 0 so we will reject the null hypothesis and conclude that there is a significant relationship between snow and runoff. Code for Problem 6: summary(my.lm) 4

7. Construct 95% confidence intervals for the slope and intercept parameters and interpret these intervals in the context of the problem. (a) (3pts) Report 95% confidence intervals for β 0 as (20513, 33515) and for β 1 as (3316,4188). NOTE: if the students did a transformation or centered their x s then their intervals will be slightly different. (b) (2pts) Interpret these intervals in context. E.g. I am 95% confident that as snowfall increases by 1 then the runoff will increase by between 3316 and 4188 acre-feet, on average. (c) (example) Because β 0 and β 1 are just point estimates of the parameters, we calculated 95% confidence intervals for each. The 95% confidence interval for β 0 is (20513, 33515). This means we are 95% confident that when snow is 0, we expect runoff to be between 20513 and 33515 acre-feet. The 95% confidence interval for β 1 is (3316, 4188). This means that we are 95% confident as snowfall increases by one unit, runoff will increase by between 3316 and 4188 acre-feet, on average. Code for Problem 7: confint(my.lm) 8. In the winter of 2013-2014, the site only received 4.5 inches of snowfall. What do you predict will be the associated runoff? Provide a 95% predictive interval and interpret the interval in the context of the problem. Do you have any hesitations performing this prediction (hint: you should)? Describe these hesitations and their potential impact on your prediction. To receive full credit, students must, (a) (2pts) Predict for 4.5 inches of snowfall and present a 95% prediction interval. My prediction for 4.5 inches of snowfall is 43900.77 with a 95% prediction interval of (25254.2, 62547.34). (b) (2pts) Interpret the prediction interval in the context of the problem. (c) (1pt) Mention that 4.5 inches of snowfall is outside of the range of the data so this is an extrapolation; therefore, this prediction might be off. (d) (example) If there is only 4.5 inches of snowfall we predict a runoff of 43900.77 acre-feet. A 95% prediction interval for this estimate is (25254.2, 62547.34). Therefore, we are 95% confident that for one year that has 4.5 inches of snowfall then the resulting runoff will be between 24254.2 and 62547.34 acre-feet. However, this prediction may not be very accurate because we are extrapolating by predicting snowfall for 4.5 inches because 4.5 is outside the range of our collected data. Code for Problem 8: predict.lm(my.lm,newdata=data.frame(precip=c(4.5)),interval="prediction") 5