ASSIGNMENT 3 SIMPLE LINEAR REGRESSION. Old Faithful

Similar documents
determine whether or not this relationship is.

Module 8: Linear Regression. The Applied Research Center

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

Applied Regression Analysis

BIOSTATS Intermediate Biostatistics Spring 2015 Class Activity Unit 1: Review of BIOSTATS 540

STAT 135 Lab 11 Tests for Categorical Data (Fisher s Exact test, χ 2 tests for Homogeneity and Independence) and Linear Regression

Basic Statistics Exercises 66

SLR output RLS. Refer to slr (code) on the Lecture Page of the class website.

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Eruptions of the Old Faithful geyser

Inferences for Regression

IT 403 Practice Problems (2-2) Answers

Chapter 3: Examining Relationships

Linear Regression Communication, skills, and understanding Calculator Use

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Lab #10 Atomic Radius Rubric o Missing 1 out of 4 o Missing 2 out of 4 o Missing 3 out of 4

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear

Chapter 11. Correlation and Regression

Stat 101 L: Laboratory 5

ECON 497 Midterm Spring

Math 243 OpenStax Chapter 12 Scatterplots and Linear Regression OpenIntro Section and

Unit 6 - Introduction to linear regression

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Example: Data from the Child Health and Development Study

Do Now 18 Balance Point. Directions: Use the data table to answer the questions. 2. Explain whether it is reasonable to fit a line to the data.

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

Six Sigma Black Belt Study Guides

y response variable x 1, x 2,, x k -- a set of explanatory variables

1 Introduction to Minitab

School of Mathematical Sciences. Question 1

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Inference for Regression Simple Linear Regression

SIMPLE REGRESSION ANALYSIS. Business Statistics

Chapter 16. Simple Linear Regression and Correlation

Exam Empirical Methods VU University Amsterdam, Faculty of Exact Sciences h, February 12, 2015

Scatterplots. 3.1: Scatterplots & Correlation. Scatterplots. Explanatory & Response Variables. Section 3.1 Scatterplots and Correlation

Inference for Regression Inference about the Regression Model and Using the Regression Line

Least Squares Regression

CHAPTER 10. Regression and Correlation

Hypothesis Tests and Estimation for Population Variances. Copyright 2014 Pearson Education, Inc.

Inference for the Regression Coefficient

1. Define the following terms (1 point each): alternative hypothesis

Unit 6 - Simple linear regression

UNIT 12 ~ More About Regression

1 Correlation and Inference from Regression

Ecn Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman. Midterm 2. Name: ID Number: Section:

Simple Linear Regression

appstats8.notebook October 11, 2016

AMS 7 Correlation and Regression Lecture 8

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

ALGEBRA 2 MIDTERM REVIEW. Simplify and evaluate the expression for the given value of the variable:

Review of Multiple Regression

Important note: Transcripts are not substitutes for textbook assignments. 1

appstats27.notebook April 06, 2017

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS

2: SIMPLE HARMONIC MOTION

Chapter 16. Simple Linear Regression and dcorrelation

Reteach 2-3. Graphing Linear Functions. 22 Holt Algebra 2. Name Date Class

THE PEARSON CORRELATION COEFFICIENT

Watch TV 4 7 Read 5 2 Exercise 2 4 Talk to friends 7 3 Go to a movie 6 5 Go to dinner 1 6 Go to the mall 3 1

Analyzing Lines of Fit

IF YOU HAVE DATA VALUES:

SCHOOL OF MATHEMATICS AND STATISTICS

23. Inference for regression

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

28. SIMPLE LINEAR REGRESSION III

STAT5044: Regression and Anova. Inyoung Kim

The response variable depends on the explanatory variable.

The following formulas related to this topic are provided on the formula sheet:

Correlation and Regression (Excel 2007)

HOMEWORK ANALYSIS #3 - WATER AVAILABILITY (DATA FROM WEISBERG 2014)

Lab 1 Uniform Motion - Graphing and Analyzing Motion

How to Run the Analysis: To run a principal components factor analysis, from the menus choose: Analyze Dimension Reduction Factor...

SMAM 314 Practice Final Examination Winter 2003

Retrieve and Open the Data

Correlation and Regression

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Introduction to Simple Linear Regression

The Model Building Process Part I: Checking Model Assumptions Best Practice

Error Analysis, Statistics and Graphing Workshop

Chapter 9. Correlation and Regression

Tutorial 6: Linear Regression

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

Please, read the following instructions carefully:

Chapter 4 Describing the Relation between Two Variables

Chapter 3: Examining Relationships

MODELING. Simple Linear Regression. Want More Stats??? Crickets and Temperature. Crickets and Temperature 4/16/2015. Linear Model

SMAM 314 Exam 49 Name. 1.Mark the following statements true or false (10 points-2 each)

Chapter 2: Looking at Data Relationships (Part 3)

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

Results and Analysis 10/4/2012. EE145L Lab 1, Linear Regression

Correlation and Regression

Contents. Acknowledgments. xix

WISE Regression/Correlation Interactive Lab. Introduction to the WISE Correlation/Regression Applet

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

Transcription:

ASSIGNMENT 3 SIMPLE LINEAR REGRESSION In the simple linear regression model, the mean of a response variable is a linear function of an explanatory variable. The model and associated inferential tools are valid as long as the assumptions of independence, normality, and constant variance are not violated. In this assignment, you will use linear regression tools in SPSS to predict the time between eruptions from the duration and estimated height of the previous eruption for the Old Faithful Geyser in Yellowstone National Park, Wyoming. Two simple linear regression models will be examined and compared. Old Faithful Old Faithful Geyser, which throws about ten thousand gallons of water and steam up to one hundred seventy feet in the air, derives its name from the regularity of its eruptions. Old Faithful is not, in fact, the largest geyser in the park, but it has become a popular destination because it erupts more frequently than any of the other big geysers. Every geyser consists of a reservoir for water storage, an opening through which steam and hot water can be released, a heat source, and underground channels to replenish the water supply after an eruption. The huge amounts of water which erupt from geysers, such as Old Faithful, deplete water from the geyser s two reservoirs. Since a long eruption depletes more water than a short eruption, it takes more time to refill the reservoir after a longer eruption. In other words, a longer time interval is needed until the next eruption. Thus, there is a relationship between the duration of an eruption and the waiting time until its next eruption. In the lab assignment, you will explore the relationship and you will use the relationship to predict times of eruption for a geyser. Accurate predictions of eruption times allow the Park Service to inform visitors of the approximate time of the next eruption. The data from July 1995 eruptions are available in the SPSS file lab3.sav located on the STAT 252 Laboratories web site at http://www.stat.ualberta.ca/statslabs/index.htm (click Stat 252 link, and Data for Lab 3). The data are not to be printed in your submission. Note some values are missing in the data file. Variable Name DAY DURATION INTERVAL HEIGHT Description of Variable Day of July 1995 on which data were collected, Length of the previous eruption in minutes, Length of time (in minutes) from the end of the previous eruption to the beginning of the next eruption (waiting time until next eruption), Estimated height of previous eruption (in feet). 1

1. In this part, you will use the Line Chart feature in the Graphs menu to examine the pattern in the mean duration and the mean height of Old Faithful s eruptions over the 31-day period. Obtain a line chart of the mean eruption height over the 31-day period. Paste the chart into your report. Is there any trend? What is the typical mean height? How does the mean height vary over the period? What factor might distort the height measurements? Obtain a line chart of mean duration over the 31-day period. Paste the chart into your report. Is there any trend? What is a typical duration of an eruption? 2. Obtain a scatterplot of the interval between eruptions (or waiting time) and the duration of the previous eruption. Paste the scatterplot with the title and the names of the axes (Duration, Interval) into your report. Use the scatterplot to describe the distributions of interval between eruptions and duration of eruption. Specifically, answer the following four questions: What is the shape of each distribution, its center, and variability? How long do Old Faithful s eruptions last? How do intervals between eruptions vary? What does the distribution of the interval between eruptions tell you about the regularity of Old Faithful s eruptions? Describe the overall pattern of the relationship involving the interval between eruptions and the duration of eruptions. Is the relationship reasonably strong or quite weak? Is it linear? Is the association positive, negative, or neither? Are there any outliers in the plot? 3. Now you will use the Correlate feature in the Analyze menu to quantify the strength of the relationship involving the interval between eruptions and the duration of eruptions. Calculate the Pearson s correlation coefficient (Pearson) involving the interval between eruptions and the duration of eruptions. Do the sign and magnitude of the coefficient confirm the conclusions you reached in part of Question 2? Explain briefly. 4. Imagine that you have just arrived at Yellowstone National Park and just missed the most recent eruption of Old Faithful Geyser. How long would you have to wait until the next eruption of Old Faithful? In this part, you will use simple linear regression to make predictions about waiting time until the next eruption given the duration of the previous eruption. Define a simple linear regression model using INTERVAL as the response variable and DURATION as the explanatory variable. State the model assumptions. Then use the Regression tool in SPSS to perform the regression and use the SPSS output to answer the following questions: What is the equation of the least-squares regression line? What is the meaning of the slope of the regression line? Insert the line into the corresponding scatterplot and paste it into your report. Does the line provide a good fit? Are there any outliers? What percent of the variation in the time between eruptions can be explained by the duration of the previous eruption? How useful is regression to make predictions about the time interval between eruptions? 2

(d) (e) (f) (g) (h) Based on the estimated regression line, predict the amount of time between the current eruption and the next eruption, given that the duration of the current eruption is 2 minutes. Obtain a 95% confidence interval for the mean interval between eruptions when the duration of the current eruption is 2 minutes. What is a 95% prediction interval for the interval between eruptions when duration is 2 minutes? Is linear regression on duration of any value in explaining interval between eruptions? Define the null and alternative hypotheses about the slope of the population regression line, obtain the value of the test statistic and its P-value from the output, and give your conclusion. What is the distribution of the test statistic under the null hypothesis? What are the sums of squares of residuals for the model with the intercept only (equal means model) and the regression model specified in part? Obtain the plot of standardized residuals versus standardized predicted values for the regression model and paste it into your report. Describe the pattern of the residuals. Do the residuals appear to be randomly and uniformly scattered about a horizontal line at zero? What are the consequences for violating the assumption being checked here? Does the assumption of normality for the response variable seem appropriate? Answer the question by obtaining a normal P-P plot of the standardized residuals and paste the plot into your report. Comment about the plot. Is there any evidence that the relationship between INTERVAL and DURATION may have changed through time? Answer the question by obtaining a sequence plot of residuals vs. the time order of observations (Sequence feature in the Graph menu). Paste the chart into your report. Is there any trend? Comment briefly. 5. In the previous question, you have used linear regression to predict the time interval between eruptions using duration of the last eruption as the explanatory variable. However, height of eruption may also be useful in predicting the time interval between eruptions of a geyser. In particular, the product of height and duration provides some information about the amount of water that is expelled from the geyser during eruption. Obtain a new variable which is the product of duration and height. Then obtain a scatterplot of the length of time between eruptions vs. the product of height and duration. Paste the scatterplot with a title and the names of the axes (Interval, Duration*Height) into your report. Describe the overall pattern of the relationship involving the time interval between eruptions and the product of height and duration. Is the relationship reasonably strong or quite weak? Is it linear? Is the association positive, negative, or neither? Are there any outliers in the plot? Fit a new regression model to predict the time interval between eruptions by using the product of height and duration as the explanatory variable and report the estimated regression line. Moreover, report the R 2 value for the new model. In order to verify the regression assumptions in this case, examine the plot of standardized residuals and normal probability plot and comment. You are not required to paste the plots into your report. Is the model better than the model discussed in Question 3? Explain briefly. 3

LAB 3 ASSIGNMENT: MARKING SCHEMA Proper Header: 10 points Question 1 (12) Line chart of mean height: 3 points Trend, typical mean height, mean height variation, height distortion: 4 points Line chart of mean duration: 3 points Trend, typical duration: 2 points Question 2 (12) Scatterplot: 3 points Shape, center, and variability of interval between eruptions: 2 points Shape, center, and variability of duration of eruptions: 2 points Regularity of Old Faithful s eruptions: 2 points Question 3 (5) Pearson s correlation coefficient: 3 points Comparison with scatterplot in Question 2: 2 points Question 4 (49) (d) (e) Simple regression model: 2 points Model assumptions: 2 points Equation of the least-squares line: 3 points Meaning of the slope: 1 point Scatterplot with the least-squares line: 4 points Quality of the fit: 1 point Percent of the variation: 2 points Utility of the linear regression: 1 point Point estimate: 2 points 95% confidence interval: 2 points 95% prediction interval: 2 points Null and alternative hypotheses: 2 points Test statistic: 1 point P-value: 1 point Conclusion: 1 point Null distribution: 2 points Sum of squares residuals: 2 points (1 point for each model) 4

(f) (g) (h) Plot of residuals: 3 points Consequences of violating the equal variance assumption: 2 points Normal probability plot: 3 points Comments on the plot: 2 points Plot of residuals versus time: 3 points Trend: 2 points Question 5 (12) Scatterplot: 3 points Estimated regression line: 2 points R 2 value: 2 points Model comparison: 2 points TOTAL = 10 + 12 + 12 + 5 + 49 + 12 = 100 5