Homework for Lecture Regression Analysis Sections

Similar documents
AMS 7 Correlation and Regression Lecture 8

Poisson regression 1/15

STAT Fall HW8-5 Solution Instructor: Shiwen Shen Collection Day: November 9

x 21 x 22 x 23 f X 1 X 2 X 3 ε

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Correlation and Regression (Excel 2007)

Psychology 282 Lecture #4 Outline Inferences in SLR

Can you tell the relationship between students SAT scores and their college grades?

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Business Statistics. Lecture 10: Course Review

Homework 2: Simple Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression

Chapter 12: Linear regression II

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Simple Linear Regression

Table 1: Fish Biomass data set on 26 streams

Introduction and Single Predictor Regression. Correlation

Chapter 12 - Lecture 2 Inferences about regression coefficient

Inference for Regression Simple Linear Regression

Inferences for Regression

1. Define the following terms (1 point each): alternative hypothesis

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Business Statistics. Lecture 9: Simple Regression

HOMEWORK ANALYSIS #3 - WATER AVAILABILITY (DATA FROM WEISBERG 2014)

Chapter 12 - Part I: Correlation Analysis

Density Temp vs Ratio. temp

Inference with Simple Regression

Statistics: CI, Tolerance Intervals, Exceedance, and Hypothesis Testing. Confidence intervals on mean. CL = x ± t * CL1- = exp

a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =

10.4 Hypothesis Testing: Two Independent Samples Proportion

Regression Analysis: Exploring relationships between variables. Stat 251

L21: Chapter 12: Linear regression

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

df=degrees of freedom = n - 1

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total

Two-Sample Inference for Proportions and Inference for Linear Regression

STAT 350 Final (new Material) Review Problems Key Spring 2016

Lectures on Simple Linear Regression Stat 431, Summer 2012

Unit 6 - Introduction to linear regression

SMAM 314 Exam 49 Name. 1.Mark the following statements true or false (10 points-2 each)

Lecture notes on Regression & SAS example demonstration

STATISTICS 141 Final Review

Class 24. Daniel B. Rowe, Ph.D. Department of Mathematics, Statistics, and Computer Science. Marquette University MATH 1700

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Regression Models - Introduction

23. Inference for regression

Lecture 3: Inference in SLR

Digital Terrain Analysis for Root River Watershed

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Ch Inference for Linear Regression

Lecture 30. DATA 8 Summer Regression Inference

Reducing Computation Time for the Analysis of Large Social Science Datasets

Ch 2: Simple Linear Regression

Applied Statistics and Econometrics

STAT Chapter 11: Regression

ECO220Y Simple Regression: Testing the Slope

TA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM

LECTURE 5. Introduction to Econometrics. Hypothesis testing

36-707: Regression Analysis Homework Solutions. Homework 3

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Stats Review Chapter 14. Mary Stangler Center for Academic Success Revised 8/16

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

Statistics for Engineers Lecture 9 Linear Regression

7.2 One-Sample Correlation ( = a) Introduction. Correlation analysis measures the strength and direction of association between

INFERENCE FOR REGRESSION

2. Outliers and inference for regression

Introduction to Linear regression analysis. Part 2. Model comparisons

The simple linear regression model discussed in Chapter 13 was written as

Conditions for Regression Inference:

Inference for Regression

Simple Linear Regression

Confidence Intervals, Testing and ANOVA Summary

Visual interpretation with normal approximation

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

Lectures 5 & 6: Hypothesis Testing

Econometrics Review questions for exam

SMAM 314 Exam 42 Name

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

Unit 6 - Simple linear regression

Correlation and Simple Linear Regression

Regression used to predict or estimate the value of one variable corresponding to a given value of another variable.

CHAPTER 3 Count Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

Chapter 7 Comparison of two independent samples

1. How will an increase in the sample size affect the width of the confidence interval?

Lecture 6: Linear Regression (continued)

Two-Sample Inferential Statistics

χ test statistics of 2.5? χ we see that: χ indicate agreement between the two sets of frequencies.

CHAPTER 10. Regression and Correlation

Any of 27 linear and nonlinear models may be fit. The output parallels that of the Simple Regression procedure.

STT 843 Key to Homework 1 Spring 2018

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

Lecture 6 Multiple Linear Regression, cont.

Outline. PubH 5450 Biostatistics I Prof. Carlin. Confidence Interval for the Mean. Part I. Reviews

Important note: Transcripts are not substitutes for textbook assignments. 1

Module 7 Practice problem and Homework answers

Transcription:

Homework for Lecture 14-15 Regression Analysis Sections 9.1-9.4 Consider the following data set labeled Gala, which describe the number of species of tortoise on the various Galapagos Islands. There are 30 cases (Islands) and 7 variables in the dataset: Species: The number of species of tortoise found on the island Endemics: The number of endemic species Elevation: The highest elevation of the island (m) Nearest: The distance from the nearest island (km) Scruz: The distance from Santa Cruz island (km) Adjacent: The area of the adjacent island (km 2 ) The data were presented by Johnson and Raven (1973) and also appear in Weisberg (1985). 1

1. In the following analysis, we consider the linear relationship between Elevation and Endemics a) What is the sample correlation between these two variables: r = 0.7929? Would you say this a strong correlation? Since r is quite close to 1, there is a quite strong positive correlation between the two variables. b) From the JMP output report the 95% confidence interval for the population correlation between Elevation and Endemics. Is this correlation statistically significant? Why or why not? The 95% CI for the population correlation, ρ, is given by (0.6056, 0.8979). The correlation is statistically significant (ρ 0), because 0 does NOT fall in the 95% CI.

c) According to the scatterplot, do the data appear to be linear? Which variable is the dependent variable and which is the independent? The data appear to be linear, however, the variance/spread in the data seems to get larger as Elevation (x) increases. Elevation is the independent variable while Endemics is the dependent variable. d) State the meaning of the coefficient of determination in the context of the problem? About 63% of the variation in Endemics can be explained by Elevation (according to this dataset).

e) Using the JMP output do a 6-step hypothesis test for the slope parameter, using a significance level of 0.05. (a) H 0 : β 1 = 0 vs. H 1 : β 1 0 where β 1 is the population expected change in Endemics when Elevation increases by 1 unit. (b) Level of significance α = 0.05 (c) Test statistic: t = b 1 0 = 0.05 (sampling distribution under H 0 is t s b1 0.007 with 30 2 = 28 df) (d) t = 6.89 and p-value< 0.001 (e) Reject H 0 because p-value < 0.05 (f) Conclude that there is a statistically significant linear relationship between Endemics and Elevation. f) Provide the Regression equation for predicting Endemics from Elevation Endemics= 7.18 + 0.05 Elevation g) Using the regression equation, predict the Endemics value for the following Elevation: 250, 600, and 1200. What is the error associated with these predictions? 7.18 + 0.05 250 = 19.98 7.18 + 0.05 600 = 37.18 7.18 + 0.05 1200 = 67.18 The associated error is given by the root mean square error: 16.95.

h) Do the residuals look healthy? Why or why not? The residuals do not look healthy as they appear to fan out. i) Are any assumptions of the model being appear to be violated? If so, which one(s)? We can see that the variance in the y s increases as x increases. So the assumption that the variance is constant is violated. 2. Consider the following dataset known as stronx. These data were collected in an experiment to study the interaction of certain kinds of elementary particles on collision with proton targets. The experiment was designed to test certain theories about the nature of the strong interaction. The cross-section(crossx) variable is believed to be linearly related to the inverse of the energy(energy - has already been inverted). At each level of the momentum, a very large number of observations were taken so that it was possible to accurately estimate the standard deviation of the response(sd). In the following analysis, we consider the linear relationship between crossx and momentum

a) What is the sample correlation between these two variables: r = 0.6477? Would you say this a strong correlation? We could say that there is a slightly strong negative correlation. b) From the JMP output report the 95% confidence interval for the population correlation between crossx and momentum. Is this correlation statistically significant? Why or why not? The 95% CI for the population correlation, ρ, is given by ( 0.9073, 0.0306). The correlation is statistically significant (ρ 0), because 0 does NOT fall in the 95% CI. c) According to the scatterplot, do the data appear to be linear? Which variable is the dependent variable and which is the independent? No, the data do not appear to be linear. Momentum is the independent variable and crossx is the dependent variable. d) State the meaning of the coefficient of determination in the context of the problem? About 42% of the variation in crossx can be explained by Momentum.

e) Using the JMP output do a 6-step hypothesis test for the slope parameter, using a significance level of 0.01. (a) H 0 : β 1 = 0 vs. H 1 : β 1 0 where β 1 is the population expected change in crossx when Momentum increases by 1 unit. (b) Level of significance α = 0.01 (c) Test statistic: t = b 1 0 (sampling distribution under H 0 is t with 8 df) s b1 (d) t = 2.40 and p-value= 0.0429 (e) Fail to reject H 0 because p-value <0.01 (f) We conclude that there is not a statistically significant linear relationship between crossx and Momentum. f) Provide the Regression equation for predicting crossx and momentum crossx = 281.20 0.79 momentum g) Using the regression equation, predict the crossx value for the following momentum values: 20, 75, and 120. What is the error associated with these predictions? We cannot predict the values for crossx since the model is not statistically significant. h) Do the residuals look healthy? Why or why not? No, the data look unhealthy. Residuals show a clear pattern! i) Are any assumptions of the model being appear to be violated? If so, which one(s)? Data are not linear. In addition the errors ((x i, y i ) given β 0, β 1 ) do not look independent. It appears to be an association.