a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 =

Similar documents
STAT 350: Summer Semester Midterm 1: Solutions

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

Stat 302 Statistical Software and Its Applications SAS: Simple Linear Regression

Stat 328 Final Exam (Regression) Summer 2002 Professor Vardeman

Chapter 1 Linear Regression with One Predictor

STAT 3A03 Applied Regression With SAS Fall 2017

Lecture 11: Simple Linear Regression

Overview Scatter Plot Example

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total

Multicollinearity Exercise

3 Variables: Cyberloafing Conscientiousness Age

Statistics 5100 Spring 2018 Exam 1

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Correlation and the Analysis of Variance Approach to Simple Linear Regression

General Linear Model (Chapter 4)

Inference for Regression

Table 1: Fish Biomass data set on 26 streams

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Lecture 3: Inference in SLR

ST505/S697R: Fall Homework 2 Solution.

Chapter 8 Quantitative and Qualitative Predictors

2.1: Inferences about β 1

STATISTICS 110/201 PRACTICE FINAL EXAM

Chapter 2 Inferences in Simple Linear Regression

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

Answer Keys to Homework#10

INFERENCE FOR REGRESSION

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Pre-Calculus Multiple Choice Questions - Chapter S8

Lecture 12 Inference in MLR

ANALYSES OF NCGS DATA FOR ALCOHOL STATUS CATEGORIES 1 22:46 Sunday, March 2, 2003

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

The simple linear regression model discussed in Chapter 13 was written as

Lecture 11 Multiple Linear Regression

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES

Lecture notes on Regression & SAS example demonstration

Chapter 14 Simple Linear Regression (A)

Inferences for Regression

Chapter 8 (More on Assumptions for the Simple Linear Regression)

STOR 455 STATISTICAL METHODS I

Math 3330: Solution to midterm Exam

Getting Correct Results from PROC REG

STATISTICS 479 Exam II (100 points)

Chapter 12 - Part I: Correlation Analysis

UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences. PROBLEM SET No. 5 Official Solutions

ST Correlation and Regression

Lecture 18 Miscellaneous Topics in Multiple Regression

Regression Models for Time Trends: A Second Example. INSR 260, Spring 2009 Bob Stine

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Assignment 9 Answer Keys

Answer to exercise 'height vs. age' (Juul)

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

Lecture 1 Linear Regression with One Predictor Variable.p2

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Regression Models - Introduction

The program for the following sections follows.

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang

STAT 401A - Statistical Methods for Research Workers

Statistical Modelling in Stata 5: Linear Models

2. Outliers and inference for regression

Use of Dummy (Indicator) Variables in Applied Econometrics

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

Econ 3790: Business and Economics Statistics. Instructor: Yogesh Uppal

Correlation Analysis

Effect of Centering and Standardization in Moderation Analysis

Ch 13 & 14 - Regression Analysis

Chapter 6 Multiple Regression

STAT 350 Final (new Material) Review Problems Key Spring 2016

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club

Multiple Linear Regression

Simple linear regression

A discussion on multiple regression models

9. Linear Regression and Correlation

EXST7015: Estimating tree weights from other morphometric variables Raw data print

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

13 Simple Linear Regression

Basic Business Statistics 6 th Edition

Multiple Regression Part I STAT315, 19-20/3/2014

Simple Linear Regression: One Qualitative IV

Simple Linear Regression

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

a. YOU MAY USE ONE 8.5 X11 TWO-SIDED CHEAT SHEET AND YOUR TEXTBOOK (OR COPY THEREOF).

Unit 11: Multiple Linear Regression

Mathematics for Economics MA course

Estimating σ 2. We can do simple prediction of Y and estimation of the mean of Y at any value of X.

Statistics for Managers using Microsoft Excel 6 th Edition

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Simple Linear Regression

Booklet of Code and Output for STAC32 Final Exam

Outline. Topic 22 - Interaction in Two Factor ANOVA. Interaction Not Significant. General Plan

ST430 Exam 2 Solutions

Handout 1: Predicting GPA from SAT

Six Sigma Black Belt Study Guides

Topic 18: Model Selection and Diagnostics

Statistics for exp. medical researchers Regression and Correlation

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Transcription:

Stat 28 Fall 2004 Key to Homework Exercise.10 a. There is evidence of a linear trend: winning times appear to decrease with year. A straight-line model for predicting winning times based on year is: Winning time = β 0 + β 1 year + error, where errors are assumed to be normally distributed with mean zero and some unknown variance σ 2. The slope of the line will be negative because time decreases as year increases. b. Similar response. c. In absolute value, the slope associated to women will be larger, because times appear to decrease more rapidly for women than for men. For example, winning times between 1980 and 2000 appear to have decreased approximately 10 minutes for men but over 0 minutes for women. d. It might be reasonable to use the model for men, but for women the predicted times 20 years out would appear to be too low. It is quite possible that winning times for women will decrease more slowly in the future, and thus the straight line model would probably under-predict actual winning times in 2020. Exercise.1 a. The least squares estimators of intercept and slope are (from JMP output): b 0 = 6.25 b 1 = -0.002 b. Results indicate the following: The sweetness index of orange juice is negatively associated to the watersoluble pectin content in the juice. When pectin increases by 1 ppm, the sweetness index is expected to decrease by 0.002 units. The scale of the data indicate that pectin content ranges from about 200 to about 400 ppm in orange juice. Thus a clearer interpretation of results is that, for example, every 10 ppm increase in pectin is expected to result in a 0.02 decrease in sweetness index units. When pectin is not present (0 ppm in the juice), sweetness can be expected to be as high as 6.25 units.

c. In juice with 00 ppm pectin content, expected sweetness is 6.25 0.002 x 00 = 5.56 units. Exercise.14 The SAS program that was used to answer the questions in this problem is given below. The path in the infile statement needs to be the appropriate path for your system. data birds ; infile 'C:\Stat 28\Text Data\Data Files\Chap\Condor2.dat'; input sample nkill nestoccup ; run ; symbol1 v = dot i = none h = 2 ; axis1 order = (0 to 5 by 1) label = (f = triplex a = 90 h = 2 'No. Flycatchers killed') minor = none length = 60 pct ; axis2 order = (20 to 70 by 5) label = (f = triplex h = 2 '% next box tit occupancy') minor = none length = 75 pct ; proc gplot data = birds ; plot nkill * nestoccup / vaxis = axis1 haxis = axis2 frame ; title f = triplex h = 'Birds data' ; run ; proc reg data = birds ; model nkill = nestoccup ; run ; a. The scatter plot of the data obtained from SAS is below. The frequency of flycatchers killed, while larger when the proportion of nests is occupied by tits, seems to be only modestly linearly associated to it. A curve (rather than a straight line) might be a better model for these data. A straight line might be used as a first rough model.

b. See SAS output below. The values of the estimated intercept and slope are, respectively -.05 and 0.108. Those values are interpreted as follows: The number of flycatchers killed can be expected to increase by about 0.108 when next box tit occupancy increases by 1%. When next box tit occupancy is 0%, the number of flycatchers killed can be expected to be -.05. This of course is nonsensical from a subjectmatter point of view and suggests that a curve rather than a straight-line might be a better model. 12:19 Friday, October 1, 2004 1 Birds data The REG Procedure Model: MODEL1 Dependent Variable: nkill Analysis of Variance Sum of Mean Pr > F Source DF Squares Square F Value

0.0018 Model 1 19.11669 19.11669 16.0 Error 12 14.1188 1.19266 Corrected Total 1.42857 Root MSE 1.09209 R-Square 0.5719 Dependent Mean 1.42857 Adj R-Sq 0.562 Coeff Var 76.44618 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 -.04686 1.155-2.64 0.0217 nestoccup 1 0.10766 0.02689 4.00 0.0018 Exercise.16 a. With n = 9, we have n-2 = 7 degrees of freedom for error. Therefore, S 2 = SSE/n-2 = 0.219 / 7 = 0.01. b. S = 0.177 (the square root of S 2 ). The value S is a measure of the variability of the observations around the fitted regression line. The tighter the observations cluster around the line, the smaller the value of S. We expect about 95% of the observations to like within +/- 2S of their predicted value. In this example, we expect that most observations will lie within +/- 0.9 of their predicted value. Exercise.20 See JMP output below: a. The least squares line is: Predicted heat transfer = 0.21 + 2.4 x unflooded area. b. See plot above, at the top. c. SSE is 4.5 and S 2 is 0.206. d. S is 0.454. We expect about 95% of the observations to lie within +/- 2S of their predicted value. Thus, most of our observations would be expected to lie within +/- 0.91 of their predicted value.

Response H-transf Whole Model Regression Plot H-transf 8 7 6 5 4 2 1 1.5 2 2.5 Leverage Plot H-transf Leverage Residuals 7 6 5 4 2 1.0 1.5 2.0 2.5.0 Leverage, P<.0001 Actual by Predicted Plot 8 7 H-transf Actual 6 5 4 2 2 4 5 6 7 H-transf Predicted P<.0001 RSq=0.84 RMSE=0.458 Summary of Fit RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) Analysis of Variance Source Model Error C. Total Lack Of Fit Source Lack Of Fit Pure Error Total Error Term Intercept DF 1 22 2 DF 19 22 Parameter Estimates Effect Tests Source Sum of Squares 2.27921 4.51079 27.805000 Estimate 0.21892 2.426887 Nparm 1 0.87041 0.82964 0.45826 4.775 24 Sum of Squares.5210791 1.0100000 4.510791 DF 1 Residual by Predicted Plot 1.0 Std Error 0.49 0.228252 Sum of Squares 2.27921 Mean Square 2.279 0.2060 Mean Square 0.18520 0.6667 t Ratio 0.49 10.6 Prob> t 0.617 <.0001 F Ratio 11.002 F Ratio 11.002 Prob > F <.0001 F Ratio 0.5505 Prob > F 0.8217 Max RSq 0.967 Prob > F <.0001 H-transf Residual 0.5 0.0-0.5-1.0 2 4 5 6 7 H-transf Predicted

Exercise.22 There is sufficient evidence to conclude that sale price increases with assessed values for homes in Tampa. The slope is estimated to be 0.8845 and the test of hypothesis that slope is equal to zero versus the alternative that the slope is different from zero leads to concluding H a at the 0.01 level (in fact, the p-value for the test is below 0.001). The 95% confidence interval for the true slope is 0.84 to 0.9. In English this can be interpreted as follows: with high probability, the interval 0.84 to 0.9 covers the true value of the slope. Thus, we are 95% confident that the true slope is no smaller than 0.84 or larger than 0.9. The interval does not include the value zero and thus we are quite confident that the true slope is not equal to zero. A narrower confidence interval can be obtained in one of two ways: By decreasing the coverage level from 95% to perhaps 90%, but that would decrease our confidence in our conclusions. By increasing the sample size to include data on more homes. If n increases, S is likely to decrease and thus the 95% CI for the true slope can be expected to be narrower. Exercise.2 A 90% CI for the true slope is given by b 1 +/- t α/2, n-2 S b1. In orange juice example, b1 = -0.002 and S b1 = 0.000905. For α= 0.10, and n-2 = 22, we get: -0.002 +/- 1.717 x 0.000905 = (-0.008 to -0.00074). The 90% confidence interval does not cover 0. Thus we conclude that we have evidence to say that pectin in orange juice is linearly associated to sweetness. The higher the pectin content the lower the sweetness as measured by the index.