Models with qualitative explanatory variables p216

Similar documents
Model Building Chap 5 p251

Confidence Interval for the mean response

Multiple Regression Examples

Analysis of Bivariate Data

INFERENCE FOR REGRESSION

School of Mathematical Sciences. Question 1

23. Inference for regression

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Orthogonal contrasts for a 2x2 factorial design Example p130

Lecture 18: Simple Linear Regression

Q Lecture Introduction to Regression

STAT 360-Linear Models

Steps for Regression. Simple Linear Regression. Data. Example. Residuals vs. X. Scatterplot. Make a Scatter plot Does it make sense to plot a line?

(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.

Inferences for linear regression (sections 12.1, 12.2)

SMAM 314 Exam 42 Name

Inference for the Regression Coefficient

The simple linear regression model discussed in Chapter 13 was written as

SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot.

Correlation & Simple Regression

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

Histogram of Residuals. Residual Normal Probability Plot. Reg. Analysis Check Model Utility. (con t) Check Model Utility. Inference.

Simple Linear Regression. Steps for Regression. Example. Make a Scatter plot. Check Residual Plot (Residuals vs. X)

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

STA220H1F Term Test Oct 26, Last Name: First Name: Student #: TA s Name: or Tutorial Room:

1 Introduction to Minitab

SMAM 319 Exam 1 Name. 1.Pick the best choice for the multiple choice questions below (10 points 2 each)

Ch 13 & 14 - Regression Analysis

Interpreting the coefficients

SMAM 314 Practice Final Examination Winter 2003

Apart from this page, you are not permitted to read the contents of this question paper until instructed to do so by an invigilator.

1. Least squares with more than one predictor

School of Mathematical Sciences. Question 1. Best Subsets Regression

Stat 501, F. Chiaromonte. Lecture #8

Basic Business Statistics, 10/e

Simple Linear Regression: A Model for the Mean. Chap 7

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables

Introduction to Regression

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

MBA Statistics COURSE #4

Contents. 2 2 factorial design 4

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Institutionen för matematik och matematisk statistik Umeå universitet November 7, Inlämningsuppgift 3. Mariam Shirdel

(1) The explanatory or predictor variables may be qualitative. (We ll focus on examples where this is the case.)

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3

Conditions for Regression Inference:

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Examination paper for TMA4255 Applied statistics

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Introduction to Regression

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice

28. SIMPLE LINEAR REGRESSION III

Multiple Regression Methods

EXAM IN TMA4255 EXPERIMENTAL DESIGN AND APPLIED STATISTICAL METHODS

Multiple Linear Regression

TMA4255 Applied Statistics V2016 (5)

This document contains 3 sets of practice problems.

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Multiple Regression: Chapter 13. July 24, 2015

Oregon Hill Wireless Survey Regression Model and Statistical Evaluation. Sky Huvard

STAB27-Winter Term test February 18,2006. There are 14 pages including this page. Please check to see you have all the pages.

Chapter 12: Multiple Regression

STAT 212 Business Statistics II 1

STATISTICS 110/201 PRACTICE FINAL EXAM

Final Exam - Solutions

Review of Regression Basics

Six Sigma Black Belt Study Guides

Stat 231 Final Exam. Consider first only the measurements made on housing number 1.

Chapter 10. Correlation and Regression. Lecture 1 Sections:

Concordia University (5+5)Q 1.

Final Exam Bus 320 Spring 2000 Russell

Question Possible Points Score Total 100

Data Set 8: Laysan Finch Beak Widths

Solution: X = , Y = = = = =

Ph.D. Preliminary Examination Statistics June 2, 2014

1 Use of indicator random variables. (Chapter 8)

Is economic freedom related to economic growth?

Multiple Regression an Introduction. Stat 511 Chap 9

Inference for Regression Inference about the Regression Model and Using the Regression Line

Chapter 14 Multiple Regression Analysis

MULTIPLE REGRESSION METHODS

[ ESS ESS ] / 2 [ ] / ,019.6 / Lab 10 Key. Regression Analysis: wage versus yrsed, ex

EX1. One way ANOVA: miles versus Plug. a) What are the hypotheses to be tested? b) What are df 1 and df 2? Verify by hand. , y 3

1. An article on peanut butter in Consumer reports reported the following scores for various brands

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test, October 2013

MULTIPLE LINEAR REGRESSION IN MINITAB

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

LEARNING WITH MINITAB Chapter 12 SESSION FIVE: DESIGNING AN EXPERIMENT

STAT 212: BUSINESS STATISTICS II Third Exam Tuesday Dec 12, 6:00 PM

STAT 213 Interactions in Two-Way ANOVA

Contents. TAMS38 - Lecture 10 Response surface. Lecturer: Jolanta Pielaszkiewicz. Response surface 3. Response surface, cont. 4

Business 320, Fall 1999, Final

Correlation and Linear Regression

MISCELLANEOUS REGRESSION TOPICS

22S39: Class Notes / November 14, 2000 back to start 1

Stat 529 (Winter 2011) A simple linear regression (SLR) case study. Mammals brain weights and body weights

MORE ON MULTIPLE REGRESSION

Transcription:

Models with qualitative explanatory variables p216 Example gen = 1 for female Row gpa hsm gen 1 3.32 10 0 2 2.26 6 0 3 2.35 8 0 4 2.08 9 0 5 3.38 8 0 6 3.29 10 0 7 3.21 8 0 8 2.00 3 0 9 3.18 9 0 10 2.34 7 0 11 3.08 9 0 218 2.86 9 1 219 3.32 10 1 220 2.07 9 1 221 0.85 7 1 222 1.86 7 1 223 2.59 5 1 224 2.28 9 1 1

gpa = 0.903 + 0.207 hsm + 0.0269 gen Constant 0.9029 0.2447 3.69 0.000 hsm 0.20704 0.02885 7.18 0.000 gen 0.02693 0.09874 0.27 0.785 S = 0.7043 R-Sq = 19.1% R-Sq(adj) = 18.3% Regression 2 25.847 12.923 26.06 0.000 Residual Error 221 109.616 0.496 Total 223 135.463 2

If the qualitative variable had more than two levels (say, l levels) introduce l-1 dummy variables. Example Length = length of stay in hospital (days) Nnurses = Number of nurses Region : There are 4 regions: NC, NE, S and W Row length nnurses region NC NE S 1 7.13 241 W 0 0 0 2 8.82 52 NC 1 0 0 3 8.34 54 S 0 0 1 4 8.95 148 W 0 0 0 5 11.20 151 NE 0 1 0 6 9.76 106 NC 1 0 0 7 9.68 129 S 0 0 1 8 11.18 360 NC 1 0 0 9 8.67 118 S 0 0 1 109 11.80 469 NC 1 0 0 110 9.50 46 S 0 0 1 111 7.70 136 W 0 0 0 112 17.94 407 NE 0 1 0 113 9.41 22 S 0 0 1 length = 7.52 + 0.00401 nnurses + 1.42 NC + 2.80 NE + 1.03 S Constant 7.5218 0.4272 17.61 0.000 nnurses 0.004010 0.001083 3.70 0.000 NC 1.4178 0.4869 2.91 0.004 NE 2.8028 0.4988 5.62 0.000 S 1.0256 0.4744 2.16 0.033 S = 1.585 R-Sq = 33.7% R-Sq(adj) = 31.3% Regression 4 138.000 34.500 13.74 0.000 Residual Error 108 271.211 2.511 Total 112 409.210 3

Predicted Values (at nnurses = 150, NC = 1, NE =0, S = 0) Fit StDev Fit 95.0% CI 95.0% PI 9.541 0.283 ( 8.981, 10.102) ( 6.350, 12.732) MINITAB Commands 4

5

Example Row Blue Green Lemon Insects trapped 1 0 0 1 45 2 0 0 1 59 3 0 0 1 48 4 0 0 1 46 5 0 0 1 38 6 0 0 1 47 7 0 0 0 21 8 0 0 0 12 9 0 0 0 14 10 0 0 0 17 11 0 0 0 13 12 0 0 0 17 13 0 1 0 37 14 0 1 0 32 15 0 1 0 15 16 0 1 0 25 17 0 1 0 39 18 0 1 0 41 19 1 0 0 16 20 1 0 0 11 21 1 0 0 20 22 1 0 0 21 23 1 0 0 14 24 1 0 0 7 Test whether some colors are more attractive than others to beetles. Insects trapped = 15.7-0.83 Blue + 15.8 Green + 31.5 Lemon Constant 15.667 2.770 5.66 0.000 Blue -0.833 3.917-0.21 0.834 Green 15.833 3.917 4.04 0.001 Lemon 31.500 3.917 8.04 0.000 S = 6.784 R-Sq = 82.1% R-Sq(adj) = 79.4% Regression 3 4218.5 1406.2 30.55 0.000 Residual Error 20 920.5 46.0 Total 23 5139.0 6

A test for comparing nested models p231 Definition Two models are nested if one model contains all the terms of the second model and at least one additional term. The model with more terms is called the complete (or full) model. The model with fewer terms is called the reduced (or restricted) model. Example 4.10 p 223 (Data from Table 4.4 p214) Row wt distance cost wt*dist wt**2 diast**2 1 5.90 47 2.6 277.3 34.8100 2209 2 3.20 145 3.9 464.0 10.2400 21025 3 4.40 202 8.0 888.8 19.3600 40804 4 6.60 160 9.2 1056.0 43.5600 25600 5 0.75 280 4.4 210.0 0.5625 78400 6 0.70 80 1.5 56.0 0.4900 6400 7 6.50 240 14.5 1560.0 42.2500 57600 8 4.50 53 1.9 238.5 20.2500 2809 9 0.60 100 1.0 60.0 0.3600 10000 10 7.50 190 14.0 1425.0 56.2500 36100 11 5.10 240 11.0 1224.0 26.0100 57600 12 2.40 209 5.0 501.6 5.7600 43681 13 0.30 160 2.0 48.0 0.0900 25600 14 6.20 115 6.0 713.0 38.4400 13225 15 2.70 45 1.1 121.5 7.2900 2025 16 3.50 250 8.0 875.0 12.2500 62500 17 4.10 95 3.3 389.5 16.8100 9025 18 8.10 160 12.1 1296.0 65.6100 25600 19 7.00 260 15.5 1820.0 49.0000 67600 20 1.10 90 1.7 99.0 1.2100 8100 a) Fit a complete second order model. 7

cost = 0.827-0.609 wt + 0.00402 dist + 0.00733 wt*dist + 0.0898 wt**2 + 0.000015 dist**2 Constant 0.8270 0.7023 1.18 0.259 wt -0.6091 0.1799-3.39 0.004 dist 0.004021 0.007998 0.50 0.623 wt*dist 0.0073271 0.0006374 11.49 0.000 wt**2 0.08975 0.02021 4.44 0.001 dist**2 0.00001507 0.00002243 0.67 0.513 S = 0.4428 R-Sq = 99.4% R-Sq(adj) = 99.2% Regression 5 449.341 89.868 458.39 0.000 Residual Error 14 2.745 0.196 Total 19 452.086 Source DF Seq SS wt 1 270.553 dist 1 143.631 wt*dist 1 31.268 wt**2 1 3.800 dist**2 1 0.088 Test the hypothesis that the terms wt**2 and dist**2 can be dropped from the model. 8

cost = - 0.141 + 0.019 wt + 0.00772 distance + 0.00780 wt*dist Constant -0.1405 0.6481-0.22 0.831 wt 0.0191 0.1582 0.12 0.905 distance 0.007721 0.003906 1.98 0.066 wt*dist 0.0077957 0.0008977 8.68 0.000 S = 0.6439 R-Sq = 98.5% R-Sq(adj) = 98.3% Regression 3 445.45 148.48 358.15 0.000 Residual Error 16 6.63 0.41 Total 19 452.09 Ex 5.14, p270, 5.15, p271 9

Examples (STA221 Apr 98 Final Exam) Weight = 0.0265-0.0729 Diameter + 0.0628 diam**2 Constant 0.02652 0.02133 1.24 0.240 Diameter -0.07287 0.01553-4.69 0.001 diam**2 0.062755 0.002609 24.06 0.000 S = 0.01117 R-Sq = 99.9% R-Sq(adj) = 99.9% Regression 2 1.32300 0.66150 5299.38 0.000 Residual Error 11 0.00137 0.00012 Total 13 1.32437 Source DF Seq SS Diameter 1 1.25077 diam**2 1 0.07223 The test of significance for the contribution of the second order term in diameter has an F-value of (to the nearest 50) A) 7600 B) 5300 C) 2650 D) 600 E) 350 Weight = - 0.237 + 0.447 Diameter - 0.150 Height Constant -0.23658 0.06340-3.73 0.003 Diameter 0.44689 0.03921 11.40 0.000 Height -0.15043 0.03622-4.15 0.002 S = 0.05104 R-Sq = 97.8% R-Sq(adj) = 97.4% Regression 2 1.29571 0.64786 248.65 0.000 Residual Error 11 0.02866 0.00261 Total 13 1.32437 10

Weight = 0.0216-0.151 Diameter + 0.0467 Height + 0.0721 diam**2-0.00290 ht**2 Constant 0.02156 0.03595 0.60 0.563 Diameter -0.15141 0.04792-3.16 0.012 Height 0.04666 0.03762 1.24 0.246 diam**2 0.072104 0.006467 11.15 0.000 ht**2-0.002898 0.004179-0.69 0.505 S = 0.01057 R-Sq = 99.9% R-Sq(adj) = 99.9% Regression 4 1.32336 0.33084 2958.44 0.000 Residual Error 9 0.00101 0.00011 Total 13 1.32437 Source DF Seq SS Diameter 1 1.25077 Height 1 0.04494 diam**2 1 0.02760 ht**2 1 0.00005 Residuals Versus Weight (response is Weight) 0.02 0.01 Residual 0.00-0.01-0.02 0.0 0.5 Weight 1.0 11

Residuals Versus Height (response is Weight) 0.02 0.01 Residual 0.00-0.01-0.02 2 3 4 5 6 Height Residuals Versus the Fitted Values (response is Weight) 0.02 0.01 Residual 0.00-0.01-0.02 0.0 0.5 1.0 Fitted Value Normal Probability Plot of the Residuals (response is Weight) 2 1 Normal Score 0-1 -2-0.02-0.01 0.00 0.01 0.02 Residual 12

Histogram of the Residuals (response is Weight) 4 3 Frequency 2 1 0-0.015-0.010-0.005 0.000 0.005 0.010 0.015 0.020 Residual Weight = 0.117 + 0.0982 Diameter - 0.159 Height + 0.0513 diam*ht Constant 0.11742 0.08189 1.43 0.182 Diameter 0.09820 0.07567 1.30 0.224 Height -0.15942 0.02090-7.63 0.000 diam*ht 0.05133 0.01063 4.83 0.001 S = 0.02934 R-Sq = 99.4% R-Sq(adj) = 99.2% Regression 3 1.31577 0.43859 509.61 0.000 Residual Error 10 0.00861 0.00086 Total 13 1.32437 7) Which of the following are true? I) If we test the extra contribution of both height and height squared to the model with only diameter and diameter squared, the calculated F-statistics would be lass than 2. II) If we test the extra contribution of adding both height squared and diameter squared to the to the first order model with just height and diameter, the calculated F-statistic is lass than 200 III) If we assume the appropriateness of the model with diameter, height and their product, we see that the effect on dry weight of an increase in diameter is not independent of the height of the trees. IV) Residual plots indicate problems with the second order model containing diameter, height and their respective squares. 13

-Sequential Sums of Squares Regression Analysis: cost versus wt, distance, wt*dist, wt**2, dist**2 cost = 0.827-0.609 wt + 0.00402 distance + 0.00733 wt*dist + 0.0898 wt**2 + 0.000015 dist**2 Predictor Coef SE Coef T P Constant 0.8270 0.7023 1.18 0.259 wt -0.6091 0.1799-3.39 0.004 distance 0.004021 0.007998 0.50 0.623 wt*dist 0.0073271 0.0006374 11.49 0.000 wt**2 0.08975 0.02021 4.44 0.001 dist**2 0.00001507 0.00002243 0.67 0.513 S = 0.442778 R-Sq = 99.4% R-Sq(adj) = 99.2% Regression 5 449.341 89.868 458.39 0.000 Residual Error 14 2.745 0.196 Total 19 452.086 Source DF Seq SS wt 1 270.553 distance 1 143.631 wt*dist 1 31.268 wt**2 1 3.800 dist**2 1 0.088 Regression Analysis: cost versus wt cost = 0.28 + 1.49 wt Predictor Coef SE Coef T P 14

Constant 0.276 1.368 0.20 0.842 wt 1.4932 0.2883 5.18 0.000 S = 3.17571 R-Sq = 59.8% R-Sq(adj) = 57.6% Regression 1 270.55 270.55 26.83 0.000 Residual Error 18 181.53 10.09 Total 19 452.09 Regression Analysis: cost versus wt, distance cost = - 4.67 + 1.29 wt + 0.0369 distance Predictor Coef SE Coef T P Constant -4.6728 0.8911-5.24 0.000 wt 1.2924 0.1378 9.38 0.000 distance 0.036936 0.004602 8.03 0.000 S = 1.49314 R-Sq = 91.6% R-Sq(adj) = 90.6% Regression 2 414.18 207.09 92.89 0.000 Residual Error 17 37.90 2.23 Total 19 452.09 Regression Analysis: cost versus wt, distance, wt*dist cost = - 0.141 + 0.019 wt + 0.00772 distance + 0.00780 wt*dist 15

Predictor Coef SE Coef T P Constant -0.1405 0.6481-0.22 0.831 wt 0.0191 0.1582 0.12 0.905 distance 0.007721 0.003906 1.98 0.066 wt*dist 0.0077957 0.0008977 8.68 0.000 S = 0.643880 R-Sq = 98.5% R-Sq(adj) = 98.3% Regression 3 445.45 148.48 358.15 0.000 Residual Error 16 6.63 0.41 Total 19 452.09 Regression Analysis: cost versus wt, distance, wt*dist, wt**2 cost = 0.475-0.578 wt + 0.00908 distance + 0.00726 wt*dist + 0.0867 wt**2 Predictor Coef SE Coef T P Constant 0.4747 0.4585 1.04 0.317 wt -0.5782 0.1707-3.39 0.004 distance 0.009078 0.002654 3.42 0.004 wt*dist 0.0072587 0.0006176 11.75 0.000 wt**2 0.08674 0.01934 4.49 0.000 S = 0.434604 R-Sq = 99.4% R-Sq(adj) = 99.2% Regression 4 449.25 112.31 594.62 0.000 Residual Error 15 2.83 0.19 Total 19 452.09 16

Regression Analysis: cost versus wt, distance, wt*dist, wt**2, dist**2 cost = 0.827-0.609 wt + 0.00402 distance + 0.00733 wt*dist + 0.0898 wt**2 + 0.000015 dist**2 Predictor Coef SE Coef T P Constant 0.8270 0.7023 1.18 0.259 wt -0.6091 0.1799-3.39 0.004 distance 0.004021 0.007998 0.50 0.623 wt*dist 0.0073271 0.0006374 11.49 0.000 wt**2 0.08975 0.02021 4.44 0.001 dist**2 0.00001507 0.00002243 0.67 0.513 S = 0.442778 R-Sq = 99.4% R-Sq(adj) = 99.2% Regression 5 449.341 89.868 458.39 0.000 Residual Error 14 2.745 0.196 Total 19 452.086 17