Simple Linear Regression. Steps for Regression. Example. Make a Scatter plot. Check Residual Plot (Residuals vs. X)

Similar documents
Histogram of Residuals. Residual Normal Probability Plot. Reg. Analysis Check Model Utility. (con t) Check Model Utility. Inference.

Steps for Regression. Simple Linear Regression. Data. Example. Residuals vs. X. Scatterplot. Make a Scatter plot Does it make sense to plot a line?

Confidence Interval for the mean response

1. An article on peanut butter in Consumer reports reported the following scores for various brands

Multiple Regression Examples

Model Building Chap 5 p251

SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot.

Analysis of Bivariate Data

23. Inference for regression

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Institutionen för matematik och matematisk statistik Umeå universitet November 7, Inlämningsuppgift 3. Mariam Shirdel

INFERENCE FOR REGRESSION

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Correlation & Simple Regression

Models with qualitative explanatory variables p216

Multiple Regression Methods

SMAM 314 Practice Final Examination Winter 2003

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Six Sigma Black Belt Study Guides

1 Introduction to Minitab

MULTIPLE REGRESSION METHODS

School of Mathematical Sciences. Question 1. Best Subsets Regression

Chapter 12: Multiple Regression

28. SIMPLE LINEAR REGRESSION III

Inferences for linear regression (sections 12.1, 12.2)

1. Least squares with more than one predictor

SMAM 314 Exam 42 Name

Orthogonal contrasts for a 2x2 factorial design Example p130

Inference for Regression Inference about the Regression Model and Using the Regression Line

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

Stat 501, F. Chiaromonte. Lecture #8

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

ANOVA: Analysis of Variation

22S39: Class Notes / November 14, 2000 back to start 1

Introduction to Regression

(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.

Conditions for Regression Inference:

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Simple Linear Regression: A Model for the Mean. Chap 7

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

Correlation and Regression

Basic Business Statistics 6 th Edition

MULTIPLE LINEAR REGRESSION IN MINITAB

Data Set 8: Laysan Finch Beak Widths

STAT 360-Linear Models

Multiple Linear Regression

Concordia University (5+5)Q 1.

Ch 13 & 14 - Regression Analysis

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

Lecture 18: Simple Linear Regression

Business 320, Fall 1999, Final

AP Statistics. The only statistics you can trust are those you falsified yourself. RE- E X P R E S S I N G D A T A ( P A R T 2 ) C H A P 9

Ph.D. Preliminary Examination Statistics June 2, 2014

MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS. F. Chiaromonte 1

This document contains 3 sets of practice problems.

The simple linear regression model discussed in Chapter 13 was written as

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3

General Linear Model (Chapter 4)

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns

MBA Statistics COURSE #4

Q Lecture Introduction to Regression

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

SMAM 319 Exam 1 Name. 1.Pick the best choice for the multiple choice questions below (10 points 2 each)

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables

Basic Business Statistics, 10/e

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Chapter 14 Multiple Regression Analysis

Introduction to Regression

Multiple Regression an Introduction. Stat 511 Chap 9

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

STAB27-Winter Term test February 18,2006. There are 14 pages including this page. Please check to see you have all the pages.

Chapter 9. Correlation and Regression

Data Set 1A: Algal Photosynthesis vs. Salinity and Temperature

Nonlinear Regression Functions

Lecture 18 Miscellaneous Topics in Multiple Regression

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

Is economic freedom related to economic growth?

Examination paper for TMA4255 Applied statistics

(1) The explanatory or predictor variables may be qualitative. (We ll focus on examples where this is the case.)

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

Interpreting coefficients for transformed variables

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Stat 529 (Winter 2011) A simple linear regression (SLR) case study. Mammals brain weights and body weights

Statistical Modelling in Stata 5: Linear Models

Department of Mathematics & Statistics STAT 2593 Final Examination 17 April, 2000

Stat 231 Final Exam. Consider first only the measurements made on housing number 1.

Final Exam Bus 320 Spring 2000 Russell

MORE ON MULTIPLE REGRESSION

W&M CSCI 688: Design of Experiments Homework 2. Megan Rose Bryant

Sociology 6Z03 Review II

Nonlinear relationships Richard Williams, University of Notre Dame, Last revised February 20, 2015

School of Mathematical Sciences. Question 1

STATISTICS 110/201 PRACTICE FINAL EXAM

Ch Inference for Linear Regression

Apart from this page, you are not permitted to read the contents of this question paper until instructed to do so by an invigilator.

Chapter 10. Correlation and Regression. Lecture 1 Sections:

Homework 2: Simple Linear Regression

[ ESS ESS ] / 2 [ ] / ,019.6 / Lab 10 Key. Regression Analysis: wage versus yrsed, ex

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Transcription:

Simple Linear Regression 1 Steps for Regression Make a Scatter plot Does it make sense to plot a line? Check Residual Plot (Residuals vs. X) Are there any patterns? Check Histogram of Residuals Is it Normal? Check Model Utility Make Interferences 2 Example In order to design an efficient incinerator of municipal waste, information about the energy content of types of waste is necessary. 3 1

Data % plastics by weight Energy Content (kcal/kg) % plastics by weight Energy Content (kcal/kg) 18.69 947 18.28 1334 19.43 1407 21.41 1155 19.24 1452 25.11 1453 22.64 1553 21.04 1278 16.54 989 17.99 1153 21.44 1162 18.73 1225 19.53 1466 18.49 1237 23.97 1656 22.08 1327 21.45 1254 14.28 1229 20.34 1336 17.74 1205 17.03 1097 20.54 1221 21.03 1266 18.25 1138 20.49 1401 19.09 1295 20.45 1223 21.25 1391 18.81 1216 21.62 1372 4 Scatterplot 5 Residuals vs. X 6 2

Histogram of Residuals 7 Residual Normal Probability Plot 8 Reg. Analysis Check Model Utility Regression Analysis The regression equation is Energy Content (kcal/kg) = 469 + 40.8 % plastics by weight Predictor Coef StDev T P Constant 468.9 211.6 2.22 0.035 % plasti 40.82 10.57 3.86 0.001 S = 126.8 R-Sq = 34.8% R-Sq(adj) = 32.4% 9 3

(con t) Check Model Utility Regression Analysis Analysis of Variance Source DF SS MS F P Regression 1 239735 239735 14.92 0.001 Resid. Error 28 449975 16071 Total 29 689710 10 Inference What is the proportion of variation of energy content explained by the percentage of plastic in the waste? What is the correlation between energy content and percentage of plastic in the waste? For every percent increase in plastic in municipal waste, how much of an increase or decrease do you expect in energy content? 11 Inference(con t) Can we make inferences about the energy content of waste that is 75% plastic by weight? What is the average energy content expected at 22% plastic by weight? (Give interval.) What is the expected energy content of the next observation that contains 22% plastic waste? (Give Interval.) 12 4

Transformed x s and y s Section 13.2 13 What if it doesn t pass the residual test? We can transform the x s or the y s and make a linear regression line with the new x and new y. A function relating y to x is intrinsically linear if by means of a transformation on x and / or y, the function can be expressed as where y is a transformed independent variable and x is a transformed dependent variable. 14 Types of Transformations Function Transformations Linear Form y = α * e βx y =ln(y) y' = ln( α ) + βx y = αx β y =ln(y), x =ln(x) y ' = ln(α) + βx y = α + β * log( x) x = ln(x) y = α + βx y = α + β * 1 x x = 1 y = α + βx x 15 5

Notes 1. We can estimate the Betahat values by using the same least squares regression formulas. 2. The r 2 refers to proportion of variation that the new ys are explained by the new xs. 3. To make CI and PI, the transformed errors need to be approx. Normal. Example: Use trial and error or theory to find the appropriate transformation. 16 Example A Tortilla Chip maker would like to make the optimal tortilla chip. They would like to have the chip that has the most appealing texture. X=frying time(sec) Y=% moisture content X 5 10 15 20 25 30 45 60 Y 16.3 9.7 8.1 4.2 3.4 2.9 1.9 1.3 17 Scatter Plot (orginal x s and y s) 18 6

Residuals 19 Scatterplot(ln(x), y) 20 Scatterplot (1/x, y) 21 7

Scatterplot(x, ln(y)) 22 Scatterplot(lnx, lny) 23 Scatterplot(logx), log(y)) 24 8

Decision Time Two reasonable choices ln(x), ln(y) 1/x, y Look at the Normal Probability plots. 25 Reg Output Regression Analysis: ln(%) versus ln(fry) The regression equation is ln(%) = 4.64-1.05 ln(fry) Predictor Coef SE Coef T P Constant 4.6384 0.2110 21.98 0.000 ln(fry) -1.04920 0.06786-15.46 0.000 S = 0.1449 R-Sq = 97.6% R-Sq(adj) = 97.1% 26 Output(con t) Analysis of Variance Source DF SS MS F P Regres 1 5.0199 5.0199 239.06 0.000 Resi Err 6 0.1260 0.0210 Total 7 5.1458 27 9

Predictions Predict the moisture content when the frying time is 61 sec. and 62 sec. You should be at least 94% confident in all of the statements. 28 Polynomial Regression 13.3 29 What do we do if the scatterplot is not linear? In section 13.2, we fixed this with intrinsically linear transformations. If the plot has any peaks or valleys, the transformations will not work. You can know try to fit a polynomial with X 2 and X 3 terms. You could fit higher order terms, but it is strongly discouraged!! 30 10

Equation y = β + β x+ β 2x2+ β x3+... + e 0 1 2 3 The errors have to have a mean of zero and a constant variance. The Beta terms are estimated using the method of least squares. We will always use Minitab to fit the curve. 31 Example A company wants to improve the fermentation process of their malt liquor. Below is the data. X=fermentation time(days) Y=glucose concentration X 1 2 3 4 5 6 7 8 Y 74 54 52 51 52 53 58 71 32 Scatterplot 33 11

Residual Plot 34 Fixes Try fitting a squared term If that doesn t work, add a cubic term. Keep going until you have a good fit. You want to have the smallest amount of terms possible. 35 Fitted Line with Squared Term 36 12

Fitted Line with Cubed Term 37 What is R 2 and R 2 (adjusted)? R 2 is measuring how much of the variation of y is explained by all x terms. R 2 gets bigger every time you add a term. So, R 2 with a cubic would appear to be automatically better than a squared term. However, simplicity is better. So, R 2 (adjusted) takes out the automatic inflation. Moral: If you have multiple x s, use the R 2 (adjusted). If you have a single x, use the R 2. 38 Regression Output Regression Analysis: y versus x, xsquared The regression equation is y = 84.5-15.9 x + 1.77 xsquared Predictor Coef SE Coef T P Constant 84.482 4.904 17.23 0.000 x -15.875 2.500-6.35 0.001 xsquared 1.7679 0.2712 6.52 0.001 S = 3.515 R-Sq = 89.5% R-Sq(adj) = 85.3%???Model Utility Test??? 39 13

ANOVA Table Analysis of Variance Source DF SS MS F P Regress 2 525.11 262.55 21.25 0.004 Resid Error 5 61.77 12.35 Total 7 586.87 Source DF Seq SS x 1 0.05 xsquared 1 525.05 You have to use this table now to conduct the model utility test. 40 Inferences Test Statistic Formula: Test for the Beta Terms: 41 Confidence Intervals Confidence Interval Formula For the Beta Terms 42 14

Intervals Confidence Interval Formula: Prediction Interval Formula: 43 Additional Output Obs x y Fit SE Fit Residual St Resid 1 1.00 74.00 70.38 2.96 3.62 1.91 2 2.00 54.00 59.80 1.86-5.80-1.95 3 3.00 52.00 52.77 1.69-0.77-0.25 4 4.00 51.00 49.27 1.86 1.73 0.58 5 5.00 52.00 49.30 1.86 2.70 0.90 6 6.00 53.00 52.88 1.69 0.12 0.04 7 7.00 58.00 59.98 1.86-1.98-0.66 8 8.00 71.00 70.62 2.96 0.38 0.20 44 One Additional Note Many Statisticians believe that it is a good practice to center the x s before you fit the equations. This helps tremendously with round off error. You then have to be really careful to transform everything back to its original state. We are not going to do an example of this. 45 15