Steps for Regression. Simple Linear Regression. Data. Example. Residuals vs. X. Scatterplot. Make a Scatter plot Does it make sense to plot a line?

Similar documents
Histogram of Residuals. Residual Normal Probability Plot. Reg. Analysis Check Model Utility. (con t) Check Model Utility. Inference.

Simple Linear Regression. Steps for Regression. Example. Make a Scatter plot. Check Residual Plot (Residuals vs. X)

Confidence Interval for the mean response

Model Building Chap 5 p251

Multiple Regression Examples

Analysis of Bivariate Data

23. Inference for regression

1. An article on peanut butter in Consumer reports reported the following scores for various brands

Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Models with qualitative explanatory variables p216

INFERENCE FOR REGRESSION

SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot.

Institutionen för matematik och matematisk statistik Umeå universitet November 7, Inlämningsuppgift 3. Mariam Shirdel

Correlation & Simple Regression

Six Sigma Black Belt Study Guides

1 Introduction to Minitab

Inferences for linear regression (sections 12.1, 12.2)

Orthogonal contrasts for a 2x2 factorial design Example p130

SMAM 314 Practice Final Examination Winter 2003

[4+3+3] Q 1. (a) Describe the normal regression model through origin. Show that the least square estimator of the regression parameter is given by

Multiple Regression Methods

MULTIPLE REGRESSION METHODS

Stat 501, F. Chiaromonte. Lecture #8

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

28. SIMPLE LINEAR REGRESSION III

School of Mathematical Sciences. Question 1. Best Subsets Regression

22S39: Class Notes / November 14, 2000 back to start 1

SMAM 314 Exam 42 Name

(4) 1. Create dummy variables for Town. Name these dummy variables A and B. These 0,1 variables now indicate the location of the house.

Inference for Regression Inference about the Regression Model and Using the Regression Line

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

Chapter 12: Multiple Regression

Correlation and Regression

Basic Business Statistics 6 th Edition

Conditions for Regression Inference:

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

STAT 360-Linear Models

Concordia University (5+5)Q 1.

Lecture 18: Simple Linear Regression

MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS. F. Chiaromonte 1

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

This document contains 3 sets of practice problems.

Simple Linear Regression: A Model for the Mean. Chap 7

Data Set 8: Laysan Finch Beak Widths

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables

Ch 13 & 14 - Regression Analysis

Introduction to Regression

Chapter 14. Multiple Regression Models. Multiple Regression Models. Multiple Regression Models

Chapter 14 Multiple Regression Analysis

Introduction to Regression

ANOVA: Analysis of Variation

Multiple Regression an Introduction. Stat 511 Chap 9

AP Statistics. The only statistics you can trust are those you falsified yourself. RE- E X P R E S S I N G D A T A ( P A R T 2 ) C H A P 9

Data Set 1A: Algal Photosynthesis vs. Salinity and Temperature

Ph.D. Preliminary Examination Statistics June 2, 2014

The simple linear regression model discussed in Chapter 13 was written as

Multiple Linear Regression

Examination paper for TMA4255 Applied statistics

ANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College

SMAM 319 Exam1 Name. a B.The equation of a line is 3x + y =6. The slope is a. -3 b.3 c.6 d.1/3 e.-1/3

General Linear Model (Chapter 4)

MBA Statistics COURSE #4

MULTIPLE LINEAR REGRESSION IN MINITAB

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

1. Least squares with more than one predictor

Basic Business Statistics, 10/e

W&M CSCI 688: Design of Experiments Homework 2. Megan Rose Bryant

STAB27-Winter Term test February 18,2006. There are 14 pages including this page. Please check to see you have all the pages.

Q Lecture Introduction to Regression

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Chapter 9. Correlation and Regression

SMAM 319 Exam 1 Name. 1.Pick the best choice for the multiple choice questions below (10 points 2 each)

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns

Nonlinear Regression Functions

Business 320, Fall 1999, Final

Apart from this page, you are not permitted to read the contents of this question paper until instructed to do so by an invigilator.

Ch Inference for Linear Regression

Lecture 18 Miscellaneous Topics in Multiple Regression

[ ESS ESS ] / 2 [ ] / ,019.6 / Lab 10 Key. Regression Analysis: wage versus yrsed, ex

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Multiple Regression: Chapter 13. July 24, 2015

1 Introduction to One-way ANOVA

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

AP Statistics Unit 6 Note Packet Linear Regression. Scatterplots and Correlation

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Statistical Modelling in Stata 5: Linear Models

Final Exam Bus 320 Spring 2000 Russell

Stat 529 (Winter 2011) A simple linear regression (SLR) case study. Mammals brain weights and body weights

Nonlinear relationships Richard Williams, University of Notre Dame, Last revised February 20, 2015

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

A discussion on multiple regression models

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Stat 231 Final Exam. Consider first only the measurements made on housing number 1.

assumes a linear relationship between mean of Y and the X s with additive normal errors the errors are assumed to be a sample from N(0, σ 2 )

Chapter 14 Student Lecture Notes 14-1

Is economic freedom related to economic growth?

Sociology 6Z03 Review II

Review of Regression Basics

(1) The explanatory or predictor variables may be qualitative. (We ll focus on examples where this is the case.)

Transcription:

Steps for Regression Simple Linear Regression Make a Scatter plot Does it make sense to plot a line? Check Residual Plot (Residuals vs. X) Are there any patterns? Check Histogram of Residuals Is it Normal? Check Model Utility Make Interferences 1 2 Example In order to design an efficient incinerator of municipal waste, information about the energy content of types of waste is necessary. Data % plastics by weight Energy Content (kcal/kg) % plastics by weight Energy Content (kcal/kg) 18.69 947 18.28 1334 19.43 1407 21.41 11 19.24 142 2.11 143 22.64 13 21.04 1278 16.4 989 17.99 113 21.44 1162 18.73 122 19.3 1466 18.49 1237 23.97 166 22.08 1327 21.4 124 14.28 1229 20.34 1336 17.74 120 17.03 1097 20.4 1221 21.03 1266 18.2 1138 20.49 1401 19.09 129 20.4 1223 21.2 1391 18.81 1216 21.62 1372 3 4 Scatterplot Residuals vs. X 6 1

Histogram of Residuals Residual Normal Probability Plot 7 8 Reg. Analysis Check Model Utility (con t) Check Model Utility Regression Analysis The regression equation is Energy Content (kcal/kg) = 469 + 40.8 % plastics by weight Predictor Coef StDev T P Constant 468.9 211.6 2.22 0.03 % plasti 40.82 10.7 3.86 0.001 Regression Analysis Analysis of Variance Source DF SS MS F P Regression 1 23973 23973 14.92 0.001 Resid. Error 28 44997 16071 Total 29 689710 S = 126.8 R-Sq = 34.8% R-Sq(adj) = 32.4% 9 10 Inference What is the proportion of variation of energy content explained by the percentage of plastic in the waste? What is the correlation between energy content and percentage of plastic in the waste? For every percent increase in plastic in municipal waste, how much of an increase or decrease do you expect in energy content? Inference(con t) Can we make inferences about the energy content of waste that is 7% plastic by weight? What is the average energy content expected at 22% plastic by weight? (Give interval.) What is the expected energy content of the next observation that contains 22% plastic waste? (Give Interval.) 11 12 2

Transformed x s and y s Section 13.2 What if it doesn t pass the residual test? We can transform the x s or the y s and make a linear regression line with the new x and new y. A function relating y to x is intrinsically linear if by means of a transformation on x and / or y, the function can be expressed as where y is a transformed independent variable and x is a transformed dependent variable. 13 14 Types of Transformations Function Transformations Linear Form y = α * e βx y =ln(y) y' = ln( α ) + βx y = αx β y =ln(y), x =ln(x) y ' = ln(α) + βx y = α + β * log( x) x = ln(x) y = α + βx y = α + β * 1 x x = 1 y = α + βx x Notes 1. We can estimate the Betahat values by using the same least squares regression formulas. 2. The r 2 refers to proportion of variation that the new ys are explained by the new xs. 3. To make CI and PI, the transformed errors need to be approx. Normal. Example: Use trial and error or theory to find the appropriate transformation. 1 16 Example Scatter Plot (orginal x s and y s) A Tortilla Chip maker would like to make the optimal tortilla chip. They would like to have the chip that has the most appealing texture. X=frying time(sec) Y=% moisture content X 10 1 20 2 30 4 60 Y 16.3 9.7 8.1 4.2 3.4 2.9 1.9 1.3 17 18 3

Residuals Scatterplot(ln(x), y) 19 20 Scatterplot (1/x, y) Scatterplot(x, ln(y)) 21 22 Scatterplot(lnx, lny) Scatterplot(logx), log(y)) 23 24 4

Decision Time Two reasonable choices ln(x), ln(y) 1/x, y Look at the Normal Probability plots. Reg Output Regression Analysis: ln(%) versus ln(fry) The regression equation is ln(%) = 4.64-1.0 ln(fry) Predictor Coef SE Coef T P Constant 4.6384 0.2110 21.98 0.000 ln(fry) -1.04920 0.06786-1.46 0.000 S = 0.1449 R-Sq = 97.6% R-Sq(adj) = 97.1% 2 26 Output(con t) Analysis of Variance Source DF SS MS F P Regres 1.0199.0199 239.06 0.000 Resi Err 6 0.1260 0.0210 Total 7.148 Polynomial Regression 13.3 27 28 What do we do if the scatterplot is not linear? In section 13.2, we fixed this with intrinsically linear transformations. If the plot has any peaks or valleys, the transformations will not work. You can know try to fit a polynomial with X 2 and X 3 terms. You could fit higher order terms, but it is strongly discouraged!! Equation y = β + β x+ β 2x2+ β x3+... + e 0 1 2 3 The errors have to have a mean of zero and a constant variance. The Beta terms are estimated using the method of least squares. We will always use Minitab to fit the curve. 29 30

Example Scatterplot A company wants to improve the fermentation process of their malt liquor. Below is the data. X=fermentation time(days) Y=glucose concentration X 1 2 3 4 6 7 8 Y 74 4 2 1 2 3 8 71 31 32 Residual Plot Fixes Try fitting a squared term If that doesn t work, add a cubic term. Keep going until you have a good fit. You want to have the smallest amount of terms possible. 33 34 Fitted Line with Squared Term Fitted Line with Cubed Term 3 36 6

What is R 2 and R 2 (adjusted)? R 2 is measuring how much of the variation of y is explained by all x terms. R 2 gets bigger every time you add a term. So, R 2 with a cubic would appear to be automatically better than a squared term. However, simplicity is better. So, R 2 (adjusted) takes out the automatic inflation. Moral: If you have multiple x s, use the R 2 (adjusted). If you have a single x, use the R 2. Regression Output Regression Analysis: y versus x, xsquared The regression equation is y = 84. - 1.9 x + 1.77 xsquared Predictor Coef SE Coef T P Constant 84.482 4.904 17.23 0.000 x -1.87 2.00-6.3 0.001 xsquared 1.7679 0.2712 6.2 0.001 S = 3.1 R-Sq = 89.% R-Sq(adj) = 8.3%???Model Utility Test??? 37 38 ANOVA Table Analysis of Variance Source DF SS MS F P Regress 2 2.11 262. 21.2 0.004 Resid Error 61.77 12.3 Total 7 86.87 Source DF Seq SS x 1 0.0 xsquared 1 2.0 Inferences Test Statistic Formula: Test for the Beta Terms: You have to use this table now to conduct the model utility test. 39 40 Confidence Intervals Confidence Interval Formula Intervals Confidence Interval Formula: For the Beta Terms Prediction Interval Formula: 41 42 7