Chapter 11: Simple Linear Regression and Correlation

Similar documents
Statistics for Economics & Business

The Ordinary Least Squares (OLS) Estimator

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Statistics MINITAB - Lab 2

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Comparison of Regression Lines

Statistics for Business and Economics

Chapter 14 Simple Linear Regression

Chapter 13: Multiple Regression

Basic Business Statistics, 10/e

STAT 3008 Applied Regression Analysis

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

18. SIMPLE LINEAR REGRESSION III

/ n ) are compared. The logic is: if the two

Introduction to Regression

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Chapter 15 - Multiple Regression

Learning Objectives for Chapter 11

28. SIMPLE LINEAR REGRESSION III

Chapter 9: Statistical Inference and the Relationship between Two Variables

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Chapter 15 Student Lecture Notes 15-1

x i1 =1 for all i (the constant ).

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

Statistics II Final Exam 26/6/18

Lecture 6: Introduction to Linear Regression

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

Linear Regression Analysis: Terminology and Notation

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Negative Binomial Regression

Economics 130. Lecture 4 Simple Linear Regression Continued

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Correlation and Regression

is the calculated value of the dependent variable at point i. The best parameters have values that minimize the squares of the errors

17 - LINEAR REGRESSION II

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Biostatistics 360 F&t Tests and Intervals in Regression 1

January Examinations 2015

Scatter Plot x

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

Lecture 4 Hypothesis Testing

Polynomial Regression Models

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

F statistic = s2 1 s 2 ( F for Fisher )

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

a. (All your answers should be in the letter!

Properties of Least Squares

SIMPLE LINEAR REGRESSION

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Lecture 3 Stat102, Spring 2007

STATISTICS QUESTIONS. Step by Step Solutions.

Chapter 8 Indicator Variables

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

IV. Modeling a Mean: Simple Linear Regression

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

β0 + β1xi. You are interested in estimating the unknown parameters β

Unit 10: Simple Linear Regression and Correlation

Regression. The Simple Linear Regression Model

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Chapter 5 Multilevel Models

Chapter 3. Two-Variable Regression Model: The Problem of Estimation

Midterm Examination. Regression and Forecasting Models

e i is a random error

Introduction to Generalized Linear Models

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

STAT 511 FINAL EXAM NAME Spring 2001

The SAS program I used to obtain the analyses for my answers is given below.

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

STAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression

Econ Statistical Properties of the OLS estimator. Sanjaya DeSilva

β0 + β1xi and want to estimate the unknown

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

Chapter 6. Supplemental Text Material

Chapter 10. What is Regression Analysis? Simple Linear Regression Analysis. Examples

This column is a continuation of our previous column

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Topic 7: Analysis of Variance

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

Statistics Chapter 4

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

T E C O L O T E R E S E A R C H, I N C.

RELIABILITY ASSESSMENT

Sociology 301. Bivariate Regression. Clarification. Regression. Liying Luo Last exam (Exam #4) is on May 17, in class.

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

Chapter 4: Regression With One Regressor

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Professor Chris Murray. Midterm Exam

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

Transcription:

Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests 11-4.2 Analyss of varance approach to test sgnfcance of regresson 11-5 Confdence Intervals 11-5.1 Confdence ntervals on the slope and ntercept 11-5.2 Confdence nterval on the mean response 11-6 Predcton of New Observatons 11-7 Adequacy of the Regresson Model 11-7.1 Resdual analyss 11-7.2 Coeffcent of determnaton (R 2 ) 11-8 Correlaton 11-9 Regresson on Transformed Varables 11-10 Logstc Regresson 1 Chapter Learnng Objectves After careful study of ths chapter you should be able to: 1. Use smple lnear regresson for buldng emprcal models to engneerng and scentfc data 2. Understand how the method of least squares s used to estmate the parameters n a lnear regresson model 3. Analyze resduals to determne f the regresson model s an adequate ft to the data or to see f any underlyng assumptons are volated 4. Test the statstcal hypotheses and construct confdence ntervals on the regresson model parameters 5. Use the regresson model to make a predcton of a future observaton and construct an approprate predcton nterval on the future observaton 6. Apply the correlaton model 7. Use smple transformatons to acheve a lnear regresson model 2

Emprcal Models Many problems n engneerng and scence nvolve explorng the relatonshps between two or more varables. Regresson analyss s a statstcal technque that s very useful for these types of problems. For example, n a chemcal process, suppose that the yeld of the product s related to the process-operatng temperature. Regresson analyss can be used to buld a model to predct yeld at a gven temperature level. 3 Emprcal Model - Example Data 4

Emprcal Model - Example Plot Fgure 11-1: Scatter dagram of oxygen purty versus hydrocarbon level from Table 11-1. 5 Smple Lnear Regresson Based on the scatter dagram, t s probably reasonable to assume that the mean of the random varable Y s related to x by the followng straght-lne relatonshp: where the slope and ntercept of the lne are called regresson coeffcents. The smple lnear regresson model s gven by where s the random error term. 6

Varance of Y = Varance of ε We thnk of the regresson model as an emprcal model. Suppose that the mean and varance of are 0 and 2, respectvely, then: The varance of Y gven x s: 7 Model of True Regresson Lne The true regresson model s a lne of mean values: where 1 can be nterpreted as the change n the mean of Y for a unt change n x (slope of the lne). The varablty of Y at a partcular value of x s determned by the error varance, 2. Ths mples there s a dstrbuton of Y-values at each x and that the varance of ths dstrbuton s the same at each x. 8

Dstrbuton of Y along Lne Fgure 11-2:The dstrbuton of Y for a gven value of x for the oxygen purty-hydrocarbon data. 9 Predctor and Response Varables The case of smple lnear regresson consders a sngle regressor or predctor x and a dependent or response varable Y. The expected value of Y at each level of x s a random varable: We assume that each observaton, Y, can be descrbed by the model: 10

Suppose that we have n pars of observatons (x 1, y 1 ), (x 2, y 2 ), (x n, y n ). The method of least squares s used to estmate the parameters, 0 and 1, by mnmzng the sum of the squares of the vertcal devatons. Fgure 11-3: Devatons of the data from the estmated regresson model. Method of Least Squares 11 Sum of Square Devatons Snce the n observatons n the sample can be expressed as: The sum of the squares of the devatons (errors) of the observatons from the true regresson lne s: 12

Least Squares Normal Equatons 13 Smple Lnear Regresson Coeffcents 14

Ftted Regresson Lne 15 16 n x x x x S n n n xx 2 1 1 2 1 2 n y x y x x x y y S n n n n xy 1 1 1 1 Sums of Squares The followng notaton may also be used: Then, xx xy S S 1 ˆ x y 1 0 ˆ ˆ and (11-10) (11-11)

Smple Lnear Regresson - Example Example 11-1 17 Example 11-1 (contnued) 18

Example 11-1 (contnued) Fgure 11-4: Scatter plot of oxygen purty y versus hydrocarbon level x and regresson model ŷ = 74.20 + 14.97x. 19 Computng 2 The error sum of squares s: It can be shown that the expected value of the error sum of squares s E(SS E ) = (n 2) 2. An unbased estmator of 2 s: where SS E can be easly computed usng: 20

21 Excel Data Analyss Tool Regresson output SUMMARY OUTPUT Regresson Statstcs Multple R 0.937 R Square 0.877 Adjusted R Square 0.871 Standard Error 1.087 Observatons 20.000 ANOVA df SS MS F Sgnfcance F Regresson 1 152.127 152.127 128.862 0.000 Resdual 18 21.250 1.181 Total 19 173.377 Coeffcents Standard Error t Stat P-value Intercept 74.283 1.593 46.617 0.000 X Varable 1 14.947 1.317 11.352 0.000 22

Propertes of Least Squares Estmators Slope propertes for the mean and varance (11-15) (11-16) Intercept propertes for the mean and varance (11-17) 23 Estmated Standard Errors In smple lnear regresson the estmated standard error of the slope and the estmated standard error of the ntercept are: se 2 2 ˆ ˆ 21 x seˆ 0 ˆ 1 S xx n S xx respectvely, where the estmated varance s computed usng Equaton 11-13. 24

Hypothess Test for the Slope If we wsh to test the slope s some value β 1,0 : (11-18) An approprate test statstc would be: ˆ 1 ˆ 1,0 1 T0 2 ˆ S se ˆ We would reject the null hypothess f: XX 1 1,0 (11-19) (11-20) 25 Hypothess Test for the Intercept If we wsh to test the ntercept s some value β 0,0 : (11-21) An approprate test statstc would be: (11-22) We would reject the null hypothess f: 26

Sgnfcance of Regresson An mportant specal case of these hypotheses s: (11-23) Falure to reject H 0 s equvalent to concludng that there s no lnear relatonshp between x and Y. In other words, f we conclude the slope could be 0 the nformaton on x tells us nothng about the varaton n the response, Y. 27 Fgure 11-5: The hypothess H 0 : 1 = 0 s not rejected. Fgure 11-6: The hypothess H 0 : 1 = 0 s rejected. 28

Hypothess Testng - Example Example 11-2 29 Analyss of Varance (ANOVA) The analyss of varance dentty s: If the null hypothess, H 0 : β 1 = 0 s true, the statstc follows the F 1,n-2 dstrbuton and we would reject f f 0 > f,1,n-2. 30

The ANOVA Table The quanttes MS R and MS E are called mean squares of the regresson and the errors, respectvely. Analyss of varance (ANOVA) table: 31 Analyss of Varance - Example Example 11-3 32

Equvalence of t-tests and ANOVA 33 Confdence Intervals on Regresson Model Parameters The followng state the confdence ntervals for the slope and ntercept of a regresson model. 34

Example 11 4 (Confdence Interval on the Slope) 12.181 β 1 17.713 35 Confdence Interval on the Mean Response The pont estmate for the response at a gven x s: ˆ ˆ ˆ x Y x 0 The confdence nterval for the mean response s then: 0 1 0 36

Example 11 5 (Confdence Interval on the Mean Response) 37 Example 11 5 (contnued) 38

Example 11 5 (contnued) Fgure 11-7: Scatter dagram of oxygen purty data from Example 11-1 wth ftted regresson lne and 95% confdence lmts on Y x0. 39 Predcton of New Observatons The response pont estmate for a new observaton at x 0 s: Yˆ 0 ˆ ˆ x 0 The predcton nterval for the new response, Y 0, s then: 1 0 40

Example 11 6 (Predcton Interval) 41 Example 11 6 (contnued) 42

Example 11 6 (contnued) Fgure 11-8: Scatter dagram of oxygen purty data from Example 11-1 wth ftted regresson lne, 95% predcton lmts (outer lnes), and 95% confdence lmts on Y x0. 43 Adequacy of Regresson Models Fttng a regresson model requres several assumptons. 1. Errors are uncorrelated random varables wth mean zero; 2. Errors have constant varance; and, 3. Errors be normally dstrbuted. The analyst should always consder the valdty of these assumptons to be doubtful and conduct analyses to examne the adequacy of the model 44

Resdual (Error) Analyss The resduals from a regresson model are e = y - ŷ, where y s an actual observaton and ŷ s the correspondng ftted value from the regresson model. Analyss of the resduals s frequently helpful n checkng the assumpton that the errors are approxmately normally dstrbuted wth constant varance, and n determnng whether addtonal terms n the model would be useful. 45 Resdual Plots Fgure 11-9: Patterns for resdual plots. (a) satsfactory, (b) funnel, (c) double bow, (d) nonlnear. 46

Resdual Analyss - Example Example 11-7 47 Example 11-7 (contnued) 48

Example 11-7 (contnued) Fgure 11-10: Normal probablty plot of resduals, Example 11-7. 49 Example 11-7 (contnued) Fgure 11-11: Plot of resduals versus predcted oxygen purty, ŷ, Example 11-7. 50

Coeffcent of Determnaton (R 2 ) The quantty s called the coeffcent of determnaton and s often used to judge the adequacy of a regresson model. 0 R 2 1; (11-34) We often refer (loosely) to R 2 as the amount of varablty n the data explaned or accounted for by the regresson model. 51 R 2 Computatons - Example For the oxygen purty regresson model, R 2 = SS R /SS T = 152.13/173.38 = 0.877 Thus, the model accounts for 87.7% of the varablty n the data. 52

Regresson on Transformed Varables In many cases a plot of the ndependent varable, y, aganst the dependent varable, x, may show the relatonshp s not lnear. Performng a lnear regresson would lead to a poor ft and resdual analyss would show the model s nadequate. However, we can often transform the dependent varable frst. Ths transformed varable, x, may have a lnear relatonshp wth y. 53 Therefore, we can perform a lnear regresson between the x and y. However, note that any use of the new equaton for predcton would requre a reverse transformaton to ndcate the desred value of x. Transformaton can take on many forms. Typcal ones nclude: x = logarthm (x) x = square root (x) x = nverse (x). 54

Example 11-9 An engneer has collected data on the DC output from a wndmll under dfferent wnd speed condtons. He wshes to develop a model descrbng output n terms of wnd speed. The table on the rght shows the data collected for output, y, as a response and wnd speed, x, as the dependent varable. The fnal column shows the transformed value, x =1/x. Obs. Output (y) Velocty (x) x'=1/x 1 1.582 5.00 0.200 2 1.822 6.00 0.167 3 1.057 3.40 0.294 4 0.5 2.70 0.370 5 2.236 10.00 0.100 6 2.386 9.70 0.103 7 2.294 9.55 0.105 8 0.558 3.05 0.328 9 2.166 8.15 0.123 10 1.866 6.20 0.161 11 0.653 2.90 0.345 12 1.93 6.35 0.157 13 1.562 4.60 0.217 14 1.737 5.80 0.172 15 2.088 7.40 0.135 16 1.137 3.60 0.278 17 2.179 7.85 0.127 18 2.112 8.80 0.114 19 1.8 7.00 0.143 20 1.501 5.45 0.183 21 2.303 9.10 0.110 22 2.31 10.20 0.098 23 1.194 4.10 0.244 24 1.144 3.95 0.253 25 0.123 2.45 0.408 55 Example 11-9 (contnued) 3.0 2.5 DC Output 2.0 1.5 1.0 0.5 0.0 0 2 4 6 8 10 12 Wnd Velocty, x Orgnal Regresson Equaton (Orgnal Data): y = 0.1309 + 0.2411 x R 2 = 0.875 56

Example 11-9 (contnued) 3.0 2.5 DC Output 2.0 1.5 1.0 Transformed 0.5 0.0 0.0 0.1 0.2 0.3 0.4 0.5 Transformed Wnd Velocty, 1/x Regresson Equaton (Transformed Data): y = 2.9789 6.9345 x R 2 = 0.980 57 THE END OF ENGG 319 CLASS NOTES 58