Chapter 13: Multiple Regression

Similar documents
Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Chapter 15 Student Lecture Notes 15-1

Chapter 11: Simple Linear Regression and Correlation

Statistics for Economics & Business

Chapter 15 - Multiple Regression

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Basic Business Statistics, 10/e

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Comparison of Regression Lines

Negative Binomial Regression

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Chapter 14 Simple Linear Regression

Statistics for Business and Economics

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

Statistics II Final Exam 26/6/18

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Lecture 6: Introduction to Linear Regression

Statistics MINITAB - Lab 2

STAT 3008 Applied Regression Analysis

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

/ n ) are compared. The logic is: if the two

x i1 =1 for all i (the constant ).

Correlation and Regression

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

is the calculated value of the dependent variable at point i. The best parameters have values that minimize the squares of the errors

Polynomial Regression Models

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

STATISTICS QUESTIONS. Step by Step Solutions.

Introduction to Regression

Chapter 9: Statistical Inference and the Relationship between Two Variables

ANOVA. The Observations y ij

Topic 23 - Randomized Complete Block Designs (RCBD)

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

x = , so that calculated

Learning Objectives for Chapter 11

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

STAT 511 FINAL EXAM NAME Spring 2001

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Chapter 8 Indicator Variables

SIMPLE LINEAR REGRESSION

Regression. The Simple Linear Regression Model

18. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Chapter 12 Analysis of Covariance

January Examinations 2015

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Economics 130. Lecture 4 Simple Linear Regression Continued

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Statistics Chapter 4

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

ECONOMETRICS - FINAL EXAM, 3rd YEAR (GECO & GADE)

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

Introduction to Analysis of Variance (ANOVA) Part 1

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Diagnostics in Poisson Regression. Models - Residual Analysis

a. (All your answers should be in the letter!

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

Chapter 6. Supplemental Text Material

Reduced slides. Introduction to Analysis of Variance (ANOVA) Part 1. Single factor

First Year Examination Department of Statistics, University of Florida

Introduction to Dummy Variable Regressors. 1. An Example of Dummy Variable Regressors

Chapter 5 Multilevel Models

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Biostatistics 360 F&t Tests and Intervals in Regression 1

Topic 7: Analysis of Variance

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

REGRESSION ANALYSIS II- MULTICOLLINEARITY

Linear Regression Analysis: Terminology and Notation

Linear regression. Regression Models. Chapter 11 Student Lecture Notes Regression Analysis is the

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

PBAF 528 Week Theory Is the variable s place in the equation certain and theoretically sound? Most important! 2. T-test

Linear Approximation with Regularization and Moving Least Squares

Topic- 11 The Analysis of Variance

β0 + β1xi. You are interested in estimating the unknown parameters β

Tests of Single Linear Coefficient Restrictions: t-tests and F-tests. 1. Basic Rules. 2. Testing Single Linear Coefficient Restrictions

Regression Analysis. Regression Analysis

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

See Book Chapter 11 2 nd Edition (Chapter 10 1 st Edition)

NUMERICAL DIFFERENTIATION

Systematic Error Illustration of Bias. Sources of Systematic Errors. Effects of Systematic Errors 9/23/2009. Instrument Errors Method Errors Personal

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

Outline. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture k r Factorial Designs with Replication

experimenteel en correlationeel onderzoek

( )( ) [ ] [ ] ( ) 1 = [ ] = ( ) 1. H = X X X X is called the hat matrix ( it puts the hats on the Y s) and is of order n n H = X X X X.

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Lecture 3 Stat102, Spring 2007

Composite Hypotheses testing

Transcription:

Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to estmate the populaton parameter β 0, β 1, and β usng the followng regresson equaton: 1

Usng Data Analyss - Regresson for the frst 3 columns of Potato.xls: The ft equaton allows us now to predct estmated values for the dfferent expermental condtons. For example ph 4 and pressure 15 leads to: ^ Y 3.816 +.7437 4 0.805 15 10.984 Our textbook now refers to Mntab for nformaton about a confdence nterval estmate of the average as well as a predcton nterval estmate for a future ndvdual value. The coeffcent of Multple Determnaton (R ) SSR SST 41.5697 R Y. 1 99.977 0.4158 Ths essentally means that 41.58% of the varaton can be explaned by the effect of ph and pressure on the sold content. The remanng 58.4% s due to random scatterng of the data. Sometmes an adjusted R s suggested, whch takes the number of data ponts (n) and the number of explanatory varables (k ) nto account. R adj 1 n 1 54 1 ( 1 R ) 1 ( 1 0.4158) 0. 399 Y.1 n k 1 54 1

13. Resdual Analyss By plottng the dfference between the actual data pont value and the predcted value (resdual), one can check the data for possble trends. The resduals are plotted aganst the varous parameters, X 1 and X, as well as aganst the data pont values, Y, or the tme. 13.3 Testng of Sgnfcance of the Multple Regresson Model Essentally testng: H 0 : β 1 β 0 (s there a slope wth respect to any of the parameters?) SST n ( Y Y ) 1 SSR n ( Y Y ) 1 ˆ SSE n ( Y Yˆ ) 1 13.4 Confdence Interval Estmate for each Slope The standard error for each regresson coeffcent s provded n the PhStat analyss. Testng for the sgnfcance of each (slope) regresson coeffcent s: or the Confdence Interval Estmate for the slope s: b j ± tn k 1 s b j 3

13.5 Testng Portons of the Multple Regresson Model Whch parameters are really mportant?? One runs the data analyss frst wth all parameters ncluded and then a second tme wth all parameters except the one beng tested. The SS-regresson then allows us to defne the dfference between the two stuatons as beng the contrbuton of that specfc (excluded) parameter. The followng example demonstrates ths for the ph-parameter n potato.xls Analyss ncludng ph and Lower Pressure: Regresson Statstcs Multple R 0.6448 R Square 0.415793 Adjusted R Square 0.3988 Standard Error 1.07016 Observatons 54 ANOVA df SS MS F Sgnfcance F Regresson 41.56971 0.78485 18.14888 1.1E-06 Resdual 51 58.40733 1.1454 Total 53 99.97704 Coeffcents Standard Error t Stat P-value Intercept 3.8165.339343 1.631334 0.108981 PH.84370 0.554451 5.18858 4.56E-06 Lower Pressure -0.8045 0.074543-3.768 0.000436 Result wth ph beng excluded: Regresson Statstcs Multple R 0.33837 R Square 0.114465 Adjusted R Square 0.097436 Standard Error 1.3048 Observatons 54 ANOVA df SS MS F Sgnfcance F Regresson 1 11.44391 11.44391 6.7159 0.01339 Resdual 5 88.53313 1.7056 Total 53 99.97704 Coeffcents Standard Error t Stat P-value Intercept 14.4666 1.313456 11.01415 3.33E-15 Lower Pressure -0.3389 0.0901 -.596 0.01339 The sgnfcance of ph s now determned through the F-rato of MSR(X ph ) / MSR(error) (41.57 11.44) / (58.41 / 51) 6.31 The F-cut-off s 4.04 for 1 and 51 degrees of freedom at the α 5% level. Thus, ph s sgnfcantly mprovng the regresson ft. 4

Coeffcent of Partal Determnaton Smlar to the complete analyss of R as descrbed n 13.1, one can also analyze how much varaton n the data can be explaned by a specfc parameter as follows: In our current example: R R Y1. Y.1 30.158 99.977 41.5697 + 30.158 16.106 99.977 41.5697 + 16.106 0.3403 0.17 ( ph ) ( lower pressure) For a fxed lower pressure 34.03% of the varaton can be explaned by the varaton n ph. And alternatvely 1.7% of the varaton can be explaned by the varaton of lower pressure. 5

13.6 The Quadratc Curvlnear Regresson Model Essentally the testng procedure and nterpretaton of the results s the same as for the lnear regresson analyss. However, the second parameter now uses the square value of the frst parameter. In a very smlar way the regresson models can be expanded to cubc or any other hgher order relatons. The predcted Y-values are: Y ˆ b + b X + b 0 The sgnfcance of the Quadratc Curvlnear Model s gven by the F-rato between the mean square regresson and the mean square error. F MSR / MSE As well as the Coeffcent of Multple Determnaton : R Y.1 SSR / SST Estmaton ntervals for ndvdual regresson coeffcents are descrbed usng the t-test wth t b / s b s b beng the standard error of the correspondng parameter n the Excel data analyss output. Detals about the calculaton of the standard error of a parameter are outsde the scope of ths class, but the systematc calculaton s descrbed on page 417 n Appled Statstcs and Probablty for Engneers, 3 rd edton by Montgomery and Runger. The t-dstrbuton s to be used for n {# of ft-parameters} for the correspondng degrees of freedom. 13.7 Dummy-Varable Models Categorcal varables can be substtuted wth dummy varables. For example, operator A s assgned a value of 0 and operator B s assgned a value of 1 (or wet 0 and dry 1). However, we should use ths only n cases where the dummy varable has only two values. We can the further evaluate f the slope of the lnear regresson for our other parameters s affected by the dummy varable by addng an nteracton factor. 1 1 X 1 6

13.8 Usng Transformatons n Regresson Models Smlarly to what was already descrbed n secton 13.6, we can use all other knds of transformatons and regresson models. E.g.: Square root transformaton Y β + β X + β X + ε 0 1 1 Multplcatve model β β Y β0 X1 1 X ε transforms to lny ln β0 + β1 ln X1 + β ln X + ln ε Exponental model β X X Y e 0 + β1 1 β + ε transforms to lny β0 + β1 X1 + β X + ln ε In short, we can use any possble mathematcal relaton as long as we can descrbe the model as a combnaton of lnear factors. 13.9 Collnearty Collnearty descrbes a stuaton, where factors are hghly correlated. It then becomes dffcult to separate the ndvdual effects from the cross-correlated effects and the effectveness of a specfc model can hghly fluctuate dependng on whch parameters are beng ncluded. One way to evaluate collnearty s to use the varance nflatonary factor: 1 VIFj wth R 1 R j beng the coeffcent of determnaton when usng j all other X-varables except X j. Values close to 1 ndcate uncorrelated varables, whereas values of 5 or hgher are consdered sgnfcant. In other words, a large VIF value ndcates that two (or more) varables are actually closely related. The two varables are not ndependent of each other but they are rather two measurements of the same effect. For example, measurng the bouncng heght of a ball and correlatng that to the speed of a fallng ball 5 cm before httng the ground (varable 1) and the heght of where the ball was released ntally (varable ) are hghly correlated varables. There s no need to nclude both varables n a model. 7

13.10 Model Buldng We lke to acheve a model that ncludes the fewest number of varables. 8

Frst we elmnate hghly correlated varables. We then could use varous combnatons of varables and see f all of them show sgnfcance above a certan threshold n our regresson model. In a more systematc way, we can use all possble combnaton of varables. The followng fgure shows the result for the potato-processng data. We can now evaluate the adjusted R values of the dfferent models. The larger the R value s, the more of the data scatterng s beng explaned by the model. An alternatve approach use the so-called C p statstc for evaluaton f a certan model should be consdered. Here models where C p s less than the number of varables beng used n the model + 1 are beng consdered for further evaluaton. Thereafter we agan evaluate each model n detal and see f all varables show a sgnfcant effect (P-values below a gven cut-off value). Nevertheless, choosng an approprate model s a hghly subjectve process! 9

13.11 Ptfalls n Multple Regresson The regresson coeffcent for one partcular varable s nterpreted for the case that all other varables are held constant. We need resdual plots for each ndependent varable. Interacton plots are needed for each parameter when usng dummy varables. VIF evaluaton s needed to decde on parameters to be ncluded n a model. Examne several alternatve subsets for models. Sample sze 10 x larger than the # of varables n the model. Evaluate a suffcently wde range for each varable. Stablty over tme s a key necessty to use a fttng model for predctve purposes. 10