A discussion on multiple regression models

Similar documents
Correlation Analysis

Inferences for Regression

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Six Sigma Black Belt Study Guides

Ch 2: Simple Linear Regression

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Lecture 10 Multiple Linear Regression

Chapter 7 Student Lecture Notes 7-1

Chapter 14 Student Lecture Notes 14-1

: The model hypothesizes a relationship between the variables. The simplest probabilistic model: or.

1 Correlation and Inference from Regression

STAT Chapter 11: Regression

Inference for Regression Simple Linear Regression

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Statistical Modelling in Stata 5: Linear Models

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Ch 3: Multiple Linear Regression

Multiple Regression Analysis. Basic Estimation Techniques. Multiple Regression Analysis. Multiple Regression Analysis

df=degrees of freedom = n - 1

Basic Business Statistics 6 th Edition

Business Statistics. Lecture 10: Correlation and Linear Regression

The simple linear regression model discussed in Chapter 13 was written as

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Inference for Regression Inference about the Regression Model and Using the Regression Line

9. Linear Regression and Correlation

Econometrics. 4) Statistical inference

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

ST430 Exam 1 with Answers

Basic Business Statistics, 10/e

Difference in two or more average scores in different groups

Simple Linear Regression Analysis

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Inference for Regression

Correlation and Simple Linear Regression

ST430 Exam 2 Solutions

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur

1.) Fit the full model, i.e., allow for separate regression lines (different slopes and intercepts) for each species

Single and multiple linear regression analysis

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

Lecture 3: Inference in SLR

The Multiple Regression Model

Chapter 4. Regression Models. Learning Objectives

Notebook Tab 6 Pages 183 to ConteSolutions

Chapter 13. Multiple Regression and Model Building

Multiple linear regression S6

Chapter 4: Regression Models

Chapter 3 Multiple Regression Complete Example

Review of Statistics 101

Simple Linear Regression Using Ordinary Least Squares

Area1 Scaled Score (NAPLEX) .535 ** **.000 N. Sig. (2-tailed)

Sociology 6Z03 Review II

Analysis of Variance: Part 1

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist

CHAPTER EIGHT Linear Regression

Statistics for Managers using Microsoft Excel 6 th Edition

Simple Linear Regression: One Quantitative IV

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Chapter 10. Simple Linear Regression and Correlation

Simple Linear Regression

REVIEW 8/2/2017 陈芳华东师大英语系

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

y response variable x 1, x 2,, x k -- a set of explanatory variables

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Density Temp vs Ratio. temp

Statistics and Quantitative Analysis U4320

Multiple Linear Regression

SIMPLE REGRESSION ANALYSIS. Business Statistics

Lecture 9: Linear Regression

Example. Multiple Regression. Review of ANOVA & Simple Regression /749 Experimental Design for Behavioral and Social Sciences

Data Analysis and Statistical Methods Statistics 651

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Lecture 18: Simple Linear Regression

Stats Review Chapter 14. Mary Stangler Center for Academic Success Revised 8/16

Ch Inference for Linear Regression

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

Statistics 5100 Spring 2018 Exam 1

Scatter plot of data from the study. Linear Regression

1 Multiple Regression

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

General Linear Model (Chapter 4)

Lecture 6 Multiple Linear Regression, cont.

Richard Yung VEE regression analysis spring 2009 (510) Paid loss triangle: Parameter Stability.

School of Mathematical Sciences. Question 1

Final Exam - Solutions

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Important note: Transcripts are not substitutes for textbook assignments. 1

Multiple Linear Regression. Chapter 12

Comparing Several Means: ANOVA

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Transcription:

A discussion on multiple regression models In our previous discussion of simple linear regression, we focused on a model in which one independent or explanatory variable X was used to predict the value of a dependent or response variable Y. We find such model is very useful in establishing a calibration curve for analytical instrument where the X variable is the standard solution concentration and Y, the instrument response. However, in many other scientific studies, we encounter several independent variables which affect the final outcome of the experiment. Hence, we now want to model the dependent variable by several independent variables. For example, we may want to carry out a drug chemical synthesis experiment by predicting its yield (g) based on its reaction temperature ( o C) and vessel pressure exerted (psi). In this case, we have two independent variables to predict the value of the dependent variable, i.e. the drug yield. With these two dependent variables in the multiple regression model, a scatter diagram of the drug yield as an outcome variable can be plotted on a threedimensional graph such as the diagram below: Product Yield Reaction Temperature Vessel Pressure In general, if Y is the dependent or outcome variable and X 1, X 2,, X k are k independent variables, then the general multiple regression model has the general form Y = βo + β1x1 + β2x2 + + βkxk + ε where 1

βo is y-intercept, and β k is the slop of y with variable x k when the other X s are held constant. The part E(Y) = βo + β1x1 + β2x2 + + βkxk is the deterministic portion of the model. The ε term is the random error in Y Before further discussion, we have to take note of the assumptions made on multiple regression. The assumptions of simple regression discussed earlier also hold true for multiple regression. They are mainly: 1. For any given set of values of the independent variables, the random error ε has a normal distribution with mean = 0 and standard deviation equal to σ; 2. The random errors are independent. Furthermore, as soon as we use more than one independent variable, we have to worry about multi-collinearity. This means that none of our independent variables (also called predictors) should correlate highly with any other independent variable. In particular, no variable can be a linear combination of other variables, i.e. we cannot include as predictors the independent variables A, B and A+B. Predictor variables that are highly correlated tend to explain much the same as variance in an outcome variable, blurring the relationship of each individual predictor with the outcome. Let s see how the first-order model works in estimating and interpreting the parameters involved. As in the case of simple linear regression, we also adopt the sample regression coefficients (bo, b1 and b2) used as estimates of the population parameters (βo, β1 and β2). Therefore, the regression equation for a multiple linear regression model with two explanatory variables, for example, is expressed as follows: y = bo + b1x1 + b2x2 For illustration, a series of experiments was carried out to synthesize a drug chemical by studying its total yield against two independent variables, reaction temperature and vessel pressure. The test results were summarized as below: 2

Pressure, psi Temperature, o C Yield, g 10 40 25 20 40 30 30 40 33 10 60 32 20 60 38 30 60 40 10 80 45 20 80 47 30 80 48 By applying the Excel software through Data ->Data Analysis -> Regression, we get the following Excel output with the values of the coefficients shown: SUMMARY OUTPUT Regression Statistics Multiple R 0.985 R Square 0.971 Adjusted R Square 0.961 Standard Error 1.602 Observations 9 ANOVA df SS MS F Significance F Regression 2 510.833 255.417 99.585 2.501E-05 Residual 6 15.389 2.565 Total 8 526.222 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 5.222 2.417 2.161 0.074-0.692 11.137 Pressure 0.317 0.065 4.843 0.003 0.157 0.477 Temperature 0.433 0.033 13.256 0.000 0.353 0.513 The estimated regression equation therefore is Yield = 5.22 + 0.37 Pressure + 0.433 Temperature 3

The above equation tells us that adding one psi vessel pressure adds 0.37g drug yield and increasing one degree reaction temperature increases the drug yield by 0.43 g. In other words, if the reaction temperature was fixed at 80 o C and the vessel pressure, at 20 psi, the drug chemical yield might be: Yield = 5.22 + 0.37 (20) + 0.433 (80) = 46.2 g We can make few inferences from the parameters obtained, such as: 1. Estimating the confidence interval on β 1, the coefficient in front of the pressure variable in the above example. In general, the confidence interval for βi is b i + tα/2 x standard error of b i. The degrees of freedom for the t-value is n (k+1), where n = sample size, (k+1) = the number of beta in the model. Hence, in this case, we have 9 (2+1) = 6 degrees of freedom and the critical t 0.5/2 = 2.447. The 95% confidence interval for the pressure coefficient, β 1 is given by b 1 + tα/2 x standard error of b 1, which is 0.317+ 2.447 x 0.065 = 0.317 + 0.159 or (0.157, 0.477). 2. Hypothesis testing that Ho : β 2 = 0 Use the following t-statistic: b2 0 t = std error, b 2 As b 2 = 0.433 and standard error b 2 = 0.033, t-statistic = 13.256 whilst t 0.5/2 = 2.447 and the corresponding p-value = 0.00, it can be concluded that the null hypothesis Ho : β 2 = 0 is rejected. One may be cautioned about conducting several t-tests on the betas. This is because if each t-test is conducted at α = 0.05, the actual alpha that would cover all the tests simultaneously is considerably larger than 0.05. For example, if significance tests are conducted on five betas, each at α = 0.05, then if all the β parameters (except β o ) are equal to zero, approximately (1-0.95 2 ) = 0.227 or 22.7% of the time we will incorrectly reject the null hypothesis at least once and conclude that some β parameter differs from zero. If there are 10 betas, there are approximately 40% of the time we will make such mistake! 4

3. The adjusted coefficient of determination R 2 (adj) The coefficient of determination R 2 is also an important measure not to be ignored. The adjusted coefficient of determination R 2 (adj) has been adjusted to take into account the sample size and the number of independent variables. It is defined in terms of R 2 as follows: 2 ( n 1) 2 R ( adj) = 1 (1 R ) n ( k 1) + In this drug experiments, we found R 2 = 0.971 and R 2 (adj) = 0.961 or 96.1%. Do take note that the adjusted R 2 does not have the same interpretation as R 2. We know R 2 is a measure of goodness of fit, adjusted R 2 is instead a comparative measure of suitability of alternative nested sets of independent variables. 5