Single and multiple linear regression analysis

Similar documents
Inferences for Regression

Analysing data: regression and correlation S6 and S7

Correlation and Regression Bangkok, 14-18, Sept. 2015

Chapter 7 Student Lecture Notes 7-1

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Basic Business Statistics 6 th Edition

Multiple linear regression S6

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Regression Diagnostics Procedures

Checking model assumptions with regression diagnostics

Introduction to Regression

Regression Models. Chapter 4. Introduction. Introduction. Introduction

Chapter 14 Simple Linear Regression (A)

Chapter 14 Student Lecture Notes 14-1

Business Statistics. Lecture 10: Correlation and Linear Regression

Chapter 4 Describing the Relation between Two Variables

Chapter 4. Regression Models. Learning Objectives

Simple Linear Regression

A discussion on multiple regression models

Multicollinearity occurs when two or more predictors in the model are correlated and provide redundant information about the response.

Statistics for Managers using Microsoft Excel 6 th Edition

Lecture 9: Linear Regression

Regression Analysis. BUS 735: Business Decision Making and Research

Inference for Regression Inference about the Regression Model and Using the Regression Line

Regression Analysis. Regression: Methodology for studying the relationship among two or more variables

Chapter 3 Multiple Regression Complete Example

Regression Model Building

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Correlation Analysis

Scatter plot of data from the study. Linear Regression

Measuring the fit of the model - SSR

Data Analysis 1 LINEAR REGRESSION. Chapter 03

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

What is a Hypothesis?

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

BNAD 276 Lecture 10 Simple Linear Regression Model

Assumptions of the error term, assumptions of the independent variables

Scatter plot of data from the study. Linear Regression

CREATED BY SHANNON MARTIN GRACEY 146 STATISTICS GUIDED NOTEBOOK/FOR USE WITH MARIO TRIOLA S TEXTBOOK ESSENTIALS OF STATISTICS, 3RD ED.

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Review of Multiple Regression

General linear models. One and Two-way ANOVA in SPSS Repeated measures ANOVA Multiple linear regression

STAT 4385 Topic 06: Model Diagnostics

Chapter Learning Objectives. Regression Analysis. Correlation. Simple Linear Regression. Chapter 12. Simple Linear Regression

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Lecture 18: Simple Linear Regression

Analysis of Bivariate Data

Econometrics Summary Algebraic and Statistical Preliminaries

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

The simple linear regression model discussed in Chapter 13 was written as

y response variable x 1, x 2,, x k -- a set of explanatory variables

Chapter 4: Regression Models

Simple Linear Regression Using Ordinary Least Squares

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Daniel Boduszek University of Huddersfield

Chapter 10 Correlation and Regression

CHAPTER 5 LINEAR REGRESSION AND CORRELATION

STA121: Applied Regression Analysis

The Multiple Regression Model

Chapter 14. Linear least squares

Statistical View of Least Squares

Statistical Modelling in Stata 5: Linear Models

Making sense of Econometrics: Basics

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Chapter 12 Summarizing Bivariate Data Linear Regression and Correlation

Mathematics for Economics MA course

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Sociology 6Z03 Review I

Linear Regression Measurement & Evaluation of HCC Systems

Simple Linear Regression

Simple linear regression

Labor Economics with STATA. Introduction to Regression Diagnostics

Basic Business Statistics, 10/e

Chapter 6: Exploring Data: Relationships Lesson Plan

Chapter 26 Multiple Regression, Logistic Regression, and Indicator Variables

STAT 4385 Topic 03: Simple Linear Regression

Finding Relationships Among Variables

Regression - Modeling a response

FAQ: Linear and Multiple Regression Analysis: Coefficients

appstats27.notebook April 06, 2017

Correlation and Regression

Chapter 27 Summary Inferences for Regression

Sample Problems. Note: If you find the following statements true, you should briefly prove them. If you find them false, you should correct them.

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

Parametric Test. Multiple Linear Regression Spatial Application I: State Homicide Rates Equations taken from Zar, 1984.

CS 5014: Research Methods in Computer Science

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

2. Outliers and inference for regression

Lecture 4: Multivariate Regression, Part 2

ECON 497: Lecture 4 Page 1 of 1

Inference for Regression

Multivariate Data Analysis Joseph F. Hair Jr. William C. Black Barry J. Babin Rolph E. Anderson Seventh Edition

Lecture 4: Multivariate Regression, Part 2

Multiple Regression and Model Building Lecture 20 1 May 2006 R. Ryznar

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Transcription:

Single and multiple linear regression analysis Marike Cockeran 2017 Introduction Outline of the session Simple linear regression analysis SPSS example of simple linear regression analysis Additional topics in multiple linear regression analysis Adjusted R-squared Standardised regression coefficients Multicollinearity A note on categorical predictors (dummy variables)

Introduction: Simple linear regression (1) Suppose we collect data on two variables: Waist to hip ratio (). Cholesterol (). For each participant we now have a pair of observations (,. ID Cholesterol Ratio () () 1 203 3.60 2 165 6.90 3 228 6.20 4 78 6.50 5 249 8.90 Introduction: Simple linear regression (2) We want to fit a straight line that describes the linear relationship between and.

Introduction: Simple linear regression (3) Simple linear regression is a technique that is used to explore the nature of the relationship between two variables. Regression analysis enables us to investigate the change in one variable, called the response (dependent variable), which corresponds to a given change in the other, known as the explanatory variable (independent variable). The ultimate objective of regression analysis is to predict or estimate the value of the response that is associated with a fixed value of the explanatory variable. Simple Linear Regression Model (1) Population model i β 0 β 1 i ε i Observed Value of for i Predicted Value of for i ε i Random Error for this i value Slope = β 1 Intercept = β 0 i

Simple Linear Regression Model (2) Population model Dependent variable Population intercept i β 0 Population slope coefficient β 1 Independent variable i ε i Random error term Linear component Random error component Simple Linear Regression Equation (Prediction Line) The simple linear regression equation provides an estimate of the population regression line Estimated (or predicted) value for observation i Estimate of the regression intercept Estimate of the regression slope Value of for observation i

Method of least squares and are obtained by finding the values of that minimize the sum of the squared differences between and : : min ( Ŷ ) i i i Interpretation of intercept term is the estimated average values of when the value of is zero. has only practical application if =0 is in the range of observed values. Intercept = 0 i

Interpretation of slope term estimates the change in the average value of as a result of a one-unit increase in. One unit increase in. Coefficient of Determination ( ) The coefficient of determination is the portion of the total variation in the dependent variable that is explained by variation in the independent variable The coefficient of determination is also called R-squared and is denoted as. R 2 SSR SST regression sum of squares total sum of squares 0 R 2 1

Coefficient of Determination ( ) i SST = SSR = _ i Examples of approximate values (1) = 1 = 1 Perfect linear relationship between and. 100% of the variation in is explained by variation in.

Examples of approximate values (2) Weak linear relationships between and. Some but not all of the variation in is explained by variation in. Examples of Approximate values (3) = 0 No linear relationship between and. The value of does not depend on. None of the variation in is explained by the variation in.

Assumption of simple linear regression models Linearity The relationship between and is linear. Independence of errors Error values are statistically independent. Normality of error Error values are normally distributed for any given value of. Equal variance (homoscedasticity) The probability distribution of the errors has constant variance. Test assumptions after fitting the model by making use of the residuals. Residual analysis The residual for observation i,, is the difference between the observed and predicted values. Check the assumptions of regression by examining the residuals. Linearity assumption Independence assumption Normal distribution assumption Constant variance for all levels of (homoscedasticity). Graphical analysis of residuals Plot the residuals against.

Residual analysis for linearity Not linear Linear residuals residuals Residual analysis for normality Not normal Normal Histogram of the dependent variable Normal Q-Q Plot Sample Quantiles -3-2 -1 0 1 2 3 Frequency 0 5 10 15 20 25 30 35 0 10 20 30 40 50 60 Normal Q-Q Plot Sample Quantiles 0 10 20 30 40 50-2 -1 0 1 2 Theoretical Quantiles -3-2 -1 0 1 2 3 Theoretical Quantiles

Residual analysis for equal variance Non-constant variance Constant variance residuals residuals SPSS example: The data Suppose we collect data on two variables: Waist to hip ratio (). Cholesterol (). For each participant we now have a pair of observations (,. ID Cholesterol Ratio () () 1 203 3.60 2 165 6.90 3 228 6.20 4 78 6.50 5 249 8.90

Steps in SPSS: Scatter plot and prediction line Graphs Legacy dialogs Scatter/Dot SPSS output: Scatter plot and prediction line Predicted cholesterol 0.970 0.387

Steps in SPSS: Simple linear regression model Analyze Regression Linear Steps in SPSS: Statistics tab

Steps in SPSS: Save tab SPSS output: Regression equation Predicted cholesterol 171.233 0.970

SPSS output: values 1.5% of the variance of cholesterol could be explained by waist to hip ratio. SPSS output: Inferences about the slope : 0 : 0 Waist to hip ratio is a significant predictor of the dependent variable cholesterol, p=0.013.

SPSS output: Residual analysis for normality The residuals are normally distributed. SPSS output: Residual analysis for equal variance The residuals have constant variance.

SPSS output: Outliers If the absolute standardised residual value is larger than 3 then the observation is considered as an outlier. SPSS output: Influential cases Analyze Regression Linear Save tab : It is the difference between the estimated regression coefficient based on all cases and the regression coefficient obtained when the case is omitted. If the absolute value of the value exceeds thevaluecanbeviewedasan influential value.

Introduction: Multiple linear regression analysis In the preceding chapter, we saw how simple linear regression can be used to explore the nature of the relationship between two variables. If knowing the value of a single explanatory variable improves our ability to predict the response, we might expect that additional explanatory variables could be used to our advantage. To investigate the more complicated relationship among a number of different variables, we use a natural extension of simple linear regression analysis known as multiple linear regression analysis. Introduction: Multiple linear regression analysis Multiple linear regression equation: The regression coefficients are still estimated by using the method of least squares. The independent variables can be continuous or categorical variables. In the case of categorical variables we need to use dummy variables.

Adjusted R-squared The inclusion of an additional variable in a model can never cause to decrease. To get around this problem, we can use a second measure, called the, that compensated for the added complexity of a model. The increases when the inclusion of a variable improves our ability to predict the response and decreases when it does not. Standardised coefficients Unstandardised coefficients The value of the unstandardised coefficient is dependent on the units of measurement of the variables. It is not possible to compare the relative magnitude of coefficients. Standardised coefficients The value of the unstandardised coefficient now does not depend on the units of measurement of the variables. It is now possible to compare the relative magnitude of coefficients. How to standardise:

Multicollinearity Multicollinearity exists when there is a strong correlation between two or more predictors (independent variables) in a regression model. High levels of collinearity increase the probability that a good predictor of the outcome variable will be found non-significant and rejected from the model (Type II error). VIF (variance inflation factor) > 10 indicates a potential problem. Tolerance below 0.2 indicates a potential problem. SPSS output: Multicollinearity Analyze Regression Linear Statistics