Six Sigma Black Belt Study Guides

Similar documents
Analysis of Covariance. The following example illustrates a case where the covariate is affected by the treatments.

Inference for Regression Inference about the Regression Model and Using the Regression Line

Notebook Tab 6 Pages 183 to ConteSolutions

Correlation and Regression

A discussion on multiple regression models

SMAM 314 Practice Final Examination Winter 2003

Inferences for Regression

Homework 2: Simple Linear Regression

Business Statistics. Chapter 14 Introduction to Linear Regression and Correlation Analysis QMIS 220. Dr. Mohammad Zainal

Basic Business Statistics 6 th Edition

Ch 13 & 14 - Regression Analysis

Inference for Regression Simple Linear Regression

The simple linear regression model discussed in Chapter 13 was written as

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

Review 6. n 1 = 85 n 2 = 75 x 1 = x 2 = s 1 = 38.7 s 2 = 39.2

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Section 11: Quantitative analyses: Linear relationships among variables

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Mathematics for Economics MA course

Chapter 16. Simple Linear Regression and dcorrelation

23. Inference for regression

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Sleep data, two drugs Ch13.xls

10.2: The Chi Square Test for Goodness of Fit

Chapter 1. Linear Regression with One Predictor Variable

(ii) Scan your answer sheets INTO ONE FILE only, and submit it in the drop-box.

Analysis of Bivariate Data

Chapter 16. Simple Linear Regression and Correlation

1 Introduction to Minitab

Inference for the Regression Coefficient

Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur

This document contains 3 sets of practice problems.

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

Chapter 12 - Part I: Correlation Analysis

28. SIMPLE LINEAR REGRESSION III

Linear Correlation and Regression Analysis

Simple Linear Regression: A Model for the Mean. Chap 7

TMA4255 Applied Statistics V2016 (5)

Ch 2: Simple Linear Regression

INFERENCE FOR REGRESSION

Multiple Regression. Inference for Multiple Regression and A Case Study. IPS Chapters 11.1 and W.H. Freeman and Company

Chapter 9. Correlation and Regression

School of Mathematical Sciences. Question 1

Basic Business Statistics, 10/e

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

Regression. Marc H. Mehlman University of New Haven

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Correlation Analysis

Lecture 9: Linear Regression

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Chapter 10. Simple Linear Regression and Correlation

Taguchi Method and Robust Design: Tutorial and Guideline

1 Correlation and Inference from Regression

STAT 350 Final (new Material) Review Problems Key Spring 2016

Correlation and Simple Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression

Statistics for Managers using Microsoft Excel 6 th Edition

The Multiple Regression Model

REVIEW 8/2/2017 陈芳华东师大英语系

Correlation & Simple Regression

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Warm-up Using the given data Create a scatterplot Find the regression line

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

CHAPTER 5 FUNCTIONAL FORMS OF REGRESSION MODELS

Better Exponential Curve Fitting Using Excel

Lecture 11: Simple Linear Regression

Applied Regression Analysis

Chapter 3 Multiple Regression Complete Example

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Chapter 14 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics. Chapter 14 Multiple Regression

Econometrics. 4) Statistical inference

CHAPTER EIGHT Linear Regression

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

y n 1 ( x i x )( y y i n 1 i y 2

Simple Linear Regression

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Topic 10 - Linear Regression

ECO220Y Simple Regression: Testing the Slope

LECTURE 6. Introduction to Econometrics. Hypothesis testing & Goodness of fit

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

Hypothesis Tests and Estimation for Population Variances. Copyright 2014 Pearson Education, Inc.

SMAM 314 Computer Assignment 5 due Nov 8,2012 Data Set 1. For each of the following data sets use Minitab to 1. Make a scatterplot.

1 Introduction to One-way ANOVA

SIMPLE REGRESSION ANALYSIS. Business Statistics

Ph.D. Preliminary Examination Statistics June 2, 2014

LEARNING WITH MINITAB Chapter 12 SESSION FIVE: DESIGNING AN EXPERIMENT

Confidence Interval for the mean response

Conditions for Regression Inference:

Regression Analysis. BUS 735: Business Decision Making and Research

Institutionen för matematik och matematisk statistik Umeå universitet November 7, Inlämningsuppgift 3. Mariam Shirdel

Examination paper for TMA4255 Applied statistics

Transcription:

Six Sigma Black Belt Study Guides 1 www.pmtutor.org Powered by POeT Solvers Limited.

Analyze Correlation and Regression Analysis 2 www.pmtutor.org Powered by POeT Solvers Limited.

Variables and relationships Correlation and Regression Analysis When dealing with two or more variables, the relationships between /among them if any exists), the impact of one variable on the other (s) can be studied under correlation and regression analysis. For simplicity, let us consider two variables X and Y. Our objective is to find out whether any relation exists between them or not. If any relation exists between them say Y = F(X), then we have to find out the relation F in the best possible way. X is called an explanatory or independent variable and Y is called response or dependent variable. Scatter diagram It is a graphical representation that depicts the relationship between two variables. It provides both a visual and statistical means to test the strength of a relationship. Construction of a scatter diagram Collect the data on both variables (preferable sample size 20 or more). Plot the data points on a XY plane where variable 1 is plotted along X axis and variable 2 is plotted along Y axis. 3 www.pmtutor.org Powered by POeT Solvers Limited.

Identify the linear relationship between them if it exists. Identify the strength of the linear relationship as strong positive, weak positive, no relationship, weak negative, and strong negative. 4 www.pmtutor.org Powered by POeT Solvers Limited.

Correlation coefficient Correlation and Regression Analysis It is a measure of strength of the linear relationship between two variables denoted by r and the range is given by -1 r 1. If the value of r is 1 or close to 1, then the two variables have strong positive relationship. If the value of r is 1 or close to 1, then they have strong negative relationship. If the value of r is 0, then no relationship exists. E.g. Let us consider a process in ABC Services Pvt. Ltd. The team leader of the process decides to find out the impact of training on performance. The following data is collected: Hours of Training Defects Hours of Training Defects 4 44 44 20 8 39 48 17 12 38 52 15 16 35 56 11 20 31 60 8 24 28 64 5 28 26 68 3 32 25 72 1 36 25 76 0 40 22 80 0 5 www.pmtutor.org Powered by POeT Solvers Limited.

To fulfill the requirement we need to draw the scatter plot and identify the relationship between variables (training hours and number of defects). Minitab steps Copy the data in the Minitab worksheet as it is given. Select Graph > Scatter plots and then select simple. Select X and Y variable as hours of training and number of defects respectively. Minitab output 6 www.pmtutor.org Powered by POeT Solvers Limited.

Interpretation From the scatter plot, it is clear that the hours of training and number of defects made by the process executives (PE) have a strong negative relationship. To find out the correlation coefficient we use excel function CORREL. It is given in the excel screenshot. The correlation coefficient r = - 0.995. 7 www.pmtutor.org Powered by POeT Solvers Limited.

Calculation of sample correlation coefficient (r) The sample correlation coefficient (r) is computed as where 8 www.pmtutor.org Powered by POeT Solvers Limited.

Hypothesis Test for the Correlation Coefficient Null Hypothesis H 0 : ρ = 0 Alternative Hypothesis H 1 : ρ 0 Test statistic where T has the t-distribution with (n 2) degrees of freedom NB: The sample correlation coefficient (r) is used to estimate the population correlation coefficient (ρ). 9 www.pmtutor.org Powered by POeT Solvers Limited.

Hypothesis Test for the Correlation Coefficient ρ Fisher s transformation of r: If W = 1 ln 1 + r then W is approximately normally distributed with 2 1 - r mean = 1 ln 1 + ρ & variance = 1 2 1 + ρ n - 3 Null Hypothesis H 0 : ρ = ρ 0 Alternative Hypothesis H 1 : ρ ρ 0 Test statistic: NB: This is a general test used to test the ρ against any non-zero value. 10 www.pmtutor.org Powered by POeT Solvers Limited.

Confidence interval for the Correlation Coefficient An approximate 100(1-α)% confidence interval for ρ is given by 11 www.pmtutor.org Powered by POeT Solvers Limited.

Simple linear regression model With the use of scatter plot and correlation coefficient, the strength of the linear relationship is detected (if any relationship exists). The next step is to determine the linear model which can be used in forecasting. A linear model is defined as a relation between two variables where changes in one variable produce a proportionate change in the other variable. Mathematically a linear model is expressed as Y = a + bx, where a and b are constants. When two variables do not have a linear relationship (that is the case in many practical situations), then their relationship can be converted into a linear model with suitable transformation. E.g. Two variables X and Y have a relation as Y = ab X Taking the log on both side produces log Y = log a + X log b. It can be written as Y = a + b X which is a linear model in X and Y. Then further analysis can be done using this linear model. 12 www.pmtutor.org Powered by POeT Solvers Limited.

Method of least square It is a method of best fitting the linear model to the observed sample. Let us consider two variables X and Y for the study and samples (x i, y i ); i = 1, 2,.., n are collected. We want to find out a simple linear regression model Y = a + bx + ϵ, where ϵ is the error. For a set of paired data {(x i, y i ) / i = 1, 2,.., n} the least square estimates of the regression coefficient are the values a and b, for which [y i (a + bx i )] 2 is minimum. Note that [y i (a + bx i )] is the deviation of the point (x i, y i ) from the fitted line which is illustrated graphically in the next slide. Least square method is the technique of minimizing the deviations to best fit the linear model. 13 www.pmtutor.org Powered by POeT Solvers Limited.

The line represents the fitted line and the points are (x i, y i ). The deviations are represented by the arrow mark ( ). Y X 14 www.pmtutor.org Powered by POeT Solvers Limited.

Residuals Response Fitted Response = Residual from the fit For a set of data points (x i, y i ) if the fitted regression line is y = a + bx, then the residuals are given by (y i y i ) Goodness of Fit The total variation in the responses is given by S yy = (y i y) 2 S yy = (y i y) 2 = (y i y i ) 2 + (y i y) 2 = SS RES + SS REG SS REG summarizes the variability explained by the model. SS RES summarizes the variability between response and their fitted values (unexplained by the model). How much variation is explained by the model is a measure of goodness of fit. 15 www.pmtutor.org Powered by POeT Solvers Limited.

Simple linear regression hypothesis testing Example: The following data refers to the number of claims (X) received by a motor insurance company in a week and the number of settlements (Y) of these claims in the following week during 10 randomly selected weeks in a year. X 100 110 120 130 140 150 160 170 180 190 Y 45 51 54 61 66 70 74 78 85 89 A regression model Y = a +bx + ϵ is to be fitted on the above data. Display the data in a scatter plot and comment on the selection of a linear model for regression. Test the hypothesis is b = 0 against b 0. 16 www.pmtutor.org Powered by POeT Solvers Limited.

Select graph > Scatter plots Select simple and then click OK Minitab gives the following output The scatter diagram shows that there is a strong relationship between the number of claims and the number of settlements. The assumption of the straight line model Y = a + bx + ϵ appears to be reasonable. 17 www.pmtutor.org Powered by POeT Solvers Limited.

Regression analysis (Fit a linear model) Select Stat > Regression > Regression Select the predictor variable as X and response variable as Y Minitab output Regression analysis: Y versus X The regression equation is Y = -2.74 + 0.483X Predictor Coef SE Coef T P Constant -2.739 1.546-1.77 0.114 X 0.48303 0.01046 46.17 0.000 S 0.950279 R-Sq 99.6% R-Sq (adj) - 99.6% Analysis of Variance Source DF SS MS F P Regression 1 1924.9 1924.9 2131.57 0.000 Residual Error 8 7.2 0.9 Total 9 1932.1 18 www.pmtutor.org Powered by POeT Solvers Limited.

Test the hypothesis b = 0 against b 0 The following ANOVA table is needed. Analysis of Variance Source DF SS MS F P Regression 1 1924.9 1924.9 2131.57 0.000 Residual Error 8 7.2 0.9 Total 9 1932.1 Table F (1, 8) at 5% level of significance is 5.32. Since the calculated value of F = 2131.57 > (Table value), we reject the hypothesis b = 0. 19 www.pmtutor.org Powered by POeT Solvers Limited.

Multiple linear regression In this case there will be more than one explanatory or independent variables. The general form of multiple linear regression model is y = b 0 + b 1 x 1 + b 2 x 2 +. + b k x k + ϵ The coefficients can be determined by the same method (method of least square) as used in a simple linear regression model. 20 www.pmtutor.org Powered by POeT Solvers Limited.

Conclusion Variables and relationships Correlation co-efficient Simple linear regression model Method of least squares Residuals Simple linear regression hypothesis testing (Test for correlation coefficient, slope parameter) Sources of variation Co-efficient of determination Multiple linear regression 21 www.pmtutor.org Powered by POeT Solvers Limited.