Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013
Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis
Introduction of simple linear regression 1 Introduction of simple linear regression Linearity Covariance and correlation coefficients Matrix of correlation coefficients Nonlinear relationship 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis
Introduction of simple linear regression Linearity Linearity Scatter plot OR correlation coefficient Before to apply linear regression, make sure X and Y has linear relationship. Figure: Scatter Plot to identify Linear Relationship
Introduction of simple linear regression Linearity Linearity Minutes 20 40 60 80 100 120 140 160 2 4 6 8 10 Units Figure: Scatter Plot to identify Linear Relationship
Introduction of simple linear regression Covariance and correlation coefficients Covariance and correlation coefficient I 1 Sample mean ȳ = x = n i=1 y i n n i=1 x i n
Introduction of simple linear regression Covariance and correlation coefficients Covariance and correlation coefficient II 2 Sample S.D s x = n i=1 (x i x) 2 n 1 s y = n i=1 (y i ȳ) 2 n 1
Introduction of simple linear regression Covariance and correlation coefficients Covariance and correlation coefficient III 3 Covariance Cov(X, Y ) = n i=1 (y i ȳ)(x i x) n 1 4 Cov(Y, X ) only tells us the direction of the linear relationship between X and Y. It does not indicate the strength of such a relationship. Its magnitude is affected by changes in the units of measurement.
Introduction of simple linear regression Matrix of correlation coefficients Correlation Table I
Introduction of simple linear regression Matrix of correlation coefficients Correlation Table II Figure: Covariance and Coefficients of Correlation
Introduction of simple linear regression Matrix of correlation coefficients Correlation coefficients I Cor(X, Y ) = 1 n 1 n i=1 ( y i ȳ s y )( x i x s x ) = Cov(Y, X ) s y s x n i=1 = (y i ȳ)(x i x) n i=1 (y i ȳ) 2 (x i x) 2
Introduction of simple linear regression Matrix of correlation coefficients Properties 1 1 Cor(X, Y ) 1 2 What does it mean if Cor(X, Y ) = 0? 3 scatter plot!! 4 correlation coefficient can be substantially influenced by one or few outliers in the data. ref P24 Anscombe s Quartet
Introduction of simple linear regression Nonlinear relationship Nonelinear relationship I Figure: Scatter Plot to identify Linear Relationship
Scatter plot 1 Introduction of simple linear regression 2 Scatter plot Computer repair data 3 Simple linear regression model 4 Test of Hypothesis
Scatter plot Computer repair data Example 2.3 P 26 I Computer repair data 1 The relationship between the length of service calls and the number of electronic components in the computer that must be repaired or replaced. 2 scatter plot 3 Cor(Y, X ) = 0.996
Simple linear regression model 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model Parameter estimation 4 Test of Hypothesis
Simple linear regression model The simple linear regression model I 1 The relationship between a response variable Y and a predictor variable X is postulated as a linear model. Y = β 0 + β 1 X + ɛ Each observation in the data set can be written as y i = β 0 + β 1 x i + ɛ i, i = 1, 2,, n.
Simple linear regression model The simple linear regression model II 2 The relationship between the length of service calls and the number of electronic components in the computer that must be repaired or replaced. Minuts = β 0 + β 1 Units + ɛ
Simple linear regression model Parameter estimation Parameter estimation I 1 Least square method gives the line that minimizes the sum of squares of the vertical distances from each point to the line. The vertical distance is the errors in the response variable, which is as follows: ɛ i = y i (β 0 + β 1 x i ), i = 0, 1, 2,, n.
Simple linear regression model Parameter estimation Parameter estimation II 2 The sum of squares of vertical distances S(β 0, β 1 ) = n ɛ 2 i = i=1 n (y i (β 0 + β 1 x i )) 2 i=1
Simple linear regression model Parameter estimation Formula I 1 ˆβ 1 = n i=1 (y i ȳ)(x i x) n i=1 (x i x) 2 = Cov(Y, X ) Var(X ) = Cor(Y, X ) s y s x ˆβ 0 = ȳ ˆβ 1 x
Simple linear regression model Parameter estimation Formula II 2 fitted value ŷ i = ˆβ 0 + ˆβ 1 x i, i = 1, 2,, n.
Test of Hypothesis 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis
Test of Hypothesis Test of Hypothesis I The linear relationship between Y and X can be checked 1 Scatter plot 2 Correlation coefficients 3 Formal way of measuring the usefulness of X as a predictor of Y is to test the hypothesis β 1 = 0
Test of Hypothesis Assumptions to perform the test of the hypotheses I 1 For every fixed X, the ɛ s are assumed to be independent random quantities normally distributed with mean zero and a common variance σ 2 2 With the above assumption, ˆβ 0 and ˆβ 1 are the unbiased estimator of β 0 and β 1
Test of Hypothesis Assumptions to perform the test of the hypotheses II 3 Their variances are (see P33-37): And Var( ˆβ 0 ) = σ 2 [ 1 n + x 2 (xi x) 2 ] Var( ˆβ 1 ) = σ 2 (xi x) 2 (Var-0) (Var-1) The sampling distributions of the least squares estimates ˆβ 0 and ˆβ 1 are normal with means β 0 and β 1 and variances as given in Var-1 and Var-0, respectively.
Test of Hypothesis Assumptions to perform the test of the hypotheses III 4 An unbiased estimator of σ 2, the variance of ɛ s is: ˆσ 2 = e 2 i n 2 = SSE n 2
Test of Hypothesis Standard deviation-standard error(s.e.) I 1 An estimate of the standard deviation of an estimator is called the standard error of the estimator. 2 The s.e. of β 0 and β 1 P33
Test of Hypothesis Testing H 0 : β 1 = β 0 1 1 Test statistics t 1 = ˆβ 1 β 0 1 s.e.( ˆβ 1 ) 2 P35 H 0 is to be rejected at the significance level of α if t 1 t (n 2,α/2) OR p(t t 1 ) α
Test of Hypothesis Testing H 0 : β 0 = β 0 0 I 1 Test statistics t 0 = ˆβ 0 β 0 0 s.e.( ˆβ 0 ) 2 P35 t 0 t (n 2,α/2) OR p(t t 0 ) α
Test of Hypothesis Testing H 0 : ρ = 0 I 1 Test statistics t 1 = Cor(Y, X ) n 2 1 [Cor(Y, X )] 2
Test of Hypothesis Confidence intervals of β 0 and β 1 I 1 ˆβ 0 ± t (n 2,α/2) s.e.( ˆβ 0 ) 2 ˆβ 1 ± t (n 2,α/2) s.e.( ˆβ 1 )
Prediction 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis
Prediction Prediction I 1 P38 The prediction of the value of the response variable Y which corresponds to any chosen value, x 0, of the predictor variable, the predicted value ŷ 0 is ŷ 0 = ˆβ 0 + ˆβ 1 x 0. The standard error of this prediction is s.e.(ŷ 0 ) = ˆσ 1 + 1 n + (x 0 x) 2 (xi x) 2
Prediction Prediction II The confidence limits for the predicted value with the level of confidence (1 α) are given by ŷ 0 ± t (n 2,α/2) s.e.(ŷ 0 )
Prediction Prediction III 2 The estimation of the mean response µ 0, when X = x 0. ˆµ 0 = ˆβ 0 + ˆβ 1 x 0
Prediction Prediction IV 3 Care should be taken in employing fitted regression lines for prediction far outside the range of observations.
Prediction Measuring the quality of fit I 1 Test of hypotheses 2 scatter plot of Y versus X 3 scatter plot of Y versus Ŷ 4 Coefficient of determination R 2 SST = SSR + SSE R 2 = SSR SST = 1 SSE SST = [Cor(Y, X )]2 = [Cor(Y, Ŷ )] 2
Prediction Regression through the origin 1 The so-called no-intercept model Y = β 1 X + ɛ (no-intercept) 2 The R 2 value obtained from the the (no-intercept) is not strictly comparable with the R 2 value obtained from the (intercept model) model. Compared with the (intercept model) model, the R 2 from the (no-intercept) model might be possible negative. Y = β 0 + β 1 X + ɛ (intercept)
Prediction H.W. I 1 P45 2.3 2.4 2.6 2.7 2.9 2 2.10 2.12