Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010

Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table

Facts about slope In previous lectures we have seen that the regression coefficient β 1 is a parameter that can be estimated using a sample In previous Chapters we have seen that using a sample we can make statistical inference about a parameter. That means we can use the regression line to make inference about regression slope and this is what we will see in this lecture.

Slope facts Facts about slope E( ˆβ 1 ) = β 1 Var( ˆβ 1 ) = σ2 S xx What can we say about the distribution of ˆβ 1 when n is large? So using this fact we can use a test statistic to make inference about the slope of the regression line. What test statistic can we use? What is a problem with the test statistic above?

Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Test statistic So the test statistic will be the following: T = ˆβ 1 β 1 = S Sxx ˆβ 1 β 1 S ˆβ1 Can you find the distribution of the above?

Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Constructing a confidence interval Starting from the fact that: ( P t n 2,α/2 < ˆβ 1 β 1 S ˆβ1 < t n 2,α/2 ) = 1 α We get the following (1 α)100% Confidence interval for β 1 : ˆβ 1 ± t n 2,α/2 S ˆβ 1

Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Hypothesis test Null Hypothesis: H 0 : β 1 = β 10 Test statistic: t = ˆβ 1 β 10 Rejection Regions: s ˆβ1 t n 2 t t n 2,α if H A : β 1 > β 10 t tn 2,α if H A : β 1 < β 10 t tn 2,α/2 and t t n 2,α/2 if H A : β 1 β 10

Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Hypothesis test using ANOVA In Chapter 6 we have seen that if you take a random variable U t v then U 2 F 1,v. Last lecture, I showed you how one can use the SSR and SSE to construct an ANOVA Table. The F test statistic that we get in that Table (see also next slide) is the square of a special case of the T-test we get from the test statistic in the previous slide. So the ANOVA table is another way to make a test, but only in the case that β 10 = 0, that is your null hypothesis is H 0 : β 1 = 0. The case when β 10 = 0 is considered the most useful test and is also called the model utility test. Why do you think that case is of extreme importance?

Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table ANOVA Table Table: ANOVA TABLE Source of Sum of Mean variation df Squares Squares F Regression 1 SSR SSR SSR/s 2 Error n 2 SSE s 2 = SSE/n 2 Total n 1 SSTo

Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table I want to find the regression line that relates the scores on the two Midterms in Stat 319. I randomly select five students and the score they had in Midterm 1 are 50, 70, 75, 80, 95 and in the same order the scores they had in Midterm 2 is 40, 65, 95, 90, 100. Find a 95% Confidence Interval for the regression slope. Make a test using a t-test to see if there is a relationship between the scores of the two midterms at significance level 0.02 Make a test using an F-test to see if there is a relationship between the two scores at significance level 0.02.

Correlation between two random variables In Stat 318, we defined the correlation coefficient ρ as a measure of how strong two random variables X and Y are related. The formula was: ρ = ρ(x, Y ) = Cov(X, Y ) Var(X )Var(Y ) ρ takes values between -1 and 1. The closer the value is to 1 the stronger positive relationship we have. The closer the value is to -1 the stronger negative relationship we have. The closer it is to 0 the weaker the relationship is.

Estimating correlation from a sample Let s assume we want to see the correlation of the height and weight of male students at PSU. That means we need to go ask all 25000 male students their height and weight find the covariance of the two random variables, the variances and calculate the correlation. It is much more easier, if we take a sample and estimate the correlation. That means that ρ as we learned it in Chapter 5 is a population parameter. If we want to estimate it from a sample, the formula that is being used is: ˆρ = r = S xy Sxx S yy This estimator, r, is actually equal to the square root of the Coefficient of Determination we have seen last lecture.

Hypothesis testing The following test is only true for testing the null H 0 : ρ = 0 Test statistic: t = r n 2 1 r 2 t n 2 Rejection Regions: t t n 2,α if H A : ρ > 0 t t n 2,α if H A : ρ < 0 t tn 2,α/2 and t t n 2,α/2 if H A : ρ 0

I want to find the regression line that relates the scores on the two Midterms in Stat 319. I randomly select five students and the score they had in Midterm 1 are 50, 70, 75, 80, 95 and in the same order the scores they had in Midterm 2 is 40, 65, 95, 90, 100. Perform a hypothesis testing procedure to test if there is significance evidence of positive relationship between the two scores at significance level 0.05

Extending the test to more cases Last test we have seen about ρ can be used only for the null H 0 : ρ = 0. What happens if we want to test for the null H 0 : ρ = ρ 0 when ρ 0 0? We will use Fisher transformation and random variable: V = 1 ( ) 1 + R 2 log 1 R

Distribution Random variable V as was defined in previous slide is approximately following normal distribution as follows: ( V N µ V = 1 ( ) 1 + ρ 2 log, σv 2 1 ρ = 1 ) n 3

Hypothesis testing Null hypothesis: H 0 : ρ = ρ 0 Test statistic: 1 2 log z = ( 1 + r 1 r Rejection Regions: z zα if H A : ρ > ρ 0 z z α if H A : ρ < ρ 0 ) 1 ( 1 + 2 log ρ0 1 ρ 0 ) N(0, 1) 1 n 3 z z α/2 and z z α/2 if H A : ρ ρ 0

Confidence interval for µ V Based on previous results it is easy to create a confidence interval for µ V. A (1 α)100% Confidence Interval for µ V is given by: V ± z α/2 n 3

Confidence interval for ρ Our objective is not to create a Confidence Interval for µ V. Our objective is to create a Confidence interval about ρ. A (1 α)100% Confidence Interval for ρ is given by: ( e 2c 1 ) 1 e 2c, e2c2 1 1 + 1 e 2c 2 + 1 c 1 is the lower endpoint for the interval for µ V c 2 is the upper endpoint for the interval for µ V

I want to find the regression line that relates the scores on the two Midterms in Stat 319. I randomly select five students and the score they had in Midterm 1 are 50, 70, 75, 80, 95 and in the same order the scores they had in Midterm 2 is 40, 65, 95, 90, 100. Make a hypothesis test at significance level 0.05 to see if there is significant evidence that the correlation coefficient is different than 0.5. Find a 99% confidence interval for ρ.

Section 12.3 page 609 31, 32, 33, 34, 35, 36, 37, 38, 41 Section 12.5 page 623 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67