From last time... The equations

Size: px

Start display at page:

Download "From last time... The equations"

Bertram Houston
6 years ago
Views:

1 This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this site. Copyright 2006, The Johns Hopkins University and Karl W. Broman. All rights reserved. Use of these materials permitted only in accordance with license rights granted. Materials provided AS IS ; no representations or warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for obtaining permissions for use from third parties as needed.

2 From last time corr = 0.78 Father s height (inches) Father s span (inches) The equations Regression of y on x (for predicting y from x) Slope = r SD(y) SD(x) Goes through the point ( x, ȳ) ŷ ȳ = r SD(y) SD(x) (x x) ŷ = ˆβ 0 + ˆβ 1 x where ˆβ 1 = r SD(y) SD(x) and ˆβ 0 = ȳ ˆβ 1 x Regression of x on y (for predicting x from y) Slope = r SD(x) SD(y) ˆx x = r SD(x) SD(y) (y ȳ) Goes through the point (ȳ, x) ˆx = ˆβ 0 + ˆβ 1 y where ˆβ 1 = r SD(x) SD(y) and ˆβ 0 = x ˆβ 1 ȳ

3 Histograms Spans mean = 68.7 SD = span (inches) mean = 67.7 SD = 2.7 Heights height (inches) Error in prediction Having no information about x, Predict y as ȳ Typical prediction error: SD(y) For predicting height, SD(y) 2.73 Having been told about x, Predict y using the regression line: ŷ = ˆβ 0 + ˆβ 1 x Typical prediction error: SD(y) 1 r 2 For predicting height from span, SD(y) 1 r

4 Back to David Sullivans data... pf3d Y = X 0.30 OD H2O2 concentration Model: y i = β 0 + β 1 x i + ǫ i where ǫ i iid Normal(0,σ 2 ) Estimates: ˆβ1 = i (x i x) (y i ȳ)/ i (x i x) 2 ˆβ 0 = ȳ ˆβ 1 x ˆσ = i (y i ŷ i ) 2 /(n 2) Parameter estimates We already know that and in particular (n 2) ˆσ2 σ 2 χ2 n 2 E(ˆσ 2 ) = σ 2 What about ˆβ 0 and ˆβ 1?

5 Parameter estimates (2) One can show that E( ˆβ 0 ) = β 0 E( ˆβ 1 ) = β 1 ( Var( ˆβ 0 ) = σ 2 1 n + x ) 2 Var( ˆβ 1 ) = σ2 Cov( ˆβ 0, ˆβ 1 ) = σ 2 x Cor( ˆβ 0, ˆβ 1 ) = x x 2 + /n Note: We re thinking of the x s as fixed. Parameter estimates (3) One can even show that the distribution of ˆβ 0 and ˆβ 1 is a bivariate normal distribution! ( ) ˆβ0 ˆβ 1 N(β, Σ) where β = ( β0 β 1 ) and Σ = σ 2 1 n + x 2 x x 1

6 OD H 2 O OD H 2 O 2

7 slope y intercept Confidence intervals We know that ˆβ 0 N ( (β 0, σ 2 1 n + x )) 2 ˆβ 1 N ( β 1, σ 2 ) We can use those distributions for hypothesis testing and to construct confidence intervals!

8 Statistical inference We want to test: H 0 : β 1 = β 1 versus H a : β 1 β 1 Generally, β 1 is 0. We use t = ˆβ 1 β 1 se( ˆβ 1 ) t n 2 where se( ˆβ 1 ) = ˆσ 2 Also, [ ˆβ1 t (1 α 2 ),n 2 se( ˆβ 1 ), ˆβ 1 + t (1 α 2 ),n 2 se( ˆβ ] 1 ) is a (1 α) 100% confidence interval for β 1. Results The calculations in the test H 0 : β 0 = β0 versus H a : β 0 β0 are analogous, except that we have to use ( ) se( ˆβ 1 0 ) = ˆσ 2 n + x2 For the pf3d7 data we get the 95% confidence intervals (0.342, 0.364) for the intercept ( , ) for the slope Testing whether the intercept (slope) is equal to zero, we obtain 70.7 ( 22.0) as test statistic. This corresponds to a p-value of ( ).

9 Now how about that Testing for the slope being equal to zero, we use t = ˆβ 1 se( ˆβ 1 ) For the squared test statistic we get t 2 = ( ˆβ1 se( ˆβ 1 ) ) 2 = ˆβ 2 1 ˆσ 2 / = ˆβ 2 1 ˆσ 2 = (SYY RSS)/1 RSS/n 2 = MS reg MSE = F The squared t statistic is the same as the F statistic from the ANOVA! Joint confidence region A 95% joint confidence region for the two parameters is the set of all values (β 0,β 1 ) that fulfill ( ) T ( β0 n i x ) ( ) i β 1 i x β0 i i x2 i β 1 2ˆσ 2 F (0.95),2,n-2 where β 0 = β 0 ˆβ 0 and β 1 = β 1 ˆβ 1.

10 β^1 β^0 Notation Assume we have n observations: (x 1, y 1 ),...,(x n, y n ). We previously defined = i (x i x) 2 = i x 2 i n( x) 2 SYY = i (y i ȳ) 2 = i y 2 i n(ȳ) 2 SXY = i (x i x)(y i ȳ) = i x i y i n xȳ We also define r XY = SXY SYY (called the sample correlation)

11 Coefficient of determination In the previous lecture we wrote Define SS reg = SYY RSS = (SXY)2 R 2 = SS reg SYY = 1 RSS SYY R 2 is often called the coefficient of determination. Notice that R 2 = SS reg SYY = (SXY) 2 SYY = r2 XY Back to the Sullivan data David Sullivan was actually interested in the slopes when one re-scales the y-axis so that the y-intercept is at 1. y = β 0 + β 1 x + ǫ becomes y/β 0 = 1 + (β 1 /β 0 )x + ǫ So we re really interested in β 1 /β 0. We d estimate that by ˆβ 1 / ˆβ 0, but what about its standard error?

12 First-order Taylor expansion Consider f(x, y) = x/y. A first-order Taylor expansion to approximate the function would be f(x, y) f(x 0, y 0 ) + (x x 0 ) f + (y y 0 ) f x y (x0,y 0 ) (x0,y 0 ) Since f/ x = 1/y and f/ y = x/y 2, we obtain the following: x/y x 0 /y 0 + (x x 0 )/y 0 (y y 0 )x 0 /y 2 0 = (x 0 /y 0 )[1 + (x x 0 )/x 0 + (y y 0 )/y 0 ] How do we use this? We use the first-order Taylor expansion of ˆβ 1 / ˆβ 0 around β 1 and β 0. Variance of a ratio Remember that β 1 and β 0 are fixed, while ˆβ 1 and ˆβ 0 are random. Add the fact that var(x+y) = var(x) + var(y) + 2 cov(x,y) var{ ˆβ 1 / ˆβ 0 } var{(β 1 /β 0 )[1 + ( ˆβ 1 β 1 )/β 1 + ( ˆβ 0 β 0 )/β 0 ]} = (β 1 /β 0 ) 2 {var( ˆβ 1 )/β var( ˆβ 0 )/β cov( ˆβ 1, ˆβ 0 )/(β 1 β 0 )} We then replace β 1 and β 0 in this formula with our estimates of them, ˆβ 1 and ˆβ 0. Further, we replace the variances and covariance with our estimates. var{ ˆ ˆβ 1 / ˆβ 0 } = ( ˆβ 1 / ˆβ 0 ) 2 { var( ˆ ˆβ 1 )/ ˆβ var( ˆ ˆβ 0 )/ ˆβ cov( ˆ ˆβ 1, ˆβ 0 )/( ˆβ 1 ˆβ0 )} The estimated SE is then SE{ ˆ ˆβ 1 / ˆβ 0 } = ˆβ 1 / ˆβ 0 [ SE( ˆ ˆβ 1 )/ ˆβ 1 ] 2 + [ SE( ˆ ˆβ 0 )/ ˆβ 0 ] cov( ˆ ˆβ 1, ˆβ 0 )/( ˆβ 1 ˆβ0 )

13 Results pf3d7: ˆβ 0 = 0.353(0.005) ˆβ1 = (0.0002) ˆ cov( ˆβ 1, ˆβ 0 ) = ˆβ 1 / ˆβ = 1.10 (SE = 0.07). estimate SE bhem pgalnoel pgal pyoelii pf3d pviv pknow pov pbr pfhz

Simple Linear Regression. John McGready Johns Hopkins University

Simple Linear Regression. John McGready Johns Hopkins University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this