y y Animal Studies of Side Effects Simple Linear Regression Basic Ideas In simple linear regression there is an approximately linear relation between two variables say y = pressure in the pancreas x = dose x 0 5 10 15 20 25 y 14.6 24.5 21.8 34.5 35.1 43.0 y = α + βx + ɛ where x y are observed; α β are unknown; ɛ is a rom error with mean 0. 15 20 25 30 35 40 0 5 10 15 20 25 Figure 1: A Scatterplot x Note: x is a design variable set by the experimenter. Slide 1 Slide 2 From the Coleman Report y = Ave 6 th Grade Verbal Score x = Mother s Education (yrs) x 12.38 10.34 14.08 14.20 12.3 11.46 y 37.01 26.51 36.51 40.70 37.1 33.40 Drawing the Line Least Squares Estimators The Problem: Given (x 1 y 1 ) (x n y n ) find a b to minimize SS(a b) = (y i a bx i ) 2. The Solution: 26 28 30 32 34 36 38 40 11 12 13 14 x Figure 2: A Scatterplot Note: x is a covariate measured with y. b = s xy a = ȳ b x where x ȳ are the sample means of x 1 x n y 1 y n s xy = (x i x)(y i ȳ) n (x i x) 2 Slide 3 Slide 4
Recall: Differentiate: Solve: SS(a b) = The Details (y i a bx i ) 2. n a SS(a b) = 2 (y i a bx i ) n b SS(a b) = 2 (y i a bx i )x i. SS(a b) = 0 a SS(a b) = 0. b SS(a b) = 2n(ȳ a b x). a a = ȳ b x. n b SS(a b) = 2 [y i ȳ b(x i x)x i Now = 2 { n } (y i ȳ)x i b (x i x)x i (x i x) = 0 = (y i ȳ). (y i ȳ)x i = (y i ȳ)(x i x) = s xy Slide 5 Slide 6 say (x i x)x i = (x i x) 2 = Coleman a =.1312 b = 2.8149 say. b SS(a b) = 2{ s xy b } b = s xy a = ȳ b x. Verbal Score 26 28 30 32 34 36 38 40 11 12 13 14 Mother s Ed The Least Squares Line: y = a + bx. Figure 3: A Scatterplot Least Squares Line Slide 7 Slide 8
y Calculating a b By Machine: Using Excel-for example. By H: Recall = (x i x) 2 = x 2 i n x 2. Similarly s xy = (x i x)(y i ȳ) = x i y i n xȳ. Dose Response x y xy x 2 0 14.6 0 0 5 24.5 122.5 25 10 21.8 218 100 15 34.5 517.5 225 20 35.1 702 400 25 43.0 1075 625 Sums 75 173.5 2635 1375 a b can be calculated from the sums of x i y i x i y i x 2 i. Slide 9 Slide 10 The Calculations x = 75 6 = 12.5 ȳ = 173.5 6 = 28.92 s xy = 2635 6 12.5 28.92 = 466.25 = 1375 6 (12.5) 2 = 437.5 b = 466.25 437.5 = 1.066 a = 28.92 1.066 12.5 = 15.595. 15 20 25 30 35 40 A Least Squares Line Pressure vs. Dose a = 15.595 b = 1.066 0 5 10 15 20 25 Figure 4: A Scatterplot Least Squares Line x Slide 11 Slide 12
Some Terminology Least Squares Estimators: b = s xy a = ȳ b x. Fitted Values AKA Predicted Values: Residuals: ŷ i = a + bx i = ȳ + b(x i x). e i = y i ŷ i = y i ȳ b(x i x) Regression Sum of Squares: SSR = (ŷ i ȳ) 2 = b 2 Error Sum of Squares AKA Residual Sum of Squares: SSE = e 2 i. s yy = (y i ȳ) 2. s yy = SSR + SSE. R 2 = SSR s yy. 100R 2 = % ExplainedVariation Note: SSE = s yy SSR = s yy b 2. Slide 13 Slide 14 Derivation of s yy = SSR + SSE s yy = (y i ŷ i + ŷ i ȳ) 2 = e 2 i + 2 e i (ŷ i ȳ) + (ŷ i ȳ) 2 1st = SSE 3rd = SSR 2nd = 2 [(y i ȳ) b(x i x)b(x i x) = 2[bs xy b 2 = 0 Model: Now suppose where Inference y i = α + βx i + ɛ i ɛ 1 ɛ n ind Normal[0 σ 2. Notes -a) < α β < σ 2 > 0 are unknown. b) If x 1 x n are covariates then the conditions must hold conditionally given x 1 x n. c) y i Normal[α + βx i σ 2 are (conditionally) independent. Slide 15 Slide 16
The Likelihood Function The Likelihood Function is n 1 exp[ 1 2πσ 2 2σ (y 2 i α βx i ) 2 = [ 1 n 1 exp[ 2πσ 2 2σ 2 (y i α βx i ) 2 the log-likelihood function is l(α β σ 2 x y) = 1 2σ 2 (y i α βx i ) 2 1 2 n[log(σ2 ) + log(2π). Maximum Likelihood Estimators: ˆα ˆβ must minimize the sum of squares. MLE = LSE. That is ˆβ = b = s xy ˆα = a = ȳ b x. The Profile Likelihood Function: l(ˆα ˆβ σ 2 x y) = 1 2σ 2 SSE The MLE of σ 2 : 1 2 n[log(σ2 ) + log(2π). σ 2 l = 1 2σ 4 SSE 1 2σ 2 n. ˆσ 2 = SSE n. MSE = SSE n 2. Slide 17 Slide 18 Means Variances Of The Estimators Unbiasedness: ˆα ˆβ are unbiased; that is Variances: E( ˆβ) = β E(ˆα) = α. σ 2ˆβ = σ2 σ 2ˆα = [ 1 n + x2 σ 2 Derivation For ˆβ: First s xy = = (x i x)(y i ȳ) (x i x)y i E( ˆβ) = 1 E(S xy ) = 1 (x i x)e(y i ) = 1 = 1 (x i x)(α + βx i ) (x i x)β(x i x) = 1 β = β. since n (x i x) = 0. Slide 19 Slide 20
Similarly σ 2ˆβ = 1 s 2 Var [ n (x i x)y i xx = 1 s 2 xx (x i x) 2 σ 2 = 1 β s 2 xx = σ2. Notes: Unbiasedness requires (only) E(ɛ i ) = 0. Variance requires also E(ɛ 2 i ) = σ2. Similarly for ˆα. Sampling Distributions ˆα Normal[α σ 2ˆα ˆβ Normal[β σ 2ˆβ SSE σ 2 χ 2 n 2; (ˆα ˆβ) is independent if SSE Corollary: MSE is unbiased; that is E(MSE ) = σ 2. Note: The proof is similar to the independence of X S 2 in the one-sample normal problem. Note: Unbiasedness of MSE requires only E(ɛ i ) = 0 E(ɛ 2 i ) = σ2. Slide 21 Slide 22 Studentization: Confidence Intervals ˆσ 2ˆβ = MSE T = ˆβ β ˆσ ˆβ T t n 2 if c is the 97.5 th percentile of t n 2 for example then P [ c T c =.95. iff c T = ˆβ β ˆσ ˆβ c ˆβ cˆσ ˆβ β ˆβ + cˆσ ˆβ. Confidence Interval For β: ˆβ ± cˆσ ˆβ is a 95% confidence interval for β. Confidence Interval for α: Similarly ˆα ± cˆσ ˆα is a 95% confidence interval for α. 0.1 O i Slide 23 Slide 24
United Data Services n = 14 x = Units Serviced y = Time. c = 2.180 ˆβ = 15.509 = 114 MSE = 29.074 Time 50 100 150 ˆσ 2ˆβ = 29.074 114 = (.505) 2 ˆβ ± cˆσ ˆβ = 15.509 ± 2.18.505 2 4 6 8 10 Units Figure 5: A Scatterplot = 15.51 ± 1.10 Slide 25 Slide 26 Testing H 0 : β = 0 From the Confidence Interval: Accept if Equivalently reject if ˆβ cˆσ ˆβ 0 ˆβ + cˆσ ˆβ. T 0 = ˆβ > c. ˆσ ˆβ : Dose Response. ˆβ ± cˆσ ˆβ = 1.066 ±.449 therefore H 0 is rejected. Note: This is the GLRT (as in the one sample problem). Review Simple Linear Regression: Y = α + βx + ɛ Least squares estimators a b. Properties of the estimators Sampling distributions Confidence intervals Testing Today Estimating expected response Predicting a future value Slide 27 Slide 28
Estimating Expected Response µ(x) = α + βx = E(Y x). Fix an x 0 let µ 0 = µ(x 0 ) = α + βx 0 µ(x) = µ 0 + β(x x 0 ). µ 0 = α when x = x x 0. ˆµ 0 = ˆα + ˆβx 0. E(ˆµ 0 ) = µ 0 σ 2ˆµ 0 = [ 1 n + (x 0 x) 2 σ 2. ˆσ 2ˆµ 0 = [ 1 n + (x 0 x) 2 MSE. ˆµ 0 ± cˆσˆµ0 is a level 95% confidence interval for µ 0. Slide 29 Slide 30 Predicting a Future Value Now let Y 0 Normal[µ 0 ˆσ 2 Ŷ 0 = ˆµ 0. := Y 0 Ŷ 0 = Y 0 µ 0 (ˆµ 0 µ 0 ). E( ) = 0 σ 2 = σ 2 + σ 2ˆµ 0 = [ 1 + 1 n + (x 0 x) 2 σ 2 iff The interval ˆσ t n 2 P [ c ˆσ c =.95. c ˆσ c Ŷ 0 cˆσ Y 0 Ŷ0 + cˆσ. P [Ŷ 0 cˆσ Y 0 Ŷ 0 + cˆσ =.95. Y 0 Ŷ 0 Normal[0 σ 2. ˆσ 2 = [ 1 + 1 n + (x 0 x) 2 MSE. Ŷ 0 ± cˆσ is called a 95% prediction interval for Y 0. Slide 31 Slide 32
Take United Data Service x 0 = 4 µ 0 = α + 4β. ˆµ 0 = 4.162 + 4 15.509 = 66.198. For this The Prediction Interval Ŷ 0 = ˆµ 0 = 66.198 ˆσ 2 = [ 1 + 1 (4 6)2 + 29.074 14 114 = (5.672) 2 Next So ˆσ 2ˆµ 0 = [ 1 14 + (4 6)2 114 x = 6 29.074 = (1.76) 2. ˆµ 0 ± cˆσˆµ0 = 66.198 ± 2.18 1.76 = 66.20 ± 3.84. Ŷ 0 ± cˆσ = 66.198 ± 2.18 5.672 = 66.20 ± 12.36 Note: Average response versus individual response. Slide 33 Slide 34