Chapters 5 and 13: REGRESSION AND CORRELATION. Univariate data: x, Bivariate data (x,y).

Chapters 5 ad 13: REGREION AND CORRELATION (ectios 5.5 ad 13.5 are omitted) Uivariate data: x, Bivariate data (x,y). Example: x: umber of years studets studied paish y: score o a proficiecy test For each studet oe (x,y) observatio: x (years) 3 4 4 5 3 4 5 3 y (score) 57 78 7 58 89 63 73 84 75 48 We are iterested i ivestigatig the relatioship betwee the x ad y variables. x: idepedet variable predictor explaatory variable y: depedet variable respose variable catterplot

100 90 80 70 60 score 50 40 30 0 10 0 0 1 3 4 5 6 The best fittig lie: years y a + bx (meaig of a,b) Miimize squared deviatios: [ y ( a + ( x, y) Calculatio: b bx)] ( x lope: ( x x)( y x) y)

a y-itercept y bx Predicted values: Notatio: xx ( x y ˆ a + bx ( x) x) x yy ( y ( y) y) y xy ( x x)( y y) xy ( x)( y) o we have b xy xx 3

Calculatios: x y x y xy 3 57 4 78 4 7 58 5 89 3 63 4 73 5 84 3 75 48 35 697 xx yy xy b a Best fittig lie: 4

Pearso s ample Correlatio r: z x z y r 1 xx xy yy Properties of r: 1. does ot deped o the uit of measuremet. symmetric i x,y variables 3. 1 r 1 4. describes the stregth of relatioship 0 r.5 weak.5 < r.8 moderate.8 < r 1 strog 5. r +1: all poits o a lie with positive slope r -1 : all poits o a lie with egative slope r 0 : the relatioship is ot liear or o relatioship Coefficiet of determiatio: r, gives the proportio of variatio i y that ca be attributed to liear relatioship betwee x ad y. (pearma s rak correlatios coefficiet: omitted) Calculatio: Example cotiued: 5

Assessig the Fit of the Lie: Predicted values: Residuals (Errors): y y ˆ a + yˆ, y yˆ 1,..., bx y yˆ 1 Regressio tatistics Multiple R 0.9111135 R quare 0.83017809 Adjusted R 0.808893786 quare tadard Error 5.651379941 Observatios 10 ANOVA df M F igificace F Regressio 1 148.59538 148.59538 39.094991 0.00045 Residual 8 55.5047619 31.9380954 Total 9 1504.1 Coefficiets tadard Error t tat Itercept 31.53333333 6.36041875 4.9577455 X Variable 1 10.9047619 1.744053714 6.5537877 REIDUAL OUTPUT Observatio Predicted Y Residuals 1 64.4761905-7.47619048 75.1538095.847619048 3 75.1538095-3.1538095 4 53.3485714 4.65714857 5 86.0571486.94857143 6 64.4761905-1.47619048 6

7 75.1538095 -.1538095 8 86.0571486 -.05714857 9 64.4761905 10.7538095 10 53.3485714-5.34857143 Residual Plot Residuals 15 10 5 0-5 -10 0 4 6 X variable Defiitios: Total um of quares: To Residual um of quares: Resid To ( y 1 y) + ( y y) +... + ( y y) ( y i y) y i ( y ) i ( y yˆ i yi ay b Resid ( y yˆ) + ( y yˆ) +... + ( y yˆ) xy 1 ) A alterative way of calculatig r : r 1 Re sid To 7

The Model of imple Liear Regressio: y α + βx + e α: y-itercept β: slope e: radom error (deviatio) We assume that 1. the mea of e is zero,. its stadard deviatio is σ, which does ot deped o x, 3. e is a ormal radom variable 4. radom errors at differet x values are idepedet. The stadard deviatio σ is estimated by s e : 8

s e Re sid Example: Poit estimators: Ukow Estimator α a β b σ s e α+βx* a+bx* α+βx*+e a+bx* The four assumptios above make iferece possible: Iferece about parameter β: 1. The mea of the distributio of b is β. The stadard deviatio of the distributio of b is σ σ b xx 3. Estimator b has ormal distributio We estimate σ b by s b ad s e xx 9

t b β s b has t-distributio with - d.f. (1-α)100% Cofidece Iterval for β: Example: b ± t α/ s b Hypothesis Test: 1.Null Hypothesis: H 0 : ββ 0. Alterative Hypothesis: Ha: β >, or <, or β 0 3. Test statistic: T b β 0 s b (d.f. -) 4. P-value: P(T > observed t), or P(T < observed t), or P(T > observed t ), respectively for the three alteratives. Example: 10

Ifereces Based o the Estimated Regressio Lie Recall: (x* is a selected value of the predictor) Ukow Estimator α a β b σ s e α+βx* a+bx* α+βx*+e a+bx* Now we shall review iferece for the last two estimatio. α + βx* is the expected value of y at x x* Its estimator a + bx* 1. is ubiased,. its samplig distributio is ormal, 3. the stadard deviatio of a+bx* is σ a+ bx* σ 1 ( x * x) + xx Whe σ is ot kow it is estimated by 11

s a+ bx* s e 1 ( x * x) + xx ad iferece is based o T a + bx * ( α + βx*) s a + bx* which has - d.f. o a (1-α)100% Cofidece Iterval for α+βx* is: a+bx* ± t α/ s a+bx* α + βx*+ e is the value of y at x x* (1-α)100% Cofidece Iterval for α+βx*+e is: a + bx* ± tα / s e + s a+ bx* Example: 1

Example: Te father-so pairs of mature me were selected at radom ad their heights recorded. Let x refer to the father s height ad y to the so s height (both i iches) Data: 10 Pair 1 3 4 5 6 7 8 9 10 x 68 69 69 67 70 71 70 66 68 65 y 69 70 7 68 7 7 69 67 66 64 x683, y689, x 46,681, y 47,539, xy47,098 13