PubH 7405: REGRESSION ANALYSIS. SLR: INFERENCES, Part II

PubH 7405: REGRESSION ANALSIS SLR: INFERENCES, Part II

We cover te topc of nference n two sessons; te frst sesson focused on nferences concernng te slope and te ntercept; ts s a contnuaton on estmatng te mean response and more. Applcatons concernng te slope and te ntercept are based on te followng four 4 teorems

SAMPLING DISTRIBUTION OF SLOPE Teorem A: Under te "Normal Error Regresson Model" : β β Eb b 0 ε ε N0, Te samplng dstrbut on of te estmated slope b s Normal wt Meanand Varance : β

IMPLICATION b β b β s b s b b b dstrbuted as N0, n χ df n Teorem B : b β s dstrbute d as "t" wt n s b degrees of freedom

CONFIDENCE INTERVALS Teorem B : b β s dstrbute d as "t" wt n s b degrees of freedom α00% Confdence Interval for β s : b ± t α / ; n sb t α/; n s te α/00 percentle of te "t" dstrbut on wt n - degrees of freedom

SAMPLING DISTRIBUTION OF INTERCEPT Teorem A: Under te Eb β β ε 0 b 0 0 β 0 ε N0, Te b 0 s samplng Normal "Normal n dstrbut wt Error Regresson Meanand Varance on of te Model" : estmated ntercept :

IMPLICATION b0 β0 b0 β0 s b0 s b0 b0 b0 dstrbuted as N0, n χ df n Teorem B : b0 β0 s dstrbute d as "t" wt n s b 0 degrees of freedom

CONFIDENCE INTERVALS Teorem B : b β0 s b 0 s dstrbute d as "t" wt n 0 degrees of freedom α00% Confdence Interval for β 0 s : b 0 ± t α / ; n sb 0 t α/; n s te α/00 percentle of te "t" dstrbut on wt n - degrees of freedom

Te Mean Response : E X β β 0 A common objectve n regresson analyss s to estmate te mean response. For eample: we are nterested to know te average blood pressure for women at certan age and ow estmate t usng te relatonsp between SBP and Age, and n a study of te relatonsp between level of pay salary, X and worker productvty, te mean productvty at g, medum, and low levels of pay may be of partcular nterest for any company.

POINT ESTIMATE Te Mean Response E X β 0 β Let X denote te level of X for wc we ws to estmate te mean response,.e. E X ; ts may be a value wc occurred n te sample, or t may be some oter value of te predctor varable wtn te scope of te model. Te pont estmate of te response s: Pont Estmate E : X b 0 b :

SAMPLING DISTRIBUTION Teorem #3A : Under te "Normal Error Regresson Model" : β β ε 0 ε N0, Te samplng dstrbut on of te estmated Mean Response s Normal wt Meanand Varance : E E X β0 β n

y k n y k y k n b b b b 0 0

Te samplng dstrbuton of Ŷ s normal because ts estmated mean response, lke te ntercept and te slope, Ŷ s a lnear combnaton of te observatons y and te dstrbuton of eac observaton s normal under te normal error regresson model :

Te estmated mean response s unbased because te estmated ntercept and estmated slope are bot unbased: 0 0 0 X E b E b E E b b β β

n k k n n k k n n k n Var y k n

n MSE s n Var Takng square root to get Standard Error

n MSE SE n MSE s Implcaton: Our estmates are less precse toward te ends

MORE ON SAMPLING DISTRIBUTION s E s E n df n χ dstrbuted as N0, freedom of degrees d as "t" wt n dstrbute s s E Teorem #3B :

CONFIDENCE INTERVALS Teorem #3B : E s s dstrbute d as "t" wt n degrees of freedom α00% Confdence Interval s : ± t α / ; n s t α/; n s te α/00 percentle of te "t" dstrbut on wt n - degrees of freedom for

EXAMPLE #: Brt wegt data: oz y % 63 66 07 7 9 5 9 75 80 8 8 0 84 4 8 4 06 7 03 90 94 9 s Intercept 56.97 Slope -.737 MSE 75.98 Mean of X 00.58 SS of X,56.93 For cldren wt brt wegt of 95 ounces, te pont estmate and 95% Confdence Interval for te Mean growt between 70-00 days as % of BW s: 56.97.73795 75.98 9.76 ±.8 7.43 9.757% 95 00.58,56.93 85.69%,97.83% 7.49

EXAMPLE #: Age and SBP Age SBP y 4 30 46 5 4 48 7 00 80 56 74 6 70 5 80 56 85 6 7 58 64 55 8 60 4 5 6 50 75 65 s Intercept 99.958 Slope.705 MSE 78.554 Mean of X 65.6 SS of X 3403.6 For 60 years old women, te pont estmate and 95% Confdence Interval for te Mean SBP s: 99.958.70560 4.6 78.554 5 4.3 ±.60.37 60 65.6 3403.6 3.4,5..37

LotSze WorkHours 80 399 30 50 90 376 70 36 60 4 0 546 80 35 00 353 50 57 40 60 70 5 90 389 0 3 0 435 00 40 30 50 68 90 377 0 4 30 73 90 468 40 44 80 34 70 33 EXAMPLE #3: Toluca Company Data Intercept 6.366 Slope 3.570 MSE,384 Mean of X 70.0 SS of X 9,800 For te lots sze of 65 unts, te pont estmate and 90% Confdence Interval for te Mean Work Hours s: s 6.37 3.5765,384 94.4 ±.74 5 98.47 94.4 65 70.0 9,800 77.4,3.4 98.47

In regresson analyss, besdes estmatng te mean response, sometmes one may want to estmate a new ndvdual response. For eample: In addton to estmatng te average blood pressure for women at certan age usng te relatonsp between SBP and Age, we may be nterested n estmatng te SBP of a partcular woman/patent at tat age; and In a study of te relatonsp between pay salary, X and worker productvty, te nterest may focus on te productvty of certan partcular worker.

POINT ESTIMATE Let X denote te level of X under nvestgaton, at wc te mean response s E X. Let new be te value of te new ndvdual response of nterest. Ts new observaton of to be predcted s often vewed as te result of a new tral ndependent of te trals on wc te regresson lne s formed. Te pont estmate s stll te same as tat of te mean response: E X b 0 β β b new 0 Same as te mean

VARIANCE Te pont estmates of te mean response and of an ndvdual response are te same but te varances are dfferent. In estmatng an ndvdual response, tere are two layers of varaton: a varaton n te poston of te dstrbuton tat s of te mean response, and b te varaton wtn tat dstrbuton tat s from te ndvdual response to te mean response

normal. s on of dstrbut samplng te Model, Error Regresson "Normal Under te : new Teorem #4A n n Var Var Var new new

n MSE s n Var new new Takng square root to get Standard Error

MORE ON SAMPLING DISTRIBUTION Inferences on a new ndvdual response s based on te followng results: Teorem #4B : new s new s dstrbute d as "t" wt n degrees of freedom

new n MSE SE n MSE s new Agan: Our estmates are less precse toward te ends

Normal Error Regresson Model β β ε { e } s 0 MSE ε N0, a sample wt mean zero : Teorem #5 : SSE s dstrbute d as χ EMSE df n- :

THE TEST FOR INDEPENDENCE Te E H 0 t Mean Response : X : β "t" test at n degreesof freedom : b sb 0 β β 0 wc s dentcal to te test usng "r": n t r r Te metod we use most often s ts Test for Independence wc we are now approacng by a dfferent way: ANOVA

COMPONENTS OF VARIATION Te varaton n s conventonally measured n terms of te devatons - 's; te total varaton, denoted by SST, s te sum of squared devatons: SST Σ -. For eample, SST0 wen all observatons are te same; SST s te numerator of te sample varance of, te greater SST te greater te varaton among -values. In te regresson analyss, te varaton n s decomposed nto two components: - - Ŷ Ŷ -

DECOMPOSITION OF SST In te decomposton: - - Ŷ Ŷ - Te frst term RHS reflects te varaton around te regresson lne; te part tan cannot be eplaned by te regresson tself wt te sum of squared errors SSE Σ - Ŷ. Te dfference between te above two sums of squares, SSR SST - SSE ΣŶ -, s called te regresson sum of squares; SSR may be consdered as a measure of te varaton n assocated wt or eplaned by te regresson lne.

Regresson elps to mprove te estmate of from wtout any nformaton to Ŷ wt nformaton provded by knowng X

SSR SSE SST e e SSR SSE SST ] [ 0

ANALSIS OF VARIANCE SST measures te total varaton n te sample of values of te dependent varable wt n- degrees of freedom, n s te sample sze. It s decomposed nto: SSTSSESSR SSE measures te varaton cannot be eplaned by te regresson wt n- degrees of freedom, and SSR measures te varaton n assocated wt or eplaned by te regresson lne wt degree of freedom representng te slope.

0 ] [ ] [ b SSR E MSR E b y b b y y b b SSR β β VarX EX {EX} EX VarX {EX}

ANOVA TABLE Te breakdowns of te total sum of squares and ts assocated degree of freedom are dsplayed n te form of an analyss of varance table ANOVA table for regresson analyss as follows: Source of Varaton SS df MS F Statstc p-value Regresson SSR MSR MSR/MSE Error SSE n- MSE Total SST n- Recall: MSE, te error mean square, serves as an estmate of te constant varance as stpulated by te regresson model.

E MSE E MSR β Under te Null Hypotess H 0 : β 0, EMSE EMSR so tat FMSR/MSE s epected to be near.0 Teorem 6: F s dstrbuted, under H 0, as F,n- followng a teorem by Cocran.

THE F-TEST Te test statstc F for te above analyss of varance approac compares MSR and MSE, a value near supports te null ypotess of ndependence. In fact, we ave: F t, were t s te test statstc for testng weter or not β 0; te F-test s equvalent to te two-sded t- test wen refereed to te F-table n Append B Table B.4 wt,n- degrees of freedom.

THE TEST FOR INDEPENDENCE Te H 0 : β 0 Two dentcal t n t r r "F" test at,n degreesof F Null "t" test at n b sb wc s MSR MSE Hypotess dentcal coces degreesof to : : te test usng freedom : "r": freedom :

COEFFICIENT OF DETERMINATION We can epress te coeffcent of determnaton te square of te coeffcent of correlaton r as: r SSR SST Tat s te porton of total varaton attrbutable to regresson; Regresson elps to mprove te estmate of from wtout any nformaton to Ŷ wt nformaton provded by knowng X reducng te total varaton by 00r %

EXAMPLE #: Brt Wegt Data oz y % 63 66 07 7 9 5 9 75 80 8 8 0 84 4 8 4 06 7 03 90 94 9 SUMMAR OUTPUT Regresson Statstcs R Square 0.89546 Observatons ANOVA df SS MS F Sgnfcance F Regresson 6508 6508 85.66 3.6E-06 Resdual 0 759.8 75.98 Total 768

EXAMPLE #: AGE & SBP Age SBP y 4 30 46 5 4 48 7 00 80 56 74 6 70 5 80 56 85 6 7 58 64 55 8 60 4 5 6 50 75 65 SUMMAR OUTPUT Regresson Statstcs R Square 0.383 Observatons 5 ANOVA df SS MS F Sgnfcance F Regresson 69 69 6.07 0.08453563 Resdual 3 36 78.6 Total 4 53

EXAMPLE #3: Toluca Company Data LotSze WorkHours 80 399 30 50 90 376 70 36 60 4 0 546 80 35 00 353 50 57 40 60 70 5 90 389 0 3 0 435 00 40 30 50 68 90 377 0 4 30 73 90 468 40 44 80 34 70 33 SUMMAR OUTPUT Regresson Statstcs R Square 0.383 Observatons 5 ANOVA df SS MS F Sgnfcance F Regresson 69 69 6.07 0.08453563 Resdual 3 36 78.6 Total 4 53

Normal Error Regesson ε β 0 β N0, ε Model : Te normal regresson model assumes tat te X values are known constants. We do not mpose any knd of dstrbuton for te -values

In many cases, ts s not true; for eample, f we study te relatonsp between egt of a person and wegt of a person, a sample of persons are taken but bot measurements are random. Rater tan a regresson model, one sould consder a correlaton model ; te most wdely used s te Bvarate Normal Dstrbuton wt densty:

] [, ep, y y y y y y y y y X E X Cov X X X f µ µ ρ µ µ µ ρ µ ρ ρ π y s te Covarance and ρ s te Coeffcent of Correlaton between te two random varables X and ; ρ s estmated by te sample Coeffcent of Correlaton r. CORRELATION MODEL Correlaton Data are often cross-sectonal or observatonal. Instead of a regresson model, one sould consder a correlaton model ; te most wdely used s te Bvarate Normal Dstrbuton wt densty:

Te Coeffcent of Correlaton ρ between te two random varables X and s estmated by te sample Coeffcent of Correlaton r but te samplng dstrbuton of r s far from beng normal. Confdence ntervals of s by frst makng te Fser s z transformaton ; te dstrbuton of z s normal f te sample sze s not too small

CONDITIONAL DISTRIBUTION : y and standard devaton wt mean normal s for any gven X dstrbuton of condtonal Te 0 0 ep, y y y y y y y y y y X X X f ρ ρ β ρ µ µ β β β µ µ µ ρ µ ρ ρ π Teorem :

Agan, snce Var X- ρ Var, ρ s bot a measure of lnear assocaton and a measure of varance reducton n assocated wt knowledge of X tat s wy we called r, an estmate of ρ, te coeffcent of determnaton.

Readngs & Eercses Readngs: A toroug readng of te tet s sectons.4-.5 pp. 5-6,.7 pp. 63-7, and. pp. 78-8 s gly recommended. Eercses: Te followng eercses are good for practce, all from capter of tet:.3,.3,.4,.8, and.9.

Due As Homework #9. Refer to dataset Cgarettes, Cotnne & XCPD: a Obtan te 95% confdence nterval for te mean Cotnne level for subjects wo consumed X 30 cgarettes per day and gve your nterpretaton. b Obtan te 95% confdence nterval for Cotnne level of a subject wo consumed 30 cgarettes per day; wy s te result s dfferent from a? c Plot te resdual aganst X; Wat would be your concluson about ter possble lnear relatonsp? Wat would be te average resdual? d Set up te ANOVA table and test weter or not a lnear assocaton est between Cotnne and CPD. #9. Answer te 4 questons of Eercse 9. usng dataset Vtal Capacty wt X Age and 00Vtal Capacty; use X 35 years for questons a and b.