Applid Statistics II - Catgorical Data Analysis Data analysis using Gnstat - Exrcis 2 Logistic rgrssion Analysis 2. Logistic rgrssion for a 2 x k tabl. Th tabl blow shows th numbr of aphids aliv and dad aftr spraying with four concntrations of solutions of sodium olat. Th qustions ar (a) Ar concntration and mortality indpndnt? (b) Is thr a rlationship btwn concntration and mortality? (c) How wll dos th modl in (b) fit? Concntration of sodium olat (%) 0.65..6 2. Dad 55 62 00 72 Aliv 22 3 2 5 Rad in th data into Gnstat as follows: Units [4] rad conc_lv, conc,dad,aliv 0.65 55 22 2. 62 3 3.6 00 2 4 2. 72 5 : groups [rdfin=ys] conc_lv calc n=dad+aliv calc prop=dad/n For Logistic rgrssion modl of th numbr dad - spcify -that th distribution is th binomial -that th logit link is bing usd -that th total count is dad+aliv=n modl [distribution=binomial;link=logit] dad;nbinomial=n Fitting th Indpndnc modl. To answr (a) first fit th indpndnc modl using th statmnt which just fits a singl constant fit This givs th output man dvianc d.f. dvianc dvianc ratio Rgrssion 0 0.00 * Rsidual 3 6.63 5.544 Total 3 6.63 5.544
Hr th valu 6.63 is th Chisquar (G 2 ) valu in a tst for indpndnc btwn rows and columns (prform this tst using th chisquar procdur usd in xrcis sssion ). It has 3 df and its P valu < 0.00, indicating that th indpndnc modl is not adquat. Only a singl paramtr, th constant, is fittd hr giving th modl for th data as E(n i ) = n i. p, E(n i2 ) = n i. (-p) and p log =α p *** Estimats of paramtrs *** Antilog of Estimat s.. t(*) stimat Constant.75 0.5.39 5.558 This shows that th ML stimat ˆ α =. 75 and th logit formula can b invrtd to giv th ML stimat of p.75 p ˆ = = 0.847.75 + This is th ML stimat of th proportion of dad aphids in th data. Th total numbr of aphids in th trial is 34 with 289 dad, which givs a proportion of 0.847 which is th sam as obtaind abov. Viwd as a simpl binomial with 34 trials this would hav bn th ML stimat of p so it is rassuring that th two agr. Fitting th Saturatd modl. " fit th saturatd modl using th factor conc_lv" fit [fprob=ys] conc_lv ***** Rgrssion Analysis ***** *** Summary of analysis *** man dvianc approx d.f. dvianc dvianc ratio chi pr Rgrssion 3 6.63 5.544 5.54 <.00 Rsidual 0 0.00 * Total 3 6.63 5.544 Hr th valu 6.63 in th last lin is th Chisquar (G 2 ) valu in a tst for indpndnc btwn rows and columns as bfor. This is thn partitiond into a componnt du to th saturatd modl (which fits thr paramtrs (4-) in addition to a constant to rprsnt th four probabilitis on for ach lvl of th factor conc_lv) and th rmaindr unxplaind by th modl. For th saturatd modl all th variation is xplaind (Chisquar 6.63 with 3 dgrs of frdom) as it givs a prfct fit and th rmaindr is zro with zro dgrs of frdom. Th modl fittd is as follows E(n i ) = n i. p i, E(n i2 ) = n i. (-p i )
and p log p = α pi log = + for i> α β i pi This givs 4 paramtrs α, β 2, β 3 and β 4 whr α is th logit of p and β i is th diffrnc btwn th logit of p i and th logit of p. Th following tabl stimats ths paramtrs. *** Estimats of paramtrs *** antilog of stimat s.. t(*) stimat Constant 0.96 0.252 3.63 2.500 conc_lv 2 0.646 0.396.63.908 conc_lv 3.204 0.396 3.04 3.333 conc_lv 4.75 0.527 3.32 5.760 Th final column abov givs th antilog of th stimat. For th thr paramtrs for conc_lv this givs th ML stimat of th OR btwn th appropriat lvl and th first lvl. Thus 3.333 is th stimat of th OR btwn lvl and lvl 3. Chck this with th 2 x 2 tabl formd by taking data just for lvls and 3 in th data. This subtabl is Lvl 3 Lvl.6 0.65 Dad 00 55 Aliv 2 22 whr OR is stimatd as (00*22)/(2*55) = 3.333. Th following command prdicts proportions from th modl for ach of th lvls of conc_lv as ˆ α.96 pˆ = = = 0.74 ˆ α.96 + + ˆ α+ β2.96+ pˆ 2 = = ˆ α+ ˆ β.96+ 2 + + and similarly for lvls 3 and 4. prdict conc_lv ˆ.646.646 = 0.827 Prdiction S.. conc_lv.00 0.743 0.055 2.00 0.8267 0.0437 3.00 0.8929 0.0292 4.00 0.935 0.028 * MESSAGE: S..s ar approximat, sinc modl is not linar. * MESSAGE: S..s ar basd on disprsion paramtr with valu
" Th following command prints th proportions calculatd from th data for comparison with thos prdictd from th modl" print conc_lv,prop conc_lv prop.000 0.743 2.000 0.8267 3.000 0.8929 4.000 0.935 Not that for th saturatd modl that th proportions prdictd from th modl ar idntical with thos calculatd from th data at ach concntration lvl rgardd as a simpl binomial. As th saturatd modl fits a sparat paramtr for ach group this is what w might xpct. Fitting th Linar Logistic rgrssion modl. This provids th answrs to qustions (b) and (c) at th start of this xrcis. " fit a linar logistic rgrssion logit(p) = a + b Conc" fit [fprob=ys] conc man dvianc approx d.f. dvianc dvianc ratio chi pr Rgrssion 6.55822 6.55822 6.56 <.00 Rsidual 2 0.07459 0.03729 Total 3 6.6328 5.54427 * MESSAGE: ratios ar basd on disprsion paramtr with valu Hr th valu 6.63 in th last lin is th Chisquar (G 2 ) valu in a tst for indpndnc btwn rows and columns as bfor. This is thn partitiond into a componnt du to th linar logistic rgrssion modl (which fits on paramtr in addition to a constant to rprsnt th rlationship btwn th four probabilitis for ach lvl of th factor variat conc which is th actual concntration of sodium olat in th tratmnts) and th rmaindr unxplaind by th modl. Th chisquar for th rgrssion part is 6.556 with df (P<0.00). Th rmaindr is 0.075 with 2 df which not at all significant. This indicats that th rgrssion modl givs a good rprsntation of th data. Th modl fittd is as follows E(n i ) = n i. p i, E(n i2 ) = n i. (-p i ) and pi log =α+ β conci p i This givs 2 paramtrs α and β. Th following tabl givs stimats of ths paramtrs. antilog of stimat s.. t(*) stimat Constant 0.52 0.399 0.38.64 conc.226 0.35 3.89 3.406
Th scond ntry in th final column of this tabl givs th antilog of th rgrssion cofficint and stimats th OR du to incrasing concntration lvl by on unit as 3.406. This mans that th numbr of daths pr survivor is 3.4 tims gratr at concntration.7 than at concntration 0.7. Th following commands giv prdictd proportion of dad at th concntrations of sodium olat nominatd in th variat xconc (0.7,.3 and 2). variat [nvalus=3;valus=.7,.3,2] xconc prdict conc; lvls=xconc Prdiction S.. conc 0.70 0.7329 0.048.30 0.853 0.0202 2.00 0.930 0.095 * MESSAGE: S..s ar approximat, sinc modl is not linar. * MESSAGE: S..s ar basd on disprsion paramtr with valu
Applid Statistics II - Catgorical Data Analysis Data analysis using Gnstat - Exrcis 2 Logistic rgrssion ANSWER SHEET - TO BE SUBMITTED FOR GRADING Nam of studnt Dat Analysis 2.2 A survy of businsss providd th data blow on th salary and ducational achivmnt of th chif xcutivs. Salary Educational Lvl < Third Third <35000 25 0 35000 45000 30 29 45000 55000 24 35 55000 65000 9 36 65000 3 (a) Is th proportion of rspondnts in third lvl ducation constant ovr salary classs? Complt th following tabl. Chisquar Df Significanc (b) Assign scors 3, 4, 5, 6 and 7 to th incom catgoris and fit a linar logistic modl to th proportion of chif xcutivs with third lvl ducation as rlatd to th scor. Intrcpt and its SE Cofficint and its SE Significanc of cofficint What is th modl? (c) Is thr vidnc of lack of fit with th modl? Support you answr with a suitabl tst of significanc. Tst typ Tst valu Dgrs of frdom Significanc
(d) Prdict th proportion of chif xcutivs with third lvl ducation for thos having incom of 50000 pr annum and comput an approximat 95% confidnc intrval for your answr. Proportion 95% CI Lowr Uppr () Estimat th OR associatd with incrasing incom by 20000 and comput a 95% confidnc intrval for it. OR stimat 95% CI Lowr Uppr