BULLETIN of the MALAYSIAN MATHEMATICAL SCIENCES SOCIETY Bull. Malays. Math. Sc. Soc. () 7 (4), 7 15 Sgfcace Testg Exact Logstc Multle Regresso MEZBAHUR RAHMAN AND SHUVRO CHAKROBARTTY Mesota State Uversty, Makato, MN 561, USA e-mal: mezbahur.rahma@msu.edu ad shuvro.chakrobartty@msu.edu Abstract. Exact logstc regresso s dscussed for multle regressors. The exact sgfcace of a regressor s comuted whch ca be used smlfyg the model ad/or to comute the sgfcace of a varable or a set of varables the model. Blgual educato data s aalyzed usg the rocedure metoed ths aer. Mathematcs Subect Classfcato: 6J 1. Itroducto Followg Cox ([1], Ch. 4), Trtchler [8] mlemeted a algorthm for the exact logstc regresso aalyss of a sgle regressor. Trtchler [8] used Fourer trasformato algorthm gve by [7] to comute the value for testg the sgfcace of the regresso arameter. Mehta et al. [4] gave a effcet Mote Carlo method by etworkg the ossble regressor values. We assume that we have a set of deedet varables (regressors) that are to be used for redcto each of the followg stuatos: (1) redctg whether a comay's dealer wll soo be mred dre facal strats, () redctg f a erso s lkely to develo heart dsease, (3) redctg whether a hostal atet wll survve utl beg dscharged, or (4) redctg whether a erso has acheved a desrable cometecy level of a learg. I such scearos, the resose (deedet) varable s bary. Due to a wde rage of alcatos, the bary resose models are studed exlctly. For the latest develomets the area, the reader s referred to [6], [] ad the refereces there. Let X 1, X,, X be searate regressors ad Y be a resose (deedet) varable. Y ca oly take the values of 1 for success ad for falure. A radom samle of data ots s take from a heomeo. A geeral bary model s assumed as ( Y 1 = 1) = π = E( Y X, X,, X ), = 1,,,, (1) where π 1ad (Y = ) = 1 π. We defe the logstc model as
8 M. Rahma ad S. Chakrobartty ( + = 1 x ) ( + x ) = 1 ( = x ) ( x ) ex ex π ( x ) = = () 1 + ex 1 + ex = where, 1,, are ukow costats ad x s the row vector ( 1 x1 x ). Notce that there s o error term o the rght sde of () because the left sde s a fucto of E Y X, X,, X ), stead of Y, whch serves to remove the error term. ( 1 If y1, y,, y s a observed bary sequece of sze, the ( Y = y, Y = y,, Y y x, x,, x, x, x,x,x,x, ) 1 1 = 11 1 1 1 1,x = 1 ex ( s + = 1 t ) ( ( ), 1 + ex + x = (3) where s = y ad t = y x for = 1,,,. Followg Cox (197), =1 =1 ferece s based o the suffcet statstcs S = 1 = Y ad T Y X =1 =1 = for = 1,,,, whose ot dstrbuto s obtaed by summg over all bary sequeces geeratg each realzato of s, t, t t. Thus ( S s,t = t, T = t,, T = t ) 1, ( = 1 C ) ex ( s + = 1 t ) ( 1 + ex ( + x )) = 1 1 =, (4) where C s are the umbers of dstct bary sequeces yeldg the values s ad t s for the suffcet statstcs. Exact fereces cotag s may be based o the codtoal dstrbuto ( T1 = t1, T = t,, T = t S = s). Because of roertes of the exoetal famly of dstrbutos, the crtcal rego defed by the uer tal values of t wth the codtoal referece set rovdes a uformly most owerful ubased test of H : = versus H a : > (Lehma 1959,. 136). The codtoal dstrbuto used to test hyotheses cocerg s = 1 = 1 ( T = t S = s) =, l C ex ( t ) C l ex ( t l ) (5) where l s a dex ragg over all the values take by T.
. Tests of sgfcace Sgfcace Testg Exact Logstc Multle Regresso 9 I oe-sded tests, such as H : = versus H a : >, the value ca be comuted as ( T t S = s) = : T t l C C l ex ( ex ( whch tests for sgfcace of the artcular varate the model. Due to the atter of roducts of frequeces ad exoetato of a lear fucto, the values are easy to comute for the sgfcace of a set of varates. For examle, for testg, H : = ad k = k versus H a : at least oe of the s ot equal to the secfed value ca be tested by comutg the value as t ) l ( T R or Tk Rk S = s) = 1 ( T R ad Tk Rk S = s) = ( T R ) ( T R ) = 1 ( 1 ( T t S s )) 1 (7) t ) k k = ( Tk tk S = s) ( 1 ) where R ad are demostrated usg a data set Secto 5. I equato (7) oe or both laces by R k are the resectve crtcal regos. The comutatos of such values (6) ' ' wll be relaced ' ' deedg o whether observed t ad/or t k le whch tal of the dstrbutos of T ad T k. Smlar rocesses ca be aled for testg a set of more tha two arameters. 3. Ifereces based o maxmum lkelhood Maxmum lkelhood arameter estmato s studed extesvely. Here we revew the method as gve [5]. The log-lkelhood fucto for the logt model () ca be wrtte as log L( ) = = 1 { y logπ ( X ) + (1 y ) log[1 π ( X )]}, where s vector valued. For methods wth more tha oe arameter, the frst-order codtos requre that we smultaeously solve the + 1 equatos u δ log L( ) = δ =, =,1,
1 M. Rahma ad S. Chakrobartty δ log L( ) for whch h k = <, ( =,1,, ), ( k =,1,, ). δ δ k For the frst-order codtos for the logt model (), the lkelhood exressos are wrtte as δ log L( ) δ = U ( ) = [ y π ( X )] = 1 X ad the egatve of the secod dervatves are δ log L( ) I( ) = = π ( X )[1 π ( X )] X X δδ = 1 where U ( ) s a ( + 1) 1 vector ad I ( ) s a ( + 1) ( + 1) matrx. The matrx I ( ) lays a key role the estmato rocedure ad yelds the estmated varaces ad covaraces of the estmates as by-roduct. The asymtotc varaces ad covaraces of the logt estmates are obtaed by vertg the Hessa (or exected Hessa) matrx or formato matrx I ( ). The the Newto-Rahso teratve soluto of a system of equatos ca be used to obta the solutos of s. At the t th terato, estmates are obtaed as 1 [ I ( ˆ ( t 1) )] U ( ˆ ( 1 ). ˆ ( t) = ˆ ( t 1) + t ) The least square estmates of s are ofte used as the tal estmates. The quatty ˆ k D kk has asymtotc ormal dstrbuto where D kk s the k th dagoal elemet of 1 [ I ( ˆ)] ad ca be used testg ad formg cofdece tervals for a artcular arameter k. Also, ay subset of arameters ca be tested usg the followg asymtotc χ statstc. χ = log L log L ], (9) [ 1 has aroxmate χ dstrbuto wth ( q 1) ( 1) = q degrees of freedom, where q + 1 s the umber of ukow arameters the model uder the ull hyothess, log L 1 s the maxmzed log lkelhood uder the full model, ad log L s the maxmzed log lkelhood uder the ull hyothess.
Sgfcace Testg Exact Logstc Multle Regresso 11 4. Motvato Exact logstc regresso s ot a ew heomeo but the wake of comutatoal coveece s attractg more atteto tha before. Secalzed software are ot oular as they do ot commucate effectvely wth the users. The algorthms are used the software ad are suggested by dfferet authors are aroxmatos, ofte through fourer trasformatos. Here we gve algorthms to comute exact values wthout ay aroxmato. A data set s used to aly the exact logstc regresso rocedures. The exact values wll hel to determe whether the artcular factor s sgfcat or ot more accurately that usg aroxmate t statstc for the maxmum lkelhood estmate. Whe artcular factors are foud to be sgfcat the maxmum lkelhood method should be used estmatg or redctg the success robabltes. Ofte, the method of dscrmat aalyss (see [], Secto 1.5) gves hgher rate of successful redctos but lacks roertes lke ubasedess, cosstecy, ad effcecy. Table 1. Estmated Varace-Covarace Matrx (MLE) Statstcs 1 3.8144.44.1553.18 1.44.3.7.5.1553.7.395.7 3.18.5.7.4 5. Alcato The oulato studed s thrtee schools of the Salas Cty Elemetary School dstrct the Couty of Moterey Calfora. artcatg studets were ffth ad sxth graders of lmted Eglsh rofcecy. Ths study was udertake 1996 to take a deth look at the data gathered for the oulato of lmted Eglsh rofcet studets ad ts role redesgatg studets to fluet Eglsh rofcecy status. Iformato o 57 artcatg studets were recorded ad dslayed Table 3. I Tables 3, E reresets Eglsh score, S reresets Sash score, Y reresets the umber of years the rogram ad B reresets the redesgato the blgual status. I the varable B, 1 dcates success of the artcat the rogram ad dcates falure. I the varables E ad S, hgher the score meas hgher the rofcecy. Redesgato the rogram s doe by the evaluator after cosderg the three varables E, S ad Y, ad the ersoal udgemet of the evaluator. Here we wll model usg Logstc regresso models. Goal s to gve a rule by whch oe ca be redesgated. The assumto s that the redesgato of the reset data s doe by a
1 M. Rahma ad S. Chakrobartty exert ad future redesgato s ossble by a auto-mated rule wth the hel of the reset aalyss. The Logstc model gve () s estmated usg the maxmum lkelhood method (MLE) gve owers ([5], Secto 3.3.3) as descrbed Secto 3 as ex (1.999 +.17 Eg.1816 Sa.474 Yrs) ˆ π = 1 + ex (1.999 +.17 Eg.1816 Sa.474 Yrs) The varace-covarace matrx for the estmates of the arameters s comuted 1 usg the delta method as [ ( ˆ)] I Secto 3 ad dslayed Table 1. The maxmum lkelhood estmates ad the corresodg values are dslayed Table. I Table, the lkelhood rato statstc as (9) s rereseted as χ, Est. reresets the MLE estmates of the arameters, Z s the studetzed statstc, z s the value for Z statstc, χ s the value usg the χ statstc, Exact s the value usg the exact method as descrbed Secto. I testg H ad : 1 = = Table. Estmates ad values Est. Z z χ χ Exact ˆ 1.999 1.188.9 ˆ 1.17.891.99.78.996 ( T1 1 S = 138) =. 84 ˆ.1816.9137.369.841.3591 ( T 553 S = 138) =. 3889 ˆ 3.474.7314.4645.541.46 ( T3 673 S = 138) =. 4177 versus H a : at least oe of 1 ad o-zero. value = 1 (1 ( T1 1) S = 138)) (1 ( T 553 S = 138)) =.89. Smlarly, the values for testg sgfcace of the other two subsets of arameters ( 1, 3) ad (, 3) are resectvely,.8954 ad.644. Asymtotc method such as lkelhood rato χ gves the followg three values for the resectve ars of arameters as.6551,.786, ad.4789. The value for testg sgfcace of all three arameters 1, ad 3 usg lkelhood rato test s.668, ad usg the exact method s = 1 (1 ( T1 1) S = 138)) ( 1 ( T 553 S = 138)) ( 1 ( T3 673 S = 138)) =.9361.
Sgfcace Testg Exact Logstc Multle Regresso 13 6. Cocluso I Secto 5, we otce that the exact values for the dvdual arameters are comarable wth the asymtotc values excet for 1 where the dfferece s otceable. But for a set of two or three arameters the dffereces values are hgh. I asymtotc comutatos, the Ch-square fereces are based o deedece of arameter estmates but realty they are ot as ca be see Table 1. The dffereces values are clear eve though the data s large ad the covaraces amog the estmates are very small. Refereces 1. D.R. Cox, Aalyss of Bary Data, Lodo, UK: Methue, 197.. D.W. Hosmer Jr. ad S. Lemeshow, Aled Logstc Regresso, Secod edto. New York: Wley,. 3. E.L. Lehma, Testg Statstcal Hyothess, New York: Joh Wley, 1995. 4. R.C. Mehta, N.R. atel, ad. Sechaudhur, Effcet Mote Carlo methods for codtoal logstc regresso, Joural of the Amerca Statstcal Assocato 95 (449) (), 99 18. 5. D.A. owers, Statstcal Methods for Categorcal Data Aalyss, Lodo, UK: Academc ress,. 6. T.. Rya, Moder Regresso Methods, New York: Wley, 1997. 7. R.C. Sgleto, A algorthm for comutg the mxed-radx fast Fourer trasformato, IEEE Trasactos o Audo ad Electroacoustcs 17 (1969), 93 13. 8. D. Trtchler, A algorthm for exact logstc regresso, Joural of the Amerca Statstcal Assocato 79 (1984), 79 711. Keywords: Bary resose model; Dscrmat aalyss; Goodess-of-ft; Least square estmate; Maxmum lkelhood estmate.
14 M. Rahma ad S. Chakrobartty Table 3. Blgual Data ID# E S Y B ID# E S Y B ID# E S Y B ID# E S Y B 1 1 4 7 1 66 1 4 7 1 131 3 6 4 196 1 5 3 4 4 1 67 3 4 5 1 13 3 5 5 1 197 3 4 3 3 4 6 1 68 3 4 1 133 1 4 7 198 1 3 1 4 1 4 5 1 69 1 5 5 134 1 4 7 1 199 3 5 1 3 6 7 1 4 7 135 1 4 7 1 3 4 3 6 1 3 5 1 71 1 4 4 1 136 1 5 4 1 1 1 4 7 1 3 7 1 7 3 4 3 137 3 4 7 1 1 4 7 8 1 3 7 73 4 4 1 138 1 3 7 1 3 1 4 3 9 1 4 7 1 74 1 5 5 139 3 4 7 1 4 1 4 7 1 1 5 7 75 1 4 7 14 1 4 7 1 5 1 4 5 11 1 5 7 76 1 4 4 1 141 1 4 7 1 6 1 4 5 1 1 1 4 7 1 77 3 4 1 14 1 4 7 1 7 3 4 5 13 1 5 7 1 78 4 5 143 1 5 7 1 8 1 4 7 14 1 4 7 79 1 5 5 144 1 4 7 9 3 4 6 15 1 4 6 1 8 1 4 7 145 1 4 7 1 1 1 4 5 16 1 4 7 81 1 4 5 1 146 1 4 7 11 1 5 7 17 1 4 7 8 3 4 147 1 4 7 1 1 1 4 7 18 1 4 7 83 4 6 148 1 3 7 1 13 1 4 6 19 3 3 7 84 1 4 7 149 1 4 5 14 1 4 6 1 4 7 85 4 15 1 4 7 1 15 1 5 5 1 1 4 6 1 86 4 4 3 151 3 4 1 16 4 6 1 3 6 1 87 1 4 6 1 15 1 3 7 1 17 1 4 7 3 1 4 7 88 4 5 1 153 1 4 5 1 18 1 4 6 4 1 3 4 1 89 1 3 6 1 154 1 4 6 1 19 1 4 4 5 1 4 6 9 1 4 5 1 155 3 6 1 4 5 6 1 4 7 1 91 1 4 5 1 156 4 6 1 1 3 4 1 7 1 4 7 9 4 4 6 157 1 4 6 1 4 3 1 8 1 4 7 93 1 5 6 1 158 1 4 6 3 4 4 1 9 1 4 7 94 1 5 5 159 3 4 6 1 4 1 5 5 1 3 1 4 7 95 1 4 6 1 16 5 7 5 1 3 31 1 4 1 96 1 4 6 1 161 3 5 4 6 1 4 1 3 1 4 7 97 1 4 7 16 1 4 5 1 7 1 4 1 33 1 4 6 98 1 4 6 1 163 1 4 5 8 3 4 1 34 1 4 7 1 99 1 4 5 1 164 1 5 4 9 1 4 1 35 1 4 7 1 1 1 5 5 1 165 1 5 3 3 4 36 1 4 6 11 5 3 1 166 1 5 1 31 1 4 1 37 1 4 7 1 1 5 4 1 1 167 1 4 6 1 3 1 4 38 1 4 7 13 1 4 4 1 168 1 5 7 33 3 3 39 1 4 5 1 14 1 3 4 1 169 5 6 34 5 4 5 4 1 4 7 15 3 4 1 17 5 5 6 35 1 4 5 41 3 4 5 1 16 4 4 1 171 1 5 4 36 1 3 5 1 4 1 4 6 1 17 1 3 6 1 17 1 5 1 37 3 3 1 43 1 4 7 18 3 5 1 173 1 5 5 38 3 4 44 1 4 7 1 19 1 3 7 174 1 5 6 39 5 7 1 45 1 4 7 11 1 5 5 1 175 1 5 4 4 3 5 4 1 46 1 3 7 111 1 4 6 1 176 3 4 6 41 1 4 5 1 47 1 5 7 11 1 4 6 1 177 1 4 7 4 1 5 4 48 1 4 6 1 113 1 4 178 4 5 7 1 43 1 5 1 49 1 4 5 1 114 1 3 179 1 5 7 44 1 4 5 1 5 1 3 6 1 115 1 4 6 1 18 5 6 45 1 4 6 1
Sgfcace Testg Exact Logstc Multle Regresso 15 Table 3 (cot d) 51 1 3 5 1 116 1 4 4 1 181 5 5 1 46 3 6 1 5 1 5 6 1 117 1 3 7 18 1 5 5 1 47 4 6 1 53 1 5 6 1 118 1 3 7 183 1 5 48 1 4 6 1 54 1 4 6 1 119 1 5 7 1 184 1 5 5 1 49 1 4 6 1 55 1 5 6 1 1 3 4 1 185 1 5 7 5 3 4 6 1 56 3 3 6 1 11 1 4 1 186 1 5 4 51 3 4 4 1 57 1 4 1 1 1 3 4 4 1 187 1 4 7 5 1 4 1 58 1 4 5 13 1 3 188 4 5 3 1 53 1 4 1 59 1 3 1 1 14 1 4 1 189 1 5 3 54 1 4 7 1 6 1 4 7 15 1 4 1 19 3 6 55 4 4 1 61 1 4 6 1 16 3 4 191 1 4 1 56 3 4 5 1 6 3 4 5 1 17 1 3 4 1 19 1 3 1 57 1 4 5 1 63 1 4 7 18 3 4 1 193 3 64 1 4 7 1 19 3 4 4 1 194 4 3 65 1 4 6 1 13 1 5 5 1 195 1 4 1