Testing categorized bivariate normality with two-stage. polychoric correlation estimates

Testing ctegorized bivrite normlity with two-stge polychoric correltion estimtes Albert Mydeu-Olivres Dept. of Psychology University of Brcelon Address correspondence to: Albert Mydeu-Olivres. Fculty of Psychology. University of Brcelon. P. Vlle de Hebrón, 7. 08035 Brcelon (Spin). E-mil: mydeu@psi.ub.es. This reserch ws supported by grnt BSO2000-066 of the Spnish Ministry of Science nd Technology. I'm indebted to Krl Jöreskog for his encourgement to write this mnuscript nd for his comments to previous drft.

2 Abstrct We show tht when the thresholds nd the polychoric correltion re estimted in two stges, neither Person's X 2 nor the likelihood rtio G 2 goodness of fit test sttistics re symptoticlly chi-squre. We propose new sttistic tht is symptoticlly chi-squre in this sitution. However, our experience -illustrted with numericl exmple- nd smll simultion study using 5! 5 tble suggest tht in finite smples our new test is not ble to outperform X 2. Both mtch rther well reference chi-squre distribution even in smll smples. G 2, on the other hnd, ws found to be conservtive in smll smples. Thus, in prctice it is justified to use X 2 to test bivrite normlity when the prmeters re estimted in two stges, even in smll smples. Lrger smples my be needed to use G 2. We ttribute our empiricl results to the very smll numericl differences found when X 2 nd G 2 re computed using two or one-stge estimtes, where in the ltter cse it is theoreticlly justified to use these sttistics. Keywords: LISREL, LISCOMP, MPLUS, ctegoricl dt nlysis, pseudo-mximum likelihood estimtion, multinomil models, sprse tbles.

3. Introduction Consider bivrite stndrd norml density ctegorized ccording to (I - ) nd (J - ) thresholds, respectively. Within mximum likelihood frmework, Olsson (979) considered one nd two-stge pproches to estimte the q = (I - ) + (J - ) + prmeters of this model from the observed I! J contingency tble. In the one-stge pproch ll prmeters re estimted simultneously. In the two-stge pproch, the thresholds re estimted seprtely from ech univrite mrginl, then the polychoric correltion is estimted from the bivrite tble using the thresholds estimted in the first stge. Of course, fter estimting the prmeters one must test the model (Muthén, 993). To this end, one my employ Person's X 2 test sttistic or the likelihood rtio sttistic G 2. From stndrd theory (e.g, Agresti, 990), when the one-stge pproch is employed both sttistics re symptoticlly distributed s chi-squre with r = IJ - q - = IJ - I - J degrees of freedom. However, ever since Olsson (979) concluded tht very similr results re obtined with the computtionlly simpler two-stge pproch, this pproch hs become the stndrd procedure for estimting this model. As such, it is the procedure implemented in computer progrms such s LISCOMP (Muthén, 987), PRELIS/LISREL (Jöreskog & Sörbom, 993) nd MPLUS (Muthén & Muthén, 998). Yet, the distribution of X 2 nd G 2 is presently unknown when the two-stge pproch is employed. This pper fills in this gp using lrge smple theory. 2. Asymptotic distribution of prmeter estimtes Consider I! J contingency tble. Let! % "!,!,!,!,!,!,!,!,!,!,! J i ij I IJ denote its IJ vector of probbilities nd p its ssocited vector of smple proportions. $

4 Furthermore, let! % $ "!!!!! nd! % "!!!,,,, i I,,,, $!! denote the vectors of 2 j univrite mrginl probbilities, nd p nd p 2 the vectors of its ssocited smple proportions. We note tht, J illustrted here for I = 2 nd J = 3! % T!! % T!, () 2 2 & ' $ 0 3 3$( T % ) 0 $ $( ( )* 3 3 + " T % I I 2 3 3 where 3 nd 0 3 denote three-dimensionl column vectors of 's nd 0's respectively. Now, ssume the following model for, ij, ij " 2 i " 2 j %.. " z dz! (2) " " i- 2 j- * * 2 where " %-/," %-/," %/," %/ nd 0 n ( ) denotes n-vrite stndrd norml 2 2 0 0 I J density function. Thus, z * hs men zero nd correltion mtrix & $ '. In prticulr, )* $ ( + " 2 i * *! " " % z dz i. " i- In the sequel, let ", 2 " 2 j * *! % j. z dz (3) 2 2 " 2 j- " % " " $, where " nd " 2 denote the (I ) nd (J ) dimensionl vectors of thresholds implied by the model. Finlly, let % "", $ $. We now provide the symptotic distribution of the one nd two-stge prmeter estimtes using stndrd results for mximum likelihood estimtors for multinomil models. Agresti (990) is good source for the relevnt theory. Let! nd p be C-dimensionl vectors of multinomil probbilities nd smple

5 proportions, respectively, nd let N denote smple size. Consider prmetric structure for!,!($), with Jcobin mtrix! % %, nd suppose we estimte $ by mximizing $ $ C L" $ % N2 p ln! " $. (4) c% Then, under typicl regulrity conditions, it follows tht c c d N " p-! 3N " 0, & & % D -!!$ (5) where % Dig "! N N D, % " % denotes symptotic equlity. " $ - $ % N " -! B p (6) " " d - N," $ $ -$ 3 0 % D % - (7) - - - B % $ D % % $ D, 3 d denotes convergence in distribution, nd 2. One-stge estimtion D d, Akin to (5) we write, N " -! 3N " & % Dig! " p 0, where & % D -!! $ nd. Then, when ll the prmeters re estimted simultneously by I J mximizing L "",", % N ln ",, i j 22p " " ij ij i j $! $, it follows from (7) i% j% N " " d - N," $ - 3 0 % D % - (8)! & ' )!! % % %) $ )* " $ $ ( + where nd ll necessry derivtives cn be found in Olsson (979). 2.2 Two-stge estimtion Consider now the following sequentil estimtor for (Olsson, 979):

6 First stge: Estimte the thresholds for ech vrible seprtely by mximizing "" ln "" I J % 2! L i i i "" N p ln 2 2 "" j j j i% j% L N p Second stge: Estimte the polychoric correltion by mximizing % 2! (9) L " $," N p ln! " $," (0) I J % 22 i% j% We shll now provide n lterntive derivtion of Olsson's results for this estimtor closely following Jöreskog's (994). We first notice tht " nd " re mximum likelihood estimtes, s (9) is the kernel of the log-likelihood function for estimting the thresholds from the univrite mrginls of the contingency tble. Similrly, (0) is the kernel of the log-likelihood function for estimting the polychoric correltion from the bivrite contingency tble given the estimted thresholds. Tht is, $ is pseudo-mximum likelihood estimte in the terminology of Gong nd Smniego (98). To obtin the symptotic distribution of the two-stge estimtes we first pply (6) to the first stge estimtes to obtin ij ij i j N "" -" % B N " p -! N "" -" % N " -! 2 2 2 2 B p () where " B % % $ D - % - % $ D -, % Dig "!! D, % % " $, nd so on. These derivtives cn lso be found in Olsson (979). Now, letting B & B T ' ( %, by () nd () we get ) B T ( * 2+ N "" -"% N " -! B p. () Similrly, direct ppliction of (6) to the second stge estimtes yields - % % - - % $ % $ 22 22 22 22 where " " N " $ - $ %' N p -! " $, " (3) 22! B D D, nd % %. Note tht B 22 nd % 22 re row nd $ 22

7 column vector, respectively, despite the nottion. Now, we need the symptotic distribution " of N p -! " $, " to proceed. In Appendix, we show tht N " p -! " $, " %" I-% B N " p -! (4) 2! where % %. Putting together (3) nd (4) we obtin 2 " $ N " - % B " I-% B N " p -! Finlly, putting together () nd (5) we obtin $ $. (5) 22 2 N " -% G N " p -! & B ' ( G % ) B I- B (. (6) " % )* 22 2 + nd since s shown in the Appendix, G! % 0, (7) N " d N ", $ - 3 0GD G (8) where G nd D re to be evluted t the true popultion vlues. 3. Goodness of fit testing We wish to investigte the symptotic distribution of Person's X 2 sttistic, X 2 I J % N22 " p -! " ij ij! " i% j% ij 2 nd the likelihood rtio sttistic I J p 2 ij G % N22 p ln % 2L " ij! i% j% ij, where by convention when p ij = 0, p ij pij ln 0! " %. ij From stndrd theory (e.g., Agresti, 990), when the model prmeters re estimted d 2 2 2 simultneously, G % X 3%. When they re estimted in two stges, it lso holds tht - - G 2 2 IJ I J % X (see Agresti, 990: p. 434). Thus, we only consider the symptotic distribution of

8 X 2. To do so, we first consider the symptotic distribution of the unstndrdized residuls N e :% N " p -! " when re two- stge prmeter estimtes. In the Appendix it is shown tht " " Ne% I-K N p -! (9)! $ where % % %"% % 2 22 nd K % % G. Thus, by (9) nd (7), d N e3 N " 0,( ( %" I-K D " I-K $ -!! $. (20) Now, we note tht X 2 cn be written s X % NeD $ e. Using stndrd results 2 - (Agresti, 990: p. 432), X NeD e NeD e. (2) 2 - - % $ % $ A necessry nd sufficient condition for (2) to be symptoticlly chi-squre distribution is (e.g., Schott, 997: Theorem 9.0) ( D ( D ( % ( D ( (22) - - - In the Appendix we show tht - " 2 - ( " " " ( " " D % I-K I-JK I-J -C4 D % I-K I-J -C (23) where J % D K$ D nd - C %!! $ D. Thus, (22) is not stisfied. Neither X 2 nor G 2 re - symptoticlly chi-squred. Rther, these sttistics converge in distribution to mixture of r = IJ - I - J independent chi-squre vribles with one degree of freedom (Box, 954: Theorem 2.). Nevertheless, following Moore (977) it is esy to construct n lterntive qudrtic form in the residuls (20) tht is symptoticlly chi-squre. Consider ( -, mtrix tht converges in probbility to ( - -, generlized inverse of ( stisfying (( ( % (. Then, it redily follows tht

9 d W % N e$ ( - e 3% (24) 2 r where W is Wld sttistic. To consistently estimte ( we evlute ll derivtive mtrices in % nd G t the two-stge estimtes nd evlute ll probbilities t the estimted vlues. Then, ( - cn be redily obtined from ( by numericl methods. In closing this section, we note tht with one-stge prmeter estimtes X 2 = W s D is generlized inverse of the symptotic covrince mtrix of the unstndrdized - " residuls N e :% N p -! ". Next, we present numericl exmple nd very smll simultion study to illustrte the smll smple performnce of X 2, G 2, nd W. 4. Numericl Results 4. Soft drink dt Agresti (992) sked 6 respondents to compre the tste of Coke, Clssic Coke nd Pepsi using five point preference scle in pired comprison design {Coke vs. Clssic Coke, Coke vs. Pepsi, Clssic Coke vs. Pepsi}. The ctegories were {"Strong preference for i", "Mild preference for i", "Indiference", "Mild preference for i ", nd "Strong preference for i ",}. For ech pir of vribles, we shll test the ssumption tht the observed 5! 5 tble rises by ctegorizing stndrd bivrite norml density. Tht is, we re interested in testing substntive hypothesis of normlly distributed continuous preferences for the soft drinks in the popultion. In Tble we provide the thresholds nd polychoric correltion for ech pir of vribles estimted in two-stges, the symptotic stndrd errors of these prmeters, nd the goodness of fit results. The stndrd errors for the prmeter estimtes were obtined s the squre root of the digonl of (8) which ws consistently estimted by evluting ll derivtive mtrices nd probbilities t the estimted prmeter vlues.

0 ------------------------------------------- Insert Tbles nd 2 bout here ------------------------------------------- As cn be seen in this tble for ll three bivrite tbles the test sttistics suggest tht the ssumption of ctegorized bivrite normlity is resonble. We lso observe tht the W nd X 2 vlues obtined re extremely close. It is our experience tht this is lwys the cse. Thus, lthough X 2 is not symptoticlly chi-squred when the model prmeters re estimted in two stges, its finite smple distribution is so close to tht of the symptoticlly correct W sttistic tht it my be used in its plce. We believe this is becuse the expected probbilities obtined using the two-stge estimtes re very close to those obtined when ll model prmeters re jointly estimted. To illustrte this point, in Tble 2 we provide the one-stge estimtes for these three bivrite tbles, long with goodness of fit sttistics. We note tht when estimting the prmeters jointly, the thresholds estimted from different bivrite tbles need not be the sme cross tbles. We see in this tble tht the prmeter estimtes nd their stndrd errors re very similr to those obtined using the two stge pproch. Furthermore, the X 2 vlues re lso essentilly identicl to those obtined using the two stge pproch. We lso note in Tbles nd 2 tht the estimted G 2 sttistics re lrger thn thew nd X 2 sttistics. This is becuse we purposely chose numericl exmple with very smll smple size reltive to the size of the tble to highlight the differences between the sttistics. The estimted G 2 sttistics re lrger becuse there re some empty cells in the observed bivrite tble nd these re not included in the computtion of G 2. To investigte more systemticlly this effect we performed simultion study using " %"-, -0.5, 0.5, $, " %"-, -0.5, 0.5, $ nd 5 = 0.3. The results for N = 50, N = 00, nd N = 000 cross 2 000 replictions re presented in Tble 3.

------------------------------------------- Insert Tble 3 bout here ------------------------------------------- As cn be seen in this tble, G 2 yields conservtive results when N = 50 nd N = 00, lthough its behvior in the criticl region {% to 5%} is cceptble when N = 00. We lso see tht the differences between the W nd X 2 sttistics re negligible, so tht the computtionlly more complex W sttistic is not relly needed. The X 2 sttistic hs n cceptble behvior even when N = 50 in 5! 5 contingency tbles. This is remrkble. 5. Discussion The purpose of this reserch ws to investigte whether it ws theoreticlly justified to use X 2 nd G 2 to test ctegorized bivrite normlity when the model prmeters re estimted in two stges. We hve found tht it is not theoreticlly justified. Consequently, we hve proposed n lterntive test, W, tht is symptoticlly chi-squred when two-stge estimtion is employed. However, our experience -illustrted with our smll numericl exmple- nd our very smll simultion study suggest tht there re very smll numericl differences when X 2 nd G 2 re computed using one nd two-stge estimtes. In the former cse it is theoreticlly justified their use. Thus, in prctice it seems justified to use X 2 nd G 2 with two stge estimtes. It is not the purpose of the present reserch to investigte the reltive performnce of X 2 nd G 2 cross vriety of smple sizes nd true prmeter vlues. However, our limited simultion results suggest tht when there re empty cells G 2 is too conservtive nd tht X 2 is to be preferred to G 2. We hve lso found tht W does not outperform X 2 in prctice, where the ltter is to be preferred on computtionl grounds.

References Agresti, A. (990). Ctegoricl dt nlysis. New York: Wiley. Agresti, A. (992). Anlysis of ordinl pired comprison dt. Applied Sttistics, 4, 287-297. Box, G.E.P. (954). Some theorems on qudrtic forms pplied in the study of nlysis of vrince problems: I. Effect of inequlity of vrince in the one-wy clssifiction. Annls of Mthemticl Sttistics, 6, 769-77. Gong, G., & Smniego, F.J. (98). Pseudo mximum likelihood estimtion: Theory nd pplictions. Annls of Sttistics, 9, 86-869. Jöreskog, K.G. (994). On the estimtion of polychoric correltions nd their symptotic covrince mtrix. Psychometrik, 59, 38-390. Jöreskog, K.G. & Sörbom, D. (993). LISREL 8. User's reference guide. Chicgo, IL: Scientific Softwre. Moore, D.S. (977). Generlized inverses, Wld's method, nd the construction of chisqured tests of fit. Journl of the Americn Sttisticl Assocition, 72, 3-37. Muthén, B. (987). LISCOMP: Anlysis of liner structurl equtions using comprehensive mesurement model. Mooresville, IN: Scientific Softwre. Muthén, B. (993). Goodness of fit with ctegoricl nd other non norml vribles. In K.A. Bollen & J.S. Long [Eds.] Testing structurl eqution models (pp. 205-234). Newbury Prk, CA: Sge. Muthén, L. & Muthén, B. (998). Mplus. Los Angeles, CA: Muthén & Muthén. Olsson, U. (979). Mximum likelihood estimtion of the polychoric correltion coefficient. Psychometrik, 44, 443-460. Schott, J.R. (997). Mtrix nlysis for sttistics. New York: Wiley.

3 TABLE Two stge prmeter estimtes, estimted stndrd errors, nd goodness of fit tests for Agrestis soft drink dt Thresholds 2 3 4 vr. -0.796 (0.80) 0.062 (0.6) 0.539 (0.69).50 (0.248) vr. 2-0.94 (0.87) -0.062 (0.6) 0.446 (0.66).202 (0.2) vr. 3 -.202 (0.2) -0.492 (0.68) 0.03 (0.6) 0.853 (0.84) Correltions nd Test Sttistics Vrs. Corr. W p-vlue G 2 p-vlue X 2 p-vlue (2,) 0.03 (0.40) 6.478 0.35 2.286 0.8 6.478 0.35 (3,) -0.347 (0.9) 4.352 0.499 8.484 0.238 4.358 0.499 (3,2) 0.005 (0.4) 5.898 0.389 8.569 0.234 5.898 0.389 Notes: N = 6; stndrd errors in prentheses; 5 d.f.

4 TABLE 2 Joint prmeter estimtes, estimted stndrd errors nd goodness of fit tests for Agrestis soft drink dt Thresholds 2 3 4 vr. 2-0.95 (0.87) -0.063 (0.6) 0.445 (0.66).202 (0.2) vr. -0.797 (0.80) 0.06 (0.6) 0.539 (0.6).5 (0.249) vr. 3 -.20 (0.2) -0.484 (0.67) 0.04 (0.60) 0.849 (0.84) vr. -0.800 (0.80) 0.064 (0.60) 0.538 (0.60).50 (0.250) vr. 3 -.202 (0.2) -0.492 (0.68) 0.03 (0.6) 0.853 (0.84) vr. 2-0.94 (0.87) -0.062 (0.6) 0.446 (0.66).202 (0.2) Correltions nd Test Sttistics Vrs. Corr. G 2 p-vlue X 2 p-vlue (2,) 0.03 (0.44) 2.29 0.8 6.476 0.35 (3,) -0.347 (0.) 8.48 0.238 4.37 0.498 (3,2) 0.005 (0.52) 8.57 0.234 5.95 0.388 Notes: N = 6; stndrd errors in prentheses; 5 d.f.

5 TABLE 3 Simultion results Nominl rtes N Stt. Men Vr. % 5% 0% 20% 30% 40% 50% 60% 70% 80% 90% W 5.33 27.37.2 4.6 9.0 8.8 3.53 42.5 52.3 62.4 73. 84.9 94.6 50 G 2 7.55 29.27.8 9.4 7.6 34.5 47.8 59.3 68.9 78.6 87.6 94. 97.8 X 2 5.37 27.64.4 4.6 9.2 9.0 32.0 43.0 52.6 62.8 73.3 84.8 94.6 W 4.90 24.59 0.2 3. 8.6 8.4 28.6 38.8 5.5 63.2 7.7 8.6 9.9 00 G 2 6.75 30.2.2 7.8 4.9 28.9 43.3 56.0 65.6 72.2 80.8 89. 94.5 X 2 4.93 24.69 0.2 3.2 8.7 8.5 28.8 38.9 52.0 63.4 7.9 8.7 9.9 W 4.65 28.32 0.8 4.3 9. 8.0 27.4 37.5 49.2 56.9 67.2 79.4 88.6 000 G 2 4.8 29.4.0 4.8 9.7 8.6 28.7 38.9 49.6 57.3 68.2 79.8 89.2 X 2 4.67 28.42 0.8 4.5 9.2 8.0 27.6 37.7 48.3 56.9 67.3 79.5 88.8 Notes: 000 replictions; 5 d.f.; " %"-, -0.5, 0.5, $, " %"-, -0.5, 0.5, $, 5 = 0.3. 2

6 Appendix: Proofs of key results Proof of Eqution (4): A first order Tylor expnsion of! " $, " round " = " 0 yields! " $," %! " $," 6% "" -", 2! where % %. Thus, N " 2 "! $, " -! %% "" - "%% B N 2 2 " p -!, where " $ the lst symptotic equlity follows from (). Now, N " p -! " $," % N " p -! - N! " $," -! % I-% B N p -! " " " 2 Proof of Eqution (7): B T! % 0 becuse - - % $ D T! % % $ D! % 0. B T! % 0 becuse 2 - - % $ D T! % % $ D! % 0. Thus, B! % 0. 2 2 2 2 Also, B! % 0 becuse 22 % $ D! % 0, so the proof is complete - 22 Proof of Eqution (9): A first order Tylor expnsion of! " round = 0 yields! where % % $! " %! " 6% " -,. Thus, N "! -! %%" -%% N " -! symptotic equlity follows from (6). Now, G p, where the lst " p -! % " p -! - "! -! %" I-% G " p -! N N N N

7 Proof of Eqution (23): We write ( D - %" I-K" I-J - C % A-C where K % % G, J % D G$ $ D nd % - C D. Then, " 2 %!! $ - ( D - % A -AC-CA6C. Using the stndrd equlities, 2 2 D! % %$ % 0 - $D $ % % 0 $ (25) -! % $ nd (7) we first notice tht JC % CJ % 0 KC % CK % 0 (26) Thus, AC = C nd CA = C. Furthermore, using (25) nd $! % ( sclr), C 2 = C. ( D - % A -C, but noting tht Thus, " 2 2 J 2 = J K 2 = K (27) we find tht 2 2 %"" - " - %" - " 6 " - A I K I J I K I JK I J. Therefore, " 2 ( D 4 ( D. - -