page1 Geeral x Cotigecy Tables We ow geeralize our previous results from the prospective, retrospective ad cross-sectioal studies ad the Poisso samplig case to x cotigecy tables. For such tables, the test of hypothesis for a - prospective study is H 0 : j1 j... j for all j - retrospective study is H 0 : XiY1 XiY... XiY for all i - the cross-sectioal study which tests idepedece is H 0 : ij i j for i 1,..., ad j 1,..., Example: As part of the 1991 Geeral Social Survey (GSS), a sample of 980 subjects were classified accordig to geder ad political party idetificatio, with the followig results: PARTY DENTFCATON depedet Democrat Republica Total Female 11 73 1 79 13 5 1 577 Male 1 47 165 3 191 403 Total 1 10 444 3 416 980(fixed) Are the variables Geder ad Party detificatio idepedet? Alteratively, we might ask Does geder have a effect o political party idetificatio? Let ij probability that a idividual is of geder i i 1, ad idetifies with party j j 1,, 3. H 0 : ij i j for i 1, ad j 1,, 3 For the case of a x cotigecy tables, the expected frequecies were give by
page m ij ij i j ad we used the m ij to test the ull hypothesis above usig either Pearso s X ij m ij m ij or the log-likelihood G ij log ij m ij with both havig df 1 1 1 1 1. A SAS program follows:. SAS PROGRAM (data iteral) data politics; iput $ PARTYD $ cout; cards; female idpdt 73 female democrat 79 female republic 5 male idpdt 47 male democrat 165 male republic 191 ; proc freq order data; tables *PARTYD / chisq expected cellchi; weight cout; ru; The values for X 7.01 with pvalue of 0.0031 ad G 7.00 with p-value 0.030 (both approximately chi-square with df ) idicate strog evidece that geder ad party idetificatio are ot idepedet. Now cosider ways to ivestigate the associatio betwee two variables i a geeral x table.
page3 1. Stadardized Residuals Stadardized residuals uder H 0 are give by e ij ij m ij m ij. These stadardized residuals ca be compared to stadard ormal percetiles (such as 1.96. Sice the asymptotic variace of e ij is less tha 1, they provide coservative idicatios of cells that have a lack of fit uder the ull hypothesis. For example, we obtai the followig as the stadardized residuals (computed uder the assumptio of H 0 ) for our data set: PARTY DENTFCATON depedet Democrat Republica Female e 11 0.8 e 1 1.09 e 13 1.7 Male e 1 0.34 e 1.30 e 3 1.5 These stadardized residuals do ot appear uusual. (Note: e 3 1.5 appears to idicate that the umber of males who idetify as Republica is higher tha what would be expected if geder ad party idetificatio were idepedet but this stadardized residual is still ot very large.). Adjusted Residuals Cosider istead the residual ij m ij p ij ij sice p ij ij p ij i j (uder H 0 ) p ij p i p j p ij p i1 p i...p i p 1j p j...p j To get the asymptotic variace of ij m ij, we resort to usig the delta method : Set ad gp ij m ij p ij p i1 p i...p i p 1j p j...p j g ij i1 i... i 1j j... j.
page4 The the asymptotic variace of is gp g where ij g ij Uder H 0 : ij i j we ca simply the expressio Usig ij i j, the asymptotic variace of gp g becomes ad thus the asymptotic variace of gp is which ca be estimated by i j 1 i 1 j i j 1 i 1 j i j 1 i 1 j m ij 1p i 1p j where i p i i ad j p j j We defie the adjusted residuals uder H 0 as
page5 r ij ij m ij m ij 1p i 1p j These have a asymptotically stadard ormal distributio so this will allow us to detect abormally large (i absolute value) adjusted residuals. For our data, we obtai the followig adjusted residuals: PARTY DENTFCATON depedet Democrat Republica Female r 11 0.46 r 1.9 r 13.6 Male r 1 0.46 r.9 r 3.6 The adjusted residuals idicate that the umber of females who idetify as Democrat is higher tha would be expected if geder ad party idetificatio are idepedet (i.e. r 1.9). They also idicate that the umber of males who idetify as Republica is higher tha would be expected uder the assumptio that geder ad party idetificatio are idepedet (i.e. r 3.6).
page6 Partitioig the Likelihood Ratio Chi-Square Statistic (G Aother way to study the associatio betwee two variables i a x table ivolves partitioig the likelihood ratio Chi-square statistic so that the compoets represet certai aspects of the associatio. Partitioig may reveal that a associatio primarily reflects differeces betwee certai categories or groupig of categories. Cosider our data ad look ow at the x subtable based o the first two colums of the origial x3 table: PARTY DENTFCATON depedet Democrat Total Female 11 73 1 79 1 35 Male 1 47 165 1 1 10 444 564 A test of idepedece o this table yields G 0.16 with df 1 (associated pvalue of 0.689.Thus, based o this subsample of the full dataset, we do ot have sufficiet evidece to refute the ull hypothesis of idepedece of geder ad party idetificatio (beig depedet or Democrat). This meas that, of those subjects who idetify either as depedet or Democrat, there does ot appear to be much evidece of a differece betwee females ad males i the relative umbers i the two categories. Suppose we for a differet, secod x table as follows (based o the above result, we have collapsed the depedet ad Democrat group ito a combied groupig): PARTY DENTFCATON dep/dem Republica Total Female 11 35 1 5 1 577 Male 1 1 191 403 1 564 416 980 A test of idepedece of this table yields G 6.84 with df 1 (associated p value of 0.0089. Here we have strog evidece agaist the ull hypothesis of idepedece. Thus there is strog evidece of a differece betwee females ad males i the relative umbers idetifyig as Republica istead of depedet or Democrat. Note that, summig the two G values above, we obtai 0.166.84 7.00 i.e. the sum of the two G compoets equals G for the complete x3 table. ( The same holds for the degrees of freedom.)
page7 t might seem atural to compute G for separate x tables that pair each colum with a particular oe, say the first or the last. However, the resultig compoet statistics are ot idepedet ad thus do ot sum to G for the complete table. Certai rules must be followed to esure that G for the complete table partitios ito idepedet compoets. Agresti idicates that Goodma(1968,1969,1971), rwi (1949), verso(1979) ad Lacaster(1949) discuss rules that help i determiig subtables for which the compoets of G are idepedet. Amog these rules are the followig ecessary coditios: 1) The degrees of freedom for the subtables must sum to the degrees of freedom for the origial table. ) Each cell cout i the origial table must be a cell cout i oe ad oly oe subtable. 3) Each margial total of the origial table must be a margial total for oe ad oly oe subtable.