SPECIAL CONSIDERAIONS FOR VOLUMERIC Z-ES FOR PROPORIONS Oe s stctve reacto to the questo of whether two percetages are sgfcatly dfferet from each other s to treat them as f they were proportos whch the deomator s the sample sze ad the umerator has a bomal dstrbuto, ad the apply the stadard statstcal test for sgfcat dfferece of proportos. But fact that s ot the case. Cosder the followg example. A respodet s asked how may bottles of each of 5 soft drks he cosumes a week. Hs data the gets summarzed to 5 percetages, amely hs cosumpto percetage of each of the soft drks. he average of these percetages s calculated across a radom sample of respodets, ad the questo of whether the average percetage of Coke s sgfcatly dfferet from that of Peps. Now cosder a secod example. A radom sample of respodets s asked four questos, () Have you ever eate at the Rtz-Carleto restaurat? () Have you ever eate at the Four Seasos restaurat? (3) What fast-food restaurats have you vsted the last moth? (3) For each of these restaurats, how may bottles of Coke dd you buy? For each of the respodets who ever ate at the Rtz-Carleto the total umber of bottles of Coke purchased are calculated for each fast-food restaurat. Smlarly, for each of the respodets who ever ate at the Four Seasos the total umber of bottles of Coke purchased are calculated for each fast-food restaurat. (Of course, there are respodets who ate at both the Rtz-Carleto ad the Four Seasos.) From these oe calculates, for example, the percetage of Coke bottles purchased at McDoald s by respodets who ate at the Rtz-Carleto ad who ate at the Four Seasos. he statstcal questo s whether the percetage of Coke purchased at McDoald s s sgfcatly dfferet for Rtz-Carleto ad Four Seasos patros. Now cosder a thrd example. A radom sample of respodets s asked two questos, () What fast-food restaurats have you vsted the last moth? () For each of these restaurats, how may bottles of each of a lst of soft drks dd you buy? he total umber of bottles of soft drks, as well as the total umber of bottles of each soft drk, are calculated for each fast-food restaurat. From these oe calculates, for example, the percetage of Coke bottles purchased at McDoald s ad the percetage of Coke bottles purchased at Burger Kg. he statstcal questo s whether the cosumpto percetage of Coke McDoald s s sgfcatly dfferet from the cosumpto percetage of Coke Burger Kg. hese three examples are qualtatvely dfferet, that the frst example the data about brad A are ot depedet of the data about brad B, as they were based o the same respodet (ad addto, sce the percetages have to sum to, the hgher the brad A percetage the lower wll be the brad B percetage). I the secod example, a subset of the respodets wll have bee patros of both the Rtz-Carleto ad Four Seasos restaurats, ad so ther Coke cosumpto McDoald s s detcal! I the thrd example, for those respodets who frequeted both McDoald s ad Burger Kg the Coke cosumptos the two fast-food restaurats are correlated.
What makes them smlar, though, s that what drves these percetages s the uderlyg volumetrc data that forms the bass for the percetage calculato. It s the dstrbuto of these data that determe the dstrbuto of the percetages, ad, tur, the proper method of testg these percetages for sgfcat dffereces. EAMPLE : AVERAGES OF MULINOMIAL PERCENAGES Suppose a respodet s asked to determe hs percetage allocato across p products (e.g., what fracto of hs dollar expedture a gve category does he sped o each product the category?) A sample of respodets s draw ad the average percetage s calculated for each product. Oe ow wats to kow f the percetage for product s sgfcatly dfferet from the percetage for product. We kow frst of all that, for each respodet, the p percetages are correlated, because they are requred to sum to. If p s the percetage for product ad p s the percetage for product for the -th respodet, the the varace of p -p s estmated by v =p (-p ) + p (-p ) + p p = p + p - (p - p ). he varace of the average of the dfferece of these proportos s therefore estmated by v v [ p p ( p p ) ] Now suppose that we have observatos oly o product, observatos oly o product, ad observatos o the par of products. I ths case the average for product s ad the average for product s p p p he varace of the dfferece of these two averages s p p ( p ) p ( p ) p p ( ) ( ) ( )( ) EAMPLE : DEPENDEN PAIRED/OVERLAP (MULI) Suppose we wated to compare the percet that respodets wth a gve attrbute cotrbute to a total of all respodets o that attrbute. For example, suppose colum records the umber of bottles of Coke cosumed a week by people who have ever eate at the Rtz-Carleto, colum
records the umber of bottles of Coke cosumed a week by people who have ever eate at the Four Seasos, the total row cotas the total cosumpto for Coke the respectve colums, ad row cotas the cosumpto for Coke of those respodets that had a specfc attrbute (e.g., Cokes purchased by respodets whose age was betwee 8 ad 35). he percetages questo here are the percetage that Coke purchases make up of the total volume of soft drks purchased amog those 8-35 year old respodets who have ever eate at the Rtz-Carleto restaurat ad that for those who have ever eate at the Four Seasos restaurat. he possble pared/overlap stuato s that there are respodets who have eate at both restaurats. Here s what such a table would look lke: Volume of soft drks purchased by Respodets aged 8 to 35 who have ever eate at Rtz-Carleto 4 Seasos Hyatt ------------ --------- ----- otal 566 5384 568 00.0% 00.0% 00.0% Coke 547 564 603 0.5% 0.6% 0.7% Peps 5459 5058 59 0.6% 9.5% 9.4% Seve Up 53 5664 5566 9.9% 0.6% 9.9% Sprte 5307 5555 66 0.3% 0.4% 0.9% Fata 4535 4638 4897 8.8% 8.7% 8.7% Dr. Pepper 543 5368 5745 0.0% 0.% 0.% Det Peps 5063 53 55 9.8% 9.8% 9.8% Det Coke 547 57 6066 0.6% 0.8% 0.8% Dr. Brow Cherry 4783 480 59 9.3% 9.% 9.% Dr. Brow Cel Ray 534 5505 5907 0.3% 0.4% 0.5% Let us beg wth the attrbute measures that make up the umerator of the percetage. Let us partto the respodets so that the frst respodets provde data for both colums ad (e.g., are betwee 8 ad 35 ad have eate at both the Rtz-Carleto ad Four Seasos restaurats), the ext m respodets provde data oly for colum (e.g., are betwee 8 ad 35 3
ad have oly eate at the Rtz-Carleto), ad the last p respodets provde data oly for colum (e.g., are betwee 8 ad 35 ad have oly eate at the Four Seasos). (here may be stll other respodets that provded data o some, f ot all, of the other baer tems, but ot o tems or. hese wll be dsregarded ths aalyss.) Let us deote by x the observed measuremet for both colums ad for respodet ( =,,, ), y the observed measuremet for respodet ( = +, +,, +m), ad by z the observed measuremet for respodet ( = +m+, +m+,, +m+p). (I assg each of these measuremets dfferet letter ames for clarty of exposto; the data are really a set of +m+p observatos.) he total of the measuremets for that attrbute for those respodg to colum s gve by x y. m ad the total of the measuremets for that attrbute for those respodg to colum s gve by mp x z m Let be the total of the measuremets for those respodg to colum across all attrbutes (e.g., the total Coke cosumpto respodets of all ages who ever ate at the Rtz-Carleto) ad be the total of the measuremets for those respodg to colum across all attrbutes (e.g., the total Coke cosumpto respodets of all ages who ever ate at the Four Seasos). he the percetages uder cosderato are p, p he dfferece of the two percetages s gve by m m ( ) x ( ) my ( ) pz mp x y x z d p p where x s the mea of the measuremets for colum amog those who qualfed for both colums ad, y s the mea of the measuremets amog those who qualfed oly for colum, ad z s the mea of the measuremets amog those who qualfed oly for colum. herefore the varace of the dfferece of the two percetages, codtoal o the totals ad, s gve by 4
( ) ( ) m ( ) p x y z where x s the varace of the measuremets colum of those respodets who qualfed for both colums ad, y s the varace of the measuremets colum of those respodets who oly qualfed for colum, ad z s the varace of the measuremets colum of those respodets who oly qualfed for colum, he estmate of the varace of the dfferece of the two percetages s gve by m mp ( x x) ( ) ( ) y y z z m d [ ( ) ] ( ) ( m ) ( p ) s m p EAMPLE 3: DEPENDEN PAIRED/OVERLAP (LOC+) Suppose we wated to compare the percet that respodets wth a gve attrbute cotrbute to a total of all respodets o that attrbute. For example, suppose colum records the umber of bottles of each of a umber of soft drks cosumed a week by people who ate at McDoald s, colum records the umber of bottles of each of a umber of soft drks cosumed a week by people who ate at Burger Kg, the total row cotas the total cosumpto of soft drks the respectve colums, ad row cotas the cosumpto for Coke each of the two restaurats,. he percetages questo here are the percetage of the total McDoald s soft drk cosumpto that s attrbutable of Coke ad the percetage of the total Burger Kg soft drk cosumpto that s attrbutable of Coke. he possble pared/overlap stuato s that there are respodets who purchased soft drks (ot ecessarly Coke) at both restaurats. Volume of soft drks purchased at McDoald's Burger Kg Al's --------- ----------- ---------- otal 70090 49366 60373 00.0% 00.0% 00.0% Coke 7595 484 638 0.8% 9.8% 0.3% Peps 658 507 5743 9.3% 0.3% 9.5% Seve Up 6874 499 608 9.8% 0.% 0.0% 5
Sprte 7330 54 7088 0.5% 0.6%.7% Fata 609 4846 69 8.7% 9.8%.4% Dr. Pepper 73 489 6409 0.% 9.8% 0.6% Det Peps 769 469 658 0.% 9.5% 0.% Det Coke 7707 4770 559.0% 9.7% 9.% Dr. Brow Cherry 6404 505 567 9.% 0.% 9.3% Dr. Brow CelRay 779 5067 465 0.4% 0.3% 7.7% Let us beg wth the attrbute measures that make up the umerator of the percetage. Let us partto the respodets so that the frst respodets provde data for both colums ad, the ext m respodets provde data oly for colum ad the last p respodets provde data oly for colum. (here may be stll other respodets that provded data o some, f ot all, of the other baer tems, but ot o tems or. hese wll be dsregarded ths aalyss.) Let us deote by x the observed measuremet for colum for respodet ( =,,, ), by x the observed measuremet for colum for respodet ( =,,, ), by y the observed measuremet for respodet ( = +, +,, +m), ad by z the observed measuremet for respodet ( = +m+, +m+,, +m+p). (I assg each of these measuremets dfferet letter ames for clarty of exposto; the data are really a set of +m+p observatos.) he total of the measuremets for that attrbute for those respodg to colum s gve by m x y. ad the total of the measuremets for that attrbute for those respodg to colum s gve by mp m x z Let be the total of the measuremets for those respodg to colum across all attrbutes ad be the total of the measuremets for those respodg to colum across all attrbutes. he the percetages uder cosderato are 6
p, p he dfferece of the two percetages s gve by m m x x ( ) ( ) my ( ) mp x y x z d p p pz where x j s the mea of the measuremets for colum j (j=,) amog those who qualfed for both colums ad, y s the mea of the measuremets amog those who qualfed oly for colum, ad z s the mea of the measuremets amog those who qualfed oly for colum. herefore the varace of the dfferece of the two percetages, codtoal o the totals ad, s gve by ( ) ( ) m ( ) p x x x x y z where s the varace of the measuremets colum of those respodets who qualfed x for both colums ad, s the varace of the measuremets colum of those x respodets who qualfed for both colums ad, r s the correlato betwee the measuremets colum ad colum of those respodets who qualfed for both colums ad, s the varace of the measuremets colum of those respodets who oly y qualfed for colum, ad z s the varace of the measuremets colum of those respodets who oly qualfed for colum. he estmate of the varace of the dfferece of the two percetages s gve by 7
m mp ( x x ) ( x x ) ( x x )( x x ) ( y y) ( z z ) m d [ ] ( ) ( ) ( ) ( m ) ( p ) s m p x x x x y y z z ) m p ( ) m mp { } ( ) ( m ( m) p EAMPLE : WEIGHED DEPENDEN PAIRED/OVERLAP (MULI) Let us deote by x the observed measuremet for both colums ad for respodet ( =,,, ), y the observed measuremet for respodet ( = +, +,, +m), ad by z the observed measuremet for respodet ( = +m+, +m+,, +m+p). (I assg each of these measuremets dfferet letter ames for clarty of exposto; the data are really a set of +m+p observatos.) he weghted total of the measuremets for that attrbute for those respodg to colum s gve by w x w y. m ad the total of the measuremets for that attrbute for those respodg to colum s gve by mp w x w z m Let be the weghted total of the measuremets for those respodg to colum across all attrbutes (e.g., the total Coke cosumpto respodets of all ages who ever ate at the Rtz- Carleto) ad be the weghted total of the measuremets for those respodg to colum across all attrbutes (e.g., the total Coke cosumpto respodets of all ages who ever ate at the Four Seasos). he the percetages uder cosderato are p, p he dfferece of the two percetages s gve by m m mp w x w y w x w z d p p 8
herefore the varace of the dfferece of the two percetages, codtoal o the weghted totals ad, s gve by ( ) ( ) ( ) m mp x w y w z w m where x s the varace of the measuremets colum of those respodets who qualfed for both colums ad, y s the varace of the measuremets colum of those respodets who oly qualfed for colum, ad z s the varace of the measuremets colum of those respodets who oly qualfed for colum, he estmate of the varace of the dfferece of the two percetages s gve by m mp ( x x) ( ) ( ) y y z z m mp m d [ ( ) ] ( ) ( m ) ( p ) m s w w w EAMPLE 3: WEIGHED DEPENDEN PAIRED/OVERLAP (LOC+) Let us deote by x the observed measuremet for colum for respodet ( =,,, ), by x the observed measuremet for colum for respodet ( =,,, ), by y the observed measuremet for respodet ( = +, +,, +m), ad by z the observed measuremet for respodet ( = +m+, +m+,, +m+p). (I assg each of these measuremets dfferet letter ames for clarty of exposto; the data are really a set of +m+p observatos.) he weghted total of the measuremets for that attrbute for those respodg to colum s gve by m w x w y. ad the weghted total of the measuremets for that attrbute for those respodg to colum s gve by mp m w x w z 9
Let be the weghted total of the measuremets for those respodg to colum across all attrbutes ad be the weghted total of the measuremets for those respodg to colum across all attrbutes. he the percetages uder cosderato are p, p he dfferece of the two percetages s gve by m m mp w x w y w x w z d p p herefore the varace of the dfferece of the two percetages, codtoal o the totals ad, s gve by ( ) ( ) ( ) m mp x x x x w y w z w m where s the varace of the measuremets colum of those respodets who qualfed x for both colums ad, s the varace of the measuremets colum of those x respodets who qualfed for both colums ad, r s the correlato betwee the measuremets colum ad colum of those respodets who qualfed for both colums ad, s the varace of the measuremets colum of those respodets who oly y qualfed for colum, ad z s the varace of the measuremets colum of those respodets who oly qualfed for colum. he estmate of the varace of the dfferece of the two percetages s gve by 0
m ( x x ) ( x x) ( x x )( x x) ( ) y y m d [ ] ( ) ( ) ( ) ( m ) s w w mp m ( z z) ( p) mp m x x x x w m mp { } ( ) ( ) y y z z m mp m w w w ( m ) ( p ) m COMPARISON WIH OAL Here the stuato s compouded by the fact that, whe oe calculates a percetage based o a total for a row of a table, that total cotas the total for the colum whch s beg compared to the total colum. here s therefore bult part/whole correlato betwee the two percetages beg compared. EAMPLE : COMPARISON WIH OAL (MULI) UNWEIGHED & WEIGHED Let us deote by x the observed measuremet for colum for respodet ( =,,, ), y the observed measuremet for respodet ( = +, +,, +m). he total of the measuremets for that attrbute for those respodg to colum s gve by x. ad the total of the measuremets for that attrbute for those respodg to the total s gve by m x y Let be the total of the measuremets for those respodg to colum across all attrbutes (e.g., the total Coke cosumpto respodets of all ages who ever ate at the Rtz-Carleto) ad be the weghted total of the measuremets for all respodets across all attrbutes (e.g., the total Coke cosumpto respodets of all ages). he the percetages uder cosderato are p, p
he dfferece of the two percetages s gve by m x x y d p p herefore the varace of the dfferece of the two percetages, codtoal o the totals ad, s gve by ( ) ( ) m x y where x s the varace of the measuremets colum of those respodets who qualfed for colum ad y s the varace of the measuremets colum of those respodets who cotrbuted to the total but dd ot qualfy for colum. he estmate of the varace of the dfferece of the two percetages s gve by If the dffereces are weghted, the m ( x x) ( ) y y d ( ) ( ) ( m) s m m w x x w y dw pw pw w w where w s the weghted total of the measuremets for those respodg to colum across all attrbutes ad w s the weghted total of the measuremets for all respodets across all attrbutes. he the varace of the dfferece of the two weghted percetages, codtoal o the totals w ad w, s gve by ( ) ( ) m x w y w w w w he estmate of the varace of the dfferece of the two weghted percetages s gve by
m ( x x) ( ) y y m d ( ) ( ) w w ( m) w s w w EAMPLE 3: COMPARISON WIH OAL (LOC+) UNWEIGHED & WEIGHED o deal wth the comparso of a colum volumetrc percetage wth a total volumetrc percetage we wll eed a bt of extra otato. Let be the umber of respodets ad c be the umber of colums the table o whch the total s based. Defe as f respodet aswered tem j ad as 0 f respodet dd ot aswer tem j, for =,,, ad j=,,, c. Let us deote by x the observed measuremet for colum j for respodet. (As you ca see, the are used to keep track of the o aswers the data.) he total of the measuremets for that attrbute for those respodg to colum s gve by x. ad the total of the measuremets for that attrbute for all respodets s gve by c j x x Let be the total of the measuremets for those respodg to colum across all attrbutes ad be the total of the measuremets for those across colums across all attrbutes. he the percetages uder cosderato are p, p he dfferece of the two percetages s gve by d p p j c x x j ( ) ( ) x c x herefore the varace of the dfferece of the two percetages, codtoal o the totals ad, s gve by c c ( ) ( ) j ( )( ) j j j j 3
where j s the varace of the measuremets colum j ad, r j s the correlato betwee the measuremets colum ad colum j of those respodets who qualfed for both colums ad j. he estmate of the varace of the dfferece of the two percetages s gve by j c c ( x x ) ( x x ) ( x x )( x x ) ( ) ( ) ( )( ) j j j j j j j where ad j j Whe the data are weghted the w x. w ad the total of the measuremets for that attrbute for all respodets s gve by c w j x w x w Let w be the weghted total of the measuremets for those respodg to colum across all attrbutes ad w be the total of the measuremets for those across colums across all attrbutes. he the percetages uder cosderato are w w p w, pw w he dfferece of the two percetages s gve by w w w w w d p p xw j c x w x w w w w j ( ) ( ) herefore the varace of the dfferece of the two percetages, codtoal o the totals ad, s gve by c c ( ) w ( ) j w ( )( ) j j w c x w w w w w j w w w j 4
he estmate of the varace of the dfferece of the two percetages s gve by j ( x x ) ( x x ) ( x x )( x x ) ( ) ( ) ( )( ) c j c j w w w w w w j j w w w j j 5