By Rudy A. Gdeon The Unvesty of Montana The Geatest Devaton Coelaton Coeffcent and ts Geometcal Intepetaton The Geatest Devaton Coelaton Coeffcent (GDCC) was ntoduced by Gdeon and Hollste (987). The GDCC was shown to possess all the popetes of a ank based coelaton coeffcent, a new ted value pocedue was ntoduced, the null dstbuton fo ndependence was gven, and a populaton ntepetaton was gven. Ths pape expands the exact dstbuton up to sample szes 5, gves a moe ntutve defnton based on the gaph of the data, and demonstates the geometcal unqueness of the defnton by 9 otatons of the gaph. An analogous gaph s used to gve a geometc defnton of the populaton o theoetcal value of GDCC. Fou coelaton coeffcents, GDCC, Kendall s, Speaman s, and an absolute value ank coelaton coeffcent ae computed on the scatteplot of the anked data by smple countng methods. The sgnfcance of GDCC as a obust coelaton coeffcent s llustated and the use of the asymptotc dstbuton s demonstated.. The Mathematcal Defnton of GDCC Ths pape bngs togethe components of the GDCC n ode to make t a useful tool fo studyng the elatonshps between vaables. We stat wth the basc o mathematcal defnton. The method wll be llustated wth the YMCA data that appeaed n the 987 atcle. The data came as the anks ( won-lost ecod vesus Spotsmanshp) of 6 teams of 4 th and 5 th gades n a YMCA basketball league. The Spotsmanshp ank was a summay tally of a weekly evaluaton by team coaches and offcals. The data s epoduced hee n Table. TABLE : YMCA team ankng n Spotsmanshp and won-lost ecod p 4 6 2 2 3 7 9 3 8 5 6 4 5 2 3 4 5 6 7 8 9 2 3 4 5 6 The Won-Lost ank data s the lowe ow and can be consdeed the ndependent vaable whle the uppe ow s the Spotsmanshp ank and s taken as the dependent vaable. By vewng the gaph of ths data ( see Gaph) one mmedately notces that (4,2) and (3,5) ae unusual data ponts. We wte the data n the fom {, p },,2,3,..., 6 ; that s, numecally odeed by the won-lost ankngs. The geneal defnton of the GDCC, wth sample sze n, s max n I( n + p > ) max n 2 n I( p > ) ()
2 whee the denomnato s the geatest ntege functon evaluated at n/2, and I s the ndcato functon. The ndcato functon s s the expesson s tue and f false. Fo small sample szes ths s easy to compute by hand, and the layout s shown fo the YMCA data n Table 2. Thee columns ae necessay, {, p, n + p} whee the fst column of the ognal data must be n numecal ode. Fom these columns the elements n the summatons n the fomula above ae computed. Ths computatonal opeaton follows: Column conssts of values of,2,3 6 and column 2 s the coespondng values of p (Spotsmanshp ank). In ode to compute I( p > ), labeled I-sum-R n column 3, a smple pocedue s used. Fo ow ( ) smply count how many p-anks (the p column) thee ae n ows and above that ae geate than. Snce p 4, thee s only one, namely 4 tself, so the 3 d column enty s. Fo ow 2 ( 2), p 2 and both 4 and (p and p 2 ) exceed 2 and hence the coespondng column 3 value s 2. Contnue down the column. One last example: fo ow 7 ( 7) we examne the numbe of entes fom {p,p 2,...,p 7 } {4,,6,2,3} whch exceed 7. Thee ae fve. In the exact same manne, columns and 4 (the evese column) ae used to calculate the values n column 5 whose fomula s the left hand sde of fomula (). The maxmums of columns 3 and 5, 6 and 3, ae convenently put on the bottom of columns 3 and 5. Then max( I sum L) max( I sum R) 3 6 3. 6 8 8 2 TABLE 2: Calculaton of the Geatest Devaton Coelaton Coeffcent column column 2 column 3 column 4 column 5 p I-sum-R n+-p I-sum-L 4 3 2 2 6 2 3 6 3 4 2 3 5 2 5 2 4 5 2 6 3 5 4 7 7 5 2 8 9 6 8 2 9 6 7 2 3 5 4 2 8 4 9 2 2 3 6 3 3 5 3 2 3 4 6 2 2 5 4 3 6 5 2 maxmum 6 3
3 2. The Geometcal Defnton of GDCC We now make the tanston fom the fomula calculaton of to a gaphcal countng technque. The scatteplot of the YMCA data {, p },,2,3,..., 6 s gven n fgue. The lnes wth slopes ± have been added to facltate the countng. Fst, the patal sum I( p > ) s meely the count among the fst p -anks of those that ae stctly geate than ;.e., the data ponts n the scatteplot stctly above and to the left of the pont (,) on the + dagonal. Ponts on the ght hand sde of the defnng ectangle ae ncluded n the count; thus, the countng egon s closed on the ght but open on the bottom. A maxmum of 6 s acheved at 9, and the bodes of ths ectangula egon ae shown on the gaph. Note that the pont (8,9) on the lowe s not counted because of the stct nequalty wheeas the pont (9,) s counted. Thus, thee ae 6 ponts fo the maxmum. If (8,8) s used to defne a countng egon, the maxmum of 6 s also acheved. Second, the patal sum I( n + p > ) I( p < n + ) s the count among the fst p -anks of those that ae stctly less than n+- 7-;.e., those ponts that ae on the vetcal o to the left and stctly below the pont (, 7-) on the dagonal. A maxmum of 3 s acheved at 2 wth the egon contanng the 3 ponts shown n the fgue. The maxmum s also acheved at 3, as s easly seen n the fgue. It s clea that the two pats of the numeato of can be calculated by tavesng the ± dagonals wth movng ectangles and detemnng whch ectangula aeas contan the maxmal numbe of data ponts. The numbe of ponts n each ectangula egon,2,3,... 6 wll coespond exactly to the numbes n the I-sum-R and I- sum-l columns of Table 2. Geometcally, t may appea that some nfomaton s lost o a dffeent value of could be obtaned by tavesng the ± dagonals on the othe sdes. The next secton elates ths queston to the geneal popetes of of ths secton was used n obtanng the asymptotc dstbuton of 3. Geometcal Unquenes of. The geometcal featue. Refeence? The nfomaton n the scatteplot should eman the same f the gaph s otated 9 and s calculated ove on ths new oentaton. When the gaph s otated 9, let the axes be elabeled n the usual left to ght and vetcal manne and ecompute.
4 Ths 9 otaton wll be done thee consecutve tmes. A fouth otaton bngs back the ognal gaph. Let the notaton (, + ) denote the elatonshp of the countng ectangles to the ognal oentaton of the axes. Thus, (, + ) means lowe ectangle fo the dagonal and uppe ectangle fo + dagonal, the scheme n fgue. We now explan what happens to the data unde thee consecutve 9 otatons; the oentaton of the countng ectangles elatve to the ognal fgue, and the new value of. In Table 3 the fst ow s the data afte a otaton, the ow the oentaton of the countng ectangles elatve to the ognal gaph, the thd ow s the value of, and the fouth ow s what s effectvely beng calculated as elated to the ognal data, x, won lost anks, and y p, the anks of spotsmanshp. Table 3:Rotatonal Effects on Fgue ognal fst otaton second otaton thd otaton data {,p } {7-p,} {7-,7-p } {p,7-} dag sdes (, + ) (, + ) (, + ) (, + ) value -3/8 3/8-3/8 3/8 (.,.) (x,y) (x,-y) (-x,-y) (-x,y) Not only does mantan the same absolute value, but the countng ectangles whle tavesng the ± dagonals gve exactly the same sequences of numbes. Thus, the I- sum-r and I-sum-L columns of the otatons ae exactly the same as fo the ognal data. Howeve, they may be n evese ode. Ths connects the geomety to popety 4 of Secton 3 and fact (d) on page 654 of the [987] pape. The eade s nvted to pefom these calculatons by otatng the fgue, elabelng the axes and ecomputng. 4. Populaton Defnton of GDCC Ths secton epeats the populaton ntepetaton of n the 987 pape and ntepets t elatve to the geometcal deas of the sample defnton. Let ( X, Y ) be a contnuous bvaate andom vaable wth magnal dstbuton functons F and G, espectvely. Let ( U, V ) ( F ( X ), G( Y )), and defne the Copula functon, Nelson (999), P( U t, V < t) C( t, t) P( U t, V > t) C( t,) C( t, t). A gaph of the densty of ( U, V ) s gven n whch the bvaate nomal fom whch t came has means and standad devatons of 2 and 32, and covaance 3. The populaton value of s 2sup C( t, t) 2sup[ C( t, ) C( t, t)]. (2) < t< < t<
5 The ght hand sde of the populaton value of coesponds to the ght hand sde of the statstc, max I ( p > ). The left hand sde sup C( t, t) coesponds to max I ( n + p > ). The dagonal lnes n the ( U, V ) plane, slopes ±, ae used n exactly the same manne to move ectangles along and detemne the ectangle wth the lagest volume unde the densty of ( U, V ). Rathe than count ponts the populaton value seeks the ectangles wth maxmum volume unde the densty of ( U, V ). Fo the specfc bvaate nomal fo the fgue, the coelaton coeffcent s ρ 3 ( 2 32) 3 8. In fomula (2) the supemums, as explaned n the 987 pape occu at t/2 and C (, ) + sn 2 2 4 2π 3.25 +.679.379, and 8 3 C(,) C(, ) ( + sn ).25.679.8882. 2 2 2 2 4 2π 8 Thus 2(.379)-2(.8882).62235.37764.2447. Fo the bvaate Nomal, the populaton value of s smplfed to 2 sn π 2 3 ρ. So, dectly, sn. 2447. π 8 The volumes.38 and.888 ae the volumes of the ectangla egons along the dagonals emanatng fom the pont (/2,/2) unde the (U,V) densty. In the fgue of the (U,V) densty the volume.32 s the egon unde the ased pat of the densty, nea the(,) pont. The.888 quantty came fom the volume unde that pat of the densty doppng off towad zeo, nea the pont (,). 5. The Null Dstbuton of The GDCC The null dstbuton fo n the 987 pape was gven exactly up to sample sze n. Table 5 ncludes ths ognal table and expands to lst the pobablty dstbuton up to sample sze n 5. The symmety of the Null Dstbuton s used so that only pobabltes of postve outcomes and zeo ae gven. A ctcal value table was gven n the ognal pape fo hypothess testng fo alpha values of.,.5, and.. An ntepolaton tem was gven so that a andomzed test could be used to obtan almost exact alpha levels. Wth the exact dstbutons fo n to 5, the smulaton estmates can be eplaced by moe exact values. Fo example f n 5 and a two-sded test of ndependence s desed, then fom the ognal pape, to obtan a two-sded test wth α. 5, eect H o : ndependence, f 4 7 and eect wth pobablty.399 f 3 7. The old pobablty was.3664. Thus,
6 ( eect tue) P( 4 7) + (.399) P( 3 7) P H o.6474 + (.399).8839.5 Table 4: Null Dstbuton: Geatest Devaton Coelaton Coeffcent fequency of outcomes, symmetc about zeo sample sze outcome n3 4 5 6 7 8 9 4 6 6 256 2848 6 6372 4624 475496 /5 562932 666468 ¼ 772 2366 /3 96 5 2/5 4792 528736 ½ 3 5 248 8992 3/5 36672 73228 2/3 35 595 ¾ 399 6927 4/5 4623 879 TOTAL 3! 4! 5! 6! 7! 8! 9!!!
7 sample sze outcome n 2 3 8323892 449824256 /6 4629788 599388 /3 3223332 633876372 ½ 946768 2633376 2/3 727632 28456272 5/6 53823 94863 TOTAL 2! 3! sample sze outcome n 4 5 3683667456 335549456 /7 73656736 2692784448 2/7 364344 585248588 3/7 274856 57739685652 4/7 758459648 9645298832 5/7 2688 4989392 6/7 627263 967359 TOTAL 4! 5! Ths s ust beng saved fo now: Column s headed by, the won-lost ankng, Column 2 by p, the Spotmanshp anks, and column 4 by n+-p, called the evese anks. The computatons ae shown n columns 3 and 5 and ae the tems I( p > ), labeled I-sum-R, and the tems I( n + p > ), labeled I-sum-L, espectvely. The hand, eye, ban algothm fo ceatng column 3 fom columns and 2 s exactly the same fo column 5 usng columns and 4. Put fnges unde the of column and unde the coespondng postons on columns 3 and 5 and count the y anks above that ae stctly geate than, and ente the esult. Move down o ncease and epeat. Wth ust a lttle pactce, t s easy to pefom ths opeaton wthout eally thnkng of these fomulae. the maxmums used fo GDCC ae gven at the bottom, 6 and 3 fo ths data. Thus -3/8.