World Academy of Sciece Egieerig ad Techology Iteratioal Joural of Mathematical ad Computatioal Scieces Vol:5 No:4 0 Approximatios to the Distributio of the Sample Correlatio Coefficiet Joh N Haddad ad Serge B Provost Digital Ope Sciece Idex Mathematical ad Computatioal Scieces Vol:5 No:4 0 wasetorg/publicatio/380 Abstract Give a bivariate ormal sample of correlated variables X iy i i a alterative estimator of Pearso s correlatio coefficiet is obtaied i terms of the rages X i Y i A approximate cofidece iterval for ρ XY is the derived ad a simulatio study reveals that the resultig coverage probabilities are i close agreemet with the set cofidece levels As well a ew approximat is provided for the desity fuctio of R the sample correlatio coefficiet A mixture ivolvig the proposed approximate desity of R deoted by h Rr ad a desity fuctio determied from a kow approximatio due to R A Fisher is show to accurately approximate the distributio of R Fially early exact desity approximats are obtaied o adjustig h Rr by a 7 th degree polyomial Keywords Sample correlatio coefficiet desity approximatio cofidece itervals I INTRODUCTION CORRELATION betwee two variables is geerally uderstood to imply a certai departure from stochastic idepedece For a discussio o the cocept of correlatio ad certai of its misiterpretatios the reader is referred to [] ad the refereces therei The most commo measure of correlatio betwee the radom variables X ad Y is Pearso s product-momet correlatio coefficiet E[X μ X Y μ Y ] ρ XY E[X μx ]E[Y μ Y ] where μ X EX ad μ Y EY Give a radom sample X i Y i i from a bivariate ormal distributio ρ XY is customarily estimated by the sample correlatio coefficiet X i R X Y i Ȳ S X S Y where X X i/ Ȳ Y i/ SX X i X / ad SY Y i Ȳ / R A Fisher obtaied the followig represetatio of the exact desity fuctio of R i []: f R r 3 π 3! ρ / r /4 i + ρr i i! i0 Joh Haddad is Associate Professor i the Departmet of Mathematics ad Statistics at Notre Dame Uiversity - Louaize Zouk Mosbeh Lebao E- mail: johhaddad@duedulb Serge Provost is Professor of Statistics i the Departmet of Statistical & Actuarial Scieces at The Uiversity of Wester Otario Lodo Caada N6A 5B7 Correspodig author s e-mail address: provost@statsuwoca for r However this series represetatio coverges very slowly Fisher s Z-trasform is a well kow trasformatio of R whose associated approximate ormal distributio possesses some shortcomigs especially whe the sample size is small ad ρ XY is large i which case the distributio of R is markedly skewed Is was show i [3] that the ormal approximatio requires large sample sizes to be valid Moreover as metioed i [4] the variace of R chages with the mea I the case of bivariate ormal vectors it is kow that the asymptotic variace of Fisher s Z statistic does ot deped o ρ XY However as was poited out for istace by [5] ad [6] this property does ot ecessarily carry over to o-ormally distributed vectors Whe X ad Y follow a bivariate ormal distributio with zero meas uit variaces ad correlatio coefficiet ρ ad a radom sample of size is available the statistics beig utilized to make ifereces about the populatio correlatio coefficiet are usually expressed i terms of the products X i Y i i It would appear that as a alterative the set of ordered pairs Z i Z i i where Z i MiX i Y i ad Z i MaxX i Y i has yet to be fully exploited for drawig ifereces about ρ It is show i Sectio that oe ca ideed make ifereces about ρ from the rages of the pairs X i Y i or equivaletly the absolute value of the differeces X i Y i Z i Z i D i Approximate cofidece itervals for ρ which are based o X i Y i ad X i + Y i are derived i Sectio 3 Sectio 4 proposes two approximatios to the desity fuctio of R which tur out to be more accurate tha that determied from Fisher s Z statistic II A RANGE-BASED ESTIMATOR Assumig that X Y follows a bivariate ormal distributio with zero meas uit variaces ad correlatio coefficiet ρ the joit probability desity fuctio of X Y is give by f XY x y π ρ exp x ρxy + y ρ for all real values of x ad y The joit desity fuctio of the order statistics Z Z is the g ZZ z z! f XY z z for < z < z < with Z MiX Y ad Z MaxX Y that is g ZZ z z π ρ exp z ρz z + z ρ The desity fuctio of D Z Z ca be obtaied as follows Lettig D Z Z ad Z Z oe has Z 3 Iteratioal Scholarly ad Scietific Research & Iovatio 54 0 658
World Academy of Sciece Egieerig ad Techology Iteratioal Joural of Mathematical ad Computatioal Scieces Vol:5 No:4 0 Digital Ope Sciece Idex Mathematical ad Computatioal Scieces Vol:5 No:4 0 wasetorg/publicatio/380 Z ad Z Z The Jacobia of this iverse trasformatio beig the joit desity of D ad Z is f DZ d z π ρ exp ρz + dz+d ρ where <z< ad 0 d< The margial desity of D is the obtaied by itegratig out Z as follows: d f D d f DZ d zdz π ρ exp ρ z dz exp dz +ρ where z dz exp dz +ρ d z + d exp exp dz 4 + ρ + ρ/ d +ρ exp π 4 + ρ sice the itegrad exp z + d / + ρ/ is proportioal to a N d/ + ρ/ desity fuctio N μ θ deotig a ormal distributio with mea μ ad variace θ The desity of D is therefore f D d exp π ρ for 0 d< ad f D d 0for d<0 d 4 ρ Thus o the basis of the rages d i x i y i i determied from the pairs of observatios x y x y the likelihood fuctio is 4 Ld ; ρ π ρ / exp d i 5 4 ρ where d d d the loglikelihood fuctio beig O settig l logl logπ ρ d i 4 ρ dl dρ equal to zero oe has ρ ˆρ d i 4 ρ d i 0 ad the mle of ρ is give by ˆρ d i that is ˆρ d 6 where d d i I order to determie the exact distributio of the correspodig estimator oe may use the fact that Di Xi Y i is the sum of the squares of idepedet N 0 ρ radom variables Oce a radom sample from a bivariate ormal distributio whose meas ad variaces are ukow is secured the variables ca be stadardized by lettig Xi X i X/S X ad Yi Y i Ȳ /S Y The o substitutig these stadardized variables i Equatio 6 oe obtais the followig represetatio of the estimator: ˆρ s X i Y i Xi X S X Y i Ȳ S Y SY X i X S X Y i Ȳ S X S Y [ X i X SX + Y i Ȳ SY X i XY i Ȳ ] S X S Y [ S X SX + ] S Y SY R R R + R 7 Cosequetly as icreases ˆρ s will ted to R ad share its distributioal properties Fisher showed that ER ρ ρ ρ Accordigly a approximate expressio for Eˆρ s ca be obtaied as follows: Eˆρ s + ER + ρ ρ + ρ ρ + ρ3 ρ 3 ρ + ρ + 3 3 ρ ρ ρ 3 ρ 3 is the approximate bias Thus ρ associated with ˆρ s Note that uder the iitial distributioal assumptios X i Y i ρ X i Y i χ ρ as X i Y i N0 ρ This implies that D X i Y i ρ χ Iteratioal Scholarly ad Scietific Research & Iovatio 54 0 659
World Academy of Sciece Egieerig ad Techology Iteratioal Joural of Mathematical ad Computatioal Scieces Vol:5 No:4 0 Digital Ope Sciece Idex Mathematical ad Computatioal Scieces Vol:5 No:4 0 wasetorg/publicatio/380 Thus ρ Eˆρ ρ ad ρ Var ˆρ ρ Observe that the variace will be larger whe ρ is egative This suggests makig use of a estimator that is expressed i terms X i + Y i whe ρ is egative where Y i deotes the secod compoet of the i th pair of egatively correlated radom variables i a sample of size This will result i a variace give i terms of + ρ where ρ deotes the egative correlatio coefficiet Such a estimator ca be derived as follows: Let X ad Y be egatively correlated with correlatio coefficiet ρ X N0 ad Y N0 ad let Y Y ad ρ ρ ; the give a sample X i Y i i of egatively correlated variables oe ca form a sample X i Y i of positively correlated variables ad make use of the estimator ˆρ X i Y i which ca be re-expressed as X i + Y i The ρ is take to be ˆρ X i + Y i Sice X i +Y i N0 +ρ it follows that E ρ ρ ad Var ρ + ρ III APPROXIMATE CONFIDENCE INTERVALS FOR ρ If oe assumes that the radom vector X i Y i follows a bivariate ormal distributio with zero meas equal variaces σ ad correlatio coefficiet ρ the o otig that X i +Y i N0 σ + ρ X i Y i N0 σ ρ ad that X i + Y i ad X i Y i are idepedetly distributed for i oe has X i + Y i /σ + ρ X i Y i /σ ρ F 8 Lettig X i+y i ad D X i Y i a 00 α% cofidece iterval for ρ ca be determied as follows from the pivotal quatity give i the left-had side of 8 First oe has Pr F α < ρ D +ρ < α or D Pr F α D < ρ + +ρ < D D α + The lettig θ D F α ad θ D where θ ad θ are greater tha zero ad otig that θ < ρ +ρ is equivalet to ρ< θ +θ it follows that Similarly leads to so that Pr D <ρ< + D ρ< D F α + D F α θ > ρ ρ ρ> D + D D F α + D F α α Thus D+ d D + d F α 9 + d + d F α is a 00 α% cofidece iterval for ρ m deotig the 00 α th percetile of a F distributio havig ad m degrees do freedom A simulatio study cofirmed that the coverage probabilities of this cofidece iterval are cosistetly i close agreemet with the set cofidece levels Samples of size 50 were geerated assumig that ρ 05 The coverage probabilities ca be readily deduced from the results preseted i Table TABLE I NUMBER OF TIMES ρ 05 LIES OUTSIDE THE CI S FOR 50 No of CI s α 5% α % 0000 5 00000 5048 037 I practice it is seldom the case that oe will ecouter a bivariate data set whose uderlyig distributio satisfies the assumptios iitially made i Sectio Nevertheless i terms of the stadardized variables Xi ad Yi oe has that X i + Y i / + ρ X i Y i 0 / ρ is approximately distributed as a F radom variable for sufficietly large This distributioal result ca be justified as follows Observe that as gets large VarXi ad VarYi The approximately ad ad X i X i + Y i N0 + ρ X i Y i N0 ρ + Y i ad X i Yi are early idepedetly Iteratioal Scholarly ad Scietific Research & Iovatio 54 0 660
World Academy of Sciece Egieerig ad Techology Iteratioal Joural of Mathematical ad Computatioal Scieces Vol:5 No:4 0 Digital Ope Sciece Idex Mathematical ad Computatioal Scieces Vol:5 No:4 0 wasetorg/publicatio/380 distributed Thus X i + Y i χ approximately +ρ oe degree of freedom beig lost sice the mea of X + Y is estimated by X + Ȳ Similarly X i Y i χ approximately ρ Accordigly the ratio give i 0 has approximately a F distributio It follows from Equatio 7 that Xi Yi S X S + S Y X SY R Similarly X i + Y i S X S X + S Y S Y + R R + R Thus oe has X i + Y i / + ρ ρ X i Y i X i + Y i / ρ + ρ X i Y i ρ + R + ρ R ρ + R + ρ R which is approximately distributed as a F radom variable A derivatio aalogous to that employed for obtaiig the cofidece iterval give i 9 leads to the followig approximate cofidece iterval for ρ at cofidece level α: D+ D / D + D F α/ + D / + D F α/ where +R ad D R I a small-scale simulatio study 0000 ad 00000 samples of size 50 were geerated assumig that ρ 05 The resultig coverage probabilities ca be deduced from the results icluded i Table TABLE II NUMBER OF TIMES ρ 05 LIES OUTSIDE THE CI S FOR 50 No of CI s α 5% α % 0000 464 79 00000 480 95 IV ALTERNATIVE DENSITY APPROXIMATIONS FOR R Two approximatios to desity fuctio of R are proposed i this sectio The first oe is obtaied by applyig the chage of variable techique to the quatity specified by Equatio Let uρ ρ +r +ρ ad x uρ r which which as explaied i the previous sectio is approximately distributed as a F radom variables Sice the probability desity fuctio of the F m distributio is x oe has f m x f x +m m m + x m x 3 +m + x Notig that dx r the resultig approximatio to the desity fuctio of R is give by h R r dr uρ uρ r 3 3 r + uρ+r Alteratively a approximate desity fuctio ca be derived as follows from Fisher s Z-trasform that is Z l +R R Let z +r l r so that r ez e z + The dr dz ez e z + ez e z e z + 4e z e z + [ r 3 e z ] 3 4e z 3 e z + e z + r+uρ+r ez e z + +uρ + ez e z + ad the desity of Z is e z uρ g Z z +uρe z Sice uρ ρ +ρ the desity of Z ca also be expressed as follows: g Z z uρ e z 4e z e z + 3 4ez +uρez e z + e z + uρ +uρe z exp z + +ρ l ρ + 4 + exp z l +ρ ρ Clearly as defied above Z is ot ormally distributed which is cosistet with a remark made by [7] Nevertheless Iteratioal Scholarly ad Scietific Research & Iovatio 54 0 66
World Academy of Sciece Egieerig ad Techology Iteratioal Joural of Mathematical ad Computatioal Scieces Vol:5 No:4 0 Digital Ope Sciece Idex Mathematical ad Computatioal Scieces Vol:5 No:4 0 wasetorg/publicatio/380 we observed that Z teds to a ormal distributio with mea l[+ρ/ ρ] ad variace / For compariso purposes Z l[ + R/ R] that is Fisher s Z-trasform applied to R is kow to be asymptotically distributed as a N l[ + ρ/ ρ] / 3 radom variable Upo iversio via the chage of variable techique a approximatio to the desity fuctio of R e Z /e Z + ca be obtaied as follows Sice z +r l r dz dr r ad give that the approximate desity of Z is 3 fz exp 3 z +ρ π l ρ oe has the followig approximate desity fuctio for R: 3 g r π r exp 3 +r l r +ρ l 5 ρ Iterestigly a equal mixture of the approximate desities give i 3 ad 5 provides more accurate approximatios tha either oe of them as 5 overestimates the variace while 3 uderestimates it This is graphically illustrated i Figures ad Fig Exact desity of R from Equatio : solid lie ad two approximate desities from Equatio 3: log dashes ad Equatio 5: dashed lie for ρ 0 05 ad 09 left to right ad sample sizes 5 ad 5 top ad bottom graphs Aother approximatio to the desity of R is ow obtaied by multiplyig the proposed approximate desity h R r by p d r a polyomial of degree d so that the first d momets of the resultig desity hp d r h R r p d r 6 coicide with those of R This approach is discussed for istace i [8] Lettig p d r d j0 ξ j r j the coefficiets ξ j are determied as follows assumig a polyomial adjustmet of degree d 7 Fig Exact desity of R from Equatio : dotted lie ad mixture of the approximate desity fuctios specified by 3 ad 5: dashed lie for ρ 0 05 ad 09 left to right ad sample sizes 5 ad 5 top ad bottom graphs Fig 3 Exact desity of R from Equatio : dotted lie ad the approximate desity from Equatio 6 with d 7: dashed lie for ρ 0 05 ad 09 left to right ad sample sizes 5 ad 5 top ad bottom graphs First lettig m i deote the i th momet of the distributio specified by h R r we evaluate the 8 8 matrix M whose j th row is m j m j m j+6 j 8 as well as its iverse M We the multiply M by μ 0 μ 7 the vector of exact momets of R i order to determie the polyomial coefficiets ξ 0 ξ 7 The resultig approximate desity that is hp 7 r h R r 7 j0 ξ jr j is plotted i Figure 3 for certai values of ρ ad Maifestly this approximatio proves to be remarkably accurate ACKNOWLEDGMENT The secod author wishes to ackowledge the fiacial support of the Natural Scieces ad Egieerig Research Coucil of Caada Iteratioal Scholarly ad Scietific Research & Iovatio 54 0 66
World Academy of Sciece Egieerig ad Techology Iteratioal Joural of Mathematical ad Computatioal Scieces Vol:5 No:4 0 Digital Ope Sciece Idex Mathematical ad Computatioal Scieces Vol:5 No:4 0 wasetorg/publicatio/380 REFERENCES [] A M Mathai The cocept of correlatio ad misiterpretatios Iteratioal Joural of Mathematical ad Statistical Scieces 998 7: 57 67 [] R A Fisher Distributio of the values of the correlatio coefficiet i samples from a idefiitely large populatio Biometrika 95 0: 507 5 [3] A Witerbottom A ote o the derivatio of Fisher s trasformatio of the correlatio coefficiet The America Statisticia 979 33: 4 43 [4] H Hotellig New light o the correlatio coefficiet ad its trasforms Joural of Royal Statistical Society Ser B 953 5: 93 3 [5] A K Gaye The frequecy distributio of the product-momet correlatio coefficiet i radom samples of ay size draw from o-ormal uiverses Biometrika 95 38: 9 47 [6] D L Hawkis Usig U statistics to derive the asymptotic distributio of Fisher s Z statistic The America Statisticia 989 43: 35 37 [7] S Koishi A approximatio to the distributio of the sample correlatio coefficiet Biometrika 978 65: 654 656 [8] H-T Ha ad S B Provost A viable alterative to resortig to statistical tables Commuicatios i Statistics Simulatio ad Computatio 007 36: 35 5 Iteratioal Scholarly ad Scietific Research & Iovatio 54 0 663