PROBABILITY AND STATISTICS Vol. III - Correlatio Aalysis - V. Nollau CORRELATION ANALYSIS V. Nollau Istitute of Mathematical Stochastics, Techical Uiversity of Dresde, Germay Keywords: Radom vector, multivariate ormal distributio, simple correlatio, partial correlatio, multiple correlatio, caoical correlatio. Cotets. Correlatio Betwee Two Radom Variables (Simple Correlatio. Partial Correlatio 3. Multiple Correlatio 4. Caoical Correlatio Ackowledgemets Glossary Bibliography Biographical Sketch Summary Correlatio aalysis is oe of the most importat aspects of multivariate statistical theory. Based o the differet defiitios of correlatio coefficiets (ordiary, partial, multiple ad caoical, which (geerally measure the liear associatio betwee radom variables or groups of radom variables, a statistical aalysis eables to explore the joit performace of the variables ad to determie the effect of each of these variables i the presece of the others.. Correlatio Betwee Two Radom Variables (Simple Correlatio X Let X be a -dimesioal radom vector the expectatio E ( X μ X X EX μ (that meas E μ ad the covariace matrix X EX μ σ σ Γ X. σ σ The the (simple or ordiary correlatio coefficiet of X ad X is defied by ( X X ( X var ( X cov, X, X ( var Ecyclopedia of Life Support Systems (EOLSS
PROBABILITY AND STATISTICS Vol. III - Correlatio Aalysis - V. Nollau ( ( X X ( X X ( X X ( X X cov, cov, E E E ( ( ad ( X ( X X var E E >0,,. (3 i i i i This correlatio coefficiet is a quatitative measure for the (liear associatio - called correlatio - betwee the radom variables X ad X the followig properties resp. is called positive( egative resp. maximal correlatio. ( ( If ad oly if (maximal correlatio there exist real costats a, a, ay + ay + b 0. If oe relabels the radom variables Y ad Y by ( Y ax + b a >0, b real ad Y cx + d ( c > 0, d real, b the the correlatio coefficiet betwee Y ad Y is the same as the correlatio coefficiet betwee X ad X : Y, Y X, X. (This property especially shows that the correlatio coefficiet is a quatitative measure for the liear associatio betwee two radom variables. If a radom d-dimesioal vector X has the covariace matrix ( jk,..., Γ X Σ σ j d (4 k,..., d ( j ( >0 ( j, k var X j k cov X X j k, σ jk (5 Ecyclopedia of Life Support Systems (EOLSS
PROBABILITY AND STATISTICS Vol. III - Correlatio Aalysis - V. Nollau the ( Xj Xk ( X var( X σjk cov, X, j X (6 k σ σ var jj kk j k is the correlatio coefficiet betwee two compoets of X, say, Give a (mathematical sample X,...,X X j ad X k. Xi X i, i,..., Xi X (idepedet observatios of X, the correlatio coefficiet X X, X is estimated by the (ordiary sample correlatio coefficiet ( X If i ( Xi Xi( Xi Xi ( Xi Xi ( Xi Xi i i i X i ad Xi X i. i i X X is ormally distributed the covariace matrix X Γ X Σ σ σ, σ σ (7 the the desity of X : T ( ( ( ( ( xμ Σ xμ f : det X fx x Σ e x ( x, x < x, x< π (8 ( Ecyclopedia of Life Support Systems (EOLSS
PROBABILITY AND STATISTICS Vol. III - Correlatio Aalysis - V. Nollau has the followig form f X or σ( xμ σ( xμ( xμ + σ( xμ ( σσσ e π σ σ σ ( x, x (9 f X μ μ σ ( x ( ( μ x μ x μ ( x μ + ( σ σσ σ e πσσ ( x, x E E ( X ( X σ var ( X ( X ( X X σ σ σ σ cov, σ σ var I this case >0 >0. (0 ( ( X i, (3 μ X, μ i (4 X X, ( i i σ i (5 X X, (6 ( i i σ i X X X X ad σ ( i i( i i (7 i are the so-called maximum likelihood estimators of μ, μ,σ, σ, (compare Statistical Iferece, that meas ad σ resp. Ecyclopedia of Life Support Systems (EOLSS
PROBABILITY AND STATISTICS Vol. III - Correlatio Aalysis - V. Nollau L ( X,..., X ; μ, μ, σ, σ, σ max + + ( μ, μ, σ, σ, σ L ( X,..., X ; μ, μ, σ, σ, σ (8 the likelihood fuctio L: L ( X,..., X; μ, μ, σ, σ, σ i (, fx X X i i ( πσσ σ ( π σσ σ i e e ( Xi ( Xi ( Xi + ( Xi σ μ σ μ μ σ μ ( σσ σ ( L L( μ μ σ σ σ ( σσσ (9 ( σ ( ( ( ( Xi μ σ Xi μ Xi μ + σ Xi μ i x,..., x ;,,,,, x,..., x, is the desity fuctio of : T X X... X. the -dimesioal radom vector ( Furthermore, it holds μ μ Eμ E, (0 μ μ ( ( ( E μμ E μμ μμ Γ μ ( E μ μ ( μ μ ( μ μ E ( μ ( μ μ var cov, cov ( μ, μ var( μ ( Ecyclopedia of Life Support Systems (EOLSS
PROBABILITY AND STATISTICS Vol. III - Correlatio Aalysis - V. Nollau σ σ, σ σ ad the sample covariace matrix σ σ Γ ( σ σ has the (probability desity (,, (,, f s s s f s s s Γ σ σ σ 4 ( σs σs + σs if ( ss s s ( >0, s >0 σσ σ e 4πΓ ( ad s < s s ( σσ σ 0 elsewhere the Gamma-fuctio Γ:Γ ( p t p e t d t( p>0 This implies the (probability desity f of the sample correlatio coefficiet ( ( ( r x x π 0( 0 0 4 d if r f r rx x (3 ad the sample fuctio (statistic T is t-distributed degrees of freedom.. (4 elsewhere (5 Thus to test the hypothesis H0 : 0 (versus the alterative H : 0 oe uses the statistic (5. The problem is somewhat difficult if oe wishes to test the hypothesis H :, < is specified, versus the alterative (hypothesis 0 0 0 0 ( A Ecyclopedia of Life Support Systems (EOLSS
PROBABILITY AND STATISTICS Vol. III - Correlatio Aalysis - V. Nollau HA : 0 (That meas, the correlatio coefficiet is assumed equal to a give value 0.. I this case R.A. Fisher (9 (cf. Nollau, V. ad Srivastava, M.S. ad Carter, E.M. suggested a trasformatio (Fisher s Z-trasformatio, c.f. Eq. (74: Z + l (6 + + ( Z 3 E Z l ad var. (7 ( With ζ l + + < < ( ( a ormal distributio ( ζ 3 the hypothesis H0: 0 the test statistic ( Z ζ 3 Fisher s Z-trasformatio has asymptotically Ν,, if the sample size teds to ifiity. Hece, uder 0 (8 ( ( + Z l, ( ad ζ 0 + 0 l + 0 0 ( (cf.eq.(7, (9 (30 is asymptotically stadardized ormally distributed. The asymptotic distributio of Z also implies that a asymptotic cofidece iterval for is Z z α Z + z α P tah < < tah α 3 3 (3 for a give cofidece level α( 0 < α<. Moreover, a asymptotic test for comparig the correlatio coefficiets ad of Ecyclopedia of Life Support Systems (EOLSS
PROBABILITY AND STATISTICS Vol. III - Correlatio Aalysis - V. Nollau two ormally distributed radom vectors X ad Y ca also be costructed by Fisher s trasformatio: Let X X X X, X,..., X ( 4 X X X ad (3 Y Y Y Y, Y,..., Y 4 Y Y Y ( idepedet radom samples from two two-dimesioal ormal populatios N EX ( μ Σ ad N(, Σ, i i μ ( i,..., ( i EY μ,...,, the covariace matrices Γ Γ Xi Yi μ the expectatio vectors σ σσ Σ σσ σ σ σσ Σ σσ σ ( i,..., ( i,..., ad the correlatio coefficiets ( i,..., ( i X, X i i,...,. Y, Y i i Uder the hypothesis H 0 : ( The correlatio coefficiets of both the populatios are equal. the (test statistic T Z Z + 3 3 (33 Ecyclopedia of Life Support Systems (EOLSS
PROBABILITY AND STATISTICS Vol. III - Correlatio Aalysis - V. Nollau Z + l ad l, + Z i ( Xi Xi( Xi Xi ( Xi Xi ( Xi Xi i i (34 (35 ad ( Yi Yi( Yi Yi i, ( Yi Yi ( Yi Yi i i X X j ad Y Y (, ji ji ji ji i i is asymptotically stadardized ormally distributed. Thus the hypothesis is to reject, if for a realizatio t of T based o cocrete samples (cf.eq. (3 holds t > z α respect to a give sigificace level α - - - ( α 0 < <. (36 TO ACCESS ALL THE 4 PAGES OF THIS CHAPTER, Visit: http://www.eolss.et/eolss-sampleallchapter.aspx Bibliography Johso N.L. ad Kotz S. (970, (97. Distributio i Statistics (cotiuous uivariate distributios-,, cotiuous multivariate distributios. New York: Joh Wiley & Sos. [This is a very importat stadard work for statistical research ad applicatios for three decades]. Müller P.H.(ed. (98. Lexiko der Stochastik. 5. Auflage. Berli: Akademie-Verlag. [This is a dictioary for all fields of stochastics a comprehesive descriptio of correlatio aalysis]. Muirhead R.J. (98. Aspects of Multivariate Statistical Theory. New York: Joh Wiley & Sos. [This Ecyclopedia of Life Support Systems (EOLSS
PROBABILITY AND STATISTICS Vol. III - Correlatio Aalysis - V. Nollau book presets all aspects of moder multivariate statistics, especially correlatio theory icludig caoical correlatio]. Nollau V. (979. Statistische Aalyse.. Auflage. Basel ud Stuttgart: Birkhäuser.[A importat chapter of this book presets simple, partial ad multiple correlatio aalysis]. Röhr: M. (987. Kaoische Korrelatiosaalyse. Berli: Akademie-Verlag. [This book presets a very comprehesive study about caoical correlatio aalysis may applicatios]. Seber G.A.F.(984. Multivariate Observatios. New York: Joh Wiley & Sos. [This moograph deals multivariate distributios, iferece for the multivariate ormal distributio, dimesioal reductios ad discrimiat aalysis, cluster aalysis ad MANOVA (multivariate aalysis of variace ad covariace]. Srivastava M.S. ad Carter E.M. (983. A Itroductio to Applied Multivariate Statistics. New York, Amsterdam, Oxford: North Hollad. [This is a textbook the mai topics: multivariate techiques as ANOVA, multivariate regressio, discrimiatio ad correlatio]. Biographical sketch V. Nollau was bor i 94 ad studied mathematics ad theoretical physics at the Techical Uiversity of Dresde (Germay. He graduated i 964, obtaiig doctorate i 966 ad 97 (Dr. habil.. From 969 he was assistat professor at TU Dresde. His mai research topics were operator theory, stochastic processes ad radom search. I 97 he made the first cotributios to stochastic optimizatio ad decisio processes theory. Sice 990 the author is professor for stochastic aalysis ad cotrol. He wrote several text works icludig "Statistische Aalyse" (Liear Models i Statistics. The author is dea of the faculty of mathematics i Dresde. Ecyclopedia of Life Support Systems (EOLSS