Costructo of Coposte dces Presece of Outlers SK Mshra Dept. of Ecoocs North-Easter Hll Uversty Shllog (da). troducto: Oftetes we requre costructg coposte dces by a lear cobato of a uber of dcator varables. f we deote the dcator varables by X = [ x, x,..., x ] where each x has observatos (cases) ad weghts assged to those varables by w = [ w, w,..., w ] the the coposte dex = Xw obtas a sgle value for each case, or = x w ; =,. The weghts ay be detered subectvely or obectvely by certa cosderatos extraeous to the dataset X, or alteratvely they ay edogeously be detered by the statstcal forato obtaed fro dataset X tself. Edogeous weghts are frequetly obtaed by a statstcal techque called the Prcpal Copoets Aalyss (PCA), whch axzes the su of squared coeffcets of (the product oet) correlato betwee the derved coposte dex ad the dcator varables, X, or stated dfferetly, = Xw such that r (, x ) s axu. presece of szeable outlers the data varables, X, we caot expect the product oets correlato coeffcets to rea uaffected. The outlers dstort ea, stadard devato ad the covarace structure of the dcator varables leadg to dstorto the coeffcet of correlato (Hapel, 00). t ay be desrable, therefore, to devse a techque that would ze the fluece of outlers o the coposte dex. Our obectve ths paper s to propose a ew techque to costruct such a coposte dex. We also deostrate the effectveess of the proposed techque by a sulato experet.. The Coeffcet of Correlato the Meda Faly: t s well ow that eda as a easure of cetral tedecy s (orally) uaffected by the presece of outlers the data. The eda s a aalogue of the (arthetc) ea; t zes the su of probablty-weghted absolute devatos of data pots fro tself ( c = L x c p for L=) whle the arthetc ea zes the probablty-weghted su of squared devatos of data pots fro tself (that ples c = L x c p / L for L=). (985) showed that f ( u, v ); =, are pars of values such that the varables u ad v have the sae eda = 0 ad the sae ea devato (fro eda) or = =, both of whch codtos ay be et by ay par (/ ) u = (/ ) v = d 0 / L
of varables whe sutably trasfored, the the absolute correlato ay be defed as ρ ( u, v) = ( u + v u v ) ( u + v ). = =. Costructo of a Coposte dex Usg s Correlato: s coeffcet of correlato (that belogs to the eda faly) s a aalogue of the Pearso s product oet correlato coeffcet ( the faly of arthetc ea). t appears therefore that oe ay costruct a coposte dex y axzato of the su of absolute values of s coeffcet of correlato betwee the coposte dex, ad the dcator varables (although ay other easure of correlato e.g. Shevlyaov 997 ay also be used). Ths s to say that we ca obta = Xw such that ρ(, x ) s axal. Ths coposte dex,, wll be aalogous to the PCA-based dex,, that axzes the su of squared su of the Pearso s coeffcets of correlato betwee the coposte dex ad the dcator varables or = Xw : ax r (, x ) ax r (, x ). V. ssues Relatg to Maxzato: Obtag the PCA-based coposte dex s spler sce t has a closed for forula. The (Pearso s) correlato atrx, R s costructed fro X such that R = (/ ) X X where x X has zero ea ad ut stadard devato. The largest egevalue ( λ ) ad the assocated egevector ( e ) of R s obtaed. The egevector s oralzed so that e =. The oralzed egevector s used as the weght, w, to obta. = Xw t s possble, evertheless, to drectly obta the coposte dex,, by axzg r (, x ) : = Xw. There s o closed for forula for obtag = Xw such that ρ / (, x ) to drectly obta t by solvg the trcate axzato proble. s axal. Hece, oe has V. Nolear Optzato by Dfferetal Evoluto: The ethod of Dfferetal Evoluto (DE) s oe of the ost powerful self-orgazg, evolutoary, populatobased ad stochastc global optzato ethods. t s a outgrowth of the Geetc Algorths. The crucal dea behd DE s a schee for geeratg tral paraeter vectors. tally, a populato of pots (p d-desoal space) s geerated ad evaluated (.e. f(p) s obtaed) for ther ftess. The for each pot (p ) three dfferet pots (p a, p b ad p c ) are radoly chose fro the populato. A ew pot (p z ) s costructed fro those three pots by addg the weghted dfferece betwee two pots (w(p b -p c )) to the thrd pot (p a ). The ths ew pot (p z ) s subected to a crossover wth the curret pot (p ) wth a probablty of crossover (c r ), yeldg a caddate pot, say p u. Ths pot, p u, s evaluated ad f foud better tha p the t replaces p else p reas. Thus we obta a ew vector whch all pots are ether better tha or as good as the curret pots. Ths ew vector s used for the ext terato. Ths process aes the dfferetal evaluato schee copletely self-orgazg. Ths ethod has bee successfully appled for optzg extreely olear ad ultodal fuctos (Mshra, 007a, 007b ad 007c).
V. A Sulato Experet: We have coducted a sulato experet to exae the effectveess of our proposed ethod. We have geerated a atrx, X, of sx varables, each 30 observatos. The correlato atrx of these varables s gve Table-. Usg these varables, we have obtaed two coposte dces by drect optzato: the oe ( 0 ) relatg to the ethod proposed by us ad the other ( 0 ) relatg to the PCA. Both of these dces are stadardzed by usg the relatoshp [ ( ) ]/ ax( ) ( )] ; =, so as to ae the dex values le [ betwee zero ad uty These coposte dces serve as referece sce X does ot cota outlers. t s terestg to ote (see table-) that 0 ad 0 are hghly correlated (r = 0.998), although weghts (w ) ad correlato coeffcets (ρ) are uforly saller ( agtude) tha the Pearso weghts (w ) ad correlato coeffcets (r). Next, we troduce outlers to X. Three outlers (ragg betwee -0 to 0) have bee added to each dcator varable (x ;, ) at rado locatos. The, usg these (cotaated) varables, the two coposte dces ( ad ) have bee obtaed. The dces have bee stadardzed as before to le betwee zero ad uty. The results are preseted Table-. All derved coposte dces are preseted Table-3. The root-ea-square (RMS) = = (/ ) ( ) = 0.0608 0 = 0 for our proposed ethod vs-à-vs RMS = (/ ) ( ) = 0.07306 obtaed for the PCA-based dex suggests us that presece of outlers our proposed ethod wll perfor better. As show the graph (Fg.), the fluctuatos appear to be ore tha those. Table. : Correlato Coeffcets ad Weghts for the Referece dcator Varables (Wthout Outlers) Varables X X X 3 X 4 X 5 X 6 0 0 X.00000 0.9 0.79774-0.80408 0.90597-0.8839 0.9833 0.97609 X 0.9.00000 0.658-0.7037 0.8905-0.76986 0.998 0.9074 X 3 0.79774 0.658.00000-0.7699 0.6645-0.7764 0.8477 0.84445 X 4-0.80408-0.7037-0.7699.00000-0.874 0.6984-0.86607-0.8794 X 5 0.90597 0.8905 0.6645-0.874.00000-0.78670 0.9443 0.93406 X 6-0.8839-0.76986-0.7764 0.6984-0.78670.00000-0.88785-0.9049 0 0.9833 0.998 0.8477-0.86607 0.9443-0.88785.00000 0.998 0 0.97609 0.9074 0.84445-0.8794 0.93406-0.9049 0.998.00000 weghts Correlato Pearso weghts Pearso correlato 0.45546 0.376 0.3684-0.943 0.35443-0.693 0.8974 0.7579 0.7083-0.68475 0.783-0.75640 0.54837 0.56794 0.7076-0.80485 0.5640-0.58643 0.97609 0.9074 0.84445-0.8794 0.93406-0.9049 0 = Coposte dex by axzato of the su of absolute s Correlato Coeffcets 0 = Coposte dex by axzato of the su of squared Pearso s Correlato Coeffcets 3
Table. : Correlato Coeffcets ad Weghts for the Referece dcator Varables (Wth three Outlers betwee -0 ad 0) Varables X X X 3 X 4 X 5 X 6 X.00000 0.6890 0.63464-0.60439 0.8649-0.74930 0.96985 0.9635 X 0.6890.00000 0.53335-0.374 0.6300-0.4538 0.73477 0.7478 X 3 0.63464 0.53335.00000-0.87 0.48497-0.45498 0.6536 0.7046 X 4-0.60439-0.374-0.87.00000-0.6073 0.45490-0.57758-0.65697 X 5 0.8649 0.6300 0.48497-0.6073.00000-0.60940 0.9400 0.898 X 6-0.74930-0.4538-0.45498 0.45490-0.60940.00000-0.7637-0.78645 0.96985 0.73477 0.6536-0.57758 0.9400-0.7637.00000 0.9853 0.9635 0.7478 0.7046-0.65697 0.898-0.78645 0.9853.00000 0.35778 0.0945 0.3863 0.0485 0.5405-0.586 = Coposte dex by weghts axzato of the su of absolute s 0.87477 0.6553 0.56840-0.5093 0.80043-0.6808 Correlato Correlato Coeffcets Pearso 0.45695 0.4839 0.557-0.47088 0.5366-0.4539 = Coposte dex by weghts axzato of the su Pearso of squared Pearso s 0.9635 0.7478 0.7047-0.65696 0.898-0.78645 correlato Correlato Coeffcets 4
Sl. Table.3 : Coposte dces wth (-0, 0 rage) Outlers ad Wthout Outlers Wthout Outlers Wth Outlers Wthout Outlers Wth Outlers No. 0 0 Sl. No. 0 0 0.00000 0.03 0.066 0.05730 6 0.045 0.08 0.00000 0.00000 0.348 0.4609 0.966 0.7855 7 0.5309 0.5573 0.5343 0.57499 3 0.88073 0.84975 0.9008 0.8784 8 0.63358 0.65675 0.6446 0.676 4 0.68067 0.67673 0.6788 0.5797 9 0.774 0.70344 0.7556 0.739 5 0.7654 0.78795 0.886 0.9680 0 0.65483 0.6435 0.66060 0.6780 6 0.38436 0.37895 0.3050 0.34575 0.379 0.3374 0.389 0.4899 7 0.0063 0.00000 0.07506 0.0755 0.6 0.633 0.7385 0.73 8 0.3555 0.3465 0.35433 0.3673 3 0.4573 0.46566 0.4880 0.4906 9 0.64 0.559 0.455 0.554 4 0.3696 0.9988 0.39360 0.37343 0 0.4863 0.47765 0.49373 0.50036 5 0.7854 0.7967 0.69088 0.56360 0.6808 0.69665 0.6697 0.7403 6 0.454 0.4897 0.45679 0.47503 0.3875 0.3640 0.4909 0.3784 7 0.40770 0.37683 0.5886 0.460 3 0.56575 0.5739 0.57338 0.595 8 0.9677 0.87900 0.9738 0.84678 4 0.4006 0.405 0.3900 0.406 9 0.99074.00000 0.85489 0.8848 5.00000 0.98508.00000.00000 30 0.67744 0.67370 0.69074 0.6983 Refereces, C. (985) The Absolute Correlato, The Matheatcal Gazette, 69(447), pp. -7. Hapel, F. (00) Robust Statstcs: a Bref troducto ad Overvew, ftp://ftp.stat.ath.ethz.ch/research-reports/94.pdf Mshra, S.K. (007a): Perforace of Dfferetal Evoluto Method Least Squares Fttg of Soe Typcal Nolear Curves Joural of Quattatve Ecoocs, 5(), pp. 40-77. Mshra, S.K. (007b): Least Squares Estato of Jot Producto Fuctos by the Dfferetal Evoluto ethod of Global Optzato. Ecoocs Bullet, 3(5), pp. -3. Mshra, S.K. (007c) "Costructo of a dex by Maxzato of the Su of ts Absolute Correlato Coeffcets wth the Costtuet Varables" SSRN: http://ssr.co/abstract=989088 Shevlyaov, G.L. (997) O Robust Estato of a Correlato Coeffcet, Joural of Matheatcal Sceces, 83(3), pp. 434-438. Note: A Fortra Coputer progra to copute Coposte dces usg s absolute correlato ad PCA by drect axzato s avalable o http://www.webg.co/ecoocs 5