Internatonal Conference on Manufacturng Scence and Engneerng (ICMSE 05) Identfcaton of the multvarable outlers usng T eclpse chart based on the mproved Partal Least Squares regresson Lu Yunlan,a X Yanhu,b Lu Janhua3,c Wu Tebn,d* L Xnjun,e Hunan Unversty of Humantes, Scence and Technology, Loud, Hunan, 47000, Chna Electrcal and Informaton Engneerng College, Changsha Unversty of Scence & Technology, Changsha, Hunan 40077, Chna 3 College of Electrcal and Informaton Engneerng, Hunan Unversty of Technology, Zhuzhou, Hunan 4007, Chna a luyunlan85@63.com, b804775693@qq.com, cjhlu065@63.com, d* wutebn8@63.com, elxnjun80@63.com * Correspondng author: Wu Tebn Keywords: Multvarable outlers; Partal Least Squares regresson; T eclpse chart Abstract. When there s mult-varables n a sample, some samples whch obvously dsturb the relatonshps among varables are called outler samples. However the presence of an extremely sgnfcant outler sample tends to conceal some other outler samples, whch brngng great challenge to the dentfcaton of multvarable outlers. On ths bass, a method of dentfyng the multvarable outlers n T eclpse chart based on the mproved Partal Least Squares regresson (PLSR) s proposed. It s generally known that some outlers samples fal to be dentfed owng to sgnfcantly outler samples are prone to nfluence the varance of T chart. To solve ths problem, a fuzzy varance computng method s put forward. The mproved PLSR based T chart can well overcome the maskng effect n outlers dentfcaton. Introducton The detecton of multvarable outlers has been regarded a dffcult problem. Although sngle varable s shown to be normal, some samples have found to apparently dsturb the relatonshps among varables. Especally, as extremely sgnfcant outler samples are presented, other outler samples tend to be concealed, whch leadng to great dffculty n the detecton of multvarable outlers. Prncpal component analyss (PCA) and PLSR have been wdely nvestgated and appled n the fault detecton and the dentfcaton of outlers as they can extract the prncpal components of multvarable, reduce or elmnate the couplng among varables, and decrease the dmensons of varables []. PCA usually utlzes Q statstcs and Hotellng T statstc to montor the outler or fault n process data []. As PCA merely consders the features of ndependent varable n ts applcaton[3] and rarely concerns the assocaton between ndependent and dependent varables, t s prone to show dentfyng mstakes n the detecton of outler and faults. S. Wold and C. Albano et al. proposed a PLSR method[4]. Such method not also ntegrates the deal of PCA for extractng useful nformaton n explanatory varables but also consders the explanatory effect of nput on output of varables. It therefore can reflect the relatonshp between dependent and ndependent varables, and partcularly s conducve to be used n the detecton of the samples wth multvarable outlers and 05. The authors - Publshed by Atlants Press 976
faults. However, there s extremely sgnfcant outler sample avalable; t also presents error n dentfcaton. To solve ths problem, ths research proposes a method of detectng the multvarable outlers n T eclpse chart based on the mproved PLSR. The dentfyng method of the multvarable outlers n T chart based on the mproved PLSR It s assumed that m th components are extracted from n th samples usng PLSR, the contrbuton rato of ( =,,, n ) th sample to h th component t h s T [5] h, we obtan T h h t = ( n ) s h () Where s h s the varance of t h Based on the equaton (), the contrbuton rato of th sample to t, L t s calculated as, m T m t h h= sh = () ( n ) Devaton tends to be produced n the analyss f Tracy et al. s usually used, as when n ( n ~ (, ) T Fmn m mn ( ) mn ( ) T F ( mn, n ( n α T s too large. The statstc proposed by, the th sample has large contrbuton raton, whch may be a outler. Where α ndcates sgnfcant level. Based on Eqs.() and (4), t s obtaned as m h h= h In the case of m =, we obtan t mn ( )( n ) Fα( mn, (5) s n ( n mn ( )( n ) c= F ( mn, α n ( n t t ( n )( n ) + = Fα(, n ) = c s s n ( n ) As the T ellpse defned n equaton (7), f all samples are shown to be wthn the ellpse, they are consdered to be dstrbuted unformly wthout outler ponts; otherwse, f the samples le outsde the ellpse or are near to the ellpse boundary, they are possbly outlers ponts; however the extremely sgnfcant outlers samples are lkely to result n a sharp ncrease of the varances s (4) (6) (7) (3) 977
and s n equaton (7), whch further makng part of outler samples beng concealed. To deal wth such defect, the computng method of varance s mproved. Assumng there s a data sequence x =( x, x,l, x n ), the mproved sample varance s I s wrtten as s x x (8) n I = β( ) n = Where x average s value of the sequence x ; β s a fuzzy parameter. The further to the average value, the less the proporton of as β n the calculaton of the varance, the equaton (9) s apresented β = e γ (9) Where γ s a coeffcent, as shown n equaton (0), x xm f σ d x xm γ = τ f < < κ σ d x xm τ f κ σd (0) Where order; x m denotes the medan obtaned by the arrangement of the data sequence x n a ncreasng σ d s standard devaton; τ and τ are coeffcents ( τ (0,], τ (0,] and τ τ), whle κ s a parameter ( κ [, ] ). The computng varance shows good ant-nterference capablty. T chart obtaned by sung mproved the equaton of The analyss of smulated results The socal economc ndcator and electrcty consumpton of a county n Hunan provnce, Chna n 990 to 00 are lsted n table [6]. The Electrcty consumpton values n 995 and 00 are shown to be outler samples. However all samples are normal when usng PCA to dentfy outlers, whch ndcatng obvous error of PCA [6]. 978
Table The socal economc ndcator and electrcty consumpton of a county n Hunan provnce, Chna Years Prmary ndustry/ 0,000 yuan n 990 to 00 Socal economc ndcator Secondary ndustry/ 0,000 yuan tertary ndustry / 0,000 yuan Per capta Electrcty consumpton / kw.h 990 3 4733 79 948 99 3307 66 8043 989 3985 99 3544 797 0660 086 604 993 4565 4343 5076 385 76 994 6507 3783 33364 860 0407 995 7966 43670 4767 47 56 996 939 5648 58407 985 377 997 9383 70764 7060 3408 76 998 9679 79775 7989 367 778 999 9684 8457 86434 3883 558 000 9846 8786 95667 4098 979 00 0548 0059 090 54 774 00 07900 434 47 5705 3607 T ellpse chart s demonstrated n the fgure when usng common T ellpse chart to detect outlers. Snce the 3th sample pont n 00 les s found n the outsde area of the T ellpse, t s a outler pont; whle the 6th pont wthn the T ellpse fals to be detected, whch showng that the 3th pont exerts a certan maskng effect on the 6th pont. Fgure T ellpse chart The outlers detected n the mproved T chart are llustrated n fgure. 979
Fgure The mproved T ellpse chart As shown n the fgure, the mprove method can recognze the 6 th and 3 th outler ponts. Frst sample pont whch s close to the edge of the ellpse, s a mutaton on power laod. Results ndcate that the mproved T chart can well detect outlers and deal wth the markng effect n the dentfcaton of outlers. Conclusons In the case of the samples comprsng multvarable, there s no apparently anomaly beng found n sngle varable contanng n the sample. Some samples whch dsturb the relatonshps among varables are consdered as outler samples. When there s extremely sgnfcant outler sample avalable n the samples, the varance of the mproved PLSR based T chart can be nfluenced, whch leads to the falure n the detecton of some outler samples. For solvng ths problem, ths work put forwards a fuzzy varance computng method. The mproved PLSR based T chart s able to overcome the markng effect n the dentfyng multvarable outlers. Acknowledgements Ths work was partally supported by The project supported by Natonal Natural Scence Foundaton of Chna (NO. 65033, NO. 65033), and scence and Technology Department of Loud cty, and Scentfc Research Fund of Hunan Provncal Educaton Department(NO.4B097, NO. 5C07) References [] Ne Yan Fang PCA and mproved. The anomaly detecton based on nearest neghbor rule. Computer engneerng and desgn, 008,9 (0):50-503. [] Zhang Xnrong, Xong Wel, Xu Baoguo. Fault detecton algorthm based on Q statstcs[j]. Computer and appled chemstry, 008, 5 (): 537-54. [3] Zhao Xaoqang; Wang Xnmng, wangyngxang. Based on PCA and KPCA TE process fault detecton applcaton research [J]. Automaton and nstrumentaton, 0, 3 (): 8-. [4] Wold H.Partal Least Squares n Encyclopeds of Statstcal Scences [M ].New York:JohnWley&Ston, 985. [5] Wang Huwen. Partal least squares regresson method and ts applcaton [M]. Bejng, Natonal Defense Industry Press, 999 [6] Mao L Fan. Research on the technology of long-term load forecastng n power network plannng [D]. Changsha: Hunan Unversty, 0 980