on the improved Partial Least Squares regression

Similar documents
A Robust Method for Calculating the Correlation Coefficient

Comparison of Regression Lines

Operating conditions of a mine fan under conditions of variable resistance

Orientation Model of Elite Education and Mass Education

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Short Term Load Forecasting using an Artificial Neural Network

LINEAR REGRESSION ANALYSIS. MODULE VIII Lecture Indicator Variables

Lecture 6: Introduction to Linear Regression

Uncertainty in measurements of power and energy on power networks

Modal Strain Energy Decomposition Method for Damage Detection of an Offshore Structure Using Modal Testing Information

Statistics II Final Exam 26/6/18

Chapter 15 - Multiple Regression

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

The Study of Teaching-learning-based Optimization Algorithm

Global Sensitivity. Tuesday 20 th February, 2018

Systematic Error Illustration of Bias. Sources of Systematic Errors. Effects of Systematic Errors 9/23/2009. Instrument Errors Method Errors Personal

A New Evolutionary Computation Based Approach for Learning Bayesian Network

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

Statistical Energy Analysis for High Frequency Acoustic Analysis with LS-DYNA

Identification of Linear Partial Difference Equations with Constant Coefficients

Chapter 13: Multiple Regression

An identification algorithm of model kinetic parameters of the interfacial layer growth in fiber composites

829. An adaptive method for inertia force identification in cantilever under moving mass

Study on Non-Linear Dynamic Characteristic of Vehicle. Suspension Rubber Component

Constructing Control Process for Wafer Defects Using Data Mining Technique

A METHOD FOR DETECTING OUTLIERS IN FUZZY REGRESSION

An Application of Fuzzy Hypotheses Testing in Radar Detection

Air Age Equation Parameterized by Ventilation Grouped Time WU Wen-zhong

Using the estimated penetrances to determine the range of the underlying genetic model in casecontrol

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Hongyi Miao, College of Science, Nanjing Forestry University, Nanjing ,China. (Received 20 June 2013, accepted 11 March 2014) I)ϕ (k)

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

An Improved multiple fractal algorithm

The Order Relation and Trace Inequalities for. Hermitian Operators

Basically, if you have a dummy dependent variable you will be estimating a probability.

Chapter 8 Indicator Variables

Investigation of the Relationship between Diesel Fuel Properties and Emissions from Engines with Fuzzy Linear Regression

x i1 =1 for all i (the constant ).

Sensor Calibration Method Based on Numerical Rounding

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Study on Active Micro-vibration Isolation System with Linear Motor Actuator. Gong-yu PAN, Wen-yan GU and Dong LI

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

Non-linear Canonical Correlation Analysis Using a RBF Network

Introduction to Regression

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Available online Journal of Chemical and Pharmaceutical Research, 2014, 6(5): Research Article

Pedersen, Ivar Chr. Bjerg; Hansen, Søren Mosegaard; Brincker, Rune; Aenlle, Manuel López

Chapter 9: Statistical Inference and the Relationship between Two Variables

Chapter 2 A Class of Robust Solution for Linear Bilevel Programming

arxiv:cs.cv/ Jun 2000

The Two-scale Finite Element Errors Analysis for One Class of Thermoelastic Problem in Periodic Composites

SIMPLE LINEAR REGRESSION

Negative Binomial Regression

Linear Regression Analysis: Terminology and Notation

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

On the Influential Points in the Functional Circular Relationship Models

Multivariate Ratio Estimator of the Population Total under Stratified Random Sampling

Simulation and Probability Distribution

Parking Demand Forecasting in Airport Ground Transportation System: Case Study in Hongqiao Airport

This column is a continuation of our previous column

Methods of Detecting Outliers in A Regression Analysis Model.

The Quadratic Trigonometric Bézier Curve with Single Shape Parameter

A Network Intrusion Detection Method Based on Improved K-means Algorithm

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Chapter 6. Supplemental Text Material

Turbulence classification of load data by the frequency and severity of wind gusts. Oscar Moñux, DEWI GmbH Kevin Bleibler, DEWI GmbH

System in Weibull Distribution

International Power, Electronics and Materials Engineering Conference (IPEMEC 2015)

Economics 130. Lecture 4 Simple Linear Regression Continued

January Examinations 2015

Statistical Evaluation of WATFLOOD

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Research Article On the Performance of the Measure for Diagnosing Multiple High Leverage Collinearity-Reducing Observations

FEATURE ANALYSIS ON QUEUE LENGTH OF ASYMMETRIC TWO-QUEUE POLLING SYSTEM WITH GATED SERVICES *

Amusing Properties of Odd Numbers Derived From Valuated Binary Tree

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

The Geometry of Logit and Probit

Chapter 12 Analysis of Covariance

Damage Identification of Beams based on Element Modal Strain Energy and Data Fusion with Reconstructed Modal Rotations H Cao*, T Liu

APPROXIMATE ANALYSIS OF RIGID PLATE LOADING ON ELASTIC MULTI-LAYERED SYSTEMS

UNR Joint Economics Working Paper Series Working Paper No Further Analysis of the Zipf Law: Does the Rank-Size Rule Really Exist?

Comparative Studies of Law of Conservation of Energy. and Law Clusters of Conservation of Generalized Energy

/ n ) are compared. The logic is: if the two

Power law and dimension of the maximum value for belief distribution with the max Deng entropy

Interactive Bi-Level Multi-Objective Integer. Non-linear Programming Problem

University, Bogor, Indonesia.

A MISAPPLICATION FOR WEIGHT EVALUATION DETERMINED BY THE PRINCIPAL COMPONENT ANALYSIS

STATISTICS QUESTIONS. Step by Step Solutions.

A PROBABILITY-DRIVEN SEARCH ALGORITHM FOR SOLVING MULTI-OBJECTIVE OPTIMIZATION PROBLEMS

One-sided finite-difference approximations suitable for use with Richardson extrapolation

Regularized Discriminant Analysis for Face Recognition

Chapter 11: Simple Linear Regression and Correlation

18.1 Introduction and Recap

Grey prediction model in world women s pentathlon performance prediction applied research

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Copyright 2017 by Taylor Enterprises, Inc., All Rights Reserved. Adjusted Control Limits for P Charts. Dr. Wayne A. Taylor

Transcription:

Internatonal Conference on Manufacturng Scence and Engneerng (ICMSE 05) Identfcaton of the multvarable outlers usng T eclpse chart based on the mproved Partal Least Squares regresson Lu Yunlan,a X Yanhu,b Lu Janhua3,c Wu Tebn,d* L Xnjun,e Hunan Unversty of Humantes, Scence and Technology, Loud, Hunan, 47000, Chna Electrcal and Informaton Engneerng College, Changsha Unversty of Scence & Technology, Changsha, Hunan 40077, Chna 3 College of Electrcal and Informaton Engneerng, Hunan Unversty of Technology, Zhuzhou, Hunan 4007, Chna a luyunlan85@63.com, b804775693@qq.com, cjhlu065@63.com, d* wutebn8@63.com, elxnjun80@63.com * Correspondng author: Wu Tebn Keywords: Multvarable outlers; Partal Least Squares regresson; T eclpse chart Abstract. When there s mult-varables n a sample, some samples whch obvously dsturb the relatonshps among varables are called outler samples. However the presence of an extremely sgnfcant outler sample tends to conceal some other outler samples, whch brngng great challenge to the dentfcaton of multvarable outlers. On ths bass, a method of dentfyng the multvarable outlers n T eclpse chart based on the mproved Partal Least Squares regresson (PLSR) s proposed. It s generally known that some outlers samples fal to be dentfed owng to sgnfcantly outler samples are prone to nfluence the varance of T chart. To solve ths problem, a fuzzy varance computng method s put forward. The mproved PLSR based T chart can well overcome the maskng effect n outlers dentfcaton. Introducton The detecton of multvarable outlers has been regarded a dffcult problem. Although sngle varable s shown to be normal, some samples have found to apparently dsturb the relatonshps among varables. Especally, as extremely sgnfcant outler samples are presented, other outler samples tend to be concealed, whch leadng to great dffculty n the detecton of multvarable outlers. Prncpal component analyss (PCA) and PLSR have been wdely nvestgated and appled n the fault detecton and the dentfcaton of outlers as they can extract the prncpal components of multvarable, reduce or elmnate the couplng among varables, and decrease the dmensons of varables []. PCA usually utlzes Q statstcs and Hotellng T statstc to montor the outler or fault n process data []. As PCA merely consders the features of ndependent varable n ts applcaton[3] and rarely concerns the assocaton between ndependent and dependent varables, t s prone to show dentfyng mstakes n the detecton of outler and faults. S. Wold and C. Albano et al. proposed a PLSR method[4]. Such method not also ntegrates the deal of PCA for extractng useful nformaton n explanatory varables but also consders the explanatory effect of nput on output of varables. It therefore can reflect the relatonshp between dependent and ndependent varables, and partcularly s conducve to be used n the detecton of the samples wth multvarable outlers and 05. The authors - Publshed by Atlants Press 976

faults. However, there s extremely sgnfcant outler sample avalable; t also presents error n dentfcaton. To solve ths problem, ths research proposes a method of detectng the multvarable outlers n T eclpse chart based on the mproved PLSR. The dentfyng method of the multvarable outlers n T chart based on the mproved PLSR It s assumed that m th components are extracted from n th samples usng PLSR, the contrbuton rato of ( =,,, n ) th sample to h th component t h s T [5] h, we obtan T h h t = ( n ) s h () Where s h s the varance of t h Based on the equaton (), the contrbuton rato of th sample to t, L t s calculated as, m T m t h h= sh = () ( n ) Devaton tends to be produced n the analyss f Tracy et al. s usually used, as when n ( n ~ (, ) T Fmn m mn ( ) mn ( ) T F ( mn, n ( n α T s too large. The statstc proposed by, the th sample has large contrbuton raton, whch may be a outler. Where α ndcates sgnfcant level. Based on Eqs.() and (4), t s obtaned as m h h= h In the case of m =, we obtan t mn ( )( n ) Fα( mn, (5) s n ( n mn ( )( n ) c= F ( mn, α n ( n t t ( n )( n ) + = Fα(, n ) = c s s n ( n ) As the T ellpse defned n equaton (7), f all samples are shown to be wthn the ellpse, they are consdered to be dstrbuted unformly wthout outler ponts; otherwse, f the samples le outsde the ellpse or are near to the ellpse boundary, they are possbly outlers ponts; however the extremely sgnfcant outlers samples are lkely to result n a sharp ncrease of the varances s (4) (6) (7) (3) 977

and s n equaton (7), whch further makng part of outler samples beng concealed. To deal wth such defect, the computng method of varance s mproved. Assumng there s a data sequence x =( x, x,l, x n ), the mproved sample varance s I s wrtten as s x x (8) n I = β( ) n = Where x average s value of the sequence x ; β s a fuzzy parameter. The further to the average value, the less the proporton of as β n the calculaton of the varance, the equaton (9) s apresented β = e γ (9) Where γ s a coeffcent, as shown n equaton (0), x xm f σ d x xm γ = τ f < < κ σ d x xm τ f κ σd (0) Where order; x m denotes the medan obtaned by the arrangement of the data sequence x n a ncreasng σ d s standard devaton; τ and τ are coeffcents ( τ (0,], τ (0,] and τ τ), whle κ s a parameter ( κ [, ] ). The computng varance shows good ant-nterference capablty. T chart obtaned by sung mproved the equaton of The analyss of smulated results The socal economc ndcator and electrcty consumpton of a county n Hunan provnce, Chna n 990 to 00 are lsted n table [6]. The Electrcty consumpton values n 995 and 00 are shown to be outler samples. However all samples are normal when usng PCA to dentfy outlers, whch ndcatng obvous error of PCA [6]. 978

Table The socal economc ndcator and electrcty consumpton of a county n Hunan provnce, Chna Years Prmary ndustry/ 0,000 yuan n 990 to 00 Socal economc ndcator Secondary ndustry/ 0,000 yuan tertary ndustry / 0,000 yuan Per capta Electrcty consumpton / kw.h 990 3 4733 79 948 99 3307 66 8043 989 3985 99 3544 797 0660 086 604 993 4565 4343 5076 385 76 994 6507 3783 33364 860 0407 995 7966 43670 4767 47 56 996 939 5648 58407 985 377 997 9383 70764 7060 3408 76 998 9679 79775 7989 367 778 999 9684 8457 86434 3883 558 000 9846 8786 95667 4098 979 00 0548 0059 090 54 774 00 07900 434 47 5705 3607 T ellpse chart s demonstrated n the fgure when usng common T ellpse chart to detect outlers. Snce the 3th sample pont n 00 les s found n the outsde area of the T ellpse, t s a outler pont; whle the 6th pont wthn the T ellpse fals to be detected, whch showng that the 3th pont exerts a certan maskng effect on the 6th pont. Fgure T ellpse chart The outlers detected n the mproved T chart are llustrated n fgure. 979

Fgure The mproved T ellpse chart As shown n the fgure, the mprove method can recognze the 6 th and 3 th outler ponts. Frst sample pont whch s close to the edge of the ellpse, s a mutaton on power laod. Results ndcate that the mproved T chart can well detect outlers and deal wth the markng effect n the dentfcaton of outlers. Conclusons In the case of the samples comprsng multvarable, there s no apparently anomaly beng found n sngle varable contanng n the sample. Some samples whch dsturb the relatonshps among varables are consdered as outler samples. When there s extremely sgnfcant outler sample avalable n the samples, the varance of the mproved PLSR based T chart can be nfluenced, whch leads to the falure n the detecton of some outler samples. For solvng ths problem, ths work put forwards a fuzzy varance computng method. The mproved PLSR based T chart s able to overcome the markng effect n the dentfyng multvarable outlers. Acknowledgements Ths work was partally supported by The project supported by Natonal Natural Scence Foundaton of Chna (NO. 65033, NO. 65033), and scence and Technology Department of Loud cty, and Scentfc Research Fund of Hunan Provncal Educaton Department(NO.4B097, NO. 5C07) References [] Ne Yan Fang PCA and mproved. The anomaly detecton based on nearest neghbor rule. Computer engneerng and desgn, 008,9 (0):50-503. [] Zhang Xnrong, Xong Wel, Xu Baoguo. Fault detecton algorthm based on Q statstcs[j]. Computer and appled chemstry, 008, 5 (): 537-54. [3] Zhao Xaoqang; Wang Xnmng, wangyngxang. Based on PCA and KPCA TE process fault detecton applcaton research [J]. Automaton and nstrumentaton, 0, 3 (): 8-. [4] Wold H.Partal Least Squares n Encyclopeds of Statstcal Scences [M ].New York:JohnWley&Ston, 985. [5] Wang Huwen. Partal least squares regresson method and ts applcaton [M]. Bejng, Natonal Defense Industry Press, 999 [6] Mao L Fan. Research on the technology of long-term load forecastng n power network plannng [D]. Changsha: Hunan Unversty, 0 980