Method for Optimizing the Number and Precision of Interval-Valued Parameters in a Multi-Object System

Similar documents
Analysis of Classification in Interval-Valued Information Systems

Decision Trees (Cont.)

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

Classification techniques focus on Discriminant Analysis

Tools of AI. Marcin Sydow. Summary. Machine Learning

An Introduction to Multivariate Methods

Adaptive Mixture Discriminant Analysis for. Supervised Learning with Unobserved Classes

Pattern Classification

GLR-Entropy Model for ECG Arrhythmia Detection

The Annals of Human Genetics has an archive of material originally published in print format by the Annals of Eugenics ( ).

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

Ratio of Vector Lengths as an Indicator of Sample Representativeness

Adaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes

Description of 2009 FUZZ-IEEE Conference Competition Problem

METRIC BASED ATTRIBUTE REDUCTION IN DYNAMIC DECISION TABLES

OVERLAPPING ANIMAL SOUND CLASSIFICATION USING SPARSE REPRESENTATION

Divergence measure of intuitionistic fuzzy sets

Bayesian Network-Based Road Traffic Accident Causality Analysis

Hand Written Digit Recognition using Kalman Filter

An artificial neural networks (ANNs) model is a functional abstraction of the

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

THE ANALYTICAL DESCRIPTION OF REGULAR LDPC CODES CORRECTING ABILITY

Applied Multivariate Analysis

LEC 4: Discriminant Analysis for Classification

Group Decision-Making with Incomplete Fuzzy Linguistic Preference Relations

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

A Hybrid Approach For Air Conditioning Control System With Fuzzy Logic Controller

Realization of the Linear Tree that Corresponds to a Fundamental Loop Matrix

A New Method to Forecast Enrollments Using Fuzzy Time Series

On Improving the k-means Algorithm to Classify Unclassified Patterns

Image Contrast Enhancement and Quantitative Measuring of Information Flow

ENTROPIES OF FUZZY INDISCERNIBILITY RELATION AND ITS OPERATIONS

Computational Intelligence Lecture 3: Simple Neural Networks for Pattern Classification

DECISION MAKING BY METHOD OF KEY SUCCESS FACTORS DISCRIMINATION: KEYANP

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Data Mining and Analysis

A Fuzzy Entropy Algorithm For Data Extrapolation In Multi-Compressor System

BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Digital Trimulus Color Image Enhancing and Quantitative Information Measuring

Decision Tree Learning

AN APPROACH TO FIND THE TRANSITION PROBABILITIES IN MARKOV CHAIN FOR EARLY PREDICTION OF SOFTWARE RELIABILITY

Applied Multivariate and Longitudinal Data Analysis

Multilayer Perceptron = FeedForward Neural Network

On (Weighted) k-order Fuzzy Connectives

The SAS System 18:28 Saturday, March 10, Plot of Canonical Variables Identified by Cluster

Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance. Probal Chaudhuri and Subhajit Dutta

A novel k-nn approach for data with uncertain attribute values

The Application of the Ant Colony Decision Rule Algorithm on Distributed Data Mining

Recursive Least Squares for an Entropy Regularized MSE Cost Function

Study of the lubrication oil consumption prediction of linear motion. guide through the grey theory and the neural network

Multilayer Perceptron

UNIT I INFORMATION THEORY. I k log 2

The Fourth International Conference on Innovative Computing, Information and Control

ARTIFICIAL NEURAL NETWORK WITH HYBRID TAGUCHI-GENETIC ALGORITHM FOR NONLINEAR MIMO MODEL OF MACHINING PROCESSES

STOCHASTIC INFORMATION GRADIENT ALGORITHM BASED ON MAXIMUM ENTROPY DENSITY ESTIMATION. Badong Chen, Yu Zhu, Jinchun Hu and Ming Zhang

In most cases, a plot of d (j) = {d (j) 2 } against {χ p 2 (1-q j )} is preferable since there is less piling up

A Posteriori Corrections to Classification Methods.

Upper Bounds on the Capacity of Binary Intermittent Communication

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

EECS490: Digital Image Processing. Lecture #26

Linear Programming-based Data Mining Techniques And Credit Card Business Intelligence

Achieving Shannon Capacity Region as Secrecy Rate Region in a Multiple Access Wiretap Channel

Adrian I. Ban and Delia A. Tuşe

Machine Leanring Theory and Applications: third lecture

Supervised locally linear embedding

Distance Metric Learning

T 2 Type Test Statistic and Simultaneous Confidence Intervals for Sub-mean Vectors in k-sample Problem

Neural Networks Lecture 2:Single Layer Classifiers

Maximum Likelihood Diffusive Source Localization Based on Binary Observations

APPLICATION OF RADIAL BASIS FUNCTION NEURAL NETWORK, TO ESTIMATE THE STATE OF HEALTH FOR LFP BATTERY

Fuzzy control of a class of multivariable nonlinear systems subject to parameter uncertainties: model reference approach

The Semi-Pascal Triangle of Maximum Deng Entropy

The Perceptron. Volker Tresp Summer 2016

Lease Statistics (December 2018)

Stability Analysis of Systems with Parameter Uncer Fuzzy Logic Control

arxiv: v2 [cs.it] 17 Jan 2018

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Course Outline MODEL INFORMATION. Bayes Decision Theory. Unsupervised Learning. Supervised Learning. Parametric Approach. Nonparametric Approach

Fuzzy Sets and Fuzzy Techniques. Joakim Lindblad. Outline. Constructing. Characterizing. Techniques. Joakim Lindblad. Outline. Constructing.

Multi-Objective Autonomous Spacecraft Motion Planning around Near-Earth Asteroids using Machine Learning

Quantization of Rough Set Based Attribute Reduction

Comparison of the grey theory with neural network in the rigidity. prediction of linear motion guide

Allocation of Resources in CB Defense: Optimization and Ranking. NMSU: J. Cowie, H. Dang, B. Li Hung T. Nguyen UNM: F. Gilfeather Oct 26, 2005

Report on Preliminary Experiments with Data Grid Models in the Agnostic Learning vs. Prior Knowledge Challenge

Trimmed Diffusion Least Mean Squares for Distributed Estimation

Iron Ore Rock Recognition Trials

A Novel Rejection Measurement in Handwritten Numeral Recognition Based on Linear Discriminant Analysis

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

MODULE -4 BAYEIAN LEARNING

X={x ij } φ ik =φ k (x i ) Φ={φ ik } Lecture 11: Information Theoretic Methods. Mutual Information as Information Gain. Feature Transforms

CS4442/9542b Artificial Intelligence II prof. Olga Veksler. Lecture 5 Machine Learning. Neural Networks. Many presentation Ideas are due to Andrew NG

Incorporating detractors into SVM classification

Obtaining ABET Student Outcome Satisfaction from Course Learning Outcome Data Using Fuzzy Logic

Lecture 1. Introduction

A Simple Implementation of the Stochastic Discrimination for Pattern Recognition

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction

12.4 Known Channel (Water-Filling Solution)

Traffic Signal Control with Swarm Intelligence

ASIGNIFICANT research effort has been devoted to the. Optimal State Estimation for Stochastic Systems: An Information Theoretic Approach

Transcription:

Method for Optimizing the Number and Precision of Interval-Valued Parameters in a Multi-Object System AMAURY CABALLERO*, KANG YEN*, *, JOSE L. ABREU**, ALDO PARDO*** *Department of Electrical & Computer Engineering modeling, simulation, rapid prototyping, model in the loop, software in the loop, hardware in the loop, controller, hydro applicationflorida International University 10555 W. Flagler Street, Miami, Florida 33174 **CJ Tech, Miami, Florida, 33145 USA *** University of Pamplona, Colombia caballer@fiu.edu, yen@fiu.edu, cjtechno@yahoo.com, apardo13@unipamplona.edu.co Abstract : In any decision-maing process, it is necessary to evaluate the merit of different parameters and disclose the effect they have on the solution in order to optimize it. If the criteria are mathematically quantifiable, a mathematical model may be created for the evaluation process. The application of rough sets, neural networs, fuzzy logic or information theory are widely used mathematical tools for the solution of this type of problem. In this wor two tass are approached: (1) To minimize the number of required parameters(attributes) by discriminating different objects (classes), for the case when there is an overlap in the collected information from parameters (intervalvalued information); (2) To calculate the minimum accuracy necessary in the selected attributes to discriminate all of the objects. This approach is very useful in both reducing the cost of the communication channels and in eliminating unnecessary stored information. Key Words: Information Systems, Classification, Databases, Diffuse Information. 1 Introduction Extraction and discovery of automated information in a database has been growing due to the development of factors as the introduction in the factory of wire or wireless communication channels, PLCs, and accurate and inexpensive sensors. Frequently, in the differentiation of different objects or conditions, it is necessary to use more than one attribute. As the number of objects is increased and their properties are more similar, the number of differentiating parameters is also increased. Due to disturbances, the collected data can have variations into some interval. Several wors have been devoted to the solution of this ind of problem [1], [2], [3] [4]. Among the different solutions to this problem, one method employing information theory. can provide very simple results. Let us have the following definitions: A ------ attribute # U i, U j ---- object (class) i or j respectively l i, l j ---- minimum values for classes i and j respectively u i, u j ---- maximum values for classes i and j respectively. The region of coincidence or misclassification rate of the two attributes may vary from 0 when there is no coincidence at all, to 1 when the two attributes coincide completely. The probability that objects in class U i are misclassified into class U j according to attribute [5] has been modified and given below α ij α if [l i, u i ] [l j, u j ] = φ; (1) α ij = min{(u i l j ) /( u i l i ), ( u j l i )/ (u j l j ), 1}, if [l i, u i ] [l j, u j ] φ (2) Where α ij is the probability that objects in class U i are misclassified into class U j as per the attribute A,. Note that in general α ij α ji. ISSN: 2367-8917 139 Volume 1, 2016

From the previous result, it was defined the maximum mutual classification error between classes U i and U j for attribute as β ij = max{ α ij, α ji } (3) where β ij = β ji. It is logical to thin that when the two classes coincide for some parameter, the obtained information from this parameter for discriminating between classes i and j is 0, and that value increases as the coincidence diminishes. This leads to the representation of this information, from Shannon and Hartley definition, using a logarithmic scale. Here the logarithm is used to provide the additivity characteristic for independent uncertainty. For expressing it with logarithms base 10, it is given as I ij = - (log β ij ) [Hartley] (4) Similarly, the minimum information required for the classification between two classes i and j for an attribute is given by I α = L = - log α [Hartley] (5) where α is the permissible misclassification error between classes for any attribute. This value shall be defined from the beginning of the classification, and should be larger than zero. If I ij I α, the two classes can be separated using the attribute with the classification error given by α. If (l i l j ) and (u j u i ) 0, for any i, j combination and for any, mae I ij = 0 2 Proposed Method for the Parameter s Minimization The method proposed was developed by the same authors in [6]. The following elements will be defined for the computer program: L ---- Localization for [log 10 α] T1 ---- Table for α ij T2 ---- Table for calculated I ij SA ---- Location in T2, showing the addition of all the I ij for each row SO ---- Location in T2, showing the addition of all the I ij for each column Vector ATU ---- Vector showing the attributes that discriminate selected objects without any uncertainty for the accepted error α Vector ATD ---- Vector showing the attributes that discriminate selected objects with uncertainty for the accepted error α Vector UD ---- Vector showing the discriminated objects without any uncertainty for the accepted error α Vector NUD ---- Vector showing the discriminated objects with some uncertainty for the accepted error α Vector ND ---- Vector showing objects that cannot be discriminated for the accepted error α The generalized bloc diagram is presented in Figure 1. With the application of this method a minimum of attributes is selected for obtaining the parameters classification under the permitted misclassification error α. 3 Application of the Method: Iris Classification This example is proposed because of its simplicity The original database presented by R. Fisher [7] has been tested by different methods [8,9]. In this example, the value of α has been selected as 0.1 arbitrarily. So the corresponding value of L is -log 10 (0.1) = 1 Hartley. From [7] the attributes for each class (object) have been calculated, and are presented in Table 1. Table 1. Attributes for Each Class Attribute Setosa SL 5.0 0.35 4.3 5.7 SW 3.42 0.38 2.66 4.18 PL 1.45 0.11 1.23 1.67 PW 0.24 0.11 0.1 0.6 Attribute Versicolor SL 5.94 0.52 4.9 6.98 SW 2.77 0.31 2.15 3.39 PL 4.26 0.47 3.32 5.2 PW 1.33 0.20 0.93 1.73 Attribute Virginica SL 6.59 0.64 5.31 7.87 SW 2.91 0.32 2.27 3.55 PL 5.55 0.55 4.45 6.65 PW 2.03 0.27 1.49 2.57 ISSN: 2367-8917 140 Volume 1, 2016

The three classes to be differentiated are: U 1 -- Setosa; U 2 -- Versicolor; U 3 -- Virginica. The attributes taen into consideration are: A 1 : Sepal Length (SL); A 2 : Sepal Width (SW); A 3 : Petal Length (PL); A 4 : Petal Width (PW). Table 2 shows the classification errors between classes, expressed in Hartley. Table 2. Classification Error between Classes I ij A 1 A 2 A 3 A 4 U 12 0.22 0.23 1 1 U 13 0.22 0.16 1 1 U 23 0 0.05 0.4 0.52 0.97 0.44 0.43 2.11 2.52 In this table the following conditions are observed: 1) If I ij L, replace the calculated value by the value of L. 2) If I ij,, is less than L locate the calculated value in the corresponding cell of Table 2. Then for each attribute we add all the values shown in the corresponding column and row. Its sum is recorded in the same column or row of the last row or column named Σ in the table. The total number of rows is given from C 2 n = (n)(n-1)/2, where n is the number of classes. Attributes A 3 and A 4 permit to differentiate U 1 from the other two. Select A 4, because it offers higher information in total. From here, the first rule is obtained Rule # 1: If A 4 is in the interval [0.1 : 0.6] THEN it is U 1 (Setosa) As can be seen from Table 2, the differentiation between classes 2 and 3 is not possible for α =0.1, because even in the case of including the three attributes, the obtained information is less than the needed one. In this case, there are two possible solutions: 1. To select a bigger α-value, which will affect the precision of the whole solution. 2. To find out for the selected attributes the minimum necessary precision. The second approach has been selected because in this case, the precision of the whole model is not affected, and it is not necessary to tae action on all the parameters. The new calculation will affect only the selected attributes. Under certain conditions, it is possible to reduce the standard deviation in the values obtained for all the different attributes. Doing this, the objects discrimination can be improved, but this increases the cost of the whole system. If the action of minimizing the standard deviation is presented only in the parameters that permit to discriminate all the objects, the cost is reduced to a minimum. In the present case, the standard deviation minimization only should be done on parameter 4. In general it is possible to find the standard deviation on selected parameters that will permit to discriminate all the objects. In Figure 2, it is shown the region of coincidence of objects (classes) i and j for the attribute. Taing the maximum and minimum values for any attribute as x + 2σ for the maximum and x 2σ for the minimum, where x represents the average value for class. The following assumptions are accepted: (1) The standard deviation (σ) is the same for the attributes i and j, (2) The average value for both attributes under analysis does not change. (3) The variations comply with the normal distribution. From equation (3), the value β ij is calculated as β ij = (u i l j ) / (u i l i ), = [(x i + 2σ) (x j 2σ)/ [(x i + 2σ) (x i 2σ)] = [(x i x j ) + 4σ]/ 4σ But (x j x i ) = K (constant) β ij = (4σ K)/ 4σ (6) It is necessary to reduce the standard deviation to a new value σ F to comply with the condition given by α = (4σ F K)/ 4σ F (7) Comparing Equations (6) and (7): σ F = σ[(1 β ij )/(1 α) If it is necessary to have the maximum and minimum new values, it can be obtained from u i = x i + 2 σ F l j = x j 2σ F ISSN: 2367-8917 141 Volume 1, 2016

Solving for the presented example, the new calculated maximum standard deviation σ F should be: σ F = σ[(1 β 23 4 ) / (1 α ) σ F = 0.27 [(1 0.3)/(1 0.1)] = 0.21 4 Conclusions An algorithm has been presented for attribute reduction in a classification process, using the concept of information (or loss of information) in diffuse databases. In this paper a logarithmic form for expressing the uncertainty in the differentiation between two classes has been used. One of the advantages of the method is that it can be easy to determine how far the solution is from each established misclassification error, as well as finding the reduct in a simple way. Another advantage is that it is easily seen from the Table 2 whether it is possible to discriminate between two classes or not. In such cases, where not all the objects (classes) can be discriminated, it is possible to calculate the minimum necessary standard deviation for certain selected attributes under the established conditions of acceptable error, which will permit to discriminate all the objects. This is possible only when there are technical reasons creating the overlap, for example, noise in the communication channel or lac of precision of the equipment. When the overlap is due to other factors, as in the case of surveys or other social situations, this approach is not useful. Applying this procedure after the number of attributes has been minimized results in savings of resources. The obtained rules serve as a model, and the database can be used for training purposes to discriminate further received information from objects in the same process, under the same conditions. [3] E. Granger, et Al., Classification of Incomplete Data Using the Fuzzy ARTMAP Neural Networ, Technical Report. Boston U., Center for Adaptive Systems and Dept. of Adaptive Neural Systems, January 2000 [4] Hu ming-li and Lin-li. Research on the Interval- Valued Multiple attribute Decision Model Based on the α-β Dominance Relation. National Natural Scienc foundation,2013 IEEE. [5] Y. Leung, et Al., A Rough Set Approach for the Discovery of Classification Rules in Intervalvalued Information Systems, Int l J. of Approximate Reasoning, no. 47, 2007, pp. 233-246. [6] Caballero A.et Al. A practical Solution for the Classification in Interval-Valued Information Systems, WSEAS Transactions on Systems and Control, Issue 9, Vol. 5, September 2010, pp.735-744. [7]] R. Fisher, The use of Multiple Measurements in Taxonomic Problem, Annals of Eugenics Cambridge University Press, V. 7, pp.179-138, 1936. [8] Shyi-Ming Chen and Yao-De Fang, A New Approach for Handling the Iris Data Classification Problem, International Journal of Applied Science and Engineering, 2005.3, 1:37-49. [9] Caballero et al., Classification with Diffuse or Incomplete Information, WSEAS Transactions on systems and Control, Issue 6, Vol. 3, June 2008, pp.617-626. References [1] T. Ying-Chieh, et Al., Entropy-Based Fuzzy Rough Classification Approach for Extracting Classification Rules, Expert systems with Applications, No.31, 2006, pp.436-443. [2] Battiti R. Using mutual information for Selecting Features in Supervised Neural Net Learning, IEEE Transactions on Neural Networs No. 5,1991, pp. 537-550 ISSN: 2367-8917 142 Volume 1, 2016

Objects can t be discriminated - Find out values in table T2, which are smaller than L (column SA) - Locate the objects(u ij ) corresponding to the rows previously identified in vector ND and write 0 in all these rows - If all the rows are 0, the discrimination is finished. If not, move to the next step There are rows that posses only one value L - Select the attributes (A ) corresponding to these cells and locate it in vector ATU. Locate the objects U ij complying with this condition in vector UD and write 0 in all these rows There are rows with more than one value L in some cells - For each row, select the attribute A containing the value L that presents the largest value in cell SO, and locate it in vector ATU. If there exist this value in more than one cell SO, then select the first encountered attribute in the row. - Locate the objects U ij that possess the value L for this attribute in vector UD, and write 0 in all these rows If all the rows are 0, the discrimination is finished. If not, move to the next step It is necessary more than one attribute for objects discrimination - For each row, select the cell with maximum I ij, add this value to the one with the nearest value, and chec whether the resultant value is L. Repeat the process until this condition is met. Locate the corresponding parameters in vector ATD, and the objects U ij complying with this condition in vector NUD. For discriminating among the objects in this condition it becomes necessary to apply some other method. Figure 1. Generalized Bloc Diagram l i u i l j Fig.2. Region of Coincidence of Classes i and j for the Attribute u j ISSN: 2367-8917 143 Volume 1, 2016