Australian Journal of Basic and Applied Sciences, 6(): 158-16, 1 SSN 1991-818 Application of Rough Set Theory in erformance Analysis 1 Mahnaz Mirbolouki, Mohammad Hassan Behzadi, 1 Leila Karamali 1 Department of Mathematics, Shahre-Rey Branch, slamic Azad University, Tehran, ran. Department of Statistics, Science and Research Branch, slamic Azad University, Tehran. Abstract Data envelopment analysis (DEA) is a mathematical technique based on linear programming for evaluating the efficiency of a set of decision making units (Us). Every U use several inputs to produce several outputs. n order to derive the efficiency values from DEA models meaningfully, it is conventionally assumed that three times the total number of inputs and outputs factors is less than or equal to the number of units. But some practical issues contain large number of these factors. Thus providing a method that can reduce the number of these factors is felt necessary. While this paper is reviewing some preliminary relationships of Rough Set Theory (RST), it is mentioning the application of RST in data envelopment analysis. One of the applications is reducing the number of inputs and outputs. Also, a numerical example is provided to show implementation of this method. Key words: Data envelopment analysis (DEA), erformance evaluation, Rough set theory (RST). NTRODUCTON DEA, a mathematical technique based on linear programming first introduced by Charnes et al., 198, is a way of determining the efficiency for a group of decision making units (Us) when measured over a set of multiple input and output variables. For a given set of input and output variables, DEA produces a single comprehensive measure of performance (efficiency score) for each U. Rough set theory (RST) was initiated in the early 198s by awlak, 198. This theory deals with the analysis of data tables which are called information systems. The data can be quantitative or qualitative. Usually in the real issues, several pieces of information are inaccurate, incomplete or unreliable. Therefore in order to drive suitable conclusions related to such information, the information must be processed first. Rough sets theory and fuzzy sets theory are two well-known tools for processing different types of inaccurate, unreliable and ambiguous data. However, these theories are included different concepts. RST is mainly differed in comparison with exact and fuzzy sets theories by membership function definition. n the usual set theory, a set of accurate data is uniquely identified with its members. Membership function describing the elements of the reference set is only getting zero or one values. The Fuzzy set theory in which data is dealing with uncertainty, the set membership function get values of [,1] interval. However, Rough sets theory membership is not common concept. Rough sets offer a different approach for vague and uncertain. Definition of a set in RST relates to the available information and relations between data in the information system. Members are specified with relevant to information and properties. Thus, two different members of a set may not be distinct clearly. Rough set can be introduced as a framework for discovering facts from imperfect data. Rough set results provide categories or decision rules which are derived from a set of samples. The main purpose in the analysis of RST is obtaining conceptual approximate of acquired data. While this approximation of a crisp set contains a pair of sets which give the lower and the upper approximation of the original set. n the usual version of RST, the lower- and upper-approximation sets are crisp sets, but in other variations, the approximating sets may be fuzzy sets. RST is a powerful mathematical tool for uncertainty reasoning helps to gain insight into the problem at hand by analyzing the constructed model. RST provides procedures for removing and reducing additional information or irrelevant knowledge of the database. This process eliminates unnecessary data from the main task of the system without losing basic data. The reduced data set makes decision much easier. Therefore, given the explosive growth of data volume, RST can be very effective in Decision Support Systems. (see e.g. awlak, 1991; olkowski and Skowron, 1998; olkowski, ). Counterintuitively, using too many inputs and outputs in DEA will be less helpful because when the number of inputs and outputs increases, more decision-making units tend to get an efficiency score of 1 as they become too specialized to be evaluated with respect to other units. A rule of thumb is that there should be a minimum of three funds per input and output in implementing a DEA model (Bowlin, 1998; Raab and Lichty, ). Thus, for practical reasons, there needs to be some limit on the number of inputs and outputs. n this paper, considering the properties of Rough set theory in the criteria classification and distinguishing the criteria Corresponding Author: Mahnaz Mirbolouki, Department of Mathematics, Shahre-Rey Branch, slamic Azad University, Tehran, ran. E-mail: m.mirbolouki@srbiau.ac.ir. 158
Aust. J. Basic & Appl. Sci., 6(): 158-16, 1 minimal sets, along with providing an example the implementation of RST in input and output reduction is proposed. By this approach the unnecessary inputs and outputs can be discovered. The remainder of this paper is organized as follows: preliminaries of RST concepts are described in section and in section during a numerical example, the application of RST in input and output reduction is proposed. Section provides the conclusion of the paper. -RST reliminaries: This section contains an explanation of the basic framework of rough set theory, along with some of the key definitions. -1 nformation System: Let ( be, ) an information system (attribute-value system), where is a non-empty set of finite obects, { x 1, x,..., x m }, and is a non-empty finite set of attributes such that : a Va for every a. V a is the set of values that attribute a may take. The information table assigns a value ax ( ) from V a to each attribute a and obect x in the universe. For any there is an associated equivalence relation ND ( ), indiscernibility relation, ND x y a a x a y ( ) {(, ), ( ) ( )}. The partition of is a family of all equivalence classes of ND ( ) and is denoted by / ND ( ). f ( x, y) ND( ), then x and y are indiscernible (or indistinguishable) by attributes from. Let X be a subset that we are going to represent using attribute subset. n general, X cannot be expressed precisely, because the set may include and exclude obects which are indistinguishable on the basis of attributes. However, the target set X can be approximated using the information which is involved in by defining the lower and upper approximations of X: X { x [ x ] X } X { x [ x ] X } (1) The tuple X, X is called a rough set. So, a rough set is composed of two crisp sets. The accuracy of the rough set representation of the set X is defined as follows: X ( X ) () X Generally the upper and lower approximations are not equal. Thus target set X is indefinable or roughly definable on attribute set. When the upper and lower approximations are equal, then the target set X is definable on attribute set. - Reduct and Core: Reducts and core indicate the attributes in the information system which are more important to the knowledge represented in the equivalence class structure than other attributes. reduct is a subset of attributes which can fully characterize the knowledge in the database. Formally, a reduct is a subset of attributes RED such that [ x ] RED [ x ], that is, the equivalence classes induced by the reduced attribute set RED are the same as the equivalence class structure induced by the full attribute set. The attribute set RED is minimal, in the sense that [ x ]( RED { a}) [ x ] for any attribute a RED, in other words, no attribute can be removed from set RED without changing the equivalence classes[ x ]. A reduct can be considered as a sufficient set of features to represent the category structure. The set of attributes which is the intersection of all reducts is called core. Core is the set of attributes which is possessed by every legitimate reduct, and therefore consists of attributes which cannot be removed from the information system without causing collapse of the equivalence-class structure. The core may be thought of as the set of necessary attributes for the category structure to be represented. 159
Aust. J. Basic & Appl. Sci., 6(): 158-16, 1 - Attribute Dependency: One of the most important aspects of database analysis or data acquisition is identifying of attribute dependencies. means the detection of the variables which are strongly related to other variables. For this purpose, let[ x] Q { Q1, Q,..., QN}, where Q i is a given equivalence class from the equivalence-class structure induced by attribute set Q. Thus, the dependency of attribute set Q on attribute set, γ (Q), is as following: N Q i i 1 ( Q ) 1 That is, for each equivalence class Q i in [x] Q, the size of its lower approximation is added up by the attributes in. Added across all equivalence classes in [x] Q, the numerator above represents the total number of obects which is based on attribute set can be positively categorized according to the classification induced by attributes Q. The dependency ratio indicates the proportion of such classifiable obects. The dependency γ (Q) can be thought as a proportion of such obects in the information system for which it suffices to know the values of attributes in to determine the values of attributes in Q. -nputs and Outputs Reduction by RST: n this section, a numerical example of an application is provided to show utilization of RST in reducing the number of input and output components. Data envelopment analysis (DEA) initiated by Charnes et al. (198), and the first model was called CCR model. This model can evaluate the relative efficiency of a set of Us, U ; 1,..., n, which use a vector of inputs, m x ( x1,..., x m) R, to produce a vector of outputs, s y ( y1,..., ys) R. nput oriented CCR model for efficiency evaluation of U ; o {1,..., n} is as following: o min n s. t. x x, i 1,..., m, i io 1 n y y, r 1,..., s, 1 r ro, 1,..., n. ( ) * n model (), in optimal solution means efficiency and it is a value between zero and one. f 1 then the under assessment U is an efficient U. Counterintuitively, using too many inputs and outputs in DEA will be less helpful because when the number of inputs and outputs increases, more Us tend to get an efficiency score of 1 as they become too specialized to be evaluated with respect to other units. A rule of thumb is that there should be a minimum of three funds per input and output in implementing a DEA model [1,]. Thus, for practical reasons, there needs to be some limit on the number of inputs and outputs. Here, we consider an application of banking industry which is containing bank branches with inputs and 16 outputs. These inputs and outputs are gathered in Table 1 and Table respectively. Since the number of inputs and outputs of each U with comparison of amount of Us is high, therefore all units are detected as efficient U after evaluation by model (). To solve this problem we applied RST in inputs and outputs sets. n this approach Us are the obects and inputs and outputs (components) are attributes. Also, the classification of components are considered as the attributes values, V a. These classifications are based on the uniform distributions. The classification of inputs and outputs in groups are in Table. Discernibility relation between Us (attributes) can be detected from Table by comparing inputs and outputs of Us. For example Us,,, 5, 6, 8, 9, 11, 1, 1 are in a same class of inputs set (note that here we ust consider inputs attributes in order to find reducts and cores of inputs). Therefore those Us which are not in a same class can discern each other. Discernibility matrix can be defined as D [ di ] nn, where d { U can discern from U by, k 1,...,}. i k i k For example, d 1, first column of D, is as 16
Aust. J. Basic & Appl. Sci., 6(): 158-16, 1,,, 1,,, 1, Table 1: nputs U 1 U1 1..5 111.99 111.99 U 9..68 856.15 856.15 U.6.9 119.8 119.8 U 69.. 11.1 11.1 U5 5..68 556.1 556.1 U6 16..61 16.69 16.69 U 9.51.8 199. 199. U8 6.6.6 168.6 168.6 U9.1. 1.9 1.9 U1 68.8. 566.8 566.8 U11 11.6.19 669.98 669.98 U1 1.65.9 698.1 698.1 U1 6.11.6 1.8 1.8 U1 18. 1.1 865.9 865.9 U15 5.9 1.5 9.8 9.8 U16 58.1.9 19.9 19.9 U1 15.. 186.1 186.1 U18 11.9 1.6 6515.61 6515.61 U19.1.5 89.56 89.56 U 1.6 1.6 98. 98. Table : Outputs. U U19 U18 U1 U16 U15 U1 U1 U1 U11 U1 U9 U8 U U6 U5 U U U U1 U 69 1 6 1 6 9 9 16 9 58 5 1 89 8 O1 8 89 1 1 1 8 9 6 9 1 9 8 5 6 15 58 1 O 1 11 15 58 51 18 6 1 118 68 1 6 9 16 5 69 9 1 O 5 51 1 68 59 5 9 1 1 5 8 85 89 6 6 66 O 98 8 6516 18 1 56 199 1 119 1 865 11 698 6 18 1685 556 11 6 856 11 O5 85 51 1 9 1 1 5 8 8 5 165 6 86 11 1 199 6 O6 1 81 591 9 6 11 8 1 55 11 15 69 6 55 1 1 19 9 O 9 158 55 918 6 9 155 81 8 119 68 696 588 66 6 1661 6 O8 8 11 66 6 1 59 6 1 18 65 169 9 1 O9 85 8 8 5 16 6 198 696 99 59 5 5 6 516 69 85 96 881 5 6 O1 6 11 999 11 1 5 55 51 565 9 5 8 16 1 18 181 166 O11 95 196 18 9 9 6 8 1 81 6195 665 155 1111 66 61 189 8 5 89 81 915 1 O1 6 65 1 11 5 69 18 16 68 11 8 5 1 8 59 9 O1 6 16 19 6 66 O1 1 9 15 5 86 68 5865 51 8 55 596 1511 1598 861 958 885 88 1 5 955 56 11 8 5 1 9999 8 68 69 96 6 6 11 15 9 5 8 O15 9 9 8685 O16 n order to find reducts and Cores, Discernibility function must be constructed. This function, which can be computed by discernibility matrix, has the following form: f ( A) f ( A) f ( A)... f ( A) 1 n where f ( A ) is discernibility function related to column i. For example i 161
Aust. J. Basic & Appl. Sci., 6(): 158-16, 1 f 1( A) ( 1 ) ( ) ( ) f ( ) 1 A is obtained by definition of operations and, rough set operations [8]. Total discernibility function (of inputs) can be obtained similarly. Here, t is equals f 1 ( A ), i.e. f ( A).Therefore {, }, {, } are reducts of inputs and is the core. t means we can consider one of reducts instead of all inputs. Also,, core, is the most important input. Corresponding reducts of outputs can be attained as follows: reducts: { O, O, O11, O, O 8}, { O, O, O11, O, O 16},{ O, O, O11, O, O 1},{ O, O, O11, O1, O 8} cores: O, O, O 11 Table : Classification of inputs and outputs component. 1 O1 O O O O5 O6 O O8 O9 O1 O11 O1 O1 O1 O15 O16 U1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 U 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 U 1 1 1 1 1 1 1 1 1 1 1 1 1 1 U 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 U 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 U 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 U 1 1 U 8 1 1 1 1 1 1 1 1 1 1 1 1 1 1 U 9 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 U 1 1 1 1 1 1 1 1 1 1 1 U 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 U 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 U 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 U 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 U 15 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 U 16 1 1 1 1 1 1 1 1 1 1 1 1 1 U 1 1 1 1 1 1 1 1 1 1 1 1 1 U 18 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 U 19 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 U 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Table : Computational results of model () related to input and output reducts { O, O, O, O, O } { O, O, O, O, O } 11 8 {, } 11 16 {, } U1 1. 1. 1. 1. U.85.9.8 1. U.6.6.65.65 U.518.6.1.8 U5 1. 1. 1..661 U6.898.59.61.66 U.866.866.86.86 U8.996 1..9.956 U9 1. 1. 1. 1. U1.99.99.55. U11 1. 1. 1. 1. U1.6655.6655.6655.6655 U1 1. 1. 1. 1. U1.91.91.91.8 U15.116.58.8.116 U16.666.15.665.86 U1 1. 1. 1. 1. U18.6.6.6.6 U19 1. 1. 1. 1. U 1. 1. 1. 1. 16 { O, O, O, O, O } 11 1 {, } { O, O, O11, O1, O8} {, } Results in Table show that CCR model to evaluate efficiencies had been resolved with high amount of input and output components based on reducts. Also, it is noteworthy to say that there exist meaningless differences between the related efficiency of each reduct. -Conclusion: n this paper, considering the properties of Rough set theory in the criteria classification and distinguishing the criteria minimal sets, along with providing an example the implementation of RST in input and output reduction is proposed. By this approach the necessary and unnecessary inputs and outputs can be discovered. This paper contains elementary arguments between RST and DEA. RST can be used to assess the performance of units that include qualitative inputs and outputs. Studying this topic is suggested for future research.
Aust. J. Basic & Appl. Sci., 6(): 158-16, 1 ACKNOWLEDGMENT Authors would like to thank anonymous referees. This research was supported by a grant from the Shahre- Rey Branch, slamic Azad University. REFERENCES Bowlin, W.F., 1998. Measuring erformance: An ntroduction to Data Envelopment Analysis (DEA). Journal of Cost Analysis, (1): -8. Charnes, A., W.W. Cooper and E. Rhodes, 198. Measuring the Efficiency of Decision Making Units. European Journal of Operational Research, (6): 9-. awlak, Z., 198. Rough Sets. nternational Journal of Computer nformation Science, 11: 1-56. awlak, Z., 1991. Rough Sets: Theoretical Aspects of Reasoning about Data. Dordrecht: Kluwer. olkowski, L. and A. Skowron, 1998. Rough Sets in Knowledge Discovery &. Heidelberg: hysica- Verlag. olkowski, L.,. Rough Sets: Mathematical Foundations. Heidelberg: hysica-verlag. Raab, R. and R. Lichty,. dentifying Sub-areas that Comprise a Greater Metropolitan Area: The Criterion of County Relative Efficiency. Journal of Regional Science, : 59-59. Walczak, B. and D.L. Massart, 1999. Rough sets theory. Chemometrics and ntelligent Laboratory Systems, : 1-16. 16