High Frequency Rough Set Model based on Database Systems

Similar documents
Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix

On Improving the k-means Algorithm to Classify Unclassified Patterns

Ensembles of classifiers based on approximate reducts

Learning Rules from Very Large Databases Using Rough Multisets

The size of decision table can be understood in terms of both cardinality of A, denoted by card (A), and the number of equivalence classes of IND (A),

A Logical Formulation of the Granular Data Model

An algorithm for induction of decision rules consistent with the dominance principle

Mining Approximative Descriptions of Sets Using Rough Sets

Rough Set Approaches for Discovery of Rules and Attribute Dependencies

Modeling the Real World for Data Mining: Granular Computing Approach

ROUGH set methodology has been witnessed great success

Classification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach

Issues in Modeling for Data Mining

Research on Complete Algorithms for Minimal Attribute Reduction

Sets with Partial Memberships A Rough Set View of Fuzzy Sets

Interpreting Low and High Order Rules: A Granular Computing Approach

Granular Computing: Granular Classifiers and Missing Values

A Scientometrics Study of Rough Sets in Three Decades

ARPN Journal of Science and Technology All rights reserved.

ROUGH SET THEORY FOR INTELLIGENT INDUSTRIAL APPLICATIONS

FUZZY PARTITIONS II: BELIEF FUNCTIONS A Probabilistic View T. Y. Lin

Knowledge Approximations and Representations in Binary Granular Computing

2 WANG Jue, CUI Jia et al. Vol.16 no", the discernibility matrix is only a new kind of learning method. Otherwise, we have to provide the specificatio

Minimal Attribute Space Bias for Attribute Reduction

Three Discretization Methods for Rule Induction

Rough Set Approach for Generation of Classification Rules for Jaundice

Rough Set Approach for Generation of Classification Rules of Breast Cancer Data

Parameters to find the cause of Global Terrorism using Rough Set Theory

Computers and Mathematics with Applications

Knowledge Discovery Based Query Answering in Hierarchical Information Systems

Neighborhoods Systems: Measure, Probability and Belief Functions

A new Approach to Drawing Conclusions from Data A Rough Set Perspective

IN the areas of machine learning, artificial intelligence, as

On Rough Set Modelling for Data Mining

Rough Sets for Uncertainty Reasoning

Data Analysis - the Rough Sets Perspective

Classification Based on Logical Concept Analysis

Describing Data Table with Best Decision

Feature Selection with Fuzzy Decision Reducts

Application of Rough Set Theory in Performance Analysis

Rough Set Model Selection for Practical Decision Making

Data Dependencies in the Presence of Difference

Concept Lattices in Rough Set Theory

A PRIMER ON ROUGH SETS:

Some remarks on conflict analysis

CRITERIA REDUCTION OF SET-VALUED ORDERED DECISION SYSTEM BASED ON APPROXIMATION QUALITY

Selected Algorithms of Machine Learning from Examples

Action rules mining. 1 Introduction. Angelina A. Tzacheva 1 and Zbigniew W. Raś 1,2,

An Approach to Classification Based on Fuzzy Association Rules

Dynamic Programming Approach for Construction of Association Rule Systems

Less is More: Non-Redundant Subspace Clustering

Data Mining. Preamble: Control Application. Industrial Researcher s Approach. Practitioner s Approach. Example. Example. Goal: Maintain T ~Td

On Granular Rough Computing: Factoring Classifiers through Granulated Decision Systems

METRIC BASED ATTRIBUTE REDUCTION IN DYNAMIC DECISION TABLES

Drawing Conclusions from Data The Rough Set Way

Mining in Hepatitis Data by LISp-Miner and SumatraTT

Decision tables and decision spaces

Bhubaneswar , India 2 Department of Mathematics, College of Engineering and

International Journal of Approximate Reasoning

Research Article Special Approach to Near Set Theory

Constraint-based Subspace Clustering

Approximate counting: count-min data structure. Problem definition

the tree till a class assignment is reached

Decomposing and Pruning Primary Key Violations from Large Data Sets

Banacha Warszawa Poland s:

1 Introduction Rough sets theory has been developed since Pawlak's seminal work [6] (see also [7]) as a tool enabling to classify objects which are on

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

P leiades: Subspace Clustering and Evaluation

Granularity, Multi-valued Logic, Bayes Theorem and Rough Sets

Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events

On The Complexity of Quantum Circuit Manipulation

Positional Analysis in Fuzzy Social Networks

Algorithmic Methods of Data Mining, Fall 2005, Course overview 1. Course overview

Investigating Measures of Association by Graphs and Tables of Critical Frequencies

A DYNAMIC PROGRAMMING APPROACH. Guy Hawerstock Dan Ilan

2D Spectrogram Filter for Single Channel Speech Enhancement

Discovery of Concurrent Data Models from Experimental Tables: A Rough Set Approach

A Possibilistic Decision Logic with Applications

Adversarial Classification

Fuzzy Rough Sets with GA-Based Attribute Division

Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms

STUDY OF PERMUTATION MATRICES BASED LDPC CODE CONSTRUCTION

Computational Intelligence, Volume, Number, VAGUENES AND UNCERTAINTY: A ROUGH SET PERSPECTIVE. Zdzislaw Pawlak

Quantization of Rough Set Based Attribute Reduction

Brock University. Probabilistic granule analysis. Department of Computer Science. Ivo Düntsch & Günther Gediga Technical Report # CS May 2008

Action Rule Extraction From A Decision Table : ARED

ENTROPIES OF FUZZY INDISCERNIBILITY RELATION AND ITS OPERATIONS

A New Approach to Estimating the Expected First Hitting Time of Evolutionary Algorithms

43400 Serdang Selangor, Malaysia Serdang Selangor, Malaysia 4

The Fourth International Conference on Innovative Computing, Information and Control

CS570 Introduction to Data Mining

Making Fast Buffer Insertion Even Faster via Approximation Techniques

Similarity and Dissimilarity

A Theory of Forgetting in Logic Programming

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Multi-objective Quadratic Assignment Problem instances generator with a known optimum solution

Discovery of Functional and Approximate Functional Dependencies in Relational Databases

Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results

Favoring Consensus and Penalizing Disagreement in Group Decision Making

Naive Bayesian Rough Sets

Transcription:

High Frequency Rough Set Model based on Database Systems Kartik Vaithyanathan kvaithya@gmail.com T.Y.Lin Department of Computer Science San Jose State University San Jose, CA 94403, USA tylin@cs.sjsu.edu Abstract - Rough sets theory was proposed by Pawlak in the 1980s and has been applied successfully in a lot of domains. One of the key concepts of the rough sets model is the computation of core and reduct. It has been shown that finding the minimal reduct is an NP-hard problem and its computational complexity has implicitly restricted its effective applications to a small and clean data set. In order to improve the efficiency of computing core attributes and reducts, many novel approaches have been developed, some of which attempt to integrate database technologies. This paper proposes a novel approach to computing reducts called high frequency value reducts using database system concepts. The method deals directly with generating value reducts and also prunes the decision table by placing a lower bound on the frequency of equivalence values in the decision table. I. INTRODUCTION Rough sets theory was proposed by Pawlak [8,9] in the 1980s and has been applied successfully in a lot of domains. One of the key concepts of the rough sets model is the computation of core and reduct. Multiple approaches to improve the efficiency of finding core attributes and reducts have been developed [2], including the algorithms presented in [5], which largely improve the generation of discernability relation by sorting the objects. Some authors have proposed approaches to reduce data size using relational database system techniques [4] and developed rough-set based data mining systems that integrate RDBMS capabilities [3]. Another approach redefined the concepts of rough set theory such as core attributes and reducts by leveraging set-oriented database operations [7]. The current approach extends the extraction of various sizes of inter connected Pawlak information systems [1] while leveraging existing relational database concepts and operations. An in-depth example to illustrate the nuances of this approach is also provided. II. APPROACH A decision table such as the one shown in Table I may have more than one value reducts. Any one of them can be used to replace the original table. Finding all the value reducts by eliminating unnecessary attributes from a decision table is NPhard [6]. Attributes that are redundant given other attributes are perceived as unnecessary. Table I shows a database table of 12 cars with information about the Weight, Door, Size, Cylinder and Mileage. Weight, Door, Size and Cylinder are the condition attributes (represented as ) and Mileage is the decision attribute (represented as ). The attribute Tuple_ID is provided for theoretical understanding only. TABLE I 12 CARS WITH ATTRIBUTES WEIGHT, DOOR, SIZE, CYLINDER AND MILEAGE Tuple_ID Weight Door Size Cylinder Mileage t 1 low 2 compact 4 high t 2 low 4 sub 6 low t 3 medium 4 compact 4 high t 4 high 2 compact 6 low t 5 high 4 compact 4 low t 6 low 4 compact 4 high t 7 high 4 sub 6 low t 8 low 2 sub 6 low t 9 medium 2 compact 4 high t 10 medium 4 sub 4 high t 11 medium 2 compact 4 low t 12 medium 4 sub 4 low The traditional approaches (including [7]) perform a two step process in identifying the value reducts the first step involves obtaining the minimal attribute reducts by eliminating unnecessary attributes without sacrificing the accuracy of the classification model. The second step is to generate the value reducts for each attribute reduct. Most approaches to computing core and reduct assume that: (a) a tuple in the decision table always contributes to the classification model and is not an outlier (b) all tuples in the decision table are consistent. There are two aspects to the new approach and is explained below. The first aspect states that tuples that occur above a certain lower bound threshold are the only ones that contributes to the classification model (decision). As a result, only high frequency rules that contribute to a decision are short-listed. A trivial case of the lower bound (= 1) is equivalent to the traditional approaches of computing value reducts. The high frequency rule prunes the decision table data and is the first key differentiator in this approach. An algorithm is outlined to ensure that the decision table is consistent before applying the proposed high frequency rule to the tuples in the decision table. The second novel step in this approach is to directly generate the value reducts instead of a 978-1-4244-2352-1/08/$25.00 2008 IEEE

two-step process of first identifying the attribute reducts and then subsequently generating the value reducts. III. CONSISTENT HIGH FREQUENCY DECISION RULES Given a decision table with rows and condition attributes, the number of possible decision rules is. The high frequency pruning will eliminate some of the decision rules. Every decision rule can then be analyzed to determine the existence of a value reduct (minimal decision rule). The generation of a decision rule set DR(X,D) can be expressed using SQL statements of the form SELECT * FROM (SELECT X, D FROM T) (1) where X is a subset of C. There are possible values of X. There are rows in each decision rule set. The inconsistent tuples in a decision rule set DR(X,D) are obtained using the following SQL SELECT * FROM DR(X,D) DR 1 WHERE EXISTS ( SELECT * FROM DR(X,D) DR 2 WHERE ((DR 1.X = DR 2.X) AND (DR 1.D!= DR 2.D)) ) (2) A consistent decision rule set DR(X,D) is obtained by removing the tuples in DR 1 (X,D) above from the original set DR(X,D) in (1). SELECT * FROM DR(X,D) MINUS SELECT * FROM DR 1 (X,D) (3) The running time for (2) is and (3) is where is the number of tuples (rows) in the decision table. The overall running time for weeding out inconsistent tuples is. The high frequency pruning is executed on the consistent decision rule set DR(X,D) from the previous step and can be expressed in SQL as SELECT X, D, COUNT(*) as Frequency FROM DR(X,D) GROUP BY X, D HAVING COUNT(*) >= MIN_FREQ (4) where MIN_FREQ represents the minimum value of the high frequency rule. The sorting process (i.e. the GROUP BY) is running time and the counting and pruning is running time where is the number of tuples (rows) in the decision table. Thus, the running time for the high frequency pruning is. The worst-case running time for obtaining a consistent, high frequency decision rule set DR(X,D) is a polynomial function of the number of rows ( ) in the decision table. The overall running time for creation of all the possible consistent, high frequency decision rules for a decision table is where is the number of rows and is the number of attributes (columns) in the decision table. III. VALUE REDUCTS The goal is to find the value reducts for each tuple in a decision rule set DR(X,D). If DR(X,D) is an -attribute decision rule set ( ), there are decision rules for each tuple in the decision rule set each having - attribute values lesser than the original tuple ( ). Each of these decision rules is analyzed for consistency and one or more minimal decision rules (a consistent decision rule with the least number of attributes) are chosen for every tuple and comprise the value reducts for that tuple. The generation of value reducts is explained in detail with an example in Section V. V. ILLUSTRATIVE EXAMPLE The representative example in Table I is broken down into its sub sets along with the high frequency values for illustration below. Each sub set is analyzed for two high frequencies (a) is equivalent to the traditional approach to computing value reducts and (b) MIN_FREQ = 2 to illustrate the high frequency pruning to computing value reducts. All tuples that are inconsistent and don t meet the high frequency criterion are discarded and are highlighted in the tables. In addition, the inconsistent tuples are noted below the respective decision rule set. The value reducts are also provided for each of these decision rule sets. TABLE II 1-ATTRIBUTE HIGH FREQUENCY DECISION RULES: {WEIGHT}, {DOOR}, {SIZE}, AND {CYLINDER} AND AND 2 (1.1) Decision Rules Tuple_ID Weight Mileage Frequency t 1, t 6 low high 2 t 2, t 8 low low 2 t 3, t 9, t 10 medium high 3 t 4, t 5, t 7 high low 3 t 11, t 12 medium low 2

(1.2) Decision Rules Tuple_ID Door Mileage Frequency t 1, t 9 2 high 2 t 2, t 5, t 7, t 12 4 low 4 t 3, t 6, t 10 4 high 3 t 4, t 8, t 11 2 low 3 (1.3) Decision Rules Tuple_ID Size Mileage Frequency t 1, t 3, t 6, t 9 compact high 4 t 2, t 7, t 8, t 12 sub low 4 t 4, t 5, t 11 compact low 3 t 10 sub high 1 (1.4) Decision Rules Tuple_ID Cylinder Mileage Frequency t 1, t 3, t 6, t 9, t 10 4 high 5 t 2, t 4, t 7, t 8 6 low 4 t 5, t 11, t 12 4 low 3 TABLE III 2-ATTRIBUTE HIGH FREQUENCY DECISION RULES: {WEIGHT, DOOR}, {WEIGHT, SIZE}, {WEIGHT, CYLINDER}, {DOOR, SIZE}, {DOOR, CYLINDER}, {SIZE, CYLINDER} AND AND 2 (2.1) Decision Rules (9 possible decision rules) Tuple_ID Weight Door Mileage t 1 low 2 high t 2 low 4 low t 3 medium 4 high t 4 high 2 low t 5 high 4 low t 6 low 4 high t 7 high 4 low t 8 low 2 low t 9 medium 2 high t 10 medium 4 high t 11 medium 2 low t 12 medium 4 low (3 possible decision rules) Tuple_ID Weight Door Mileage Frequency t 1 low 2 high 1 t 2 low 4 low 1 t 3, t 10 medium 4 high 2 t 4 high 2 low 1 Tuple_ID Weight Door Mileage Frequency t 5, t 7 high 4 low 2 t 6 low 4 high 1 t 8 low 2 low 1 t 9 medium 2 high 1 t 11 medium 2 low 1 t 12 medium 4 low 1 {t 1, t 8 },{t 2, t 6 },{t 9, t 11 } and {{t 3, t 10 }, t 12 } are eliminated due to inconsistency. (a) Consistent Decision Rule: ({high, 4} (W, D) {low} (M) ) Minimal Decision Rules: ({high} (W) {low} (M) ), ({4} (D) {low} (M) ) [{4} (D) could imply {high} (M) hence not a value reduct] (2.2) Decision Rules (21 possible decision rules) Tuple_ID Weight Size Mileage t 1 low compact high t 2 low sub low t 3 medium compact high t 4 high compact low t 5 high compact low t 6 low compact high t 7 high sub low t 8 low sub low t 9 medium compact high t 10 medium sub high t 11 medium compact low t 12 medium sub low Value Reduct: ({low, compact} (W,S) {high} (M) ), ({low, sub} (W,S) {low} (M) ),({high} (W) {low} (M) ) (9 possible decision rules) Tuple_ID Weight Size Mileage Frequency t 1, t 6 low compact high 2 t 2, t 8 low sub low 2 t 3, t 9 medium compact high 2 t 4, t 5 high compact low 2 t 7 high sub low 1 t 10 medium sub high 1 t 11 medium compact low 1 t 12 medium sub low 1 {{t 3, t 9 }, t 11 } and {t 10, t 12 } are eliminated due to inconsistency (a) Consistent Decision Rule: ({low, compact} (W,S) {high} (M) ) Minimal Decision Rules: ({low} (W) {high} (M) ), ({compact} (S) {high} (M) ) Both are not valid [{low} (W) could imply {low} (D) and {compact} (S) could imply {low} (D) ] Value Reduct: ({low, compact} (W,S) {high} (M) ) (b) Consistent Decision Rule: ({low, sub} (W,S) {low} (M) ) Minimal Decision Rules: ({low} (W) {low} (M) ), ({sub} (S) {low} (M) ) Both are not valid [{low} (W) could imply {high} (D) and {sub} (S) could imply {high} (D) ] Value Reduct: ({low, sub} (W,S) {low} (M) ) (c) Consistent Decision Rule: ({high, compact} (W,S) {low} (M) ) Minimal Decision Rules: ({high} (W) {low} (M) ), ({compact} (S) {low} (M) ) [{compact} (S) could imply {high} (M) - hence not a value reduct] (2.3) Decision Rules

(21 possible decision rules) Tuple_ID Weight Cylinder Mileage t 1 low 4 high t 2 low 6 low t 3 medium 4 high t 4 high 6 low t 5 high 4 low t 6 low 4 high t 7 high 6 low t 8 low 6 low t 9 medium 4 high t 10 medium 4 high t 11 medium 4 low t 12 medium 4 low Value Reduct: ({low, 4} (W,C) {high} (M) ),({6} (C) {low} (M) ),({high} (W) {low} (M) ) (9 possible decision rules) Tuple_ID Weight Cylinder Mileage Frequency t 1, t 6 low 4 high 2 t 2, t 8 low 6 low 2 t 3, t 9, t 10 medium 4 high 3 t 4, t 7 high 6 low 2 t 5 high 4 low 1 t 11, t 12 medium 4 low 2 {{t 3, t 9, t 10 },{t 11, t 12 }} is eliminated due to inconsistency (a) Consistent Decision Rule: ({low, 4} (W,C) {high} (M) ) Minimal Decision Rules: ({low} (W) {high} (M) ), ({4} (C) {high} (M) ) Both are not valid. Value Reduct: ({low, 4} (W,C) {high} (M) ) (b) Consistent Decision Rule: ({low, 6} (W,C) {low} (M) ) Minimal Decision Rules: ({low} (W) {low} (M) ), ({6} (C) {low} (M) ) (c) Consistent Decision Rule: ({high, 6} (W,C) {low} (M) ) Minimal Decision Rules: ({high} (W) {low} (M) ), ({6} (C) {low} (M) ), ({6} (C) {low} (M) ) (2.4) Decision Rules (3 possible decision rules) Tuple_ID Door Size Mileage t 1 2 compact high t 2 4 sub low t 3 4 compact high t 4 2 compact low t 5 4 compact low t 6 4 compact high t 7 4 sub low t 8 2 sub low t 9 2 compact high t 10 4 sub high t 11 2 compact low t 12 4 sub low Value Reduct: ({2, sub} (D,S) {low} (M) ) (no possible decision rules) Tuple_ID Door Size Mileage Frequency t 1, t 9 2 compact high 2 t 2, t 7, t 12 4 sub low 3 t 3, t 6 4 compact high 2 t 4, t 11 2 compact low 2 Tuple_ID Door Size Mileage Frequency t 5 4 compact low 1 t 8 2 sub low 1 t 10 4 sub high 1 {{t 1, t 9 }, {t 4, t 11 }}, {{t 2, t 7, t 12 }, t 10 }, {{t 3, t 6 }, t 5 } are eliminated due to inconsistency (2.5) Decision Rules (12 possible decision rules) Tuple_ID Door Cylinder Mileage t 1 2 4 high t 2 4 6 low t 3 4 4 high t 4 2 6 low t 5 4 4 low t 6 4 4 high t 7 4 6 low t 8 2 6 low t 9 2 4 high t 10 4 4 high t 11 2 4 low t 12 4 4 low (4 possible decision rules) Tuple_ID Door Cylinder Mileage Frequency t 1, t 9 2 4 high 2 t 2, t 7 4 6 low 2 t 3, t 6, t 10 4 4 high 3 t 4, t 8 2 6 low 2 t 5, t 12 4 4 low 2 t 11 2 4 low 1 {{t 1, t 9 }, t 11 },{{t 3, t 6, t 10 },{t 5, t 12 }} are eliminated due to inconsistency (a) Consistent Decision Rule: ({4, 6} (D,C) {low} (M) ) Minimal Decision Rules: ({4} (D) {low} (M) ), ({6} (C) {low} (M) ) (b) Consistent Decision Rule: ({2, 6} (D,C) {low} (M) ) Minimal Decision Rules: ({2} (D) {low} (M) ), ({6} (C) {low} (M) ) (2.6) Decision Rules (15 possible decision rules) Tuple_ID Size Cylinder Mileage t 1 compact 4 high t 2 sub 6 low t 3 compact 4 high t 4 compact 6 low t 5 compact 4 low t 6 compact 4 high t 7 sub 6 low t 8 sub 6 low t 9 compact 4 high t 10 sub 4 high t 11 compact 4 low t 12 sub 4 low,({compact, 4} (S,C) {low} (M) ) (3 possible decision rules) Tuple_ID Size Cylinder Mileage Frequency t 1, t 3, t 6, t 9 compact 4 high 4

Tuple_ID Size Cylinder Mileage Frequency t 2, t 7, t 8 sub 6 low 3 t 4 compact 6 low 1 t 5 compact 4 low 1 t 10 sub 4 high 1 t 11 compact 4 low 1 t 12 sub 4 low 1 {{t 1, t 3, t 6, t 9 }, t 11 },{t 10, t 12 } are eliminated due to inconsistency (a) Consistent Decision Rule: ({sub, 6} (S,C) {low} (M) ) Minimal Decision Rules: ({sub} (S) {low} (M) ), ({6} (C) {low} (M) ) TABLE IV 3-ATTRIBUTE HIGH FREQUENCY DECISION RULES: {WEIGHT, DOOR, SIZE}, {DOOR, SIZE, CYLINDER}, {WEIGHT, SIZE, CYLINDER}, {WEIGHT, DOOR, CYLINDER} AND AND 2 (3.1) Decision Rules (56 possible decision rules) Tuple_ID Weight Door Size Mileage t 1 low 2 compact high t 2 low 4 sub low t 3 medium 4 compact high t 4 high 2 compact low t 5 high 4 compact low t 6 low 4 compact high t 7 high 4 sub low t 8 low 2 sub low t 9 medium 2 compact high t 10 medium 4 sub high t 11 medium 2 compact low t 12 medium 4 sub low Value Reduct: ({low, compact} (W,S) {high} (M) ), ({low, sub} (W,S) {low} (M) ), ({medium, 4, compact} (W,D,S) {high} (M) ), ({high} (W) {low} (M) ),({2, sub} (D,S) {low} (M) ) (no possible decision rules) Tuple_ID Weight Door Size Mileage Frequency t 1 low 2 compact high 1 t 2 low 4 sub low 1 t 3 medium 4 compact high 1 t 4 high 2 compact low 1 t 5 high 4 compact low 1 t 6 low 4 compact high 1 t 7 high 4 sub low 1 t 8 low 2 sub low 1 t 9 medium 2 compact high 1 t 10 medium 4 sub high 1 t 11 medium 2 compact low 1 t 12 medium 4 sub low 1 {t 9, t 11 } and {t 10, t 12 } are eliminated due to inconsistency (3.2) Decision Rules (28 possible decision rules) Tuple_ID Door Size Cylinder Mileage t 1 2 compact 4 high t 2 4 sub 6 low t 3 4 compact 4 high t 4 2 compact 6 low t 5 4 compact 4 low t 6 4 compact 4 high t 7 4 sub 6 low Tuple_ID Door Size Cylinder Mileage t 8 2 sub 6 low t 9 2 compact 4 high t 10 4 sub 4 high t 11 2 compact 4 low t 12 4 sub 4 low Value Reduct: ({2, sub} (D,S) {low} (M) ), ({6} (C) {low} (M) ) (7 possible decision rules) Tuple_ID Door Size Cylinder Mileage Frequency t 1, t 9 2 compact 4 high 2 t 2, t 7 4 sub 6 low 2 t 3, t 6 4 compact 4 high 2 t 4 2 compact 6 low 1 t 5 4 compact 4 low 1 t 8 2 sub 6 low 1 t 10 4 sub 4 high 1 t 11 2 compact 4 low 1 t 12 4 sub 4 low 1 {{t 1, t 9 }, t 11 },{{t 3, t 6 }, t 5 } and {t 10, t 12 } are eliminated due to inconsistency (a) Consistent Decision Rule: ({4, sub, 6} (D,S,C) {low} (M) ) Minimal Decision Rules: ({4, sub} (D,S) {low} (M) ), ({sub, 6} (S,C) {low} (M) ), ({4, 6} (D,C) {low} (M) ) Minimal Decision Rules: ({sub} (S) {low} (M) ), ({6} (C) {low} (M) ), ({4} (D) {low} (M) ), ({6} (C) {low} (M) ) (3.3) Decision Rules (49 possible decision rules) Tuple_ID Weight Size Cylinder Mileage t 1 low compact 4 high t 2 low sub 6 low t 3 medium compact 4 high t 4 high compact 6 low t 5 high compact 4 low t 6 low compact 4 high t 7 high sub 6 low t 8 low sub 6 low t 9 medium compact 4 high t 10 medium sub 4 high t 11 medium compact 4 low t 12 medium sub 4 low Value Reduct: ({low, compact} (W,S) {high} (M) ),({low, 4} (W,C) {high} (M) ), ({low, sub} (W,S) {low} (M) ),({6} (C) {low} (M) ),({high} (W) {low} (M) ) (14 possible decision rules) Tuple_ID Weight Size Cylinder Mileage Frequency t 1, t 6 low compact 4 high 2 t 2, t 8 low sub 6 low 2 t 3, t 9 medium compact 4 high 2 t 4 high compact 6 low 1 t 5 high compact 4 low 1 t 7 high sub 6 low 1 t 10 medium sub 4 high 1 t 11 medium compact 4 low 1 t 12 medium sub 4 low 1 {{t 3, t 9 }, t 11 } and {t 10, t 12 } are eliminated due to inconsistency (a) Consistent Decision Rule: ({low, compact, 4} (W,S,C) {high} (M) ) Minimal Decision Rules: ({low, compact} (W,S) {high} (M) ), ({compact, 4} (S,C) {high} (M) ), ({low, 4} (W,C) {high} (M) ) Minimal Decision Rules: ({low} (W) {high} (M) ), ({compact} (S) {high} (M) ), ({low} (W) {high} (M) ), ({4} (C) {high} (M) )

Value Reduct: ({low, compact} (W,S) {high} (M) ), ({low, 4} (W,C) {high} (M) ) (b) Consistent Decision Rule: ({low, sub, 6} (W,S,C) {low} (M) ) Minimal Decision Rules: ({low, sub} (W,S) {low} (M) ), ({sub, 6} (S,C) {low} (M) ), ({low, 6} (W,C) {low} (M) ) Minimal Decision Rules: ({low} (W) {low} (M) ), ({sub} (S) {low} (M) ), ({sub} (S) {low} (M) ), ({6} (C) {low} (M) ), ({low} (W) {low} (M) ), ({6} (C) {low} (M) ) Value Reduct: ({low, sub} (W,S) {low} (M) ), ({6} (C) {low} (M) ) (3.4) Decision Rules (49 possible decision rules) Tuple_ID Weight Door Cylinder Mileage t 1 low 2 4 high t 2 low 4 6 low t 3 medium 4 4 high t 4 high 2 6 low t 5 high 4 4 low t 6 low 4 4 high t 7 high 4 6 low t 8 low 2 6 low t 9 medium 2 4 high t 10 medium 4 4 high t 11 medium 2 4 low t 12 medium 4 4 low Value Reduct: ({low, 4} (W,C) {high} (M) ),({6} (C) {low} (M) ),({high} (W) {low} (M) ) (no possible decision rules) Tuple_ID Weight Door Cylinder Mileage Frequency t 1 low 2 4 high 1 t 2 low 4 6 low 1 t 3, t 10 medium 4 4 high 2 t 4 high 2 6 low 1 t 5 high 4 4 low 1 t 6 low 4 4 high 1 t 7 high 4 6 low 1 t 8 low 2 6 low 1 t 9 medium 2 4 high 1 t 11 medium 2 4 low 1 t 12 medium 4 4 low 1 {{t 3, t 10 }, t 12 } and {t 9, t 11 } are eliminated due to inconsistency The final list of all value reducts for Table I are documented as follows: CONCLUSION In summary, a new approach has been proposed to directly generate high frequency value reducts without any knowledge of attribute reducts for a given decision table. The running time is a polynomial function of the number of rows (m) and an exponential function of the number of columns (n) in the decision table. This approach combines value reduct generation together with high frequency pruning of equivalence values while leveraging set-oriented database operations. The number of iterations for high frequency value reducts is lesser than the traditional approach to creating value reducts by considering every tuple in the decision table. Our future work will involve application of this approach to large data sets stored in database systems as well as knowledge discovery in very large data sets. REFERENCES [1] Tsau Young Lin, Rough Set Theory in Very Large Databases, Symposium on Modeling, Analysis and Simulation, CESA 96 IMACS Multi Conference (Computational Engineering in Systems Applications), Lille, France, July 9-12, 1996, Vol. 2 of 2, 936-941. [2] Bazan, J., Nguyen, H., Nguyen, S., Synak, P., Wroblewski, J., Rough set algorithms in classification problems, Rough Set Methods and Applications: New Developments in Knowledge Discovery in Information Systems, L. Polkowski, T. Y. Lin, and S. Tsumoto (eds), 49-88, Physica-Verlag, Heidelberg, Germany, 2000. [3] Fernandez-Baizan, A., Ruiz, E., Sanchez, J., Integrating RDMS and Data Mining Capabilities Using Rough Sets, Proc. IMPU, Granada, Spain, 1996. [4] Kumar A., New Techniques for Data Reduction in Database Systems for Knowledge Discovery Applications, Journal of Intelligent Information Systems, 10(1), 31-48, 1998. [5] Nguyen, H., Nguyen, S., Some efficient algorithms for rough set methods, Proc. IPMU Granada, Spain, 1451-1456, 1996. [6] A. Skowron, C. Rauszer The discernibility matrices and functions in information systems, Decision Support by Experience - Application of the Rough Sets Theory, R. Slowinski (ed.), Kluwer Academic Publishers, 1992, pp. 331-362. [7] Xiaohua Hu, T. Y. Lin, Jianchao Han, A New Rough Sets Model Based on Database Systems, Fundamenta Informaticae, v.59 n.2-3, p.135-152, April 2004. [8] Pawlak Z., Rough Sets, International Journal of Information and Computer Science, 11(5), 341-356, 1982 [9] Pawlak Z., Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, 1992 High Frequency Value Reducts Value Reducts (MIN_FREQ=1) (MIN_FREQ=2) ({high} (W) {low} (M) ) ({high} (W) {low} (M) ) ({6} (C) {low} (M) ) ({6} (C) {low} (M) ) ({low, compact} (W,S) {high} (M) ) ({low, compact} (W,S) {high} (M) ) ({low, sub} (W,S) {low} (M) ) ({low, sub} (W,S) {low} (M) ) ({low, 4} (W,C) {high} (M) ) ({low, 4} (W,C) {high} (M) ) ({compact, 4} (S,C) {low} (M) ) ({2, sub} (D,S) {low} (M) ) ({medium, 4, compact} (W,D,S) {high} (M) )