CPDA Based Fuzzy Association Rules for Learning Achievement Mining

Similar documents
An Approach to Classification Based on Fuzzy Association Rules

Application of Apriori Algorithm in Open Experiment

Mining Positive and Negative Fuzzy Association Rules

Fuzzy Logic Controller Based on Association Rules

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH

Reductionist View: A Priori Algorithm and Vector-Space Text Retrieval. Sargur Srihari University at Buffalo The State University of New York

A New Method to Forecast Enrollments Using Fuzzy Time Series

A Clear View on Quality Measures for Fuzzy Association Rules

Association Rule. Lecturer: Dr. Bo Yuan. LOGO

Outline. Fast Algorithms for Mining Association Rules. Applications of Data Mining. Data Mining. Association Rule. Discussion

CSE 5243 INTRO. TO DATA MINING

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 6

FUZZY ASSOCIATION RULES: A TWO-SIDED APPROACH

Mining Class-Dependent Rules Using the Concept of Generalization/Specialization Hierarchies

Chapter 6. Frequent Pattern Mining: Concepts and Apriori. Meng Jiang CSE 40647/60647 Data Science Fall 2017 Introduction to Data Mining

Apriori algorithm. Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK. Presentation Lauri Lahti

Weighted Fuzzy Time Series Model for Load Forecasting

Encyclopedia of Machine Learning Chapter Number Book CopyRight - Year 2010 Frequent Pattern. Given Name Hannu Family Name Toivonen

Guaranteeing the Accuracy of Association Rules by Statistical Significance

A FORECASTING METHOD BASED ON COMBINING AUTOMATIC CLUSTERING TECHNIQUE AND FUZZY RELATIONSHIP GROUPS

CSE 5243 INTRO. TO DATA MINING

Failure Mode Screening Using Fuzzy Set Theory

Association Rules. Fundamentals

Fuzzy Local Trend Transform based Fuzzy Time Series Forecasting Model

Data Mining. Dr. Raed Ibraheem Hamed. University of Human Development, College of Science and Technology Department of Computer Science

DATA MINING LECTURE 3. Frequent Itemsets Association Rules

D B M G Data Base and Data Mining Group of Politecnico di Torino

Quantitative Association Rule Mining on Weighted Transactional Data

D B M G. Association Rules. Fundamentals. Fundamentals. Elena Baralis, Silvia Chiusano. Politecnico di Torino 1. Definitions.

D B M G. Association Rules. Fundamentals. Fundamentals. Association rules. Association rule mining. Definitions. Rule quality metrics: example

Regrese a predikce pomocí fuzzy asociačních pravidel

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar..

Detecting Anomalous and Exceptional Behaviour on Credit Data by means of Association Rules. M. Delgado, M.D. Ruiz, M.J. Martin-Bautista, D.

Fuzzy system reliability analysis using time dependent fuzzy set

Association Rule Mining on Web

A Two-Factor Autoregressive Moving Average Model Based on Fuzzy Fluctuation Logical Relationships

TIME-SERIES forecasting is used for forecasting the

CS5112: Algorithms and Data Structures for Applications


TEMPERATUTE PREDICTION USING HEURISTIC DATA MINING ON TWO-FACTOR FUZZY TIME-SERIES

Data Mining Concepts & Techniques

CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014

Data Mining and Analysis: Fundamental Concepts and Algorithms

Mining Strong Positive and Negative Sequential Patterns

Dynamic Programming Approach for Construction of Association Rule Systems

Previous Accomplishments. Focus of Research Iona College. Focus of Research Iona College. Publication List Iona College. Journals

Mining Interesting Infrequent and Frequent Itemsets Based on Minimum Correlation Strength

FUZZY TIME SERIES FORECASTING MODEL USING COMBINED MODELS

A New Fuzzy Positive and Negative Ideal Solution for Fuzzy TOPSIS

DATA MINING - 1DL360

Temperature Prediction Using Fuzzy Time Series

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

CS 484 Data Mining. Association Rule Mining 2

Lecture Notes for Chapter 6. Introduction to Data Mining

COMP 5331: Knowledge Discovery and Data Mining

Data Mining and Knowledge Discovery. Petra Kralj Novak. 2011/11/29

Data Analytics Beyond OLAP. Prof. Yanlei Diao

Mining Molecular Fragments: Finding Relevant Substructures of Molecules

Forecasting Enrollment Based on the Number of Recurrences of Fuzzy Relationships and K-Means Clustering

Removing trivial associations in association rule discovery

CS 584 Data Mining. Association Rule Mining 2

STUDY ON SCALE TRANSFORMATION IN SPATIAL DATA MINING

732A61/TDDD41 Data Mining - Clustering and Association Analysis

KEYWORDS: fuzzy set, fuzzy time series, time variant model, first order model, forecast error.

Associa'on Rule Mining

A Generalized Decision Logic in Interval-set-valued Information Tables

Meelis Kull Autumn Meelis Kull - Autumn MTAT Data Mining - Lecture 05

LONG-TERM PREDICTIVE VALUE INTERVAL WITH THE FUZZY TIME SERIES

10/19/2017 MIST.6060 Business Intelligence and Data Mining 1. Association Rules

Generalized Triangular Fuzzy Numbers In Intuitionistic Fuzzy Environment

COMP 5331: Knowledge Discovery and Data Mining

Connections between mining frequent itemsets and learning generative models

On Minimal Infrequent Itemset Mining

ROUGH NEUTROSOPHIC SETS. Said Broumi. Florentin Smarandache. Mamoni Dhar. 1. Introduction

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Fuzzy Systems. Introduction

Index Terms Vague Logic, Linguistic Variable, Approximate Reasoning (AR), GMP and GMT

DATA MINING - 1DL360

A New Method for Forecasting Enrollments based on Fuzzy Time Series with Higher Forecast Accuracy Rate

Mining Rank Data. Sascha Henzgen and Eyke Hüllermeier. Department of Computer Science University of Paderborn, Germany

Summarizing Transactional Databases with Overlapped Hyperrectangles

Handling Uncertainty using FUZZY LOGIC

Assignment 7 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran

Association Rules Information Retrieval and Data Mining. Prof. Matteo Matteucci

Mining Approximate Top-K Subspace Anomalies in Multi-Dimensional Time-Series Data

Fuzzy Systems. Introduction

Also, course highlights the importance of discrete structures towards simulation of a problem in computer science and engineering.

Machine Learning: Pattern Mining

Model-based probabilistic frequent itemset mining

Data mining, 4 cu Lecture 5:

PAijpam.eu ON FUZZY INVENTORY MODEL WITH ALLOWABLE SHORTAGE

2002 Journal of Software

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

A NOVEL TRIANGULAR INTERVAL TYPE-2 INTUITIONISTIC FUZZY SETS AND THEIR AGGREGATION OPERATORS

DATA MINING LECTURE 4. Frequent Itemsets and Association Rules

Rule-Based Fuzzy Model

Chapters 6 & 7, Frequent Pattern Mining

Accelerating Effect of Attribute Variations: Accelerated Gradual Itemsets Extraction

Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results

Research Article Understanding Open Source Software Evolution Using Fuzzy Data Mining Algorithm for Time Series Data

Transcription:

2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore CPDA Based Fuzzy Association Rules for Learning Achievement Mining Jr-Shian Chen 1, Hung-Lieh Chou 2,3, Ching-Hsue Cheng 2, Jen-Ya Wang 1 1 Department of Computer Science and Information Management, HUNGKUANG University 2 Department of Information Management, National Yunlin University of Science and Technology 3 Department of Computer Center, St. Joseph s Hospita Abstract. This paper proposes a fusion model to reinforce fuzzy association rules, which contains two main procedures: (1) employing the cumulative probability distribution approach (CPDA) to partition the universe of discourse and build membership functions; and (2) using the AprioriTid mining algorithm to extract fuzzy association rules. The proposed model is more objective and reasonable in determining the universe of discourse and membership functions with other fuzzy association rules. Keywords: cumulative probability distribution approach (CPDA), fuzzy association rule, AprioriTid 1. Introduction Previous studies [4, 5] often partitioned the length of interval in equal-length and ignored the distribution characteristics of datasets. Therefore, this paper focuses on improving the persuasiveness in determining the universe of discourse and membership functions of fuzzy association rules. In empirical case study, we use an exemplary dataset to be our simulation data, which contains the learning achievement data of 10 students. The remaining content of this paper is organized as follows. Related work is described in section 2. Section 3 presents our proposed approach. Finally, conclusions of this paper are stated in the Section 4. 2. Related Work In this section, we will briefly discuss the following research: fuzzy numbers, fuzzy association rules, and our prior research cumulative probability distribution approach (CPDA) [1]. 2.1. Cumulative probability distribution approach Our prior work CPDA [1] partitions the universe of discourse and builds membership functions, which is based on the inverse of the normal cumulative distribution function. In probability theory, the inverse of the normal cumulative distribution function (CDF) is modeled by two parameters and for a given probability p. The CDF is defined as follows: where x F 1 ( p, ) { x: F( x, ) p}, (1) 2 1 x ( t ) p F( x, ) dt e 2 2 2, (2) and, denote the mean and the standard deviation, respectively. Corresponding author. Tel.: + 886-4-26318652x5420; fax: +886-4-26521921. E-mail address: jschen@sunrise.hk.edu.tw. 25

2.2. Fuzzy numbers Zadeh (1965) first introduced the concept of fuzzy set for modeling the vagueness type of uncertainty [2]. : x [0,1] A fuzzy set A defined on the universe X is characterized by a membership function A. When describing imprecise numerical quantities, one should capture intuitive concepts of approximate numbers or intervals such as "approximately m." A fuzzy number must have a unique modal value "m", convex and piecewise continuous. A common approach is to limit the shape of membership functions defined by LR-type fuzzy numbers. A special case of LR-type fuzzy numbers TFN is defined by a triplet, A ( abc,, ) denoted by. The graph of a typical TFN is shown in Fig. 1. A ( x ) A Fig. 1: Triangular fuzzy numbers. 2.3. Fuzzy association rules Agrawal et al. [3] introduced Apriori approach to explore behaviors in market basket data, those transactions are all binary values. However, any real-world database may also contain quantitative attributes. To overcome there shortcomings, fuzzy association rule mining approaches were proposed to handle both the quantitative and categorical attributes. Fuzzy association rules-based methods transform quantitative values into a fuzzy set for each attribute; and use the membership degree operations to find rules. These rules can be expressed like the one: If age is young, then salary is low. Detailed overviews of fuzzy association rules mining methods can be found in [4, 5]. 3. The Proposed Approach In this section, the fusion model is proposed to improve the persuasiveness in determining the universe of discourse and membership functions of fuzzy association rules. Firstly, we build the membership function based on CPDA for each conditional attribute. Secondly, we use the AprioriTid mining algorithm to extract fuzzy association rules. An exemplary dataset [4] used throughout this paper is shown in Table 1. The dataset contains 10 instances, which is characterized by the following attributes: (I) statistics (denoted ST), (II) database (denoted DB), (III) object-oriented programming (denoted OOP), (IV) data structure (denoted DS), and (V) management information system (denoted MIS). All these attributes are numerical values. The proposed model is introduced in detail as follows: Table 1. The exemplary data set. Student No. ST DB OOP DS MIS 1 86 77 86 71 68 2 61 79 89 77 80 3 84 89 86 79 89 4 73 86 79 84 62 5 70 89 87 72 79 6 65 77 86 61 87 7 67 87 75 71 80 8 86 63 64 84 86 9 75 65 79 87 88 10 79 63 63 85 89 Step 1: Partition continuous attributes by CPDA. The CPDA is used to discrete the dataset and define the fuzzy numbers. This model partitions the universe of discourse based on cumulative probability distribution. The results are shown in Table 2. 26

Table 2. The fuzzy number for each attribute. Attribute Low Middle High ST (48.22,58.65,69.09) (62.23,71.17,80.11) (74.60,86.69,98.78) DB (47.95,59.48,71.02) (62.94,73.46,83.98) (77.50,88.75,100.00) OOP (49.53,61.56,73.60) (66.37,75.79,85.20) (79.40,89.70,100.00) DS (49.20,60.61,72.02) (65.68,73.93,82.18) (77.10,87.95,98.80) MIS (48.81,61.96,75.12) (68.04,77.26,86.48) (80.80,90.40,100.00) Step 2: Build membership function. The membership function is built by employing triangular fuzzy number and the triangular fuzzy number model of Table 2 is shown in Fig. 2. The membership function as shown in Fig. 2 can be established based on CPDA. Fig. 2: Membership functions of ST attribute. Step 3: Fuzzify the continuous data. According to the membership function in step 2, the degree of membership for each datum is calculated. Table 3 shows the result of the students achievement dataset. Table 3. The degree of membership for each datum. No. ST DB OOP DS MIS L M H L M H L M H L M H L M H 1 0.00 0.00 0.94 0.00 0.66 0.00 0.00 0.00 0.64 0.09 0.64 0.00 0.54 0.00 0.00 2 0.78 0.00 0.00 0.00 0.47 0.13 0.00 0.00 0.93 0.00 0.63 0.00 0.00 0.70 0.00 3 0.00 0.00 0.78 0.00 0.00 1.00 0.00 0.00 0.64 0.00 0.39 0.18 0.00 0.00 0.85 4 0.00 0.80 0.00 0.00 0.00 0.76 0.00 0.66 0.00 0.00 0.00 0.64 1.00 0.00 0.00 5 0.00 0.87 0.00 0.00 0.00 1.00 0.00 0.00 0.74 0.00 0.77 0.00 0.00 0.81 0.00 6 0.39 0.31 0.00 0.00 0.66 0.00 0.00 0.00 0.64 0.97 0.00 0.00 0.00 0.00 0.65 7 0.20 0.53 0.00 0.00 0.00 0.84 0.00 0.92 0.00 0.09 0.64 0.00 0.00 0.70 0.00 8 0.00 0.00 0.94 0.70 0.01 0.00 0.80 0.00 0.00 0.00 0.00 0.64 0.00 0.05 0.54 9 0.00 0.57 0.03 0.52 0.20 0.00 0.00 0.66 0.00 0.00 0.00 0.91 0.00 0.00 0.75 10 0.00 0.12 0.36 0.70 0.01 0.00 0.88 0.00 0.00 0.00 0.00 0.73 0.00 0.00 0.85 Step 4: Predefine the minimum support value and confidence threshold. Users need to predefine the minimum support value and the confidence threshold. The minimum support value is set to 2.0 and the confidence threshold is set at 0.7 in this case. Step 5: Generate the candidate itemset C 1. Taking the linguistic value ST.Low as an example, the scalar cardinality will be (0.78 + 0.39 + 0.20) = 1.37. The C1 candidate itemset for this example is shown as follows: {(ST.low,1.37), (ST.middle,3.20), (ST.high,3.05), (DB.low,1.92), (DB.middle,2.01), (DB.high,3.73), (OOP.low,1.68), (OOP.middle,2.24), (OOP.high,3.59), (DS.low,1.15), (DS.middle,3.07), (DS.high,3.10), (MIS.low,1.54), (MIS.middle,2.26), (MIS.high,3.64)} Step 6: Generate the large itemset. 27

The large itemset rides on the largest count value for each attribute and is equal to or greater than the minimum support value. In this exemplary dataset, can be denoting as follows: {(ST.middle,3.20), (DB.high, 3.73), (OOP.high,3.59), (DS.high, 3.10), (MIS.high, 3.64)}. Step 7: Generate the candidate itemset C1 i from L i. The candidate itemset is generated from L i. Here, comparison operation is used to return the lesser membership degree of a newly formed candidate itemset. In the example below, we compare the linguistic values ST.middle with DB.high that is transformed into ST.middle,DB.high. The result is shown in Table 4. Table 4. The linguistic value (ST.middle,DB.high). No. ST.middle DB.high (ST.middle,DB.high) 1 0.00 0.00 0.00 2 0.00 0.13 0.00 3 0.00 1.00 0.00 4 0.80 0.76 0.76 5 0.87 1.00 0.87 6 0.31 0.00 0.00 7 0.53 0.84 0.53 8 0.00 0.00 0.00 9 0.57 0.00 0.00 10 0.12 0.00 0.00 Taking the linguistic value (ST.middle,DB.high) as an example, the scalar cardinality will be (0.76 + 0.87 + 0.53) = 2.16. The C2 candidate itemset for this exemplary dataset is shown in Table 5. Table 5. The C 2 candidate itemset. Itemset Count ST.middle,DB.high 2.16 ST.middle,OOP.high 1.05 ST.middle,DS.high 1.33 ST.middle,MIS.high 1.00 DB.high,OOP.high 1.51 DB.high,DS.high 0.82 DB.high,MIS.high 0.85 OOP.high,DS.high 0.18 OOP.high,MIS.high 1.28 DS.high,MIS.high 2.20 Step 8: Generate the i large itemset from candidate itemset C1 i. The L2 large itemset rides on the count value, which is equal to or greater than the minimum support value. The L2 large itemset for this exemplary dataset is shown as follows: {((ST.middle,DB.high),2.16),(( (DS.high,MIS.high)),2.20)} Table 6. The linguistic value (ST.middle,DB.high,DS.high). No. ST.middle DB.high DS.high (ST.middle,DB.high,DS.high) 1 0.00 0.00 0.00 0.00 2 0.00 0.13 0.00 0.00 3 0.00 1.00 0.18 0.00 4 0.80 0.76 0.64 0.64 5 0.87 1.00 0.00 0.00 6 0.31 0.00 0.00 0.00 7 0.53 0.84 0.00 0.00 8 0.00 0.00 0.64 0.00 9 0.57 0.00 0.91 0.00 10 0.12 0.00 0.73 0.00 28

Step 9: Repeat the steps 7 and 8 until the i large itemset is null. In the example below, we generate the candidate itemsets C 3. The count value of linguistic value (ST.middle,DB.high,DS.high) is described in Table 6. The C 3 candidate itemset is listed in Table 7. All of the count values in C3 are smaller than the minimum support value. Therefore, the L3 large itemset is null. Table 7. The C 3 candidate itemset.. Itemset Count ST.middle,DB.high,DS.high 0.64 ST.middle,DB.hig,MIS.high, 0.00 ST.middle DS.high,MIS.high 0.69 DB.high,DS.high,MIS.high 0.18 Step 10: Construct the fuzzy association rules. Construct the fuzzy association rules from L 2. In the following example, ST. middle DB. high, its ( ST. middle, DB. high) 2.16 0.68 confidence value is calculated as follows: ST. middle 3.20. The fuzzy association rules are listed in Table 8. Table 8. The fuzzy association rules. fuzzy association rules confidence value ST. middle DB. high 0.68 DB. high ST. middle 0.58 DS. high MIS. high 0.71 MIS. high DS. high 0.60 Now we can check whether the confidence values of the above association rules (listed in Table 8) are larger than or equal to the predefined confidence threshold. The final resulting rules are thus obtained: If the score of data structure is high, then the score of management information system is high. 4. Conclusion This paper proposes a fusion model to improve the persuasiveness in determining the universe of discourse and membership functions, which combines AprioriTid and our prior work CPDA. The proposed model is more objective and reasonable in determining the universe of discourse and membership functions with other fuzzy association rules. 5. Acknowledgements This work was funded by the National Science Council of the Republic of China under contract NSC 97-2511-S-241-007-MY3. 6. References (This is Header 1 style) [1] Teoh, H.J., Cheng, C.H., Chu, H.H. and Chen, J.S., Fuzzy time series model based on probabilistic approach and rough set rule induction for empirical research in stock markets, Data & Knowledge Engineering, 2008, 67(1): 103-117 [2] Ross, T.J., Fuzzy logic with engineering applications, International edition, McGraw-Hill, USA. (2000) [3] Agrawal, R., Imielinski, T., Swami, A., Mining Association Rules between Sets of Items in Large Databases. In Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data (SIGMOD-93), Washington, DC, United States, 1993, 207 216 [4] Hong, T.P., Kuo, C.S. and Wang, S.L., A fuzzy AprioriTid mining algorithm with reduced computational time. Applied Soft Computing, 2004, 5(1): 1-10. [5] Khan, M. S., Muyeba, M. K., and Coenen, F.: Weighted Association Rule Mining from Binary and Fuzzy Data, presented at Advances in Data Mining. Medical Applications, E-Commerce, Marketing, and Theoretical Aspects, 8th Industrial Conference, ICDM 2008, Leipzig, Germany, 2008, 200-212. 29