Multivariate interdependent discretization in discovering the best correlated attribute
|
|
- Gertrude Conley
- 5 years ago
- Views:
Transcription
1 Data Mining VI 35 Multivariate interdependent discretization in discovering the best correlated attribute S. Chao & Y. P. Li Faculty of Science and Technology, University of Macau, China Abstract The decision tree is one of the most widely used and practical methods in data mining. However, many discretization algorithms developed in this field focus on the univariate only, which discretize continuous-valued attributes independently, without considering the interdependent relationship between other attributes, at most taking the class attribute into account. Such univariate discretization is inadequate to handle the critical problems especially owned in the medical domain. In this paper, we propose a new multivariate discretization method called Multivariate Interdependent Discretization for Continuous Attributes MIDCA. This method incorporates the normalized relief and information measures to look for the best correlated attribute with respect to each continuous-valued attribute being discretized, and using the discovered best correlated attribute as the interdependent attribute to carry out the multivariate discretization. We believe that a good multivariate discretization scheme for continuous-valued attributes should rely highly on their perfect correlated attributes respectively. Among an attribute space, each attribute should have at least one most relevant attribute that may be different from others. Our novel multivariate discretization algorithm can minimize the uncertainty between the interdependent attribute and the continuous-valued attribute being discretized and at the same time maximize their correlation. Such a method can be used as a pre-processing step for the learning algorithms. The empirical results demonstrate a comparison of performance between MIDCA and various discretization methods for two decision tree algorithms ID3 and C4.5 on twelve real-life datasets from UCI repository. Keywords: multivariate discretization, interdependent feature, correlated attribute, data mining, machine learning.
2 36 Data Mining VI 1 Introduction The decision tree is one of the most widely used and practical methods for inductive inference in the data mining and machine learning discipline )Han and Kamber [1]). Most decision tree learning algorithms are limited to handle the attributes with discrete values only. However, the datasets are always the mix of discrete and continuous values of attributes. The common method to handle continuous-valued attributes is to discretize them by dividing them into intervals. Moreover, even if a learning algorithm is able to deal with continuous-valued attributes directly, it is still better to carry out the discretization prior the learning algorithm, so as to minimize the information lost and increase the classification accuracy. Many discretization algorithms developed in data mining focus on univariate, which discretize each continuous-valued attribute independently without considering the interdependent relationship between other attributes, at most taking the interdependent relationship between class attributes into account. The simplest discretization method is equal width interval binning (Dougherty et a.l [2]), which divides the range of a continuous-valued attribute into several equally sized bins. It makes no use of class attribute and thus it is an unsupervised discretization method. The best discretization algorithms are supervised that take the class attribute information into consideration. One is entropy-based, it recursively partitions a continuous-valued attribute to obtain the minimal entropy measure (Fayyad and Irani [3]), and uses the minimum description length principle to be the stopping criteria; the other is based on the chi-square statistics (Liu and Setiono [4]), which aims at having the most similar distribution to the original data even after discretization. Evaluations and comparisons of some supervised and unsupervised univariate discretization methods can be found in [2, 5]. As Bay [6, 7] indicated, the discretized intervals should make sense to human expert. For example, when learning the medical data for hypertensive patients, we know that a person s blood pressure is increasing as his/her age increasing. Therefore it is improper to set 140mmHg and 90mmHg as systolic pressure and diastolic pressure respectively for all patients. Since the standard for diagnosing hypertension is a little bit different from young people (orthoarteriotony is mmHg/80mmHg) to the old people (orthoarteriotony is 140mmHg/90mmHg) [8]. If the blood pressure of a person aged 20 is 139mmHg/89mmHg, he/she might be considered as a potential hypertensive. In contrast, if a person aged 65 has the same blood pressure measures, he/she is definitely considered as normotensive. Obviously, to discretize the continuous-valued attribute blood pressure, it must take at least the attribute age into consideration. While discretizing other continuous-valued attribute may not take age into consideration. The only solution to address the mentioned problem is to use multivariate interdependent discretization in place of univariate discretization. Multivariate interdependent discretization concerns the correlation between the attribute being discretized and the other potential interdependent attributes in addition to the class attribute. There are few literatures discussed about the
3 Data Mining VI 37 multivariate interdependent discretization methods. In this paper, we propose a new multivariate interdependent discretization method that can be used as a preprocessing step for the learning algorithms, called Multivariate Interdependent Discretization for Continuous Attributes MIDCA. The method is based on the normalized relief and information measure to look for the best correlated attribute for each continuous-valued attribute being discretized, and using it as the interdependent attribute to carry out the multivariate discretization. In the next section, we describe our discretization method in detail. The evaluation of the proposed algorithm on some real datasets is performed in section 3. Finally, we discuss the limitations of the method and present the directions for our further research in section 4. 2 MIDCA algorithm In order to obtain the good quality for a multivariate discretization, to discover a best interdependent attribute respect to each continuous-valued attribute being discretized is considered as the primary vital task. To measure the correlation between attributes, entropy measure [3, 9, 10] and relief theory [11, 12] are adopted. Relief is a feature-weighting algorithm for estimating the quality of attributes such that it is able to discover the interdependencies among attributes. Entropy from information theory is a measure of the uncertainty for an arbitrary variable. In this section, we first recall the entropy information and theory of relief and then describe our discretization method in detail. 2.1 Entropy information Entropy specifies the minimum number of bits of information needed to encode the classification of an arbitrary member of instances [9, 10]. Given a collection of instances S, containing C types of examples of a target attribute, the entropy of S relative to this C-classification is defined as Entropy( S) = p( S ) log( p( S )). (1) i C where p(s i ) is the proportion of S belonging to class i. Based on this measure, we may find out the most informative attribute A relative to a collection of examples S by defining the measure called information gain S v Gain( S, A) = Entropy( S) Entropy( S ). (2) v v Values( A) S where Values(A) is the set of all distinct values of attribute A; S v is the subset of S S = s S A( s) = v. for which attribute A has value v, that is { } 2.2 Relief The key idea of relief (Kira and Rendell [11, 12]) is to estimate the quality of an attribute by calculating how well its values distinguish among the instances from both same class and different class. A good attribute should have the same values for instances from the same class and should differentiate between instances v i i
4 38 Data Mining VI from the different classes. Kononenko [13] notes that Relief attempts to approximate the following difference of probabilities for the weight of an attribute A Relief = P(different value of A different class) A (3) P(different value of A same class). which can be reformulated as 2 Gini '( A) p( x) x X Relief =. (4) A 2 2 (1 pc ( ) ) pc ( ) c C c C where C is the class attribute and 2 px ( ) Gini '( A) = p( c)(1 P( c)) p( c x)(1 p( c x)). (5) 2 c C x X px ( ) c C x X Gini is a variance of another attribute quality measure algorithm Gini-index (Breiman [14]). 2.3 MIDCA Our proposed multivariate discretization method MIDCA is interested mainly in discovering the best interdependent attribute relative to the continuous-valued attribute being discretized. Among an attribute space, attributes should have certain relevancy between each other. No matter how loose or tight the relationships are, there must exist at least one such interdependent attribute that perfect correlates with the continuous-valued attribute being discretized. As we believe that a good multivariate discretization scheme for continuous-valued attributes should highly rely on their perfect correlated attributes respectively. We assume that a dataset S = {s 1, s 2,, s N } contains N instances. Each instance s S is defined over a set of M attributes (features) A = {a 1, a 2,, a M } and a class attribute c C. For each continuous-valued attribute a i A, there exists at least one a j A, such that a j is the most correlated attribute respect to a i, or vice versa, since the interdependent weight is measured in symmetrically. For the purpose of finding out such a best interdependent attribute a j for each continuous-valued attribute a i, both entropy information and relief measure are taken into account to capture their interactions among the attributes space A. First, for each attribute pair (a i, a j ) A where i j, we calculate the correlation weights by utilizing both symmetric relief and entropy information. Then normalize the two measures and finally get the best result as our target. The algorithm can be defined as InterdependentWeight( a, a ) = i SymGain( a, a ) SymRelief ( a, a ) i j i j + A A 2 2 SymGain( a, a ) (, ) i M SymRelief a a i M M i M i j /2. (6)
5 Data Mining VI 39 SymGain(a i, a j ) and SymRelief(a i, a j ) are two symmetric forms of information gain and relief measures respectively, which treated either a i or a j in turn to be the class attribute C in the formula. That is SymGainAB (, ) = [ GainAB (, ) + GainBA (, )]/2. (7) and 2 2 Gini '( A) p( x) Gini '( B) p( y) x X y Y SymRelief ( A, B) = + /2. (8) (1 pb ( ) ) pb ( ) (1 pa ( ) ) pa ( ) b B b B a A a A The advantage of incorporating the measures of information gain and relief in our multivariate interdependent discretization algorithm is to minimize the uncertainty between the continuous-valued attribute being discretized and its interdependent attribute, and at the same time to maximize their correlation. The measures output from eqns (7) and (8) are in different standards, the only way to balance them is to normalize each result by using proportions in place of real values. Thus, a best interdependent attribute with respect to the continuousvalued attribute being discretized is determined by further averaging the two normalized proportions for which the interdependent weight is the best amongst all the potential interdependent attributes. However, if a potential interdependent attribute is a continuous-valued attribute too, it is first discretized with entropybased method [2, 3]. This is important and may reduce the bias of in favor of the attribute with more values. Furthermore, our method creates an interdependent attribute for each continuous-valued attribute in a dataset rather than using one for all continuous-valued attributes, this is also the main factor for improving the final classification accuracy. Once the best interdependent attribute has been discovered, the multivariate interdependent discretization process carries out by adopting the most efficient supervised discretization algorithm Minimal Entropy Partitioning with MDLP (Fayyad and Irani [3]). Nevertheless, our method makes several differences compared with the original one. First, our method ensures at least binary discretization for each continuous-valued attribute, which is different from the original method that sometimes the boundary of a continuous-valued attribute is [-, + ]. We realized that if a continuous-valued attribute generates null cutting point means the attribute is useless and will be ignored during learning process. This may conflict with our belief that most continuous-valued attributes in medical domain have their specific meanings. For example, most figures express the degrees of illness, such as blood pressure, heart rate, cholesterol, etc., so their discretization cannot be ignored. Second, our discretization carries out with respect to the best interdependent attribute that discovered from eqn (6) in addition to the class attribute. Moreover, we assume that the interdependent attribute INT has T discrete values; as such each of its distinct value identifies a subset in the original dataset, the probability should be generated relative to the subset in place of the whole dataset. Therefore, the combinational probability distribution over the attribute space {C} A is redefined as well as the information gain algorithm as
6 40 Data Mining VI MIDCAInfoGain( A, P; INT, S) = T S (9) v Entropy( S INT ) ( ). T Entropy Sv v Values( A) INTT S where the algorithm defines the class information entropy of the partition induced by P, which is a collection of candidate cutting points for attribute A and under the projection of value T for the interdependent attribute INT. We replace the Entropy(S) with the conditional entropy Entropy(S INT T ) to emphasize the interaction between the interdependent attribute INT. As a consequence, v Values(A) INT T becomes the set of all distinct values of attribute A of the cluster induced by T of interdependent attribute INT; S v is the subset of S for which attribute A has value v and under the projection of T for INT, that is S = s S A( s) = v INT( s) = T. v { } 2.4 MIDCA high level descriptions We now present the high level descriptions of our MIDCA algorithm and the algorithm INTDDiscovery for discovering the best-correlated interdependent attribute as follows: Algorithm MIDCA For each continuous-valued attribute A Sort A in ascending order; Discover the best interdependent attribute of A by INTDDiscovery; Repeat Discover the best cutpoints by MIDCAInfoGain measure; Until MDLP = pass; Regenerate the dataset according to the obtained cutpoints; End MIDCA. Algorithm INTDDiscovery For each attribute atr other than A If atr is a continuous-valued attribute Discretize atr using entropy-based method; Calculate symmetric entropy SymGain for A and atr; Calculate symmetric relief SymRelief for A and atr; Normalize SymGain and SymRelief; Average SymGain and SymRelief; Output the attribute with the highest average measure; End INTDDiscovery. 3 Experiments In this section, our empirical evaluation results are presented. We have tested our method MIDCA on twelve real-life datasets from UCI repository (Blake and Merz [15]), which containing a mixture of continuous and discrete attributes. The details of each dataset are listed in table 1. In order to make comparisons between MIDCA algorithm and different discretization methods, we simulated the univariate and multivariate discretization methods. While the interdependent
7 Data Mining VI 41 attributes of multivariate discretizations are obtained by using Relief and Gain Ratio respectively. In the experiment, MIDCA and various discretization methods are used as pre-processing steps for the two learning algorithms: ID3 Quinlan [16], and C4.5 Quinlan [17, 18]. Table 1: Twelve real-life datasets from UCI. Features Instances Size No. Dataset Training Testing Classes Continuous Discrete set set 1 Cleve Hepatitis Hypothyroid Heart Sickeuthyroid Iris Australian Auto Breast Crx Diabetes Horse-colic Average Table 2: Comparison of classification error rates of decision tree algorithm ID3 with/without discretization algorithms. ID3 Classification Error Rate (%) Multivariate No. No Univariate discretization discretization discretization Average(Relief, MIDCA GainRatio) ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± error 25.78± ± ± ± ± ± ±3.47 Avg 17.75± ± ± ±2.78
8 42 Data Mining VI The experiments results summarized in table 2 and table 3 reveal that MIDCA improves the classification accuracy on average. In table 2, although MIDCA increases the error rate on three datasets for ID3 with univariate and multivariate discretizations respectively, it decreases the error rate on all but only one dataset for ID3 without discretization. In table 3, it improves the performance on all but one dataset for C4.5 with/without univariate discretization and two datasets for C4.5 with multivariate discretization. For the rest of the datasets, MIDCA provides a significant increase in classification accuracy, especially on two datasets: Hypothyroid and Sick-euthyroid, which approached to zero error rates for both learning algorithms. As observed from table 2, MIDCA slightly decreases the performance on three datasets comparing with the ID3 with univariate discretization; similarly, MIDCA increases the error rate on one dataset for C4.5 with univariate discretization in table 3. As we discovered that all these downgrade datasets contain only continuous attributes. This makes worse classification performance, because the MIDCA algorithm needs to carry out a univariate discretization first, prior the multivariate discretization if an interdependent attribute is a continuousvalued attribute too. This extra step increases the uncertainty to the attribute being discretized, hence increases the error rate accordingly. Table 3: Comparison of classification error rates of decision tree algorithm C4.5 with/without discretization algorithms. No. No discretization C4.5 Classification Error rate (%) Multivariate Univariate discretization discretization Average(Relief, GainRatio) MIDCA Avg Moreover, from the average error rate depicted in table 2 and table 3 respectively, our method MIDCA indeed decreases the classification error rate from 17.75%, 16.74% and 14.77% down to 13.27% of ID3 algorithm; and from 15.66%, 15.02% and 14.21% down to 12.33% of C4.5 algorithm, although
9 Data Mining VI 43 several datasets obtained higher error rates versus the average of the algorithms with multivariate discretizations of relief and gain ratio respectively. The improvements relative to both algorithms without discretizations, with univariate and multivariate discretizations reach to approximately 25.2% and 21.3%, 20.7% and 17.9%, and 10.2% and 13.2% respectively. The least improvement is over 10%, this verifies that our algorithm MIDCA that incorporating Relief and Gain Ratio does outperform their original ones, and of course better than the univariate discretization method and no discretization at all. 4 Conclusions and future research In this paper, we have proposed a novel method for multivariate interdependent discretization that focused on the discovery of a best interdependent attribute for each continuous-valued attribute. The method can be used as a preprocessing tool for any learning algorithms, and it ensures at least the binary discretization so that minimizes the information lost and maximizes the classification accuracy. The empirical evaluation results presented in this paper indicate the significant evidence that our method MIDCA can appropriately discretize a continuousvalued attribute with respect to a specific interdependent attribute, thus improves the final classification performance. However, the method has limitation in handling the dataset contains all continuous-valued attributes. If this is the case, the complexity and cost for discovering an interdependent attribute will be increased and the performance of MIDCA will be decreased. Since a perfect matching of an interdependent attribute to a continuous-valued attribute is considered as the key success factor in multivariate interdependent discretization. Our experiments were performed by applying ID3 and C4.5 learning algorithms, for further comparisons, we plan to perform the experiments by other learning algorithms, such as naive-bayes (Langley et al. [19]), or clusters, etc. On the other hand, further research should include investigating the complexity as well as efficiency of the algorithm, and may extend the discretization on more than two attributes. Finally, limitations should be resolved to be able to handle continuous-valued interdependent attribute efficiently and effectively. These addressed research directions may finally guide us to create a valuable algorithm. References [1] Han J. & Kamber M., Data Mining - Concepts and Techniques, Morgan Kaufmann Publishers, [2] Dougherty J., Kohavi R. & Sahami M., Supervised and Unsupervised Discretization of Continuous Features. Proceedings of the Twelfth International Conference, Morgan Kaufmann Publishers, San Francisco, CA [3] Fayyad U. M. & Irani K. B., Multi-interval discretization of continuousvalued attributes for classification learning. Proceeding of the Thirteenth International Joint Conference on Artificial Intelligence, pp , 1993.
10 44 Data Mining VI [4] Liu H. & Setiono R., Feature selection via discretization, Technical report, 1997, Dept. of Information Systems and Computer Science, Singapore. [5] Liu H., Hussain F., Tan C. & Dash M., Discretization: An enabling technique. Topics in Data Mining and Knowledge Discovery, pp , [6] Bay S. D., Multivariate Discretization of Continuous Variables for Set Mining. Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp , [7] Bay S.D., Multivariate Discretization for Set Mining. Knowledge and Information Systems, 3(4), pp , [8] 北京醫科大學人民醫院心內科高血壓研究組編寫, 高血壓病現代知識百問答,1998. [9] Mitchell T. M., Machine Learning, McGraw-Hill Companies, Inc [10] 朱雪龍, 應用信息論基礎, 清華大學出版社,2000. [11] Kira K. & Rendell L., A practical approach to feature selection. Proc. Intern. Conf. on Machine Learning, Aberdeen, Morgan Kaufmann, pp , [12] Kira K. & Rendell L., The feature selection problem: traditional methods and new algorithm. Proc. AAAI 92, San Jose, CA [13] Kononenko I., On biases in estimating multi-valued attributes. In IJCAI95, pp , [14] Breiman L., Technical note: Some properties of splitting criteria. Machine Learning, 24: pp , [15] Blake C. L. & Merz C. J., UCI Repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science html. [16] Quinlan J. R., Induction of decision trees. Machine Learning, 1(1), pp , [17] Quinlan J. R., C4.5: Programs for Machine Learning, San Mateo, CA. Morgan Kaufmann, [18] Quinlan J. R., Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research 4, pp , [19] Langley P., Iba W. & Thompsom K., An analysis of Bayesian classifiers. In Proceedings of the tenth national conference on artificial intelligence, AAAI Press and MIT Press, pp , 1992.
Aijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules.
Discretization of Continuous Attributes for Learning Classication Rules Aijun An and Nick Cercone Department of Computer Science, University of Waterloo Waterloo, Ontario N2L 3G1 Canada Abstract. We present
More informationInduction of Decision Trees
Induction of Decision Trees Peter Waiganjo Wagacha This notes are for ICS320 Foundations of Learning and Adaptive Systems Institute of Computer Science University of Nairobi PO Box 30197, 00200 Nairobi.
More informationLBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules
LBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules Zhipeng Xie School of Computer Science Fudan University 220 Handan Road, Shanghai 200433, PR. China xiezp@fudan.edu.cn Abstract LBR is a highly
More information1 Introduction. Keywords: Discretization, Uncertain data
A Discretization Algorithm for Uncertain Data Jiaqi Ge, Yuni Xia Department of Computer and Information Science, Indiana University Purdue University, Indianapolis, USA {jiaqge, yxia}@cs.iupui.edu Yicheng
More informationLecture 4: Data preprocessing: Data Reduction-Discretization. Dr. Edgar Acuna. University of Puerto Rico- Mayaguez math.uprm.
COMP 6838: Data Mining Lecture 4: Data preprocessing: Data Reduction-Discretization Dr. Edgar Acuna Department t of Mathematics ti University of Puerto Rico- Mayaguez math.uprm.edu/~edgar 1 Discretization
More informationA Discretization Algorithm for Uncertain Data
A Discretization Algorithm for Uncertain Data Jiaqi Ge 1,*, Yuni Xia 1, and Yicheng Tu 2 1 Department of Computer and Information Science, Indiana University Purdue University, Indianapolis, USA {jiaqge,yxia}@cs.iupui.edu
More informationAn Approach to Classification Based on Fuzzy Association Rules
An Approach to Classification Based on Fuzzy Association Rules Zuoliang Chen, Guoqing Chen School of Economics and Management, Tsinghua University, Beijing 100084, P. R. China Abstract Classification based
More informationFinite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier
Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier Kaizhu Huang, Irwin King, and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,
More informationClassification Using Decision Trees
Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association
More informationCHAPTER-17. Decision Tree Induction
CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes
More informationLecture 3: Decision Trees
Lecture 3: Decision Trees Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning Lecture 3: Decision Trees p. Decision
More informationDecision Tree Learning
Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,
More informationMachine Learning & Data Mining
Group M L D Machine Learning M & Data Mining Chapter 7 Decision Trees Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University Top 10 Algorithm in DM #1: C4.5 #2: K-Means #3: SVM
More informationPattern-Based Decision Tree Construction
Pattern-Based Decision Tree Construction Dominique Gay, Nazha Selmaoui ERIM - University of New Caledonia BP R4 F-98851 Nouméa cedex, France {dominique.gay, nazha.selmaoui}@univ-nc.nc Jean-François Boulicaut
More informationData Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition
Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each
More informationM chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng
1 Decision Trees 2 Instances Describable by Attribute-Value Pairs Target Function Is Discrete Valued Disjunctive Hypothesis May Be Required Possibly Noisy Training Data Examples Equipment or medical diagnosis
More informationDecision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,
More informationData Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018
Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model
More informationA Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index
A Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index Dan A. Simovici and Szymon Jaroszewicz University of Massachusetts at Boston, Department of Computer Science, Boston,
More informationLecture 3: Decision Trees
Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}
More informationVoting Massive Collections of Bayesian Network Classifiers for Data Streams
Voting Massive Collections of Bayesian Network Classifiers for Data Streams Remco R. Bouckaert Computer Science Department, University of Waikato, New Zealand remco@cs.waikato.ac.nz Abstract. We present
More informationSupervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!
Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"
More informationRule Generation using Decision Trees
Rule Generation using Decision Trees Dr. Rajni Jain 1. Introduction A DT is a classification scheme which generates a tree and a set of rules, representing the model of different classes, from a given
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationAbstract. 1 Introduction. Ian H. Witten Department of Computer Science University of Waikato Hamilton, New Zealand
Í Ò È ÖÑÙØ Ø ÓÒ Ì Ø ÓÖ ØØÖ ÙØ Ë Ð Ø ÓÒ Ò ÓÒ ÌÖ Eibe Frank Department of Computer Science University of Waikato Hamilton, New Zealand eibe@cs.waikato.ac.nz Ian H. Witten Department of Computer Science University
More informationI D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69
R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual
More informationCS6375: Machine Learning Gautam Kunapuli. Decision Trees
Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s
More informationComparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees
Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland
More informationData classification (II)
Lecture 4: Data classification (II) Data Mining - Lecture 4 (2016) 1 Outline Decision trees Choice of the splitting attribute ID3 C4.5 Classification rules Covering algorithms Naïve Bayes Classification
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect
More informationImproving Naive Bayes Classifiers Using Neuro-Fuzzy Learning 1
Improving Naive Bayes Classifiers Using Neuro-Fuzzy Learning A. Nürnberger C. Borgelt and A. Klose Dept. of Knowledge Processing and Language Engineering Otto-von-Guericke-University of Magdeburg Germany
More informationThree Discretization Methods for Rule Induction
Three Discretization Methods for Rule Induction Jerzy W. Grzymala-Busse, 1, Jerzy Stefanowski 2 1 Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, Kansas 66045
More informationTechnical Note On the Handling of Continuous-Valued Attributes in Decision Tree Generation
Machine Learning, 8, 87-102 (1992) 1992 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Technical Note On the Handling of Continuous-Valued Attributes in Decision Tree Generation USAMA
More informationDecision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.
Decision Trees Supervised approach Used for Classification (Categorical values) or regression (continuous values). The learning of decision trees is from class-labeled training tuples. Flowchart like structure.
More informationOccam s Razor Just Got Sharper
Occam s Razor Just Got Sharper Saher Esmeir and Shaul Markovitch Computer Science Department, Technion Israel Institute of Technology, Haifa 32000, Israel {esaher, shaulm}@cs.technion.ac.il Abstract Occam
More informationExact model averaging with naive Bayesian classifiers
Exact model averaging with naive Bayesian classifiers Denver Dash ddash@sispittedu Decision Systems Laboratory, Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15213 USA Gregory F
More informationAn Empirical Study of Building Compact Ensembles
An Empirical Study of Building Compact Ensembles Huan Liu, Amit Mandvikar, and Jigar Mody Computer Science & Engineering Arizona State University Tempe, AZ 85281 {huan.liu,amitm,jigar.mody}@asu.edu Abstract.
More informationOn Multi-Class Cost-Sensitive Learning
On Multi-Class Cost-Sensitive Learning Zhi-Hua Zhou and Xu-Ying Liu National Laboratory for Novel Software Technology Nanjing University, Nanjing 210093, China {zhouzh, liuxy}@lamda.nju.edu.cn Abstract
More informationConcept Lattice based Composite Classifiers for High Predictability
Concept Lattice based Composite Classifiers for High Predictability Zhipeng XIE 1 Wynne HSU 1 Zongtian LIU 2 Mong Li LEE 1 1 School of Computing National University of Singapore Lower Kent Ridge Road,
More informationSelected Algorithms of Machine Learning from Examples
Fundamenta Informaticae 18 (1993), 193 207 Selected Algorithms of Machine Learning from Examples Jerzy W. GRZYMALA-BUSSE Department of Computer Science, University of Kansas Lawrence, KS 66045, U. S. A.
More informationLattice Machine: Version Space in Hyperrelations
Lattice Machine: Version Space in Hyperrelations [Extended Abstract] Hui Wang, Ivo Düntsch School of Information and Software Engineering University of Ulster Newtownabbey, BT 37 0QB, N.Ireland {H.Wang
More informationDevelopment of a Data Mining Methodology using Robust Design
Development of a Data Mining Methodology using Robust Design Sangmun Shin, Myeonggil Choi, Youngsun Choi, Guo Yi Department of System Management Engineering, Inje University Gimhae, Kyung-Nam 61-749 South
More informationGrowing a Large Tree
STAT 5703 Fall, 2004 Data Mining Methodology I Decision Tree I Growing a Large Tree Contents 1 A Single Split 2 1.1 Node Impurity.................................. 2 1.2 Computation of i(t)................................
More informationShort Note: Naive Bayes Classifiers and Permanence of Ratios
Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence
More informationClassification Based on Logical Concept Analysis
Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.
More informationQuestion of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning
Question of the Day Machine Learning 2D1431 How can you make the following equation true by drawing only one straight line? 5 + 5 + 5 = 550 Lecture 4: Decision Tree Learning Outline Decision Tree for PlayTennis
More informationClick Prediction and Preference Ranking of RSS Feeds
Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS
More informationENTROPIES OF FUZZY INDISCERNIBILITY RELATION AND ITS OPERATIONS
International Journal of Uncertainty Fuzziness and Knowledge-Based Systems World Scientific ublishing Company ENTOIES OF FUZZY INDISCENIBILITY ELATION AND ITS OEATIONS QINGUA U and DAEN YU arbin Institute
More informationClassification and Prediction
Classification Classification and Prediction Classification: predict categorical class labels Build a model for a set of classes/concepts Classify loan applications (approve/decline) Prediction: model
More informationText Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University
Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data
More informationDecision Tree. Decision Tree Learning. c4.5. Example
Decision ree Decision ree Learning s of systems that learn decision trees: c4., CLS, IDR, ASSISA, ID, CAR, ID. Suitable problems: Instances are described by attribute-value couples he target function has
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More information.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.
.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Definitions Data. Consider a set A = {A 1,...,A n } of attributes, and an additional
More informationCorrelation Preserving Unsupervised Discretization. Outline
Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationON COMBINING PRINCIPAL COMPONENTS WITH FISHER S LINEAR DISCRIMINANTS FOR SUPERVISED LEARNING
ON COMBINING PRINCIPAL COMPONENTS WITH FISHER S LINEAR DISCRIMINANTS FOR SUPERVISED LEARNING Mykola PECHENIZKIY*, Alexey TSYMBAL**, Seppo PUURONEN* Abstract. The curse of dimensionality is pertinent to
More informationUsing HDDT to avoid instances propagation in unbalanced and evolving data streams
Using HDDT to avoid instances propagation in unbalanced and evolving data streams IJCNN 2014 Andrea Dal Pozzolo, Reid Johnson, Olivier Caelen, Serge Waterschoot, Nitesh V Chawla and Gianluca Bontempi 07/07/2014
More informationBackground literature. Data Mining. Data mining: what is it?
Background literature Data Mining Lecturer: Peter Lucas Assessment: Written exam at the end of part II Practical assessment Compulsory study material: Transparencies Handouts (mostly on the Web) Course
More informationEECS 349:Machine Learning Bryan Pardo
EECS 349:Machine Learning Bryan Pardo Topic 2: Decision Trees (Includes content provided by: Russel & Norvig, D. Downie, P. Domingos) 1 General Learning Task There is a set of possible examples Each example
More informationDecision Tree Learning Lecture 2
Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over
More informationFrom statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu
From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom
More informationFundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur
Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new
More informationInformation Theory & Decision Trees
Information Theory & Decision Trees Jihoon ang Sogang University Email: yangjh@sogang.ac.kr Decision tree classifiers Decision tree representation for modeling dependencies among input variables using
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan, Steinbach, Kumar Adapted by Qiang Yang (2010) Tan,Steinbach,
More informationhsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference
CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science
More informationIntroduction. Abstract
From: KDD-98 Proceedings. Copyright 1998, AAAI (www.aaai.org). All rights reserved. BAYDA: Software for Bayesian Classification and Feature Selection Petri Kontkanen, Petri Myllymäki, Tomi Silander, Henry
More informationDecision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1
Decision Trees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 5 th, 2007 2005-2007 Carlos Guestrin 1 Linear separability A dataset is linearly separable iff 9 a separating
More informationSupport Vector Machine via Nonlinear Rescaling Method
Manuscript Click here to download Manuscript: svm-nrm_3.tex Support Vector Machine via Nonlinear Rescaling Method Roman Polyak Department of SEOR and Department of Mathematical Sciences George Mason University
More informationLearning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1
Learning Objectives At the end of the class you should be able to: identify a supervised learning problem characterize how the prediction is a function of the error measure avoid mixing the training and
More informationChapter 6: Classification
Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationBayesian Classification. Bayesian Classification: Why?
Bayesian Classification http://css.engineering.uiowa.edu/~comp/ Bayesian Classification: Why? Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches
More informationResampling Methods CAPT David Ruth, USN
Resampling Methods CAPT David Ruth, USN Mathematics Department, United States Naval Academy Science of Test Workshop 05 April 2017 Outline Overview of resampling methods Bootstrapping Cross-validation
More informationAn asymmetric entropy measure for decision trees
An asymmetric entropy measure for decision trees Simon Marcellin Laboratoire ERIC Université Lumière Lyon 2 5 av. Pierre Mendès-France 69676 BRON Cedex France simon.marcellin@univ-lyon2.fr Djamel A. Zighed
More informationA Unified Bias-Variance Decomposition
A Unified Bias-Variance Decomposition Pedro Domingos Department of Computer Science and Engineering University of Washington Box 352350 Seattle, WA 98185-2350, U.S.A. pedrod@cs.washington.edu Tel.: 206-543-4229
More informationClassification and Regression Trees
Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity
More informationML techniques. symbolic techniques different types of representation value attribute representation representation of the first order
MACHINE LEARNING Definition 1: Learning is constructing or modifying representations of what is being experienced [Michalski 1986], p. 10 Definition 2: Learning denotes changes in the system That are adaptive
More informationReview of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations
Review of Lecture 1 This course is about finding novel actionable patterns in data. We can divide data mining algorithms (and the patterns they find) into five groups Across records Classification, Clustering,
More informationStudy on Classification Methods Based on Three Different Learning Criteria. Jae Kyu Suhr
Study on Classification Methods Based on Three Different Learning Criteria Jae Kyu Suhr Contents Introduction Three learning criteria LSE, TER, AUC Methods based on three learning criteria LSE:, ELM TER:
More informationCS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón
CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING Santiago Ontañón so367@drexel.edu Summary so far: Rational Agents Problem Solving Systematic Search: Uninformed Informed Local Search Adversarial Search
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationInducing Polynomial Equations for Regression
Inducing Polynomial Equations for Regression Ljupčo Todorovski, Peter Ljubič, and Sašo Džeroski Department of Knowledge Technologies, Jožef Stefan Institute Jamova 39, SI-1000 Ljubljana, Slovenia Ljupco.Todorovski@ijs.si
More informationBayesian Averaging of Classifiers and the Overfitting Problem
Bayesian Averaging of Classifiers and the Overfitting Problem Pedro Domingos pedrod@cs.washington.edu Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195, U.S.A.
More informationData Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction
Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 06 - Regression & Decision Trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom
More informationthe tree till a class assignment is reached
Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal
More informationNot so naive Bayesian classification
Not so naive Bayesian classification Geoff Webb Monash University, Melbourne, Australia http://www.csse.monash.edu.au/ webb Not so naive Bayesian classification p. 1/2 Overview Probability estimation provides
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes dr. Petra Kralj Novak Petra.Kralj.Novak@ijs.si 7.11.2017 1 Course Prof. Bojan Cestnik Data preparation Prof. Nada Lavrač: Data mining overview Advanced
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationLearning From Inconsistent and Noisy Data: The AQ18 Approach *
Eleventh International Symposium on Methodologies for Intelligent Systems, Warsaw, pp. 411-419, 1999 Learning From Inconsistent and Noisy Data: The AQ18 Approach * Kenneth A. Kaufman and Ryszard S. Michalski*
More informationCSCI 5622 Machine Learning
CSCI 5622 Machine Learning DATE READ DUE Mon, Aug 31 1, 2 & 3 Wed, Sept 2 3 & 5 Wed, Sept 9 TBA Prelim Proposal www.rodneynielsen.com/teaching/csci5622f09/ Instructor: Rodney Nielsen Assistant Professor
More informationMachine Learning 3. week
Machine Learning 3. week Entropy Decision Trees ID3 C4.5 Classification and Regression Trees (CART) 1 What is Decision Tree As a short description, decision tree is a data classification procedure which
More informationGenerative v. Discriminative classifiers Intuition
Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative
More informationAn analysis of data characteristics that affect naive Bayes performance
An analysis of data characteristics that affect naive Bayes performance Irina Rish Joseph Hellerstein Jayram Thathachar IBM T.J. Watson Research Center 3 Saw Mill River Road, Hawthorne, NY 1532 RISH@US.IBM.CM
More informationMachine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler
+ Machine Learning and Data Mining Decision Trees Prof. Alexander Ihler Decision trees Func-onal form f(x;µ): nested if-then-else statements Discrete features: fully expressive (any func-on) Structure:
More informationFeature gene selection method based on logistic and correlation information entropy
Bio-Medical Materials and Engineering 26 (2015) S1953 S1959 DOI 10.3233/BME-151498 IOS Press S1953 Feature gene selection method based on logistic and correlation information entropy Jiucheng Xu a,b,,
More information