Multivariate interdependent discretization in discovering the best correlated attribute

Size: px
Start display at page:

Download "Multivariate interdependent discretization in discovering the best correlated attribute"

Transcription

1 Data Mining VI 35 Multivariate interdependent discretization in discovering the best correlated attribute S. Chao & Y. P. Li Faculty of Science and Technology, University of Macau, China Abstract The decision tree is one of the most widely used and practical methods in data mining. However, many discretization algorithms developed in this field focus on the univariate only, which discretize continuous-valued attributes independently, without considering the interdependent relationship between other attributes, at most taking the class attribute into account. Such univariate discretization is inadequate to handle the critical problems especially owned in the medical domain. In this paper, we propose a new multivariate discretization method called Multivariate Interdependent Discretization for Continuous Attributes MIDCA. This method incorporates the normalized relief and information measures to look for the best correlated attribute with respect to each continuous-valued attribute being discretized, and using the discovered best correlated attribute as the interdependent attribute to carry out the multivariate discretization. We believe that a good multivariate discretization scheme for continuous-valued attributes should rely highly on their perfect correlated attributes respectively. Among an attribute space, each attribute should have at least one most relevant attribute that may be different from others. Our novel multivariate discretization algorithm can minimize the uncertainty between the interdependent attribute and the continuous-valued attribute being discretized and at the same time maximize their correlation. Such a method can be used as a pre-processing step for the learning algorithms. The empirical results demonstrate a comparison of performance between MIDCA and various discretization methods for two decision tree algorithms ID3 and C4.5 on twelve real-life datasets from UCI repository. Keywords: multivariate discretization, interdependent feature, correlated attribute, data mining, machine learning.

2 36 Data Mining VI 1 Introduction The decision tree is one of the most widely used and practical methods for inductive inference in the data mining and machine learning discipline )Han and Kamber [1]). Most decision tree learning algorithms are limited to handle the attributes with discrete values only. However, the datasets are always the mix of discrete and continuous values of attributes. The common method to handle continuous-valued attributes is to discretize them by dividing them into intervals. Moreover, even if a learning algorithm is able to deal with continuous-valued attributes directly, it is still better to carry out the discretization prior the learning algorithm, so as to minimize the information lost and increase the classification accuracy. Many discretization algorithms developed in data mining focus on univariate, which discretize each continuous-valued attribute independently without considering the interdependent relationship between other attributes, at most taking the interdependent relationship between class attributes into account. The simplest discretization method is equal width interval binning (Dougherty et a.l [2]), which divides the range of a continuous-valued attribute into several equally sized bins. It makes no use of class attribute and thus it is an unsupervised discretization method. The best discretization algorithms are supervised that take the class attribute information into consideration. One is entropy-based, it recursively partitions a continuous-valued attribute to obtain the minimal entropy measure (Fayyad and Irani [3]), and uses the minimum description length principle to be the stopping criteria; the other is based on the chi-square statistics (Liu and Setiono [4]), which aims at having the most similar distribution to the original data even after discretization. Evaluations and comparisons of some supervised and unsupervised univariate discretization methods can be found in [2, 5]. As Bay [6, 7] indicated, the discretized intervals should make sense to human expert. For example, when learning the medical data for hypertensive patients, we know that a person s blood pressure is increasing as his/her age increasing. Therefore it is improper to set 140mmHg and 90mmHg as systolic pressure and diastolic pressure respectively for all patients. Since the standard for diagnosing hypertension is a little bit different from young people (orthoarteriotony is mmHg/80mmHg) to the old people (orthoarteriotony is 140mmHg/90mmHg) [8]. If the blood pressure of a person aged 20 is 139mmHg/89mmHg, he/she might be considered as a potential hypertensive. In contrast, if a person aged 65 has the same blood pressure measures, he/she is definitely considered as normotensive. Obviously, to discretize the continuous-valued attribute blood pressure, it must take at least the attribute age into consideration. While discretizing other continuous-valued attribute may not take age into consideration. The only solution to address the mentioned problem is to use multivariate interdependent discretization in place of univariate discretization. Multivariate interdependent discretization concerns the correlation between the attribute being discretized and the other potential interdependent attributes in addition to the class attribute. There are few literatures discussed about the

3 Data Mining VI 37 multivariate interdependent discretization methods. In this paper, we propose a new multivariate interdependent discretization method that can be used as a preprocessing step for the learning algorithms, called Multivariate Interdependent Discretization for Continuous Attributes MIDCA. The method is based on the normalized relief and information measure to look for the best correlated attribute for each continuous-valued attribute being discretized, and using it as the interdependent attribute to carry out the multivariate discretization. In the next section, we describe our discretization method in detail. The evaluation of the proposed algorithm on some real datasets is performed in section 3. Finally, we discuss the limitations of the method and present the directions for our further research in section 4. 2 MIDCA algorithm In order to obtain the good quality for a multivariate discretization, to discover a best interdependent attribute respect to each continuous-valued attribute being discretized is considered as the primary vital task. To measure the correlation between attributes, entropy measure [3, 9, 10] and relief theory [11, 12] are adopted. Relief is a feature-weighting algorithm for estimating the quality of attributes such that it is able to discover the interdependencies among attributes. Entropy from information theory is a measure of the uncertainty for an arbitrary variable. In this section, we first recall the entropy information and theory of relief and then describe our discretization method in detail. 2.1 Entropy information Entropy specifies the minimum number of bits of information needed to encode the classification of an arbitrary member of instances [9, 10]. Given a collection of instances S, containing C types of examples of a target attribute, the entropy of S relative to this C-classification is defined as Entropy( S) = p( S ) log( p( S )). (1) i C where p(s i ) is the proportion of S belonging to class i. Based on this measure, we may find out the most informative attribute A relative to a collection of examples S by defining the measure called information gain S v Gain( S, A) = Entropy( S) Entropy( S ). (2) v v Values( A) S where Values(A) is the set of all distinct values of attribute A; S v is the subset of S S = s S A( s) = v. for which attribute A has value v, that is { } 2.2 Relief The key idea of relief (Kira and Rendell [11, 12]) is to estimate the quality of an attribute by calculating how well its values distinguish among the instances from both same class and different class. A good attribute should have the same values for instances from the same class and should differentiate between instances v i i

4 38 Data Mining VI from the different classes. Kononenko [13] notes that Relief attempts to approximate the following difference of probabilities for the weight of an attribute A Relief = P(different value of A different class) A (3) P(different value of A same class). which can be reformulated as 2 Gini '( A) p( x) x X Relief =. (4) A 2 2 (1 pc ( ) ) pc ( ) c C c C where C is the class attribute and 2 px ( ) Gini '( A) = p( c)(1 P( c)) p( c x)(1 p( c x)). (5) 2 c C x X px ( ) c C x X Gini is a variance of another attribute quality measure algorithm Gini-index (Breiman [14]). 2.3 MIDCA Our proposed multivariate discretization method MIDCA is interested mainly in discovering the best interdependent attribute relative to the continuous-valued attribute being discretized. Among an attribute space, attributes should have certain relevancy between each other. No matter how loose or tight the relationships are, there must exist at least one such interdependent attribute that perfect correlates with the continuous-valued attribute being discretized. As we believe that a good multivariate discretization scheme for continuous-valued attributes should highly rely on their perfect correlated attributes respectively. We assume that a dataset S = {s 1, s 2,, s N } contains N instances. Each instance s S is defined over a set of M attributes (features) A = {a 1, a 2,, a M } and a class attribute c C. For each continuous-valued attribute a i A, there exists at least one a j A, such that a j is the most correlated attribute respect to a i, or vice versa, since the interdependent weight is measured in symmetrically. For the purpose of finding out such a best interdependent attribute a j for each continuous-valued attribute a i, both entropy information and relief measure are taken into account to capture their interactions among the attributes space A. First, for each attribute pair (a i, a j ) A where i j, we calculate the correlation weights by utilizing both symmetric relief and entropy information. Then normalize the two measures and finally get the best result as our target. The algorithm can be defined as InterdependentWeight( a, a ) = i SymGain( a, a ) SymRelief ( a, a ) i j i j + A A 2 2 SymGain( a, a ) (, ) i M SymRelief a a i M M i M i j /2. (6)

5 Data Mining VI 39 SymGain(a i, a j ) and SymRelief(a i, a j ) are two symmetric forms of information gain and relief measures respectively, which treated either a i or a j in turn to be the class attribute C in the formula. That is SymGainAB (, ) = [ GainAB (, ) + GainBA (, )]/2. (7) and 2 2 Gini '( A) p( x) Gini '( B) p( y) x X y Y SymRelief ( A, B) = + /2. (8) (1 pb ( ) ) pb ( ) (1 pa ( ) ) pa ( ) b B b B a A a A The advantage of incorporating the measures of information gain and relief in our multivariate interdependent discretization algorithm is to minimize the uncertainty between the continuous-valued attribute being discretized and its interdependent attribute, and at the same time to maximize their correlation. The measures output from eqns (7) and (8) are in different standards, the only way to balance them is to normalize each result by using proportions in place of real values. Thus, a best interdependent attribute with respect to the continuousvalued attribute being discretized is determined by further averaging the two normalized proportions for which the interdependent weight is the best amongst all the potential interdependent attributes. However, if a potential interdependent attribute is a continuous-valued attribute too, it is first discretized with entropybased method [2, 3]. This is important and may reduce the bias of in favor of the attribute with more values. Furthermore, our method creates an interdependent attribute for each continuous-valued attribute in a dataset rather than using one for all continuous-valued attributes, this is also the main factor for improving the final classification accuracy. Once the best interdependent attribute has been discovered, the multivariate interdependent discretization process carries out by adopting the most efficient supervised discretization algorithm Minimal Entropy Partitioning with MDLP (Fayyad and Irani [3]). Nevertheless, our method makes several differences compared with the original one. First, our method ensures at least binary discretization for each continuous-valued attribute, which is different from the original method that sometimes the boundary of a continuous-valued attribute is [-, + ]. We realized that if a continuous-valued attribute generates null cutting point means the attribute is useless and will be ignored during learning process. This may conflict with our belief that most continuous-valued attributes in medical domain have their specific meanings. For example, most figures express the degrees of illness, such as blood pressure, heart rate, cholesterol, etc., so their discretization cannot be ignored. Second, our discretization carries out with respect to the best interdependent attribute that discovered from eqn (6) in addition to the class attribute. Moreover, we assume that the interdependent attribute INT has T discrete values; as such each of its distinct value identifies a subset in the original dataset, the probability should be generated relative to the subset in place of the whole dataset. Therefore, the combinational probability distribution over the attribute space {C} A is redefined as well as the information gain algorithm as

6 40 Data Mining VI MIDCAInfoGain( A, P; INT, S) = T S (9) v Entropy( S INT ) ( ). T Entropy Sv v Values( A) INTT S where the algorithm defines the class information entropy of the partition induced by P, which is a collection of candidate cutting points for attribute A and under the projection of value T for the interdependent attribute INT. We replace the Entropy(S) with the conditional entropy Entropy(S INT T ) to emphasize the interaction between the interdependent attribute INT. As a consequence, v Values(A) INT T becomes the set of all distinct values of attribute A of the cluster induced by T of interdependent attribute INT; S v is the subset of S for which attribute A has value v and under the projection of T for INT, that is S = s S A( s) = v INT( s) = T. v { } 2.4 MIDCA high level descriptions We now present the high level descriptions of our MIDCA algorithm and the algorithm INTDDiscovery for discovering the best-correlated interdependent attribute as follows: Algorithm MIDCA For each continuous-valued attribute A Sort A in ascending order; Discover the best interdependent attribute of A by INTDDiscovery; Repeat Discover the best cutpoints by MIDCAInfoGain measure; Until MDLP = pass; Regenerate the dataset according to the obtained cutpoints; End MIDCA. Algorithm INTDDiscovery For each attribute atr other than A If atr is a continuous-valued attribute Discretize atr using entropy-based method; Calculate symmetric entropy SymGain for A and atr; Calculate symmetric relief SymRelief for A and atr; Normalize SymGain and SymRelief; Average SymGain and SymRelief; Output the attribute with the highest average measure; End INTDDiscovery. 3 Experiments In this section, our empirical evaluation results are presented. We have tested our method MIDCA on twelve real-life datasets from UCI repository (Blake and Merz [15]), which containing a mixture of continuous and discrete attributes. The details of each dataset are listed in table 1. In order to make comparisons between MIDCA algorithm and different discretization methods, we simulated the univariate and multivariate discretization methods. While the interdependent

7 Data Mining VI 41 attributes of multivariate discretizations are obtained by using Relief and Gain Ratio respectively. In the experiment, MIDCA and various discretization methods are used as pre-processing steps for the two learning algorithms: ID3 Quinlan [16], and C4.5 Quinlan [17, 18]. Table 1: Twelve real-life datasets from UCI. Features Instances Size No. Dataset Training Testing Classes Continuous Discrete set set 1 Cleve Hepatitis Hypothyroid Heart Sickeuthyroid Iris Australian Auto Breast Crx Diabetes Horse-colic Average Table 2: Comparison of classification error rates of decision tree algorithm ID3 with/without discretization algorithms. ID3 Classification Error Rate (%) Multivariate No. No Univariate discretization discretization discretization Average(Relief, MIDCA GainRatio) ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± error 25.78± ± ± ± ± ± ±3.47 Avg 17.75± ± ± ±2.78

8 42 Data Mining VI The experiments results summarized in table 2 and table 3 reveal that MIDCA improves the classification accuracy on average. In table 2, although MIDCA increases the error rate on three datasets for ID3 with univariate and multivariate discretizations respectively, it decreases the error rate on all but only one dataset for ID3 without discretization. In table 3, it improves the performance on all but one dataset for C4.5 with/without univariate discretization and two datasets for C4.5 with multivariate discretization. For the rest of the datasets, MIDCA provides a significant increase in classification accuracy, especially on two datasets: Hypothyroid and Sick-euthyroid, which approached to zero error rates for both learning algorithms. As observed from table 2, MIDCA slightly decreases the performance on three datasets comparing with the ID3 with univariate discretization; similarly, MIDCA increases the error rate on one dataset for C4.5 with univariate discretization in table 3. As we discovered that all these downgrade datasets contain only continuous attributes. This makes worse classification performance, because the MIDCA algorithm needs to carry out a univariate discretization first, prior the multivariate discretization if an interdependent attribute is a continuousvalued attribute too. This extra step increases the uncertainty to the attribute being discretized, hence increases the error rate accordingly. Table 3: Comparison of classification error rates of decision tree algorithm C4.5 with/without discretization algorithms. No. No discretization C4.5 Classification Error rate (%) Multivariate Univariate discretization discretization Average(Relief, GainRatio) MIDCA Avg Moreover, from the average error rate depicted in table 2 and table 3 respectively, our method MIDCA indeed decreases the classification error rate from 17.75%, 16.74% and 14.77% down to 13.27% of ID3 algorithm; and from 15.66%, 15.02% and 14.21% down to 12.33% of C4.5 algorithm, although

9 Data Mining VI 43 several datasets obtained higher error rates versus the average of the algorithms with multivariate discretizations of relief and gain ratio respectively. The improvements relative to both algorithms without discretizations, with univariate and multivariate discretizations reach to approximately 25.2% and 21.3%, 20.7% and 17.9%, and 10.2% and 13.2% respectively. The least improvement is over 10%, this verifies that our algorithm MIDCA that incorporating Relief and Gain Ratio does outperform their original ones, and of course better than the univariate discretization method and no discretization at all. 4 Conclusions and future research In this paper, we have proposed a novel method for multivariate interdependent discretization that focused on the discovery of a best interdependent attribute for each continuous-valued attribute. The method can be used as a preprocessing tool for any learning algorithms, and it ensures at least the binary discretization so that minimizes the information lost and maximizes the classification accuracy. The empirical evaluation results presented in this paper indicate the significant evidence that our method MIDCA can appropriately discretize a continuousvalued attribute with respect to a specific interdependent attribute, thus improves the final classification performance. However, the method has limitation in handling the dataset contains all continuous-valued attributes. If this is the case, the complexity and cost for discovering an interdependent attribute will be increased and the performance of MIDCA will be decreased. Since a perfect matching of an interdependent attribute to a continuous-valued attribute is considered as the key success factor in multivariate interdependent discretization. Our experiments were performed by applying ID3 and C4.5 learning algorithms, for further comparisons, we plan to perform the experiments by other learning algorithms, such as naive-bayes (Langley et al. [19]), or clusters, etc. On the other hand, further research should include investigating the complexity as well as efficiency of the algorithm, and may extend the discretization on more than two attributes. Finally, limitations should be resolved to be able to handle continuous-valued interdependent attribute efficiently and effectively. These addressed research directions may finally guide us to create a valuable algorithm. References [1] Han J. & Kamber M., Data Mining - Concepts and Techniques, Morgan Kaufmann Publishers, [2] Dougherty J., Kohavi R. & Sahami M., Supervised and Unsupervised Discretization of Continuous Features. Proceedings of the Twelfth International Conference, Morgan Kaufmann Publishers, San Francisco, CA [3] Fayyad U. M. & Irani K. B., Multi-interval discretization of continuousvalued attributes for classification learning. Proceeding of the Thirteenth International Joint Conference on Artificial Intelligence, pp , 1993.

10 44 Data Mining VI [4] Liu H. & Setiono R., Feature selection via discretization, Technical report, 1997, Dept. of Information Systems and Computer Science, Singapore. [5] Liu H., Hussain F., Tan C. & Dash M., Discretization: An enabling technique. Topics in Data Mining and Knowledge Discovery, pp , [6] Bay S. D., Multivariate Discretization of Continuous Variables for Set Mining. Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp , [7] Bay S.D., Multivariate Discretization for Set Mining. Knowledge and Information Systems, 3(4), pp , [8] 北京醫科大學人民醫院心內科高血壓研究組編寫, 高血壓病現代知識百問答,1998. [9] Mitchell T. M., Machine Learning, McGraw-Hill Companies, Inc [10] 朱雪龍, 應用信息論基礎, 清華大學出版社,2000. [11] Kira K. & Rendell L., A practical approach to feature selection. Proc. Intern. Conf. on Machine Learning, Aberdeen, Morgan Kaufmann, pp , [12] Kira K. & Rendell L., The feature selection problem: traditional methods and new algorithm. Proc. AAAI 92, San Jose, CA [13] Kononenko I., On biases in estimating multi-valued attributes. In IJCAI95, pp , [14] Breiman L., Technical note: Some properties of splitting criteria. Machine Learning, 24: pp , [15] Blake C. L. & Merz C. J., UCI Repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science html. [16] Quinlan J. R., Induction of decision trees. Machine Learning, 1(1), pp , [17] Quinlan J. R., C4.5: Programs for Machine Learning, San Mateo, CA. Morgan Kaufmann, [18] Quinlan J. R., Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research 4, pp , [19] Langley P., Iba W. & Thompsom K., An analysis of Bayesian classifiers. In Proceedings of the tenth national conference on artificial intelligence, AAAI Press and MIT Press, pp , 1992.

Aijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules.

Aijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules. Discretization of Continuous Attributes for Learning Classication Rules Aijun An and Nick Cercone Department of Computer Science, University of Waterloo Waterloo, Ontario N2L 3G1 Canada Abstract. We present

More information

Induction of Decision Trees

Induction of Decision Trees Induction of Decision Trees Peter Waiganjo Wagacha This notes are for ICS320 Foundations of Learning and Adaptive Systems Institute of Computer Science University of Nairobi PO Box 30197, 00200 Nairobi.

More information

LBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules

LBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules LBR-Meta: An Efficient Algorithm for Lazy Bayesian Rules Zhipeng Xie School of Computer Science Fudan University 220 Handan Road, Shanghai 200433, PR. China xiezp@fudan.edu.cn Abstract LBR is a highly

More information

1 Introduction. Keywords: Discretization, Uncertain data

1 Introduction. Keywords: Discretization, Uncertain data A Discretization Algorithm for Uncertain Data Jiaqi Ge, Yuni Xia Department of Computer and Information Science, Indiana University Purdue University, Indianapolis, USA {jiaqge, yxia}@cs.iupui.edu Yicheng

More information

Lecture 4: Data preprocessing: Data Reduction-Discretization. Dr. Edgar Acuna. University of Puerto Rico- Mayaguez math.uprm.

Lecture 4: Data preprocessing: Data Reduction-Discretization. Dr. Edgar Acuna. University of Puerto Rico- Mayaguez math.uprm. COMP 6838: Data Mining Lecture 4: Data preprocessing: Data Reduction-Discretization Dr. Edgar Acuna Department t of Mathematics ti University of Puerto Rico- Mayaguez math.uprm.edu/~edgar 1 Discretization

More information

A Discretization Algorithm for Uncertain Data

A Discretization Algorithm for Uncertain Data A Discretization Algorithm for Uncertain Data Jiaqi Ge 1,*, Yuni Xia 1, and Yicheng Tu 2 1 Department of Computer and Information Science, Indiana University Purdue University, Indianapolis, USA {jiaqge,yxia}@cs.iupui.edu

More information

An Approach to Classification Based on Fuzzy Association Rules

An Approach to Classification Based on Fuzzy Association Rules An Approach to Classification Based on Fuzzy Association Rules Zuoliang Chen, Guoqing Chen School of Economics and Management, Tsinghua University, Beijing 100084, P. R. China Abstract Classification based

More information

Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier

Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier Kaizhu Huang, Irwin King, and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,

More information

Classification Using Decision Trees

Classification Using Decision Trees Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association

More information

CHAPTER-17. Decision Tree Induction

CHAPTER-17. Decision Tree Induction CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning Lecture 3: Decision Trees p. Decision

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,

More information

Machine Learning & Data Mining

Machine Learning & Data Mining Group M L D Machine Learning M & Data Mining Chapter 7 Decision Trees Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University Top 10 Algorithm in DM #1: C4.5 #2: K-Means #3: SVM

More information

Pattern-Based Decision Tree Construction

Pattern-Based Decision Tree Construction Pattern-Based Decision Tree Construction Dominique Gay, Nazha Selmaoui ERIM - University of New Caledonia BP R4 F-98851 Nouméa cedex, France {dominique.gay, nazha.selmaoui}@univ-nc.nc Jean-François Boulicaut

More information

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition

Data Mining Classification: Basic Concepts and Techniques. Lecture Notes for Chapter 3. Introduction to Data Mining, 2nd Edition Data Mining Classification: Basic Concepts and Techniques Lecture Notes for Chapter 3 by Tan, Steinbach, Karpatne, Kumar 1 Classification: Definition Given a collection of records (training set ) Each

More information

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng 1 Decision Trees 2 Instances Describable by Attribute-Value Pairs Target Function Is Discrete Valued Disjunctive Hypothesis May Be Required Possibly Noisy Training Data Examples Equipment or medical diagnosis

More information

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,

More information

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018 Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model

More information

A Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index

A Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index A Metric Approach to Building Decision Trees based on Goodman-Kruskal Association Index Dan A. Simovici and Szymon Jaroszewicz University of Massachusetts at Boston, Department of Computer Science, Boston,

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}

More information

Voting Massive Collections of Bayesian Network Classifiers for Data Streams

Voting Massive Collections of Bayesian Network Classifiers for Data Streams Voting Massive Collections of Bayesian Network Classifiers for Data Streams Remco R. Bouckaert Computer Science Department, University of Waikato, New Zealand remco@cs.waikato.ac.nz Abstract. We present

More information

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"

More information

Rule Generation using Decision Trees

Rule Generation using Decision Trees Rule Generation using Decision Trees Dr. Rajni Jain 1. Introduction A DT is a classification scheme which generates a tree and a set of rules, representing the model of different classes, from a given

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Abstract. 1 Introduction. Ian H. Witten Department of Computer Science University of Waikato Hamilton, New Zealand

Abstract. 1 Introduction. Ian H. Witten Department of Computer Science University of Waikato Hamilton, New Zealand Í Ò È ÖÑÙØ Ø ÓÒ Ì Ø ÓÖ ØØÖ ÙØ Ë Ð Ø ÓÒ Ò ÓÒ ÌÖ Eibe Frank Department of Computer Science University of Waikato Hamilton, New Zealand eibe@cs.waikato.ac.nz Ian H. Witten Department of Computer Science University

More information

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69 R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland

More information

Data classification (II)

Data classification (II) Lecture 4: Data classification (II) Data Mining - Lecture 4 (2016) 1 Outline Decision trees Choice of the splitting attribute ID3 C4.5 Classification rules Covering algorithms Naïve Bayes Classification

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect

More information

Improving Naive Bayes Classifiers Using Neuro-Fuzzy Learning 1

Improving Naive Bayes Classifiers Using Neuro-Fuzzy Learning 1 Improving Naive Bayes Classifiers Using Neuro-Fuzzy Learning A. Nürnberger C. Borgelt and A. Klose Dept. of Knowledge Processing and Language Engineering Otto-von-Guericke-University of Magdeburg Germany

More information

Three Discretization Methods for Rule Induction

Three Discretization Methods for Rule Induction Three Discretization Methods for Rule Induction Jerzy W. Grzymala-Busse, 1, Jerzy Stefanowski 2 1 Department of Electrical Engineering and Computer Science, University of Kansas, Lawrence, Kansas 66045

More information

Technical Note On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Technical Note On the Handling of Continuous-Valued Attributes in Decision Tree Generation Machine Learning, 8, 87-102 (1992) 1992 Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Technical Note On the Handling of Continuous-Valued Attributes in Decision Tree Generation USAMA

More information

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label. Decision Trees Supervised approach Used for Classification (Categorical values) or regression (continuous values). The learning of decision trees is from class-labeled training tuples. Flowchart like structure.

More information

Occam s Razor Just Got Sharper

Occam s Razor Just Got Sharper Occam s Razor Just Got Sharper Saher Esmeir and Shaul Markovitch Computer Science Department, Technion Israel Institute of Technology, Haifa 32000, Israel {esaher, shaulm}@cs.technion.ac.il Abstract Occam

More information

Exact model averaging with naive Bayesian classifiers

Exact model averaging with naive Bayesian classifiers Exact model averaging with naive Bayesian classifiers Denver Dash ddash@sispittedu Decision Systems Laboratory, Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA 15213 USA Gregory F

More information

An Empirical Study of Building Compact Ensembles

An Empirical Study of Building Compact Ensembles An Empirical Study of Building Compact Ensembles Huan Liu, Amit Mandvikar, and Jigar Mody Computer Science & Engineering Arizona State University Tempe, AZ 85281 {huan.liu,amitm,jigar.mody}@asu.edu Abstract.

More information

On Multi-Class Cost-Sensitive Learning

On Multi-Class Cost-Sensitive Learning On Multi-Class Cost-Sensitive Learning Zhi-Hua Zhou and Xu-Ying Liu National Laboratory for Novel Software Technology Nanjing University, Nanjing 210093, China {zhouzh, liuxy}@lamda.nju.edu.cn Abstract

More information

Concept Lattice based Composite Classifiers for High Predictability

Concept Lattice based Composite Classifiers for High Predictability Concept Lattice based Composite Classifiers for High Predictability Zhipeng XIE 1 Wynne HSU 1 Zongtian LIU 2 Mong Li LEE 1 1 School of Computing National University of Singapore Lower Kent Ridge Road,

More information

Selected Algorithms of Machine Learning from Examples

Selected Algorithms of Machine Learning from Examples Fundamenta Informaticae 18 (1993), 193 207 Selected Algorithms of Machine Learning from Examples Jerzy W. GRZYMALA-BUSSE Department of Computer Science, University of Kansas Lawrence, KS 66045, U. S. A.

More information

Lattice Machine: Version Space in Hyperrelations

Lattice Machine: Version Space in Hyperrelations Lattice Machine: Version Space in Hyperrelations [Extended Abstract] Hui Wang, Ivo Düntsch School of Information and Software Engineering University of Ulster Newtownabbey, BT 37 0QB, N.Ireland {H.Wang

More information

Development of a Data Mining Methodology using Robust Design

Development of a Data Mining Methodology using Robust Design Development of a Data Mining Methodology using Robust Design Sangmun Shin, Myeonggil Choi, Youngsun Choi, Guo Yi Department of System Management Engineering, Inje University Gimhae, Kyung-Nam 61-749 South

More information

Growing a Large Tree

Growing a Large Tree STAT 5703 Fall, 2004 Data Mining Methodology I Decision Tree I Growing a Large Tree Contents 1 A Single Split 2 1.1 Node Impurity.................................. 2 1.2 Computation of i(t)................................

More information

Short Note: Naive Bayes Classifiers and Permanence of Ratios

Short Note: Naive Bayes Classifiers and Permanence of Ratios Short Note: Naive Bayes Classifiers and Permanence of Ratios Julián M. Ortiz (jmo1@ualberta.ca) Department of Civil & Environmental Engineering University of Alberta Abstract The assumption of permanence

More information

Classification Based on Logical Concept Analysis

Classification Based on Logical Concept Analysis Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.

More information

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning Question of the Day Machine Learning 2D1431 How can you make the following equation true by drawing only one straight line? 5 + 5 + 5 = 550 Lecture 4: Decision Tree Learning Outline Decision Tree for PlayTennis

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

ENTROPIES OF FUZZY INDISCERNIBILITY RELATION AND ITS OPERATIONS

ENTROPIES OF FUZZY INDISCERNIBILITY RELATION AND ITS OPERATIONS International Journal of Uncertainty Fuzziness and Knowledge-Based Systems World Scientific ublishing Company ENTOIES OF FUZZY INDISCENIBILITY ELATION AND ITS OEATIONS QINGUA U and DAEN YU arbin Institute

More information

Classification and Prediction

Classification and Prediction Classification Classification and Prediction Classification: predict categorical class labels Build a model for a set of classes/concepts Classify loan applications (approve/decline) Prediction: model

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Decision Tree. Decision Tree Learning. c4.5. Example

Decision Tree. Decision Tree Learning. c4.5. Example Decision ree Decision ree Learning s of systems that learn decision trees: c4., CLS, IDR, ASSISA, ID, CAR, ID. Suitable problems: Instances are described by attribute-value couples he target function has

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Definitions Data. Consider a set A = {A 1,...,A n } of attributes, and an additional

More information

Correlation Preserving Unsupervised Discretization. Outline

Correlation Preserving Unsupervised Discretization. Outline Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

ON COMBINING PRINCIPAL COMPONENTS WITH FISHER S LINEAR DISCRIMINANTS FOR SUPERVISED LEARNING

ON COMBINING PRINCIPAL COMPONENTS WITH FISHER S LINEAR DISCRIMINANTS FOR SUPERVISED LEARNING ON COMBINING PRINCIPAL COMPONENTS WITH FISHER S LINEAR DISCRIMINANTS FOR SUPERVISED LEARNING Mykola PECHENIZKIY*, Alexey TSYMBAL**, Seppo PUURONEN* Abstract. The curse of dimensionality is pertinent to

More information

Using HDDT to avoid instances propagation in unbalanced and evolving data streams

Using HDDT to avoid instances propagation in unbalanced and evolving data streams Using HDDT to avoid instances propagation in unbalanced and evolving data streams IJCNN 2014 Andrea Dal Pozzolo, Reid Johnson, Olivier Caelen, Serge Waterschoot, Nitesh V Chawla and Gianluca Bontempi 07/07/2014

More information

Background literature. Data Mining. Data mining: what is it?

Background literature. Data Mining. Data mining: what is it? Background literature Data Mining Lecturer: Peter Lucas Assessment: Written exam at the end of part II Practical assessment Compulsory study material: Transparencies Handouts (mostly on the Web) Course

More information

EECS 349:Machine Learning Bryan Pardo

EECS 349:Machine Learning Bryan Pardo EECS 349:Machine Learning Bryan Pardo Topic 2: Decision Trees (Includes content provided by: Russel & Norvig, D. Downie, P. Domingos) 1 General Learning Task There is a set of possible examples Each example

More information

Decision Tree Learning Lecture 2

Decision Tree Learning Lecture 2 Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over

More information

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

Information Theory & Decision Trees

Information Theory & Decision Trees Information Theory & Decision Trees Jihoon ang Sogang University Email: yangjh@sogang.ac.kr Decision tree classifiers Decision tree representation for modeling dependencies among input variables using

More information

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan, Steinbach, Kumar Adapted by Qiang Yang (2010) Tan,Steinbach,

More information

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science

More information

Introduction. Abstract

Introduction. Abstract From: KDD-98 Proceedings. Copyright 1998, AAAI (www.aaai.org). All rights reserved. BAYDA: Software for Bayesian Classification and Feature Selection Petri Kontkanen, Petri Myllymäki, Tomi Silander, Henry

More information

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1

Decision Trees. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. February 5 th, Carlos Guestrin 1 Decision Trees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 5 th, 2007 2005-2007 Carlos Guestrin 1 Linear separability A dataset is linearly separable iff 9 a separating

More information

Support Vector Machine via Nonlinear Rescaling Method

Support Vector Machine via Nonlinear Rescaling Method Manuscript Click here to download Manuscript: svm-nrm_3.tex Support Vector Machine via Nonlinear Rescaling Method Roman Polyak Department of SEOR and Department of Mathematical Sciences George Mason University

More information

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1

Learning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1 Learning Objectives At the end of the class you should be able to: identify a supervised learning problem characterize how the prediction is a function of the error measure avoid mixing the training and

More information

Chapter 6: Classification

Chapter 6: Classification Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Bayesian Classification. Bayesian Classification: Why?

Bayesian Classification. Bayesian Classification: Why? Bayesian Classification http://css.engineering.uiowa.edu/~comp/ Bayesian Classification: Why? Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches

More information

Resampling Methods CAPT David Ruth, USN

Resampling Methods CAPT David Ruth, USN Resampling Methods CAPT David Ruth, USN Mathematics Department, United States Naval Academy Science of Test Workshop 05 April 2017 Outline Overview of resampling methods Bootstrapping Cross-validation

More information

An asymmetric entropy measure for decision trees

An asymmetric entropy measure for decision trees An asymmetric entropy measure for decision trees Simon Marcellin Laboratoire ERIC Université Lumière Lyon 2 5 av. Pierre Mendès-France 69676 BRON Cedex France simon.marcellin@univ-lyon2.fr Djamel A. Zighed

More information

A Unified Bias-Variance Decomposition

A Unified Bias-Variance Decomposition A Unified Bias-Variance Decomposition Pedro Domingos Department of Computer Science and Engineering University of Washington Box 352350 Seattle, WA 98185-2350, U.S.A. pedrod@cs.washington.edu Tel.: 206-543-4229

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity

More information

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order MACHINE LEARNING Definition 1: Learning is constructing or modifying representations of what is being experienced [Michalski 1986], p. 10 Definition 2: Learning denotes changes in the system That are adaptive

More information

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations Review of Lecture 1 This course is about finding novel actionable patterns in data. We can divide data mining algorithms (and the patterns they find) into five groups Across records Classification, Clustering,

More information

Study on Classification Methods Based on Three Different Learning Criteria. Jae Kyu Suhr

Study on Classification Methods Based on Three Different Learning Criteria. Jae Kyu Suhr Study on Classification Methods Based on Three Different Learning Criteria Jae Kyu Suhr Contents Introduction Three learning criteria LSE, TER, AUC Methods based on three learning criteria LSE:, ELM TER:

More information

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING Santiago Ontañón so367@drexel.edu Summary so far: Rational Agents Problem Solving Systematic Search: Uninformed Informed Local Search Adversarial Search

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Inducing Polynomial Equations for Regression

Inducing Polynomial Equations for Regression Inducing Polynomial Equations for Regression Ljupčo Todorovski, Peter Ljubič, and Sašo Džeroski Department of Knowledge Technologies, Jožef Stefan Institute Jamova 39, SI-1000 Ljubljana, Slovenia Ljupco.Todorovski@ijs.si

More information

Bayesian Averaging of Classifiers and the Overfitting Problem

Bayesian Averaging of Classifiers and the Overfitting Problem Bayesian Averaging of Classifiers and the Overfitting Problem Pedro Domingos pedrod@cs.washington.edu Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195, U.S.A.

More information

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction Data Mining 3.6 Regression Analysis Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Straight-Line Linear Regression Multiple Linear Regression Other Regression Models References Introduction

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Lecture 06 - Regression & Decision Trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom

More information

the tree till a class assignment is reached

the tree till a class assignment is reached Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal

More information

Not so naive Bayesian classification

Not so naive Bayesian classification Not so naive Bayesian classification Geoff Webb Monash University, Melbourne, Australia http://www.csse.monash.edu.au/ webb Not so naive Bayesian classification p. 1/2 Overview Probability estimation provides

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes dr. Petra Kralj Novak Petra.Kralj.Novak@ijs.si 7.11.2017 1 Course Prof. Bojan Cestnik Data preparation Prof. Nada Lavrač: Data mining overview Advanced

More information

Learning Decision Trees

Learning Decision Trees Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Learning From Inconsistent and Noisy Data: The AQ18 Approach *

Learning From Inconsistent and Noisy Data: The AQ18 Approach * Eleventh International Symposium on Methodologies for Intelligent Systems, Warsaw, pp. 411-419, 1999 Learning From Inconsistent and Noisy Data: The AQ18 Approach * Kenneth A. Kaufman and Ryszard S. Michalski*

More information

CSCI 5622 Machine Learning

CSCI 5622 Machine Learning CSCI 5622 Machine Learning DATE READ DUE Mon, Aug 31 1, 2 & 3 Wed, Sept 2 3 & 5 Wed, Sept 9 TBA Prelim Proposal www.rodneynielsen.com/teaching/csci5622f09/ Instructor: Rodney Nielsen Assistant Professor

More information

Machine Learning 3. week

Machine Learning 3. week Machine Learning 3. week Entropy Decision Trees ID3 C4.5 Classification and Regression Trees (CART) 1 What is Decision Tree As a short description, decision tree is a data classification procedure which

More information

Generative v. Discriminative classifiers Intuition

Generative v. Discriminative classifiers Intuition Logistic Regression (Continued) Generative v. Discriminative Decision rees Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 31 st, 2007 2005-2007 Carlos Guestrin 1 Generative

More information

An analysis of data characteristics that affect naive Bayes performance

An analysis of data characteristics that affect naive Bayes performance An analysis of data characteristics that affect naive Bayes performance Irina Rish Joseph Hellerstein Jayram Thathachar IBM T.J. Watson Research Center 3 Saw Mill River Road, Hawthorne, NY 1532 RISH@US.IBM.CM

More information

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler

Machine Learning and Data Mining. Decision Trees. Prof. Alexander Ihler + Machine Learning and Data Mining Decision Trees Prof. Alexander Ihler Decision trees Func-onal form f(x;µ): nested if-then-else statements Discrete features: fully expressive (any func-on) Structure:

More information

Feature gene selection method based on logistic and correlation information entropy

Feature gene selection method based on logistic and correlation information entropy Bio-Medical Materials and Engineering 26 (2015) S1953 S1959 DOI 10.3233/BME-151498 IOS Press S1953 Feature gene selection method based on logistic and correlation information entropy Jiucheng Xu a,b,,

More information