# Information Theory and ID3 Algo.

Save this PDF as:
Size: px
Start display at page:

## Transcription

1 Information Theory and ID3 Algo. Sohn Jong-Soo

2 Claude Elwood Shannon The father of information theory 본적 : Petosky, Michigan 논문 : A Mathematical Theory of Communication, BSTJ 1948 등 경력 : Researcher at Bell Lab(1941~1972), MIT Professor(1956~2001) 학력 : B.S. in Univ. of Michigan (EE & AM 1936) M.S. in MIT (1937) : A Symbolic Analysis of Relay and Switching Circuits Ph.D in MIT (1940) : An algebra for Theoretical Genetics 생년월일 : 직업 : Electrical engineer, Mathematician 병역 : 2 차대전시암호해독가로활약 사망 : , Medford, Mass. Alzheimer`s disease 가족관계 : Betty(Wife), Andrew(Son), Margarita(Daughter)

3 Fundamentals of Information Theory Three fundamental problems Data compression Sampling (Digitalizing) Classification Q : How to measure information? Concept of Entropy : Low Entropy : High Entropy 정보량 : (Bits) 예 : 정상인동전의엔트로피 : 1 (Bits) 비정상인동전 ( P( 앞면 )=0.9) 의엔트로피 : (Bits) 로또 (6/45) 의엔트로피 : (Bits)

4 Fundamentals of Information Theory " 여기세로가로 4 장씩 16 장의트럼프가있다. 이중한장만머릿속에점찍어두시오 " " 예, 점찍었습니다 " " 그것은상단에있는가?" " 그렇습니다 " " 그럼상단의오른쪽반에있는가?" " 아닙니다 " " 그럼왼쪽반의상단에있는가?" " 아닙니다 " " 그럼하단의오른쪽에있는가?" " 그렇습니다 " " 당신이점찍은것은크로바 3 이군요 " 첫째, 카드의수. 여기서는 16 장이므로아무런정보도없는상황에서알아맞힐확률은 1/16 둘째, 상대의대답의종류. 여기서는예, 아니오두종류만허용 셋째, 질문횟수. 여기서는 4 회.16 장의카드에서특정카드를알아맞히는데 4 회의질문이필요

5 Fundamentals of Information Theory 2 4 = 16 (generalizing) W = 2 n W : number of cards n : number of questions 16 장의카드에서 1 장을알아맞히는데 4 회의질문이필요 log 2 16=4 16 장의카드에서특정 1 장을고르는정보량은 4bit logw = n log2 n = logw/log2 n = log 2 W 16 장에서 1 장을고르는확률은 1/16 이므로이확률을중심으로정보량을표현하면 n = -log 2 1/16 n = -log 2 P I( s s s m i 1,s2,...,sm ) = log 2 i= 1 s s i

6 Fundamentals of Information Theory = - P 1 log 2 P 1 = - P 2 log 2 P 2 E = - P 1 log 2 P 1 - P 2 log 2 P 2 E(A) = v j= 1 s 1 j s s mj I( s 1 j,...,s mj ) Gain(A) = I(s 1,s 2,...,sm) E(A)

7 Entropy X: discrete random variable, p(x) Entropy (or self-information) H(p) = H(X) = x X p(x)log 2 p(x) Entropy measures the amount of information in a RV it s the average length of the message needed to transmit an outcome of that variable using the optimal code

8 Joint Entropy The joint entropy of 2 RV X,Y is the amount of the information needed on average to specify both their values H(X, Y) = p(x, y)logp(x, Y) x X y Y

9 Conditional Entropy The conditional entropy of a RV Y given another X expresses how much extra information one still needs to supply on average to communicate Y given that the other party knows X It s also called the equivocation ( 애매함 ) of X about Y H(Y X) = = = p(x) x X x X y Y x X y Y p(x)h(y X = p(y x)logp(y p(x, y)logp(y x) x) x) = E ( logp(y X) )

10 Mutual Information H(X, Y) H(X)-H(X Y) = H(X) + H(Y X) = = H(Y)- H(Y X) H(Y) + H(X Y) = I(X, Y) I(X,Y) is the mutual information between X and Y. It is the reduction of uncertainty of one RV due to knowing about the other, or the amount of information one RV contains about the other

11 Mutual Information (cont) I(X, Y) = H(X) -H(X Y) = H(Y) - H(Y X) I is 0 only when X,Y are independent: H(X Y)=H(X) H(X)=H(X)-H(X X)=I(X,X) Entropy is the self-information

12 ID3 algorithm ID3 Nonincremental algorithm meaning it derives its classes from a fixed set of training instances. The classes created by ID3 are inductive, that is, given a small set of training instances, the specific classes created by ID3 are expected to work for all future instances. The distribution of the unknowns must be the same as the test cases. Induction classes cannot be proven to work in every case since they may classify an infinite number of instances. Note that ID3 (or any inductive algorithm) may misclassify data.

13 ID3 algorithm Data Description The sample data used by ID3 has certain requirements, which are: Attribute-value description the same attributes must describe each example and have a fixed number of values. Predefined classes an example's attributes must already be defined, that is, they are not learned by ID3. Discrete classes classes must be sharply delineated. Continuous classes broken up into vague categories such as a metal being "hard, quite hard, flexible, soft, quite soft" are suspect.

14 ID3 algorithm Attribute Selection information gain, is used The one with the highest information (information being the most useful for classification) is selected. Entropy measures the amount of information in an attribute. Given a collection S of c outcomes Entropy(S) = -p(i) log 2 p(i) where p(i) is the proportion of S belonging to class I. S is over c. Log2 is log base 2. Note that S is not an attribute but the entire sample set.

15 ID3 algorithm - Example should we play baseball? Over the course of 2 weeks, data is collected to help ID3 build a decision tree (see following table) The weather attributes are outlook, temperature, humidity, and wind speed. They can have the following values: outlook = { sunny, overcast, rain } temperature = {hot, mild, cool } humidity = { high, normal } wind = {weak, strong }

16 ID3 algorithm - Example

17 ID3 algorithm - Example We need to find which attribute will be the root node in our decision tree. The gain is calculated for all four attributes: Gain(S, Outlook) = Gain(S, Temperature) = Gain(S, Humidity) = Gain(S, Wind) = Outlook attribute has the highest gain, therefore it is used as the decision attribute in the root node.

18 ID3 algorithm - Example Since Outlook has three possible values, the root node has three branches (sunny, overcast, rain). The next question is "what attribute should be tested at the Sunny branch node? S sunny = {D1, D2, D8, D9, D11} = 5 Gain(S sunny, Humidity) = Gain(S sunny, Temperature) = Gain(S sunny, Wind) = Humidity has the highest gain it is used as the decision node.

19 ID3 algorithm - Example

20 ID3 algorithm - Example The decision tree can also be expressed in rule format: IF outlook = sunny AND humidity = high THEN playball = no IF outlook = rain AND humidity = high THEN playball = no IF outlook = rain AND wind = strong THEN playball = yes IF outlook = overcast THEN playball = yes IF outlook = rain AND wind = weak THEN playball = yes

21 Next presentation OLAM Engine User GUI API Data Cube API OLAP Engine Layer4 User Interface Layer3 OLAP/OLAM MDDB Meta Data Layer2 MDDB Filtering&Integration Databases Database API Data cleaning Data integration Filtering Data Warehouse Layer1 Data Repository

22 Next presentation Name Gender Major Birth-Place Birth_date Residence Phone # GPA Jim M CS Vancouver,BC, Main St., Woodman Canada Richmond Scott M CS Montreal, Que, st Ave., Lachance Canada Richmond Laura Lee F Physics Removed Retained Sci,Eng, Bus Seattle, WA, USA Austin Ave., Burnaby Country Age range City Removed Excl, VG,..

### Decision Trees. Gavin Brown

Decision Trees Gavin Brown Every Learning Method has Limitations Linear model? KNN? SVM? Explain your decisions Sometimes we need interpretable results from our techniques. How do you explain the above

### Classification: Decision Trees

Classification: Decision Trees Outline Top-Down Decision Tree Construction Choosing the Splitting Attribute Information Gain and Gain Ratio 2 DECISION TREE An internal node is a test on an attribute. A

### Learning Decision Trees

Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy

### Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

### Dept. of Linguistics, Indiana University Fall 2015

L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

### Learning Decision Trees

Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?

### Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Decision trees Special Course in Computer and Information Science II Adam Gyenge Helsinki University of Technology 6.2.2008 Introduction Outline: Definition of decision trees ID3 Pruning methods Bibliography:

### 2018 CS420, Machine Learning, Lecture 5. Tree Models. Weinan Zhang Shanghai Jiao Tong University

2018 CS420, Machine Learning, Lecture 5 Tree Models Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/cs420/index.html ML Task: Function Approximation Problem setting

### Decision Trees. Tirgul 5

Decision Trees Tirgul 5 Using Decision Trees It could be difficult to decide which pet is right for you. We ll find a nice algorithm to help us decide what to choose without having to think about it. 2

### Classification & Information Theory Lecture #8

Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing

### Information in Biology

Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living

### Decision Tree Learning and Inductive Inference

Decision Tree Learning and Inductive Inference 1 Widely used method for inductive inference Inductive Inference Hypothesis: Any hypothesis found to approximate the target function well over a sufficiently

### Machine Learning 2nd Edi7on

Lecture Slides for INTRODUCTION TO Machine Learning 2nd Edi7on CHAPTER 9: Decision Trees ETHEM ALPAYDIN The MIT Press, 2010 Edited and expanded for CS 4641 by Chris Simpkins alpaydin@boun.edu.tr h1p://www.cmpe.boun.edu.tr/~ethem/i2ml2e

### Information in Biology

Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve

### Induction on Decision Trees

Séance «IDT» de l'ue «apprentissage automatique» Bruno Bouzy bruno.bouzy@parisdescartes.fr www.mi.parisdescartes.fr/~bouzy Outline Induction task ID3 Entropy (disorder) minimization Noise Unknown attribute

### ( D) I(2,3) I(4,0) I(3,2) weighted avg. of entropies

Decision Tree Induction using Information Gain Let I(x,y) as the entropy in a dataset with x number of class 1(i.e., play ) and y number of class (i.e., don t play outcomes. The entropy at the root, i.e.,

### Machine Learning Alternatives to Manual Knowledge Acquisition

Machine Learning Alternatives to Manual Knowledge Acquisition Interactive programs which elicit knowledge from the expert during the course of a conversation at the terminal. Programs which learn by scanning

### Machine Learning Recitation 8 Oct 21, Oznur Tastan

Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy

### the tree till a class assignment is reached

Decision Trees Decision Tree for Playing Tennis Prediction is done by sending the example down Prediction is done by sending the example down the tree till a class assignment is reached Definitions Internal

### Rule Generation using Decision Trees

Rule Generation using Decision Trees Dr. Rajni Jain 1. Introduction A DT is a classification scheme which generates a tree and a set of rules, representing the model of different classes, from a given

### Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Decision Trees Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, 2018 Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1 Roadmap Classification: machines labeling data for us Last

### 6.02 Fall 2011 Lecture #9

6.02 Fall 2011 Lecture #9 Claude E. Shannon Mutual information Channel capacity Transmission at rates up to channel capacity, and with asymptotically zero error 6.02 Fall 2011 Lecture 9, Slide #1 First

### Classification and Prediction

Classification Classification and Prediction Classification: predict categorical class labels Build a model for a set of classes/concepts Classify loan applications (approve/decline) Prediction: model

### Learning Classification Trees. Sargur Srihari

Learning Classification Trees Sargur srihari@cedar.buffalo.edu 1 Topics in CART CART as an adaptive basis function model Classification and Regression Tree Basics Growing a Tree 2 A Classification Tree

### Classification Using Decision Trees

Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association

### Communication Theory and Engineering

Communication Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 018-019 Information theory Practice work 3 Review For any probability distribution, we define

### Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Decision-Tree Learning Chapter 3: Decision Tree Learning CS 536: Machine Learning Littman (Wu, TA) [read Chapter 3] [some of Chapter 2 might help ] [recommended exercises 3.1, 3.2] Decision tree representation

### Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Decision Trees Claude Monet, The Mulberry Tree Slides from Pedro Domingos, CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore Michael Guerzhoy

### Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

Administration Chapter 3: Decision Tree Learning (part 2) Book on reserve in the math library. Questions? CS 536: Machine Learning Littman (Wu, TA) Measuring Entropy Entropy Function S is a sample of training

### Decision Tree Learning - ID3

Decision Tree Learning - ID3 n Decision tree examples n ID3 algorithm n Occam Razor n Top-Down Induction in Decision Trees n Information Theory n gain from property 1 Training Examples Day Outlook Temp.

### Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Introduction Decision Tree Learning Practical methods for inductive inference Approximating discrete-valued functions Robust to noisy data and capable of learning disjunctive expression ID3 earch a completely

### CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

### 10-701/ Machine Learning: Assignment 1

10-701/15-781 Machine Learning: Assignment 1 The assignment is due September 27, 2005 at the beginning of class. Write your name in the top right-hand corner of each page submitted. No paperclips, folders,

### Decision Tree Learning

Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,

### Chapter 3: Decision Tree Learning

Chapter 3: Decision Tree Learning CS 536: Machine Learning Littman (Wu, TA) Administration Books? New web page: http://www.cs.rutgers.edu/~mlittman/courses/ml03/ schedule lecture notes assignment info.

### Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Decision Tree Learning Mitchell, Chapter 3 CptS 570 Machine Learning School of EECS Washington State University Outline Decision tree representation ID3 learning algorithm Entropy and information gain

### The Solution to Assignment 6

The Solution to Assignment 6 Problem 1: Use the 2-fold cross-validation to evaluate the Decision Tree Model for trees up to 2 levels deep (that is, the maximum path length from the root to the leaves is

### Decision T ree Tree Algorithm Week 4 1

Decision Tree Algorithm Week 4 1 Team Homework Assignment #5 Read pp. 105 117 of the text book. Do Examples 3.1, 3.2, 3.3 and Exercise 3.4 (a). Prepare for the results of the homework assignment. Due date

### Decision Trees / NLP Introduction

Decision Trees / NLP Introduction Dr. Kevin Koidl School of Computer Science and Statistic Trinity College Dublin ADAPT Research Centre The ADAPT Centre is funded under the SFI Research Centres Programme

### Dan Roth 461C, 3401 Walnut

CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn

### CS 6375 Machine Learning

CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}

### Information Theory, Statistics, and Decision Trees

Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010

### Lecture 9: Bayesian Learning

Lecture 9: Bayesian Learning Cognitive Systems II - Machine Learning Part II: Special Aspects of Concept Learning Bayes Theorem, MAL / ML hypotheses, Brute-force MAP LEARNING, MDL principle, Bayes Optimal

### Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Intelligent Data Analysis Decision Trees Paul Prasse, Niels Landwehr, Tobias Scheffer Decision Trees One of many applications:

### 3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that

### Administrative notes. Computational Thinking ct.cs.ubc.ca

Administrative notes Labs this week: project time. Remember, you need to pass the project in order to pass the course! (See course syllabus.) Clicker grades should be on-line now Administrative notes March

### Machine Learning. Yuh-Jye Lee. March 1, Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU

Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU March 1, 2017 1 / 13 Bayes Rule Bayes Rule Assume that {B 1, B 2,..., B k } is a partition of S

### Classification and Regression Trees

Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity

http://xkcd.com/1570/ Strategy: Top Down Recursive divide-and-conquer fashion First: Select attribute for root node Create branch for each possible attribute value Then: Split

### Lecture 3: Decision Trees

Lecture 3: Decision Trees Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning Lecture 3: Decision Trees p. Decision

### Artificial Intelligence. Topic

Artificial Intelligence Topic What is decision tree? A tree where each branching node represents a choice between two or more alternatives, with every branching node being part of a path to a leaf node

### 3F1 Information Theory, Lecture 1

3F1 Information Theory, Lecture 1 Jossy Sayir Department of Engineering Michaelmas 2013, 22 November 2013 Organisation History Entropy Mutual Information 2 / 18 Course Organisation 4 lectures Course material:

### Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Part I Introduction to Data Mining by Tan, Steinbach, Kumar Adapted by Qiang Yang (2010) Tan,Steinbach,

### Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"

### Bayesian Classification. Bayesian Classification: Why?

Bayesian Classification http://css.engineering.uiowa.edu/~comp/ Bayesian Classification: Why? Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches

### Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose

### Bayesian Learning. Bayesian Learning Criteria

Bayesian Learning In Bayesian learning, we are interested in the probability of a hypothesis h given the dataset D. By Bayes theorem: P (h D) = P (D h)p (h) P (D) Other useful formulas to remember are:

### Decision Support. Dr. Johan Hagelbäck.

Decision Support Dr. Johan Hagelbäck johan.hagelback@lnu.se http://aiguy.org Decision Support One of the earliest AI problems was decision support The first solution to this problem was expert systems

### Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2

Introduction to Information Theory B. Škorić, Physical Aspects of Digital Security, Chapter 2 1 Information theory What is it? - formal way of counting information bits Why do we need it? - often used

### Decision Trees Part 1. Rao Vemuri University of California, Davis

Decision Trees Part 1 Rao Vemuri University of California, Davis Overview What is a Decision Tree Sample Decision Trees How to Construct a Decision Tree Problems with Decision Trees Classification Vs Regression

### EECS 349:Machine Learning Bryan Pardo

EECS 349:Machine Learning Bryan Pardo Topic 2: Decision Trees (Includes content provided by: Russel & Norvig, D. Downie, P. Domingos) 1 General Learning Task There is a set of possible examples Each example

### Etymology of Entropy. Definitions. Shannon Entropy 3/3/2008. Information Entropy: Illustrating Example. Entropy = randomness. Amount of uncertainty

Inforation Entropy: Illutrating Etyology of Entropy Andrew Kuak 239 Seaan Center Iowa City, Iowa 52242-527 andrew-kuak@uiowa.edu http://www.icaen.uiowa.edu/~ankuak Tel: 39-335 5934 Fax: 39-335 5669 Entropy

### Algorithms for Classification: The Basic Methods

Algorithms for Classification: The Basic Methods Outline Simplicity first: 1R Naïve Bayes 2 Classification Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.

### Decision Trees. Common applications: Health diagnosis systems Bank credit analysis

Decision Trees Rodrigo Fernandes de Mello Invited Professor at Télécom ParisTech Associate Professor at Universidade de São Paulo, ICMC, Brazil http://www.icmc.usp.br/~mello mello@icmc.usp.br Decision

### Decision Tree Learning

0. Decision Tree Learning Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 3 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell PLAN 1. Concept learning:

### CSCE 478/878 Lecture 6: Bayesian Learning and Graphical Models. Stephen Scott. Introduction. Outline. Bayes Theorem. Formulas

ian ian ian Might have reasons (domain information) to favor some hypotheses/predictions over others a priori ian methods work with probabilities, and have two main roles: Naïve Nets (Adapted from Ethem

### Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof Ganesh Ramakrishnan October 20, 2016 1 / 25 Decision Trees: Cascade of step

### CSCE 478/878 Lecture 6: Bayesian Learning

Bayesian Methods Not all hypotheses are created equal (even if they are all consistent with the training data) Outline CSCE 478/878 Lecture 6: Bayesian Learning Stephen D. Scott (Adapted from Tom Mitchell

### Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Decision Trees Defining the Task Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything. Can we predict, i.e

### 1 Introduction to information theory

1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through

### Decision Trees. Danushka Bollegala

Decision Trees Danushka Bollegala Rule-based Classifiers In rule-based learning, the idea is to learn a rule from train data in the form IF X THEN Y (or a combination of nested conditions) that explains

### Data Mining. Chapter 1. What s it all about?

Data Mining Chapter 1. What s it all about? 1 DM & ML Ubiquitous computing environment Excessive amount of data (data flooding) Gap between the generation of data and their understanding Looking for structural

### Information & Correlation

Information & Correlation Jilles Vreeken 11 June 2014 (TADA) Questions of the day What is information? How can we measure correlation? and what do talking drums have to do with this? Bits and Pieces What

### DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

1 DECISION TREE LEARNING [read Chapter 3] [recommended exercises 3.1, 3.4] Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting Decision Tree 2 Representation: Tree-structured

### Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Decision Trees Supervised approach Used for Classification (Categorical values) or regression (continuous values). The learning of decision trees is from class-labeled training tuples. Flowchart like structure.

### 6.02 Fall 2012 Lecture #1

6.02 Fall 2012 Lecture #1 Digital vs. analog communication The birth of modern digital communication Information and entropy Codes, Huffman coding 6.02 Fall 2012 Lecture 1, Slide #1 6.02 Fall 2012 Lecture

### CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

### Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

Introduction to ML Two examples of Learners: Naïve Bayesian Classifiers Decision Trees Why Bayesian learning? Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical

### Reminders. HW1 out, due 10/19/2017 (Thursday) Group formations for course project due today (1 pt) Join Piazza (

CS 145 Discussion 2 Reminders HW1 out, due 10/19/2017 (Thursday) Group formations for course project due today (1 pt) Join Piazza (email: juwood03@ucla.edu) Overview Linear Regression Z Score Normalization

### Probability Based Learning

Probability Based Learning Lecture 7, DD2431 Machine Learning J. Sullivan, A. Maki September 2013 Advantages of Probability Based Methods Work with sparse training data. More powerful than deterministic

### Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:

### Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 23. Decision Trees Barnabás Póczos Contents Decision Trees: Definition + Motivation Algorithm for Learning Decision Trees Entropy, Mutual Information, Information

### CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING Santiago Ontañón so367@drexel.edu Summary so far: Rational Agents Problem Solving Systematic Search: Uninformed Informed Local Search Adversarial Search

### Decision Tree Learning. Dr. Xiaowei Huang

Decision Tree Learning Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ After two weeks, Remainders Should work harder to enjoy the learning procedure slides Read slides before coming, think them

### Decision Tree Learning

Topics Decision Tree Learning Sattiraju Prabhakar CS898O: DTL Wichita State University What are decision trees? How do we use them? New Learning Task ID3 Algorithm Weka Demo C4.5 Algorithm Weka Demo Implementation

### An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

### Etymology of Entropy. Definitions. Shannon Entropy. Information Entropy: Illustrating Example. Entropy = randomness. Amount of uncertainty

Inforation Entropy: Illutrating Exaple Etyology of Entropy Andrew Kuak Intelligent Syte Laboratory 2139 Seaan Center The Unierty of Iowa Iowa City, Iowa 52242-1527 andrew-kuak@uiowa.edu http://www.icaen.uiowa.edu/~ankuak

### Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Decision Tree Analysis for Classification Problems Entscheidungsunterstützungssysteme SS 18 Supervised segmentation An intuitive way of thinking about extracting patterns from data in a supervised manner

### Stephen Scott.

1 / 28 ian ian Optimal (Adapted from Ethem Alpaydin and Tom Mitchell) Naïve Nets sscott@cse.unl.edu 2 / 28 ian Optimal Naïve Nets Might have reasons (domain information) to favor some hypotheses/predictions

### Leveraging Randomness in Structure to Enable Efficient Distributed Data Analytics

Leveraging Randomness in Structure to Enable Efficient Distributed Data Analytics Jaideep Vaidya (jsvaidya@rbs.rutgers.edu) Joint work with Basit Shafiq, Wei Fan, Danish Mehmood, and David Lorenzi Distributed

### Chapter 6: Classification

Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant

### Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Outline Training Examples for EnjoySport Learning from examples General-to-specific ordering over hypotheses [read Chapter 2] [suggested exercises 2.2, 2.3, 2.4, 2.6] Version spaces and candidate elimination

### Classification: Rule Induction Information Retrieval and Data Mining. Prof. Matteo Matteucci

Classification: Rule Induction Information Retrieval and Data Mining Prof. Matteo Matteucci What is Rule Induction? The Weather Dataset 3 Outlook Temp Humidity Windy Play Sunny Hot High False No Sunny

### Lecture 1: Introduction, Entropy and ML estimation

0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual

### 1 Basic Information Theory

ECE 6980 An Algorithmic and Information-Theoretic Toolbo for Massive Data Instructor: Jayadev Acharya Lecture #4 Scribe: Xiao Xu 6th September, 206 Please send errors to 243@cornell.edu and acharya@cornell.edu

### Inductive Learning. Chapter 18. Material adopted from Yun Peng, Chuck Dyer, Gregory Piatetsky-Shapiro & Gary Parker

Inductive Learning Chapter 18 Material adopted from Yun Peng, Chuck Dyer, Gregory Piatetsky-Shapiro & Gary Parker Chapters 3 and 4 Inductive Learning Framework Induce a conclusion from the examples Raw

### INTRODUCTION TO INFORMATION THEORY

INTRODUCTION TO INFORMATION THEORY KRISTOFFER P. NIMARK These notes introduce the machinery of information theory which is a eld within applied mathematics. The material can be found in most textbooks