Etymology of Entropy. Definitions. Shannon Entropy 3/3/2008. Information Entropy: Illustrating Example. Entropy = randomness. Amount of uncertainty

Similar documents
Etymology of Entropy. Definitions. Shannon Entropy. Information Entropy: Illustrating Example. Entropy = randomness. Amount of uncertainty

Introduction. Decision Tree Learning. Outline. Decision Tree 9/7/2017. Decision Tree Definition

Learning Decision Trees

Learning Decision Trees

Decision Trees / NLP Introduction

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Decision Trees. Tirgul 5

Lecture 3: Decision Trees

Machine Learning 2nd Edi7on

Decision Trees. Data Science: Jordan Boyd-Graber University of Maryland MARCH 11, Data Science: Jordan Boyd-Graber UMD Decision Trees 1 / 1

Decision Tree Learning - ID3

EECS 349:Machine Learning Bryan Pardo

Decision Trees. Gavin Brown

Classification: Decision Trees

Decision Tree Learning Mitchell, Chapter 3. CptS 570 Machine Learning School of EECS Washington State University

Learning Classification Trees. Sargur Srihari

Administration. Chapter 3: Decision Tree Learning (part 2) Measuring Entropy. Entropy Function

Decision Tree Learning and Inductive Inference

Decision Trees. Common applications: Health diagnosis systems Bank credit analysis

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Intelligent Data Analysis. Decision Trees

Induction on Decision Trees

Decision Trees.

The Quadratic Entropy Approach to Implement the Id3 Decision Tree Algorithm

Artificial Intelligence. Topic

Information Theory and ID3 Algo.

the tree till a class assignment is reached

Decision Trees Part 1. Rao Vemuri University of California, Davis

Lecture 3: Decision Trees

Dan Roth 461C, 3401 Walnut

Decision-Tree Learning. Chapter 3: Decision Tree Learning. Classification Learning. Decision Tree for PlayTennis

Decision Trees.

Decision Tree Learning

Machine Learning 3. week

2018 CS420, Machine Learning, Lecture 5. Tree Models. Weinan Zhang Shanghai Jiao Tong University

Machine Learning Alternatives to Manual Knowledge Acquisition

Chapter 3: Decision Tree Learning

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Bayesian Classification. Bayesian Classification: Why?

Rule Generation using Decision Trees

Classification and Prediction

Chapter 3: Decision Tree Learning (part 2)

Introduction to ML. Two examples of Learners: Naïve Bayesian Classifiers Decision Trees

( D) I(2,3) I(4,0) I(3,2) weighted avg. of entropies

M chi h n i e n L e L arni n n i g Decision Trees Mac a h c i h n i e n e L e L a e r a ni n ng

CS 6375 Machine Learning

Classification Using Decision Trees

Bias Correction in Classification Tree Construction ICML 2001

Introduction to Machine Learning CMU-10701

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning


Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Decision Trees. Danushka Bollegala

Outline. Training Examples for EnjoySport. 2 lecture slides for textbook Machine Learning, c Tom M. Mitchell, McGraw Hill, 1997

Lecture 24: Other (Non-linear) Classifiers: Decision Tree Learning, Boosting, and Support Vector Classification Instructor: Prof. Ganesh Ramakrishnan

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Bayesian Learning. Artificial Intelligence Programming. 15-0: Learning vs. Deduction

The Solution to Assignment 6

Decision Tree Learning

COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

CC283 Intelligent Problem Solving 28/10/2013

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

Decision Tree Learning

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Classification and regression trees

Moving Average Rules to Find. Confusion Matrix. CC283 Intelligent Problem Solving 05/11/2010. Edward Tsang (all rights reserved) 1

The Bayesian Learning

The Naïve Bayes Classifier. Machine Learning Fall 2017

Typical Supervised Learning Problem Setting

10-701/ Machine Learning: Assignment 1

Classification II: Decision Trees and SVMs

Decision T ree Tree Algorithm Week 4 1

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

If Y is normally Distributed, then and 2 Y Y 10. σ σ

CS 380: ARTIFICIAL INTELLIGENCE MACHINE LEARNING. Santiago Ontañón


DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

COMP61011! Probabilistic Classifiers! Part 1, Bayes Theorem!

Einführung in Web- und Data-Science

Classification and Regression Trees

Imagine we ve got a set of data containing several types, or classes. E.g. information about customers, and class=whether or not they buy anything.

Decision Trees. CSC411/2515: Machine Learning and Data Mining, Winter 2018 Luke Zettlemoyer, Carlos Guestrin, and Andrew Moore

Artificial Intelligence Decision Trees

Example A1: Preparation of a Calibration Standard

Administrative notes. Computational Thinking ct.cs.ubc.ca

Leveraging Randomness in Structure to Enable Efficient Distributed Data Analytics

Data classification (II)

Chapter 6: Classification

Machine Learning & Data Mining

Answer keys. EAS 1600 Lab 1 (Clicker) Math and Science Tune-up. Note: Students can receive partial credit for the graphs/dimensional analysis.

Concept Learning through General-to-Specific Ordering

CSE-4412(M) Midterm. There are five major questions, each worth 10 points, for a total of 50 points. Points for each sub-question are as indicated.

Machine Learning 2010

Decision Trees. Each internal node : an attribute Branch: Outcome of the test Leaf node or terminal node: class label.

Notes on Machine Learning for and

Decision trees. Decision tree induction - Algorithm ID3

ARTIFICIAL INTELLIGENCE. Supervised learning: classification

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

NAM weather forecasting model. RUC weather forecasting model 4/19/2011. Outline. Short and Long Term Wind Farm Power Prediction

COMP 328: Machine Learning

Transcription:

Inforation Entropy: Illutrating Etyology of Entropy Andrew Kuak 239 Seaan Center Iowa City, Iowa 52242-527 andrew-kuak@uiowa.edu http://www.icaen.uiowa.edu/~ankuak Tel: 39-335 5934 Fax: 39-335 5669 Entropy = randone Aount of uncertainty Shannon Entropy S = final probability pace copoed of two dijoint eent E and E 2 with probability p = p and p 2 = p, repectiely. The Shannon entropy i defined a H(S) = H(p, p 2 ) = plogp ( p)log( p) Inforation content Entropy Inforation gain Definition I(,2,...,) log 2 j... j j ) Gain(A) = I(, 2,...,) E(A)

I(D, D 2 ) = -4/8* (4/8) - 4/8* (4/8) = For Blue D = 4, D 2 = 0 I(D, D 2 ) = -4/4* (4/4) = 0 For Red D 2 = 0, D 22 = 4 I(D 2, D 22 ) = -4/4* (4/4) = 0 E(F) = 4/8 I(D, D 2 ) 4/8 I(D 2, D 22 ) = 0 Gain (F) = I (D, D 2 ) - E (F) = D = of exaple in cla D 2 = of exaple in cla 2. F D Blue 2 Blue 3 Blue 4 Blue 5 Red 2 6 Red 2 7 Red 2 8 Red 2 I(,2,...,) j... j j) Gain(A) = I(, 2,...,) E(A) I(D, D 2, D 3 ) = -2/8* (2/8) - 3/8* (3/8) - 3/8* (3/8) =.56 For Blue D = 2, D 2 = 2, D 3 = 0 I(D, D 2 )= -2/4* (2/4) -2/4* (2/4) = For Red D 2 = 0, D 22 =, D 32 = 3 I(D 22, D 32 ) = -/4* (/4) -3/4* (3/4) = 0.8 E(F) = 4/8 I(D, D 2 ) 4/8 I (D 22, D 32 ) = 0.905 Gain (F) = I(D, D 2 ) - E (F) = 0.655 I(,2,...,) j... j j) Gain(A) = I(, 2,...,) E(A). F D Blue 2 Blue 3 Blue 2 4 Blue 2 5 Red 2 6 Red 3 7 Red 3 8 Red 3 I(,2,...,) j... j j) I(D, D 2, D 3 ) = -/8* (/8) - 3/8* (3/8) - 4/8* (4/8) =.4 Gain(A) = I(, 2,...,) E(A) For Blue D =, D 2 = 3, D 3 = 0 I (D, D 2 ) = -/4* (/4) -3/4* (3/4) = 0.8. F D Blue For Red D 2 = 0, D 22 = 0, D 32 = 4 2 Blue 2 I (D 32 ) = -4/4* (4/4) = 0 3 Blue 2 4 Blue 2 E(F) = 4/8 I(D, D 2 ) 4/8 I(D 32 ) = 0.4 5 Red 3 6 Red 3 Gain (F)= I(D, D 2 ) - E (F) = 7 Red 3 8 Red 3 I(,2,...,) I(D, D 2, D 3 ) = -2/8* (2/8) -3/8* (3/8) -3/8* (3/8) =.56 For Blue D = 2, D 2 = 0, D 3 = 0 I(D ) = -2/2* (2/2) = 0 For Red D 2 = 0, D 22 = 3, D 32 = 0 I(D 32 )=-3/3* (3/3) = 0 For Green D 3 = 0, D 23 = 0, D 33 = 3 I(D 33 ) = -3/3* (3/3) = 0 E(F) = 2/8 I(D ) 3/8 I(D 32 ) 3/8 I(D 32 ) = 0 Gain (F) = I(D, D 2 ) - E (F) =.56 j... j j) Gain(A) = I(, 2,...,) E(A). F D Blue 2 Blue 3 Green 3 4 Green 3 5 Green 3 6 Red 2 7 Red 2 8 Red 2 2

I(D, D 2, D 3, D 4 ) = -2/8* (2/8) -2/8* (2/8) -2/8* (2/8) -2/8* (2/8) = 2 For Blue D =, D 2 = 0, D 3 = 0, D 4 =0 I(D ) = -/* (/) = 0 For Red D 2 = 0, D 22 = 2, D 32 = 0, D 42 = 0 I(D 22 ) = -2/2* (2/2) = 0. F D Blue 2 Green 3 Green 3 4 Green 3 5 Green 4 6 Green 4 7 Red 2 8 Red 2 For Green D 3 =, D 23 = 0, D 33 = 2, D 43 = 2 I(D 3, D 33, D 43 ) = -/5* (/5) - 2/5* (2/5) - 2/5* (2/5) =.52 E(F)= /8 I(D ) 2/8 I(D 22 ) 5/8 I(D 3, D 33, D 43 ) = 0.95 Suary Cae Cae 2 Cae 3 Cae 4 Cae 5. F D. F D. F D. F D. F D Blue Blue Blue Blue Blue 2 Blue 2 Blue 2 Blue 2 2 Blue 2 Green 3 Blue 3 Blue 2 3 Blue 2 3 Green 3 3 Green 3 4 Blue 4 Blue 2 4 Blue 2 4 Green 3 4 Green 3 5 Red 2 5 Red 2 5 Red 3 5 Green 3 5 Green 4 6 Red 2 6 Red 3 6 Red 3 6 Red 2 6 Green 4 7 Red 2 7 Red 3 7 Red 3 7 Red 2 7 Red 2 8 Red 2 8 Red 3 8 Red 3 8 Red 2 8 Red 2 E(F) = 0 Gain (F) = E(F) = 0.905 Gain (F) = 0.655 E(F) = 0.4 Gain (F) = E(F) = 0 E(F) = 0.95 Gain (F) =.56 Gain (F) =.05 Gain (F)= I(D, D 2 ) - E (F) =.05 Suary Play Tenni:Training Data Set The higher the inforation gain, the ore releant i the obered feature to the decion. i The lower the entropy, the ore releant i the feature to the decion. Outlook Teperature Huidity Wind Play tenni unny hot high weak no unny hot high trong no oercat hot high weak ye rain ild high weak ye rain cool noral weak ye rain cool noral trong no oercat cool noral trong ye unny ild high weak no unny cool noral weak ye rain ild noral weak ye Decion Feature (Attribute) unny ild noral trong ye oercat ild high trong ye oercat hot noral weak ye rain ild high trong no Feature alue 3

Entropy: A Meaure of Hoogeneity Entropy of S Set S of N object - - H(S) = -p - (p-) - p (p ) p = n / N p -= n- / N Entropy: A Meaure of Hoogeneity Gien et S of 4 exaple 9 potie exaple 5 negatie exaple S = [9, 5-] The entropy H H(S) = - p- (p-) - p (p) = = -9/4 (9/4) - 5/4 (5/4) = = 0.940 p = n / N p - = n - / N Which Feature to Select? Inforation Gain Ued in C4.5? Expected reduction in entropy caued by the ue of feature A Gain(S, A) = H(S) card(s ) card(s) H (S ) Value(A) S - a ubet of S for which A aue alue Which Feature to Select? Gain(S,A) = H(S) card (S ) Value(A) card(s) H(S ) feature wind alue (wind) = weak, trong ( = weak, = trong) S = [9, 5-] S weak = [6, 2-] S trong = [3, 3-] Gain = H(S) - 8/4 *H(S weak ) -6/4*H(S trong ) = 0.940-8/4*0.8-6/4*.0= 0.048 4

Feature Selection Outlook Teperature Huidity Wind Play tenni unny hot high weak no unny hot high trong no oercat hot high weak ye rain ild high weak ye Contructing Decion Tree feature wind Gain(S, wind) = 0.048 feature outlook Gain(S, outlook) = 0.246 feature huidity Gain(S, huidity) = 0.5 feature teperature Gain(S, teperature) = 0.029 rain cool noral weak ye rain cool noral trong no oercat cool noral trong ye unny ild high weak no unny cool noral weak ye rain ild noral weak ye unny ild noral trong ye oercat ild high trong ye oercat hot noral weak ye rain ild high trong no Outlook Sunny Rain Oercat Ye Ye and Outlook Tep unny hot Huidity high Wind weak Play tenni no unny hot high trong no oercat hot high weak ye rain ild high weak ye rain cool noral weak ye rain cool noral trong no oercat cool noral trong ye unny ild high weak no unny cool noral weak ye rain ild noral weak ye unny ild noral trong ye oercat ild high trong ye oercat hot noral weak ye rain ild high trong no Coplete Decion Tree Fro Decion Tree to Rule Outlook Outlook Sunny Rain Sunny Huidity Oercat Rain Wind Huidity High Oercat ye Wind ral Strong Weak Ye Ye Ye High ral Ye Strong Weak Ye If Outlook = Oercat OR Outlook = Sunny AND Huidity = ral OR Outlook = Rain AND Wind = Weak THEN Play tenni 5

Decion Tree: Key Characteritic Aoiding Oerfitting the Data Coplete pace of finite dicrete-alued function Maintaining a ngle hypothe backtracking in earch All training exaple ued at each tep Accuracy Training data et Teting data et Size of tree Reference J. R. Quinlan, Induction of decion tree, Machine Learning,, 986, 8-06. 6