Etymology of Entropy. Definitions. Shannon Entropy 3/3/2008. Information Entropy: Illustrating Example. Entropy = randomness. Amount of uncertainty

Inforation Entropy: Illutrating Etyology of Entropy Andrew Kuak 239 Seaan Center Iowa City, Iowa 52242-527 andrew-kuak@uiowa.edu http://www.icaen.uiowa.edu/~ankuak Tel: 39-335 5934 Fax: 39-335 5669 Entropy = randone Aount of uncertainty Shannon Entropy S = final probability pace copoed of two dijoint eent E and E 2 with probability p = p and p 2 = p, repectiely. The Shannon entropy i defined a H(S) = H(p, p 2 ) = plogp ( p)log( p) Inforation content Entropy Inforation gain Definition I(,2,...,) log 2 j... j j ) Gain(A) = I(, 2,...,) E(A)

I(D, D 2 ) = -4/8* (4/8) - 4/8* (4/8) = For Blue D = 4, D 2 = 0 I(D, D 2 ) = -4/4* (4/4) = 0 For Red D 2 = 0, D 22 = 4 I(D 2, D 22 ) = -4/4* (4/4) = 0 E(F) = 4/8 I(D, D 2 ) 4/8 I(D 2, D 22 ) = 0 Gain (F) = I (D, D 2 ) - E (F) = D = of exaple in cla D 2 = of exaple in cla 2. F D Blue 2 Blue 3 Blue 4 Blue 5 Red 2 6 Red 2 7 Red 2 8 Red 2 I(,2,...,) j... j j) Gain(A) = I(, 2,...,) E(A) I(D, D 2, D 3 ) = -2/8* (2/8) - 3/8* (3/8) - 3/8* (3/8) =.56 For Blue D = 2, D 2 = 2, D 3 = 0 I(D, D 2 )= -2/4* (2/4) -2/4* (2/4) = For Red D 2 = 0, D 22 =, D 32 = 3 I(D 22, D 32 ) = -/4* (/4) -3/4* (3/4) = 0.8 E(F) = 4/8 I(D, D 2 ) 4/8 I (D 22, D 32 ) = 0.905 Gain (F) = I(D, D 2 ) - E (F) = 0.655 I(,2,...,) j... j j) Gain(A) = I(, 2,...,) E(A). F D Blue 2 Blue 3 Blue 2 4 Blue 2 5 Red 2 6 Red 3 7 Red 3 8 Red 3 I(,2,...,) j... j j) I(D, D 2, D 3 ) = -/8* (/8) - 3/8* (3/8) - 4/8* (4/8) =.4 Gain(A) = I(, 2,...,) E(A) For Blue D =, D 2 = 3, D 3 = 0 I (D, D 2 ) = -/4* (/4) -3/4* (3/4) = 0.8. F D Blue For Red D 2 = 0, D 22 = 0, D 32 = 4 2 Blue 2 I (D 32 ) = -4/4* (4/4) = 0 3 Blue 2 4 Blue 2 E(F) = 4/8 I(D, D 2 ) 4/8 I(D 32 ) = 0.4 5 Red 3 6 Red 3 Gain (F)= I(D, D 2 ) - E (F) = 7 Red 3 8 Red 3 I(,2,...,) I(D, D 2, D 3 ) = -2/8* (2/8) -3/8* (3/8) -3/8* (3/8) =.56 For Blue D = 2, D 2 = 0, D 3 = 0 I(D ) = -2/2* (2/2) = 0 For Red D 2 = 0, D 22 = 3, D 32 = 0 I(D 32 )=-3/3* (3/3) = 0 For Green D 3 = 0, D 23 = 0, D 33 = 3 I(D 33 ) = -3/3* (3/3) = 0 E(F) = 2/8 I(D ) 3/8 I(D 32 ) 3/8 I(D 32 ) = 0 Gain (F) = I(D, D 2 ) - E (F) =.56 j... j j) Gain(A) = I(, 2,...,) E(A). F D Blue 2 Blue 3 Green 3 4 Green 3 5 Green 3 6 Red 2 7 Red 2 8 Red 2 2

I(D, D 2, D 3, D 4 ) = -2/8* (2/8) -2/8* (2/8) -2/8* (2/8) -2/8* (2/8) = 2 For Blue D =, D 2 = 0, D 3 = 0, D 4 =0 I(D ) = -/* (/) = 0 For Red D 2 = 0, D 22 = 2, D 32 = 0, D 42 = 0 I(D 22 ) = -2/2* (2/2) = 0. F D Blue 2 Green 3 Green 3 4 Green 3 5 Green 4 6 Green 4 7 Red 2 8 Red 2 For Green D 3 =, D 23 = 0, D 33 = 2, D 43 = 2 I(D 3, D 33, D 43 ) = -/5* (/5) - 2/5* (2/5) - 2/5* (2/5) =.52 E(F)= /8 I(D ) 2/8 I(D 22 ) 5/8 I(D 3, D 33, D 43 ) = 0.95 Suary Cae Cae 2 Cae 3 Cae 4 Cae 5. F D. F D. F D. F D. F D Blue Blue Blue Blue Blue 2 Blue 2 Blue 2 Blue 2 2 Blue 2 Green 3 Blue 3 Blue 2 3 Blue 2 3 Green 3 3 Green 3 4 Blue 4 Blue 2 4 Blue 2 4 Green 3 4 Green 3 5 Red 2 5 Red 2 5 Red 3 5 Green 3 5 Green 4 6 Red 2 6 Red 3 6 Red 3 6 Red 2 6 Green 4 7 Red 2 7 Red 3 7 Red 3 7 Red 2 7 Red 2 8 Red 2 8 Red 3 8 Red 3 8 Red 2 8 Red 2 E(F) = 0 Gain (F) = E(F) = 0.905 Gain (F) = 0.655 E(F) = 0.4 Gain (F) = E(F) = 0 E(F) = 0.95 Gain (F) =.56 Gain (F) =.05 Gain (F)= I(D, D 2 ) - E (F) =.05 Suary Play Tenni:Training Data Set The higher the inforation gain, the ore releant i the obered feature to the decion. i The lower the entropy, the ore releant i the feature to the decion. Outlook Teperature Huidity Wind Play tenni unny hot high weak no unny hot high trong no oercat hot high weak ye rain ild high weak ye rain cool noral weak ye rain cool noral trong no oercat cool noral trong ye unny ild high weak no unny cool noral weak ye rain ild noral weak ye Decion Feature (Attribute) unny ild noral trong ye oercat ild high trong ye oercat hot noral weak ye rain ild high trong no Feature alue 3

Entropy: A Meaure of Hoogeneity Entropy of S Set S of N object - - H(S) = -p - (p-) - p (p ) p = n / N p -= n- / N Entropy: A Meaure of Hoogeneity Gien et S of 4 exaple 9 potie exaple 5 negatie exaple S = [9, 5-] The entropy H H(S) = - p- (p-) - p (p) = = -9/4 (9/4) - 5/4 (5/4) = = 0.940 p = n / N p - = n - / N Which Feature to Select? Inforation Gain Ued in C4.5? Expected reduction in entropy caued by the ue of feature A Gain(S, A) = H(S) card(s ) card(s) H (S ) Value(A) S - a ubet of S for which A aue alue Which Feature to Select? Gain(S,A) = H(S) card (S ) Value(A) card(s) H(S ) feature wind alue (wind) = weak, trong ( = weak, = trong) S = [9, 5-] S weak = [6, 2-] S trong = [3, 3-] Gain = H(S) - 8/4 *H(S weak ) -6/4*H(S trong ) = 0.940-8/4*0.8-6/4*.0= 0.048 4

Feature Selection Outlook Teperature Huidity Wind Play tenni unny hot high weak no unny hot high trong no oercat hot high weak ye rain ild high weak ye Contructing Decion Tree feature wind Gain(S, wind) = 0.048 feature outlook Gain(S, outlook) = 0.246 feature huidity Gain(S, huidity) = 0.5 feature teperature Gain(S, teperature) = 0.029 rain cool noral weak ye rain cool noral trong no oercat cool noral trong ye unny ild high weak no unny cool noral weak ye rain ild noral weak ye unny ild noral trong ye oercat ild high trong ye oercat hot noral weak ye rain ild high trong no Outlook Sunny Rain Oercat Ye Ye and Outlook Tep unny hot Huidity high Wind weak Play tenni no unny hot high trong no oercat hot high weak ye rain ild high weak ye rain cool noral weak ye rain cool noral trong no oercat cool noral trong ye unny ild high weak no unny cool noral weak ye rain ild noral weak ye unny ild noral trong ye oercat ild high trong ye oercat hot noral weak ye rain ild high trong no Coplete Decion Tree Fro Decion Tree to Rule Outlook Outlook Sunny Rain Sunny Huidity Oercat Rain Wind Huidity High Oercat ye Wind ral Strong Weak Ye Ye Ye High ral Ye Strong Weak Ye If Outlook = Oercat OR Outlook = Sunny AND Huidity = ral OR Outlook = Rain AND Wind = Weak THEN Play tenni 5

Decion Tree: Key Characteritic Aoiding Oerfitting the Data Coplete pace of finite dicrete-alued function Maintaining a ngle hypothe backtracking in earch All training exaple ued at each tep Accuracy Training data et Teting data et Size of tree Reference J. R. Quinlan, Induction of decion tree, Machine Learning,, 986, 8-06. 6