Data Miig CS 341, Sprig 2007 Lecture 8: Decisio tree algorithms Jackkife Estimator: Example 1 Estimate of mea for X={x 1, x 2, x 3,}, =3, g=3, m=1, θ = µ = (x( 1 + x 2 + x 3 )/3 θ 1 = (x( 2 + x 3 )/2, θ 2 = (x( 1 + x 3 )/2, θ 1 = (x( 1 + x 2 )/2, θ = (θ( 1 + θ 2 + θ 2 )/3 θ Q = gθ-(gg (g-1) θ_= 3θ-(33 (3-1) θ_= (x( 1 + x 2 + x 3 )/3 I this case, the Jackkife Estimator is the same as the usual estimator. Pretice Hall 2 Jackkife Estimator: Example 2 Estimate of variace for X={1, 4, 4}, =3, g=3, m=1, θ = σ 2 σ 2 = ((1-3) 2 +(4-3) 2 +(4-3) 2 )/3 = 2 θ 1 = ((4-4) 4) 2 + (4-4) 4) 2 ) /2 = 0, 0 θ 2 = 2.25, θ 3 = 2.25 θ = (θ 1 + θ 2 + θ 2 )/3 = 1.5 θ Q = gθ-(g-1) θ_= 3θ-(33 (3-1) θ_ =3(2)-2(1.5)=3 2(1.5)=3 I this case, the Jackkife Estimator is differet from the usual estimator. Jackkife Estimator: Example 2(cot Example 2(cot d) I geeral, apply the Jackkife techique to the biased estimator σ 2 σ 2 = Σ (x i x ) 2 / the the jackkife estimator is s 2 s 2 = Σ (x i x ) 2 / ( -1) Which is kow to be ubiased for σ 2 Pretice Hall 3 Pretice Hall 4 Review: Distace-based Algorithms Place items i class to which they are closest. Similarity measures or distace measures Simple approach K Nearest Neighbors Decisio Tree issues, pros ad cos Pretice Hall 5 Classificatio Usig Decisio Trees Partitioig based: Divide search space ito rectagular regios. Tuple placed ito class based o the regio withi which it falls. DT approaches differ i how the tree is built: DT Iductio Iteral odes associated with attribute ad arcs with values for that attribute. Algorithms: ID3, C4.5, CART Pretice Hall 6 1
Decisio Tree Give: D = {t 1,, t } where t i =<t i1,, t ih > Database schema cotais {A 1, A 2,,, A h } Classes C={C 1,., C m } Decisio or Classificatio Tree is a tree associated with D such that Each iteral ode is labeled with attribute, A i Each arc is labeled with predicate which ca be applied to attribute at paret Each leaf ode is labeled with a class, C j DT Iductio Pretice Hall 7 Pretice Hall 8 Iformatio Decisio Tree Iductio is ofte based o Iformatio Theory So Pretice Hall 9 Pretice Hall 10 DT Iductio Whe all the marbles i the bowl are mixed up, little iformatio is give. Whe the marbles i the bowl are all from oe class ad those i the other two classes are o either side, more iformatio is give. Use this approach with DT Iductio! Iformatio/Etropy Give probabilities p 1, p 2,.., p s whose sum is 1, Etropy is defied as: Etropy measures the amout of radomess or surprise or ucertaity. Its value is betwee 0 ad 1. Reaches the maximum whe all the probabilities are the same. Goal i classificatio o surprise etropy = 0 Pretice Hall 11 Pretice Hall 12 2
Etropy ID3 Creates tree usig iformatio theory cocepts ad tries to reduce expected umber of compariso.. ID3 chooses split attribute with the highest iformatio gai: Iformatio gai: the differece betwee how much iformatio is eeded to make a correct classificatio before the split versus how much iformatio is eeded after the split. log (1/p) H(p,1-p) Pretice Hall 13 Pretice Hall 14 Height Example Data N a m e G e d e r H e i g h t O u t p u t 1 O u t p u t 2 K r is ti a F 1. 6 m S h o r t M e d iu m J im M 2 m T a ll M e d iu m M a g g ie F 1. 9 m M e d iu m T a ll M a r th a F 1. 8 8 m M e d iu m T a ll S te p h a ie F 1. 7 m S h o r t M e d iu m B o b M 1. 8 5 m M e d iu m M e d iu m K a t h y F 1. 6 m S h o r t M e d iu m D a v e M 1. 7 m S h o r t M e d iu m W o r t h M 2. 2 m T a ll T a ll S te v e M 2. 1 m T a ll T a ll D e b b ie F 1. 8 m M e d iu m M e d iu m T o d d M 1. 9 5 m M e d iu m M e d iu m K im F 1. 9 m M e d iu m T a ll A m y F 1. 8 m M e d iu m M e d iu m W y e t te F 1. 7 5 m M e d iu m M e d iu m Pretice Hall 15 Iformatio Gai Choose geder as the split attribute H(D): etropy before split E(H(D)) : expected etropy after split Iformatio gai = Choose height as the split attribute H(D): etropy before split E(H(D)) : expected etropy after split Iformatio gai = Pretice Hall 16 ID3 Example (Output1) Startig state etropy: 4/15 log(15/4) + 8/15 log(15/8) + 3/15 log(15/3) = 0.4384 Gai usig geder: Female: 3/9 log(9/3)+6/9 log(9/6)=0.2764 Male: 1/6 (log 6/1) + 2/6 log(6/2) + 3/6 log(6/3) = 0.4392 Weighted sum: (9/15)(0.2764) + (6/15)(0.4392) = 0.34152 Gai: 0.4384 0.34152 = 0.09688 Gai usig height: 0.4384 (2/15)(0.301) = 0.3983 Choose height as first splittig attribute Pretice Hall 17 ID3 Example (Output1) Startig state etropy: 4/15 log(15/4) + 8/15 log(15/8) + 3/15 log(15/3) = 0.4384 Gai usig geder: 0.09688 Gai usig height: 0.4384 (2/15)(0.301) = 0.3983 Choose height as first splittig attribute Pretice Hall 18 3
C4.5 ID3 favors attributes with large umber of divisios Improved versio of ID3: Missig Data Cotiuous Data Pruig Rules GaiRatio: C4.5: Example Calculate the GaiRatio for the geder split Etropy associated with the split igorig classes H(9/15, 6/15) = 0.292 The GaiRatio value for the geder attribute 0.09688/0.292 = 0.332 Pretice Hall 19 Pretice Hall 20 C5.0 A commercial versio of C4.5 widely used i may data miig packages. Targeted toward use with large datasets. Produce more accurate rules. Improves o memory usage by 90% Ru much faster tha C4.5 CART Create Biary Tree Uses etropy for best splittig attribute (as with ID3) Formula to choose split poit, s, for ode t: P L,P R probability that a tuple i the traiig set will be o the left or right side of the tree. P(C j t L ), P(C j t R ) :probability that a tuple is i class C j ad i the left (or right) subtree. Pretice Hall 21 Pretice Hall 22 CART Example At the start, there are six choices for split poit (right brach o equality): ϕ(geder)= (Geder)=2(6/15)(9/15)(2/15 + 4/15 + 3/15)=0.224 ϕ(1.6) = 0 ϕ(1.7) = 2(2/15)(13/15)(0 + 8/15 + 3/15) = 0.169 ϕ(1.8) = 2(5/15)(10/15)(4/15 + 6/15 + 3/15) = 0.385 ϕ(1.9) = 2(9/15)(6/15)(4/15 + 2/15 + 3/15) = 0.256 ϕ(2.0) = 2(12/15)(3/15)(4/15 + 8/15 + 3/15) = 0.32 Best split at 1.8 What is ext? Scalable DT Techiques SPRINT Creatio of DTs for large datasets. Based o CART techiques Pretice Hall 23 Pretice Hall 24 4
Next Lecture: Rule-based algorithms Combig techiques Pretice Hall 25 5