UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$17:$Decision(Tree(/(Random( Forest(/(Ensemble$

Size: px

Start display at page:

Download "UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$17:$Decision(Tree(/(Random( Forest(/(Ensemble$"

Loraine Wright
6 years ago
Views:

1 Dr.YanjunQi/UVACS6316/f15 UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$17:$DecisionTree/Random Forest/Ensemble$ 10/28/15 Dr.YanjunQi UniversityofVirginia Departmentof ComputerScience 1 Wherearewe?! FivemajorsecOonsofthiscourse " Regressionsupervised) " ClassificaOonsupervised) " Unsupervisedmodels " Learningtheory " Graphicalmodels Dr.YanjunQi/UVACS6316/f15 10/28/15 2

2 Dr.YanjunQi/UVACS6316/f15 hup://scikitxlearn.org/stable/tutorial/machine_learning_map/ ChoosingtherightesOmator 10/28/15 3 YanjunQi/UVACS4501X01X6501X07 ScikitXlearn:Regression Linearmodel ﬁUedby minimizinga regularized empiricalloss withsgd 10/28/15 4

3 ScikitXlearn:ClassificaOon Dr.YanjunQi/UVACS6316/f15 Tocombinethe predicoonsof severalbase esomatorsbuilt withagiven learningalgorithm inordertoimprove generalizability/ robustnessovera singleesomator.1) averaging/bagging 2)boosOng approximatethe explicitfeature mappingsthat correspondto certainkernels Linearclassifiers SVM,logisOc regression )with SGDtraining. 10/28/15 5 Kmeans+ GMM nextagerclassificaoon? Dr.YanjunQi/UVACS6316/f15 Basic PCA BayesXNet HMM 10/28/15 6

4 Today$ Dr.YanjunQi/UVACS6316/f15 # DecisionTreeDT): # TreerepresentaOon # BriefinformaOontheory # Learningdecisiontrees # Bagging # Randomforests:EnsembleofDT # Moreaboutensemble 10/28/15 7 Dr.YanjunQi/UVACS6316/f15 A study comparing Classifiers Proceedingsofthe23rdInternaOonal ConferenceonMachineLearningICML`06). 10/28/15 8

A study comparing Classifiers! 11binaryclassificaOonproblems/8metrics Dr.

! ThreemajorsecOonsforclassificaOon We can divide the large variety of

Discriminative - directly estimate a decision rule/boundary - e.g.

Generative: - build a generative statistical model - e.g., naïve bayes classifier, Bayesian networks 3.

5 A study comparing Classifiers! 11binaryclassificaOonproblems/8metrics Dr.YanjunQi/UVACS6316/f15 Top8 Models 10/28/15 9 Wherearewe?! ThreemajorsecOonsforclassificaOon We can divide the large variety of classification approaches into roughly three major types 1. Discriminative - directly estimate a decision rule/boundary - e.g., logistic regression, support vector machine, decisiontree 2. Generative: - build a generative statistical model - e.g., naïve bayes classifier, Bayesian networks 3. Instance based classifiers - Use observation directly no models) - e.g. K nearest neighbors Dr.YanjunQi/UVACS6316/f15 10/28/15 10

classificaoon C3 Output as Discrete Class

6 C3 Dr.YanjunQi/UVACS6316/f15 ADatasetfor classificaoon C3 Output as Discrete Class Label C 1, C 2,, C L Data/points/instances/examples/samples/records:[rows] Features/a0ributes/dimensions/independent3variables/covariates/predictors/regressors:[columns,exceptthelast] Target/outcome/response/label/dependent3variable:specialcolumntobepredicted[lastcolumn] 10/28/15 11 Example Dr.YanjunQi/UVACS6316/f15 C Example: Play Tennis 10/28/15 12

7 Dr.YanjunQi/UVACS6316/f15 Anatomy$of$a$decision$tree$ Outlook Each$node$is$a$test$on$$ one$feature/akribute$ sunny overcast rain PossibleaUributevalues ofthenode Humidity Yes$ Windy high normal false true No$ Yes$ No$ Yes$ Leavesarethe decisions 10/28/15 13 Dr.YanjunQi/UVACS6316/f15 Anatomy$of$a$decision$tree$ Samplesize Outlook Each$node$is$a$test$on$$ one$akribute$ Yourdata getssmaller sunny overcast rain PossibleaUributevalues ofthenode Humidity Yes$ Windy high normal false true No$ Yes$ No$ Yes$ Leavesarethe decisions 10/28/15 14

YanjunQi/UVACS6316/f15 Humidity Yes Windy high normal false true No Yes No Yes 10/28/15 15 $$ Outlook overcast rain Dr.

8 sunny ApplyModeltoTestData: To$ play$tennis $or$not.$$ Outlook overcast rain A$new$test$example:$ Outlook==rain)$and$ Windy==false)$ $ Pass$it$on$the$tree$ X>$Decision$is$yes. Dr.YanjunQi/UVACS6316/f15 Humidity Yes Windy high normal false true No Yes No Yes 10/28/15 15 sunny ApplyModeltoTestData: To$ play$tennis $or$not.$$ Outlook overcast rain Dr.YanjunQi/UVACS6316/f15 Outlook$==overcast)$$X>$yes$ Outlook==rain)$and$Windy==false)$X>yes$ Outlook==sunny)$and$Humidity=normal)$X>yes$ Threecasesof YES Humidity Yes Windy high normal false true No Yes No Yes 10/28/15 16

9 Decision$trees$ DecisiontreesrepresentadisjuncOonof conjuncoonsofconstraintsontheauribute valuesofinstances. Dr.YanjunQi/UVACS6316/f15 Outlook ==overcast) OR Outlook==rain) and Windy==false)) OR Outlook==sunny) and Humidity=normal)) => yes play tennis 10/28/15 17 RepresentaOon Dr.YanjunQi/UVACS6316/f15 0 A 1 Y=A$and$B)$or$not$A)$and$C))$ C B false true false true 10/28/15 18

10 Dr.YanjunQi/UVACS6316/f15 Sameconcept/differentrepresentaOon 0 A 1 Y=A$and$B)$or$not$A)$and$C))$ C B false true false true 0 C 1 B A false 1 A 0 false true true false 10/28/15 19 Dr.YanjunQi/UVACS6316/f15 Which$aKribute$to$select$for$spli]ng?$ 2+ 2X 4+ 4X 8+ 8X 2+ 2X 4+ 4X X 4+ 4X 8+ 8X 4+ 4X Thisisbadspliqng thedistribuoonof eachclassnotauribute) 10/28/15 20

YanjunQi/UVACS6316/f15 10/28/15 21 Today$ Dr.

11 How$do$we$choose$which$ akribute$to$split$?$ WhichaUributeshouldbeusedasthetest? IntuiOvely,youwouldpreferthe onethatseparatesthetraining examplesasmuchaspossible. Dr.YanjunQi/UVACS6316/f15 10/28/15 21 Today$ Dr.YanjunQi/UVACS6316/f15 # DecisionTreeDT): # TreerepresentaOon # BriefinformaOontheory # Learningdecisiontrees # Bagging # Randomforests:EnsembleofDT # Moreaboutensemble 10/28/15 22

InformaOongainisonecriteriato decideonwhichauributeforspliqng$ Imagine: 1.Someoneisabouttotellyouyourownname 2.Youareabouttoobservetheoutcomeofadiceroll 2.Youareabouttoobservetheoutcomeofacoinflip 3.

12 InformaOongainisonecriteriato decideonwhichauributeforspliqng$ Imagine: 1.Someoneisabouttotellyouyourownname 2.Youareabouttoobservetheoutcomeofadiceroll 2.Youareabouttoobservetheoutcomeofacoinflip 3.Youareabouttoobservetheoutcomeofabiasedcoinflip Dr.YanjunQi/UVACS6316/f15 EachsituaOonhaveadifferentamount3of3 uncertainty3astowhatoutcomeyouwillobserve. 10/28/15 23 InformaOon Dr.YanjunQi/UVACS6316/f15 InformaOon:!ReducOoninuncertaintyamountofsurpriseintheoutcome) 1 IE ) = log2 log 2 px ) px ) = Iftheprobabilityofthiseventhappeningissmallandithappens, theinformaoonislarge. # Observingtheoutcomeofacoinflip ishead I = log2 1/ 2 = 1 # Observetheoutcomeofadiceis6 I = log2 1/ 6 = /28/15 24

13 Entropy Dr.YanjunQi/UVACS6316/f15 Theexpected3amount3of3informa9onwhenobservingthe outputofarandomvariablex H X) = E I X)) = p x) I x) = p x)log p x) i i i i 2 i i IftheXcanhave8outcomesandallareequallylikely H X ) == 1/8log 1/8= 3 i 2 10/28/15 25 Iftherearekpossible outcomes Entropy$ H X) log 2 k Dr.YanjunQi/UVACS6316/f15 Equalityholdswhenall outcomesareequallylikely Themoretheprobability distribuoonthedeviates fromuniformity,thelower theentropy e.g.forarandom 10/28/15 26 binaryvariable

14 Entropy$Lower$!$beKer$purity$ Entropymeasuresthepurity Dr.YanjunQi/UVACS6316/f X 8+ 0X ThedistribuOonislessuniform Entropyislower Thenodeispurer 10/28/15 27 Informa`on$gain$ IGX,Y)=HY)XHY X) ReducOoninuncertaintyofYbyknowingafeature variablex InformaOongain: =informaoonbeforesplit) informaoonagersplit) =entropyparent) [averageentropychildren)] Fixed thelower,the beuerchildren nodesarepurer) = Dr.YanjunQi/UVACS6316/f15 ForIG,the higher,the beuer 10/28/15 28

15 Condi`onal$entropy$ HY ) = p y i )log 2 p y i ) i HY X ) = px j ) HY X = x j ) j = px j ) p y i x j )log 2 p y i x j ) j i Dr.YanjunQi/UVACS6316/f15 HY X = x j ) = p y i x j )log 2 p y i x j ) i 10/28/15 29 AUributes Labels X1 X2 Y Count T T + 2 T F + 2 F T X 5 F F + 1 Example$ Whichonedowechoose X1orX2? Dr.YanjunQi/UVACS6316/f15 IGX1,Y) = HY) HY X1) HY) = - 5/10) log5/10) -5/10log5/10) = 1 HY X1) = PX1=T)HY X1=T) + PX1=F) HY X1=F) = 4/10 1log log 0) +6/10 5/6log 5/6 +1/6log1/6) = 0.39 InformaOongainX1,Y)=1X0.39= /28/15 30

16 Dr.YanjunQi/UVACS6316/f15 Whichonedowechoose? X1 X2 Y Count T T + 2 T F + 2 F T X 5 F F + 1 X1 X2 Y Count T T + 2 T F + 2 F T X 5 F F + 1 Onebranch Theotherbranch Information gain X1,Y)= 0.61 Information gain X2,Y)= 0.12 Pick the variable which provides the most information gain about Y PickX1!$Then$recursively$choose$next$Xi$on$branches 10/28/15 31 Dr.YanjunQi/UVACS6316/f15 10/28/15 32

17 Decision$Trees$ Dr.YanjunQi/UVACS6316/f15 Caveats:$$Thenumberofpossiblevaluesinfluencesthe informaoongain. Themorepossiblevalues,thehigherthegainthemorelikelyitisto formsmall,butpureparooons) Other$Purity$diversity)$measures InformaOonGain GinipopulaOondiversity) wherep mk isproporoonofclasskatnodem ChiXsquareTest 10/28/15 33 Overfi]ng$ Dr.YanjunQi/UVACS6316/f15 YoucanperfectlyfitDTtoanytrainingdata InstabilityofTrees Highvariancesmallchangesintrainingsetwill resultinchangesoftreemodel) Hierarchicalstructure!Errorintopsplit propagatesdown Twoapproaches: 1.Stopgrowingthetreewhenfurtherspliqngthedatadoesnot yieldanimprovement 2.Growafulltree,thenprunethetree,byeliminaOngnodes. 10/28/15 34

18 FromESLbookCh9: ClassificaOonand RegressionTreesCART) Par``on$feature$ space$into$set$of$ rectangles$ Fitsimplemodelin eachparooon Summary:$Decision$trees$ Dr.YanjunQi/UVACS6316/f15 NonXlinearclassifier Easytouse Easytointerpret SuscepObletooverfiqngbutcanbeavoided. 10/28/15 36

19 Decision Tree / Random Forest YanjunQi/UVACS4501X01X6501X07 Task Representation Classification Partition feature space into set of rectangles, local smoothness Score Function Greedy to find partitions Search/Optimization Split Purity measure / e.g. IG / cross-entropy / Gini / Models, Parameters Tree Model s), i.e. space partition 10/28/15 37 Today$ Dr.YanjunQi/UVACS6316/f15 # DecisionTreeDT): # TreerepresentaOon # BriefinformaOontheory # Learningdecisiontrees # Bagging # Randomforests:EnsembleofDT # Moreaboutensemble 10/28/15 38

Dr.YanjunQi/UVACS6316/f15 Bagging$ Baggingorbootstrap3aggrega9on33 atechniqueforreducingthevarianceofan esomatedpredicoonfuncoon.

20 Dr.YanjunQi/UVACS6316/f15 Bagging$ Baggingorbootstrap3aggrega9on33 atechniqueforreducingthevarianceofan esomatedpredicoonfuncoon. Forinstance,forclassificaOon,acommi0ee3 oftrees Eachtreecastsavoteforthepredictedclass. 10/28/15 39 Bootstrap$ Dr.YanjunQi/UVACS6316/f15 Thebasicidea: randomlydrawdatasetswith3replacementi.e.3allows3duplicates)3 fromthetrainingdata,3eachsamplethe3same3size3as3the3original3 training3set3 3 10/28/15 40

21 Dr.YanjunQi/UVACS6316/f15 WithvsWithoutReplacement Bootstrapwithreplacementcankeepthe samplingsizethesameastheoriginalsizefor everyrepeatedsampling.thesampleddata groupsareindependentoneachother. Bootstrapwithoutreplacementcannotkeepthe samplingsizethesameastheoriginalsizefor everyrepeatedsampling.thesampleddata groupsaredependentoneachother. 10/28/15 41 Bagging Createbootstrapsamples fromthetrainingdata Dr.YanjunQi/UVACS6316/f15 Nexamples Mfeatures... $ 10/28/15 42

22 BaggingofDTClassifiers$ e.g. Dr.YanjunQi/UVACS6316/f15 Mfeatures Nexamples... $... $ Take$the$ majority$ vote$ i.e.refitthemodelto eachbootstrap dataset,andthen examinethebehavior overtheb replicaoons. 10/28/15 43 Dr.YanjunQi/UVACS6316/f15 Bagging$for$Classifica`on$with$0,1$Loss$ ClassificaOonwith0,1loss BaggingagoodclassifiercanmakeitbeKer. Baggingabadclassifiercanmakeitworse. Canunderstandthebaggingeffectintermsofaconsensusof independentweak%leaners%andwisdom%of%crowds% 10/28/15 44

23 Dr.YanjunQi/UVACS6316/f15 Peculiari`es$ ModelInstabilityisgoodwhenbagging$ Themorevariableunstable)thebasicmodelis,themore improvementcanpotenoallybeobtained LowXVariabilitymethodse.g.LDA)improvelessthanHighX Variabilitymethodse.g.decisiontrees) LoadofRedundancy% Mostpredictorsdoroughly thesamething 10/28/15 45 Bagging$:$an$simulated$example N=30trainingsamples, twoclassesandp=5features, EachfeatureN0,1)distribuOonandpairwisecorrelaOon.95 ResponseYgeneratedaccordingto: Testsamplesizeof2000 FitclassificaOontreestotrainingsetandbootstrapsamples B=200 ESLbook/Example8.7.1

24 NoOcethe bootstrap treesare differentthan theoriginal tree ESLbook/Example8.7.1 Fivefeatures highlycorrelated witheachother!noclear differencewith pickingupwhich featuretosplit! Small changesin thetraining setwillresult indifferent tree! Butthese treesare actuallyquite similarfor classificaoon! ForB>30,moretrees donotimprovethe baggingresults! Sincethetrees correlatehighlyto eachotherandgive similarclassificaoons B Consensus:Majorityvote Probability:AveragedistribuOonatterminalnodes ESLbook/Example8.7.1

25 Bagging Slightlyincreasesmodelspace Cannothelpwheregreaterenlargementof spaceisneeded Baggedtreesarecorrelated UserandomforesttoreducecorrelaOon betweentrees Today$ Dr.YanjunQi/UVACS6316/f15 # DecisionTreeDT): # TreerepresentaOon # BriefinformaOontheory # Learningdecisiontrees # Bagging # Randomforests:specialensembleofDT # Moreaboutensemble 10/28/15 50

26 Random$forest$classifier$ Randomforestclassifier, anextensiontobagging whichusesdebcorrelated3trees. Dr.YanjunQi/UVACS6316/f15 10/28/15 51 RandomForestClassifier Createbootstrapsamples fromthetrainingdata Dr.YanjunQi/UVACS6316/f15 Nexamples Mfeatures... $ 10/28/15 52

27 RandomForestClassifier Dr.YanjunQi/UVACS6316/f15 Ateachnodewhenchoosingthesplitfeature chooseonlyamongm<mfeatures Nexamples Mfeatures... $ 10/28/15 53 Random$Forest$Classifier$ Create$decision$tree$ from$each$bootstrap$sample$ Dr.YanjunQi/UVACS6316/f15 Nexamples Mfeatures... $... $ 10/28/15 54

Random$Forest$Classifier$ Dr.YanjunQi/UVACS6316/f15 Nexamples Mfeatures... $... $ Take$he$ majority$ vote$ 10/28/15 55 Random Forests 1. ForeachofourBbootstrapsamples a.

28 Random$Forest$Classifier$ Dr.YanjunQi/UVACS6316/f15 Nexamples Mfeatures... $... $ Take$he$ majority$ vote$ 10/28/15 55 Random Forests 1. ForeachofourBbootstrapsamples a. Formatreeinthefollowingmanner i. Givenpdimensions,pickmofthem ii. Splitonlyaccordingtothesemdimensions 1. wewillnotconsidertheotherpbmdimensions) iii. Repeattheabovestepsi&iiforeachsplit 1. Note:wepickadifferentsetofmdimensionsforeachsplit onasingletree

Dr.YanjunQi/UVACS6316/f15 Page598X599InESLbook 10/28/15 57 Random Forests Randomforestcanbeviewedasarefinementofbaggingwitha tweakofdecorrela`ngthetrees: o

29 Dr.YanjunQi/UVACS6316/f15 Page598X599InESLbook 10/28/15 57 Random Forests Randomforestcanbeviewedasarefinementofbaggingwitha tweakofdecorrela`ngthetrees: o Ateachtreesplit,arandomsubsetofmfeaturesoutofallp featuresisdrawntobeconsideredforspliqng SomeguidelinesprovidedbyBreiman,butbecarefultochoose mbasedonspecificproblem: o m=pamountstobagging o m=p/3orlog2p)forregression o m=sqrtp)forclassificaoon

30 Why correlated trees are not ideal? RandomForeststrytoreducecorrelaOon betweenthetrees. Why? Why correlated trees are not ideal? Assumingeachtreehasvarianceσ 2 IftreesareindependentlyidenOcally distributed,thenaveragevarianceisσ 2 /B

31 Why correlated trees are not ideal? Assumingeachtreehasvarianceσ 2 IfsimplyidenOcallydistributed,thenaverage varianceis ρrefersto pairwise correlaoon,a posiovevalue AsB,secondterm 0 Thus,thepairwisecorrelaOonalwaysaffectsthevariance Why correlated trees are not ideal? Howtodeal? o Ifwereducemthenumberofdimensionswe actuallyconsider), o thenwereducethepairwisetreecorrelaoon o Thus,variancewillbereduced.

32 Dr.YanjunQi/UVACS6316/f15 Today$ # DecisionTreeDT): # TreerepresentaOon # BriefinformaOontheory # Learningdecisiontrees # Bagging # Randomforests:EnsembleofDT # Moreensemble 10/28/15 63 e.g. Ensembles in practice Oct2006X2009 EachraOng/sample: +<user,movie,dateofgrade,grade> Trainingset100,480,507raOngs) Qualifyingset2,817,131raOngs)!winner

Ensemble in practice Team Bellkor'sPragmaOc Chaos defeatedtheteam ensemble bysubmiqng just20minutesearlier!! 1milliondollar! Theensembleteam!blendersofmulOpledifferentmethods References$$ Dr.

33 Ensemble in practice Team Bellkor'sPragmaOc Chaos defeatedtheteam ensemble bysubmiqng just20minutesearlier!! 1milliondollar! Theensembleteam!blendersofmulOpledifferentmethods References$$ Dr.YanjunQi/UVACS6316/f15 " $Prof.Tan,Steinbach,Kumar s IntroducOon todatamining slide " HasOe,Trevor,etal.The3elements3of3 sta9s9cal3learning.vol.2.no.1.newyork: Springer,2009. " Dr.OznurTastan sslidesaboutrfanddt 10/28/15 66

UVA$CS$4501$+$001$/$6501$ $007$ Introduc8on$to$Machine$Learning$ and$data$mining$$ $ Lecture$19:!Decision!Tree!/! Random!Forest!/!

YanjunQi/UVACS4501L01L6501L07 UVA$CS$4501$+$001$/$6501$ $007$ Introduc8on$to$Machine$Learning$ and$data$mining$$ $ Lecture$19:DecisionTree/ RandomForest/Ensemble$ YanjunQi/Jane,,PhD UniversityofVirginia