UVA$CS$4501$+$001$/$6501$ $007$ Introduc8on$to$Machine$Learning$ and$data$mining$$ $ Lecture$19:!Decision!Tree!/! Random!Forest!/!

Size: px

Start display at page:

Download "UVA$CS$4501$+$001$/$6501$ $007$ Introduc8on$to$Machine$Learning$ and$data$mining$$ $ Lecture$19:!Decision!Tree!/! Random!Forest!/!"

Silas Blake
6 years ago
Views:

1 YanjunQi/UVACS4501L01L6501L07 UVA$CS$4501$+$001$/$6501$ $007$ Introduc8on$to$Machine$Learning$ and$data$mining$$ $ Lecture$19:DecisionTree/ RandomForest/Ensemble$ YanjunQi/Jane,,PhD UniversityofVirginia Departmentof ComputerScience 11/6/14 1 Wherearewe? FivemajorsecQonsofthiscourse " Regression(supervised) " ClassificaQon(supervised) " Unsupervisedmodels " Learningtheory " Graphicalmodels YanjunQi/UVACS4501L01L6501L07 11/6/14 2

2 YanjunQi/UVACS4501L01L6501L07 Announcements:RoughPlan 11/6/14 3 Announcements:RoughPlan HW5:outthisThursday,dueNov.23 rd midnight 4501$ $001:$Programmingonimagelabelingtask(similartoHW3) 6501$ $007:$Grantproposal/Businessplan If$finishing$both,$5$extra$credits$$ HW6: similartohw4 12samplesquesQonsforthepreparaQonforfinal coveringtopicsa`ermidterm Finalexam: Inclass@Thornton$Hall$E316$ Dec4 th 3:35pmL4:45pm(70mins) YanjunQi/UVACS4501L01L6501L07 11/6/14 4

3 YanjunQi/UVACS4501L01L6501L07 Announcements:RoughPlan HW3Grades+SoluQon/Willbepostedin Collabthisweekend HW4Grades/WillbepostedinCollabthis weekend MidtermGrades+SoluQon/Willbepostedin Collabthisweekend 11/6/14 5 YanjunQi/UVACS4501L01L6501L $ $007$HW5:ProjectProposal/ GrantProposal/BusinessPlan ThefollowingsecQonsareexpectedintheproposal Minimum8pages(latexandwordtemplatewillbeprovided): Abstract/Summary(300wordslimit) IntroducQonofthetargettask PrevioussoluQonforthetargettask WhyyouaretherightpersonforimplemenQngthisplan? Whymachinelearningmethodsaregoodcandidatesforthe targettask Methodyoupropose/Pleaseprovideadiagramofyour proposedsystem Experimentaldesign/detailsofthedata/wheretoget/data staqsqcs/preliminaryresults?preliminaryprototype? References(notincludedinthepagecounts) Budgetplan/Personnelplanarenotrequired 11/6/14 6

4 Wherearewe? FivemajorsecQonsofthiscourse " Regression(supervised) " ClassificaQon(supervised) " Unsupervisedmodels " Learningtheory " Graphicalmodels YanjunQi/UVACS4501L01L6501L07 11/6/14 7 hgp://scikitllearn.org/stable/tutorial/machine_learning_map/ YanjunQi/UVACS4501L01L6501L07 11/6/14 8

YanjunQi/UVACS4501L01L6501L07 ScikitLlearn:Regression

withsgd 11/6/14 9 ScikitLlearn:ClassificaQon

severalbase esqmatorsbuilt withagiven learningalgorithm

5 YanjunQi/UVACS4501L01L6501L07 ScikitLlearn:Regression Linearmodel figedby minimizinga regularized empiricalloss withsgd 11/6/14 9 ScikitLlearn:ClassificaQon YanjunQi/UVACS4501L01L6501L07 Tocombinethe predicqonsof severalbase esqmatorsbuilt withagiven learningalgorithm inordertoimprove generalizability/ robustnessovera singleesqmator.(1) averaging/bagging (2)boosQng approximatethe explicitfeature mappingsthat correspondto certainkernels Linearclassifiers (SVM,logisQc regression )with SGDtraining. 11/6/14 10

6 Kmeans+ GMM nexta`erclassificaqon? YanjunQi/UVACS4501L01L6501L07 Basic PCA BayesLNet HMM 11/6/14 11 Today$ YanjunQi/UVACS4501L01L6501L07 # DecisionTree(DT): # TreerepresentaQon # BriefinformaQontheory # Learningdecisiontrees # Bagging # Randomforests:EnsembleofDT # Moreaboutensemble 11/6/14 12

7 YanjunQi/UVACS4501L01L6501L07 A study comparing Classifiers Proceedingsofthe23rdInternaQonal ConferenceonMachineLearning(ICML`06). 11/6/14 13 A study comparing Classifiers 11binaryclassificaQonproblems/8metrics YanjunQi/UVACS4501L01L6501L07 11/6/14 14

approaches into roughly three major types 1.

, logistic regression, support vector machine, decisiontree 2.

Instance based classifiers - Use observation directly (no models) - e.g.

YanjunQi/UVACS4501L01L6501L07 ADatasetfor classificaqon C3 Output as Discrete Class

8 Wherearewe? ThreemajorsecQonsforclassificaQon We can divide the large variety of classification approaches into roughly three major types 1. Discriminative - directly estimate a decision rule/boundary - e.g., logistic regression, support vector machine, decisiontree 2. Generative: - build a generative statistical model - e.g., naïve bayes classifier, Bayesian networks 3. Instance based classifiers - Use observation directly (no models) - e.g. K nearest neighbors YanjunQi/UVACS4501L01L6501L07 11/6/14 15 C3 YanjunQi/UVACS4501L01L6501L07 ADatasetfor classificaqon C3 Output as Discrete Class Label C 1, C 2,, C L Data/points/instances/examples/samples/records:[rows] Features/a0ributes/dimensions/independent3variables/covariates/predictors/regressors:[columns,exceptthelast] Target/outcome/response/label/dependent3variable:specialcolumntobepredicted[lastcolumn] 11/6/14 16

9 Example YanjunQi/UVACS4501L01L6501L07 C Example: Play Tennis 11/6/14 17 YanjunQi/UVACS4501L01L6501L07 Anatomy$of$a$decision$tree$ Outlook Each$node$is$a$test$on$$ one$feature/asribute$ sunny overcast rain Possibleagributevalues ofthenode Humidity Yes$ Windy high normal false true No$ Yes$ No$ Yes$ Leavesarethe decisions 11/6/14 18

10 YanjunQi/UVACS4501L01L6501L07 Anatomy$of$a$decision$tree$ Samplesize Outlook Each$node$is$a$test$on$$ one$asribute$ Yourdata getssmaller sunny overcast rain Possibleagributevalues ofthenode Humidity Yes$ Windy high normal false true No$ Yes$ No$ Yes$ Leavesarethe decisions 11/6/14 19 sunny ApplyModeltoTestData: To$ play$tennis $or$not.$$ Outlook overcast rain YanjunQi/UVACS4501L01L6501L07 A$new$test$example:$ (Outlook==rain)$and$ (Windy==false)$ $ Pass$it$on$the$tree$ +>$Decision$is$yes. Humidity Yes Windy high normal false true No Yes No Yes 11/6/14 20

11 sunny ApplyModeltoTestData: To$ play$tennis $or$not.$$ Outlook overcast rain YanjunQi/UVACS4501L01L6501L07 (Outlook$==overcast)$$+>$yes$ (Outlook==rain)$and$(Windy==false)$+>yes$ (Outlook==sunny)$and$(Humidity=normal)$+>yes$ Humidity Yes Windy high normal false true No Yes No Yes 11/6/14 21 Decision$trees$ DecisiontreesrepresentadisjuncQonof conjuncqonsofconstraintsontheagribute valuesofinstances. YanjunQi/UVACS4501L01L6501L07 (Outlook ==overcast) OR ((Outlook==rain) and (Windy==false)) OR ((Outlook==sunny) and (Humidity=normal)) => yes play tennis 11/6/14 22

12 RepresentaQon YanjunQi/UVACS4501L01L6501L07 0 A 1 Y=((A$and$B)$or$((not$A)$and$C))$ C B false true false true 11/6/14 23 YanjunQi/UVACS4501L01L6501L07 Sameconcept/differentrepresentaQon 0 A 1 Y=((A$and$B)$or$((not$A)$and$C))$ C B false true false true 0 C 1 B A false 1 A 0 false true true false 11/6/14 24

2L 2+ 2L Thisisbadsplirng thedistribuqonof

13 Which$aSribute$to$select$for$spliang?$ L 8+ 8L 8+ 8L 4+ 4L 4+ 4L 4+ 4L 4+ 4L 2+ 2L 2+ 2L Thisisbadsplirng thedistribuqonof eachclass(notagribute) 11/6/14 25 YanjunQi/UVACS4501L01L6501L07 How$do$we$choose$which$ asribute$to$split$?$ Whichagributeshouldbeusedasthetest? IntuiQvely,youwouldpreferthe onethatseparatesthetraining examplesasmuchaspossible. 11/6/14 26 YanjunQi/UVACS4501L01L6501L07

14 Today$ YanjunQi/UVACS4501L01L6501L07 # DecisionTree(DT): # TreerepresentaQon # BriefinformaQontheory # Learningdecisiontrees # Bagging # Randomforests:EnsembleofDT # Moreaboutensemble 11/6/14 27 InformaQongainisonecriteriato decideontheagributeforsplirng$ Imagine: 1.Someoneisabouttotellyouyourownname 2.Youareabouttoobservetheoutcomeofadiceroll 2.Youareabouttoobservetheoutcomeofacoinflip 3.Youareabouttoobservetheoutcomeofabiasedcoinflip YanjunQi/UVACS4501L01L6501L07 EachsituaQonhaveadifferentamount3of3 uncertainty3astowhatoutcomeyouwillobserve. 11/6/14 28

InformaQon YanjunQi/UVACS4501L01L6501L07 InformaQon: reducqoninuncertainty(amountofsurpriseintheoutcome) 1 IE ( ) = log2 log 2 px ( ) px ( ) = Iftheprobabilityofthiseventhappeningissmallandithappens,

15 InformaQon YanjunQi/UVACS4501L01L6501L07 InformaQon: reducqoninuncertainty(amountofsurpriseintheoutcome) 1 IE ( ) = log2 log 2 px ( ) px ( ) = Iftheprobabilityofthiseventhappeningissmallandithappens, theinformaqonislarge. # Observingtheoutcomeofacoinflip ishead I = log2 1/ 2 = 1 # Observetheoutcomeofadiceis6 I = log2 1/ 6 = /6/14 29 Entropy YanjunQi/UVACS4501L01L6501L07 Theexpected3amount3of3informa9onwhenobservingthe outputofarandomvariablex H( X) = E( I( X)) = p( x) I( x) = p( x)log p( x) i i i i 2 i i IftheXcanhave8outcomesandallareequallylikely H( X ) == 1/8log 1/8= 3 i 2 11/6/14 30

Iftherearekpossible outcomes Entropy$ H( X) log 2 k YanjunQi/UVACS4501L01L6501L07 Equalityholdswhenall outcomesareequallylikely Themoretheprobability distribuqonthedeviates fromuniformity,thelower

16 Iftherearekpossible outcomes Entropy$ H( X) log 2 k YanjunQi/UVACS4501L01L6501L07 Equalityholdswhenall outcomesareequallylikely Themoretheprobability distribuqonthedeviates fromuniformity,thelower theentropy e.g.forarandom 11/6/14 31 binaryvariable Entropy$Lower$$beSer$purity$ Entropymeasuresthepurity YanjunQi/UVACS4501L01L6501L L 8+ 0L ThedistribuQonislessuniform Entropyislower Thenodeispurer 11/6/14 32

17 Informa8on$gain$ YanjunQi/UVACS4501L01L6501L07 IG(X_i,Y)=H(Y)LH(Y X_i) ReducQoninuncertaintybyknowingafeatureX_i InformaQongain: =(informaqonbeforesplit) (informaqona`ersplit) =entropy(parent) [averageentropy(children)] Fixed thelower,the beger(children nodesarepurer) = ForIG,the higher,the beger 11/6/14 33 Condi8onal$entropy$ H(Y ) = p( y i )log 2 p( y i ) i H(Y X ) = p(x j ) H(Y X = x j ) j = p(x j ) p( y i x j )log 2 p( y i x j ) j i YanjunQi/UVACS4501L01L6501L07 11/6/14 34

18 YanjunQi/UVACS4501L01L6501L07 Agributes Labels X1 X2 Y Count T T + 2 T F + 2 F T L 5 F F + 1 Example$ Whichonedowechoose X1orX2? IG(X1,Y) = H(Y) H(Y X1) H(Y) = - (5/10) log(5/10) -5/10log(5/10) = 1 H(Y X1) = P(X1=T)H(Y X1=T) + P(X1=F) H(Y X1=F) = 4/10 (1log log 0) +6/10 (5/6log 5/6 +1/6log1/6) = 0.39 InformaQongain(X1,Y)=1L0.39= /6/14 35 Whichonedowechoose? YanjunQi/UVACS4501L01L6501L07 X1 X2 Y Count T T + 2 T F + 2 F T L 5 F F + 1 X1 X2 Y Count T T + 2 T F + 2 F T L 5 F F + 1 Onebranch Theotherbranch Information gain (X1,Y)= 0.61 Information gain (X2,Y)= 0.12 Pick the variable which provides the most information gain about Y PickX1 $Then$recursively$choose$next$Xi$on$branches 11/6/14 36

19 Decision$Trees$ YanjunQi/UVACS4501L01L6501L07 Caveats:$$Thenumberofpossiblevaluesinfluencesthe informaqongain. Themorepossiblevalues,thehigherthegain(themorelikelyitisto formsmall,butpureparqqons) Other$Purity$(diversity)$measures InformaQonGain Gini(populaQondiversity) wherep mk isproporqonofclasskatnodem ChiLsquareTest 11/6/14 37 Overfiang$ YanjunQi/UVACS4501L01L6501L07 YoucanperfectlyfitDTtoanytrainingdata InstabilityofTrees Highvariance(smallchangesintrainingsetwill resultinchangesoftreemodel) HierarchicalstructureErrorintopsplit propagatesdown Twoapproaches: 1.Stopgrowingthetreewhenfurthersplirngthedatadoesnot yieldanimprovement 2.Growafulltree,thenprunethetree,byeliminaQngnodes. 11/6/14 38

20 FromESLbookCh9: ClassificaQonand RegressionTrees(CART) Par88on$feature$ space$into$set$of$ rectangles$ Fitsimplemodelin eachparqqon Summary:$Decision$trees$ YanjunQi/UVACS4501L01L6501L07 NonLlinearclassifier Easytouse Easytointerpret SuscepQbletooverfirngbutcanbeavoided. 11/6/14 40

21 YanjunQi/UVACS4501L01L6501L07 Today$ # DecisionTree(DT): # TreerepresentaQon # BriefinformaQontheory # Learningdecisiontrees # Bagging # Randomforests:EnsembleofDT # Moreaboutensemble 11/6/14 41 Bagging$ Baggingorbootstrap3aggrega9on33 atechniqueforreducingthevarianceofan esqmatedpredicqonfuncqon. YanjunQi/UVACS4501L01L6501L07 Forinstance,forclassificaQon,acommi0ee3 oftrees Eachtreecastsavoteforthepredictedclass. 11/6/14 42

22 Bootstrap$ YanjunQi/UVACS4501L01L6501L07 Thebasicidea: randomlydrawdatasetswith3replacement(i.e.3allows3duplicates)3 fromthetrainingdata,3eachsamplethe3same3size3as3the3original3 training3set3 3 11/6/14 43 YanjunQi/UVACS4501L01L6501L07 WithvsWithoutReplacement Bootstrapwithreplacementcankeepthe samplingsizethesameastheoriginalsizefor everyrepeatedsampling.thesampleddata groupsareindependentoneachother. Bootstrapwithoutreplacementcannotkeepthe samplingsizethesameastheoriginalsizefor everyrepeatedsampling.thesampleddata groupsaredependentoneachother. 11/6/14 44

23 YanjunQi/UVACS4501L01L6501L07 Bagging Createbootstrapsamples fromthetrainingdata Nexamples Mfeatures... $ 11/6/14 45 BaggingofDTClassifiers$ e.g. YanjunQi/UVACS4501L01L6501L07 Mfeatures Nexamples... $... $ Take$the$ majority$ vote$ i.e.refitthemodelto eachbootstrap dataset,andthen examinethebehavior overtheb replicaqons. 11/6/14 46

24 YanjunQi/UVACS4501L01L6501L07 Bagging$for$Classifica8on$with$0,1$Loss$ ClassificaQonwith0,1loss BaggingagoodclassifiercanmakeitbeSer. Baggingabadclassifiercanmakeitworse. Canunderstandthebaggingeffectintermsofaconsensusof independentweak%leaners%andwisdom%of%crowds% 11/6/14 47 Peculiari8es$ YanjunQi/UVACS4501L01L6501L07 ModelInstabilityisgoodwhenbagging$ Themorevariable(unstable)thebasicmodelis,themore improvementcanpotenqallybeobtained LowLVariabilitymethods(e.g.LDA)improvelessthanHighL Variabilitymethods(e.g.decisiontrees) LoadofRedundancy% Mostpredictorsdoroughly thesamething 11/6/14 48

25 Bagging$:$an$simulated$example N=30trainingsamples, twoclassesandp=5features, EachfeatureN(0,1)distribuQonandpairwisecorrelaQon.95 ResponseYgeneratedaccordingto: Testsamplesizeof2000 FitclassificaQontreestotrainingsetandbootstrapsamples B=200 NoQcethe bootstrap treesare differentthan theoriginal tree Fivefeatures highlycorrelated witheachother Noclear differencewith pickingupwhich featuretosplit Small changesin thetraining setwillresult indifferent tree Butthese treesare actuallyquite similarfor classificaqon

26 ForB>30,moretrees donotimprovethe baggingresults Sincethetrees correlatehighlyto eachotherandgive similarclassificaqons B Consensus:Majorityvote Probability:AveragedistribuQonatterminalnodes ESLbook/Example8.7.1 Bagging Slightlyincreasesmodelspace Cannothelpwheregreaterenlargementof spaceisneeded Baggedtreesarecorrelated UserandomforesttoreducecorrelaQon betweentrees

27 YanjunQi/UVACS4501L01L6501L07 Today$ # DecisionTree(DT): # TreerepresentaQon # BriefinformaQontheory # Learningdecisiontrees # Bagging # Randomforests:specialensembleofDT # Moreaboutensemble 11/6/14 53 Random$forest$classifier$ Randomforestclassifier, anextensiontobagging whichusesdebcorrelated3trees. YanjunQi/UVACS4501L01L6501L07 11/6/14 54

28 RandomForestClassifier Createbootstrapsamples fromthetrainingdata YanjunQi/UVACS4501L01L6501L07 Nexamples Mfeatures... $ 11/6/14 55 RandomForestClassifier YanjunQi/UVACS4501L01L6501L07 Ateachnodewhenchoosingthesplitfeature chooseonlyamongm<mfeatures Nexamples Mfeatures... $ 11/6/14 56

29 Random$Forest$Classifier$ Create$decision$tree$ from$each$bootstrap$sample$ YanjunQi/UVACS4501L01L6501L07 Nexamples Mfeatures... $... $ 11/6/14 57 Random$Forest$Classifier$ YanjunQi/UVACS4501L01L6501L07 Mfeatures Nexamples... $... $ Take$he$ majority$ vote$ 11/6/14 58

Random Forests 1. ForeachofourBbootstrapsamples a. Formatreeinthefollowingmanner i. Givenpdimensions,pickmofthem ii. Splitonlyaccordingtothesemdimensions 1.

30 Random Forests 1. ForeachofourBbootstrapsamples a. Formatreeinthefollowingmanner i. Givenpdimensions,pickmofthem ii. Splitonlyaccordingtothesemdimensions 1. (wewillnotconsidertheotherpbmdimensions) iii. Repeattheabovestepsi&iiforeachsplit 1. Note:wepickadifferentsetofmdimensionsforeachsplit onasingletree YanjunQi/UVACS4501L01L6501L07 Page598L599InESLbook 11/6/14 60

31 Random Forests Randomforestcanbeviewedasarefinementofbaggingwitha tweakofdecorrela8ngthetrees: o Ateachtreesplit,arandomsubsetofmfeaturesoutofallp featuresisdrawntobeconsideredforsplirng SomeguidelinesprovidedbyBreiman,butbecarefultochoose mbasedonspecificproblem: o m=pamountstobagging o m=p/3orlog2(p)forregression o m=sqrt(p)forclassificaqon Random Forests RandomForeststrytoreducecorrelaQon betweenthetrees. Why?

32 Random Forests Assumingeachtreehasvarianceσ 2 IftreesareindependentlyidenQcally distributed,thenaveragevarianceisσ 2 /B Random Forests Assumingeachtreehasvarianceσ 2 IfsimplyidenQcallydistributed,thenaverage varianceis ρrefersto pairwise correlaqon,a posiqvevalue AsB,secondterm 0 Thus,thepairwisecorrelaQonalwaysaffectsthevariance

33 Random Forests Howtodeal? o Ifwereducem(thenumberofdimensionswe actuallyconsider), o thenwereducethepairwisetreecorrelaqon o Thus,variancewillbereduced. Today$ YanjunQi/UVACS4501L01L6501L07 # DecisionTree(DT): # TreerepresentaQon # BriefinformaQontheory # Learningdecisiontrees # Bagging # Randomforests:EnsembleofDT # Moreensemble 11/6/14 66

practice Team Bellkor'sPragmaQc Chaos defeatedtheteam

34 e.g. Ensembles in practice Oct2006L2009 EachraQng/sample: +<user,movie,dateofgrade,grade> Trainingset(100,480,507raQngs) Qualifyingset(2,817,131raQngs)winner Ensemble in practice Team Bellkor'sPragmaQc Chaos defeatedtheteam ensemble bysubmirng just20minutesearlier 1milliondollar TheensembleteamblendersofmulQpledifferentmethods

35 YanjunQi/UVACS4501L01L6501L07 References$$ " $Prof.Tan,Steinbach,Kumar s IntroducQon todatamining slide " HasQe,Trevor,etal.The3elements3of3 sta9s9cal3learning.vol.2.no.1.newyork: Springer,2009. " Dr.OznurTastan sslidesaboutrfanddt 11/6/14 69

UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$17:$Decision(Tree(/(Random( Forest(/(Ensemble$

Dr.YanjunQi/UVACS6316/f15 UVA$CS$6316$$ $Fall$2015$Graduate:$$ Machine$Learning$$ $ $Lecture$17:$DecisionTree/Random Forest/Ensemble$ 10/28/15 Dr.YanjunQi UniversityofVirginia Departmentof ComputerScience