Decision Trees & Random Forests

Size: px

Start display at page:

Download "Decision Trees & Random Forests"

Aubrey Griffith
5 years ago
Views:

1 Decision Trees & Random Forests BUGS Meeting Daniel Pimentel-Alarcón Computer Science, GSU

2 Decision Trees Goal: Predict Will I get El Cáncer? Will I develop Diabetes? Is my boyfriend/girlfriend cheating on me? Will my Bacteria develop Antibiotic Resistance?

3 Here is my Data How do I know? Entropy

4 Variable of Interest (age, gender, etc.) (cancer, diabetes, cheater) Samples Other Variables Here is my Data How do I know? Entropy

5 THE FOLLOWING PREVIEW HAS BEEN APPROVED FOR ALL AUDIENCES BY THE MOTION PICTURE ASSOCIATION OF AMERICA INC. THE FILM ADVERTISED HAS BEEN RATED R RESTRICTED UNDER 17 REQUIRES ACCOMPANYING PARENT OR GUARDIAN PARTIAL NUDITY & INFO THEORY

Horse 1 2 3 4 5 6 7 8 Length P(winning) 1/2

0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 3

6 Horse Length P(winning) 1/2 1/4 1/8 1/16 1/32 1/32 1/32 1/32 Message bits Message Info Theory & Entropy A Quick Detour

7 Horse E[Length] P(winning) 1/2 1/4 1/8 1/16 1/32 1/32 1/32 1/32 Message bits Optimal bits Message Info Theory & Entropy A Quick Detour

8 This is what you need to remember: How frequent is this outcome? Amount of information encoded in variable x (x) X H(x) = x Average over all outcomes? p(x)log 1 p(x) How many bits should I spend encoding this outcome? (more likely outcomes get fewer bits) Info Theory & Entropy A Quick Detour

9 Gender PhD? H(x) = X x p(x) log 1 p(x) Info Theory & Entropy Example

10 Gender PhD? p(0) 1/2 15/16 p(1) 1/2 1/16 H(x) = X x p(x) log 1 p(x) = p(0) log 1 p(0) = 1 2 log(2) log(2) = =1 + p(1) log 1 p(1) Info Theory & Entropy Example

11 Gender PhD? p(0) 1/2 15/16 p(1) 1/2 1/16 H(x) 1 H(x) = X x p(x) log 1 p(x) = p(0) log 1 p(0) = 1 2 log(2) log(2) = =1 + p(1) log 1 p(1) Info Theory & Entropy Example

12 Gender PhD? p(0) 1/2 15/16 p(1) 1/2 1/16 H(x) 1 H(x) = X x p(x) log 1 p(x) = p(0) log 1 p(0) + p(1) log 1 p(1) = log(16 15 ) log(16) = (0.093) (4) =0.337 Info Theory & Entropy Example

13 Most informative! Gender PhD? p(0) 1/2 15/16 p(1) 1/2 1/16 H(x) H(x) = X x p(x) log 1 p(x) = p(0) log 1 p(0) + p(1) log 1 p(1) = log(16 15 ) log(16) = (0.093) (4) =0.337 Info Theory & Entropy Example

14 Gender Here is my Data 1) Find Most Informative Variable

15 Gender Male Female Most Informative Variable: First Decision in my Tree Then What?

16 Females Males Gender 2) Split According to Most Informative Variable 3) Find Most Informative Variable in Each Subset

17 Age Income Females Males Gender Repeat 2) Split According to Most Informative Variable 3) Find Most Informative Variable in Each Subset

18 Young Filthy-rich Rich Females Males Old Gender Age Income Repeat 2) Split According to Most Informative Variable 3) Find Most Informative Variable in Each Subset

19 Gender Male Female Age Income Young Old Rich Filthy-rich Each Informative Variable: One Decision in my Tree Then What?

20 Young Filthy-rich Rich Females Males Old Gender Age Income Repeat 2) Split According to Most Informative Variable 3) Find Most Informative Variable in Each Subset When do we stop?

21 Gender PhD? Will Die? H(x) = X x p(x) log 1 p(x) = p(0) log 1 p(0) = 0 log( 1 0 ) + 1 log(1) =0 + p(1) log 1 p(1) p(0) 1/2 15/16 0 p(1) 1/2 1/16 1 H(x) Info Theory & Entropy Extreme Case

22 Gender PhD? Will Die? p(0) 1/2 15/16 0 p(1) 1/2 1/16 1 H(x) = X x p(x) log 1 p(x) = p(0) log 1 p(0) = 0 log( 1 0 ) + 1 log(1) =0 + p(1) log 1 p(1) This Variable Provides No Information! H(x) Info Theory & Entropy Extreme Case

23 Old 1 Young 0 Filthy-rich 1 Rich Females Males Gender Age 0 Income When do we stop? When Variable of Interest is Uninformative in Subset (~zero entropy)

24 Gender Male Female Age Income Young Old Rich Filthy-rich Finally Put Result on my Decision Tree And enjoy! (start predicting)

25 (Didn t I promise partial nudity?) What could possibly go wrong? Overfitting & Bias

26 Overfitting. My tree is accurate, but only for my given data (for which I already know the answer) Not a lot of predictive power. Bias. It may heavily depend on my particular sample. If I add/remove a few people, the result may be very different! What could possibly go wrong? Overfitting & Bias

27 No worries: Random Forests

28 Random Forests Main Idea: Do many decision trees, each with a random subsample aka Bootstrap Bagging

29 Remove 1st 10% Obtain 1 Tree Remove next 10% Obtain 1 Tree Consensus Obtain 1 Tree Remove last 10% Random Forests Main Idea: Do many decision trees, each with a random subsample

30 Remove 1st 10% Obtain 1 Tree Remove next 10% Obtain 1 Tree Consensus Obtain 1 Tree Remove last 10% Random Forests Main Idea: Do many decision trees, each with a random subsample

31 Good for Prediction, but bad for Description. Fast to train, but slow to predict. Poor performance on unbalanced data. Advantages/Disadvantages

32 Neural Networks Regression (Linear, Logistic, Polynomial) Other clustering methods (e.g., subspaces) Alternatives

33 Questions?

VBM683 Machine Learning

VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data