Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://

Size: px

Start display at page:

Download "Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://www.cmpe.boun.edu."

Gabriella Fisher
6 years ago
Views:

1 Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2010 h1p://

2 CHAPTER 19: Design and Analysis of Machine Learning Experiments

3 IntroducLon QuesLons: Assessment of the expected error of a learning algorithm: Is the error rate of 1- NN less than 2%? Comparing the expected errors of two algorithms: Is k- NN more accurate than MLP? Training/validaLon/test sets Resampling methods: K- fold cross- validalon 3

4 Algorithm Preference Criteria (ApplicaLon- dependent): MisclassificaLon error, or risk (loss funclons) Training Lme/space complexity TesLng Lme/space complexity Interpretability Easy programmability Cost- sensilve learning 4

5 Factors and Response 5

6 Strategies of ExperimentaLon Response surface design for approximalng and maximizing the response funclon in terms of the controllable factors 6

7 Guidelines for ML experiments A. Aim of the study B. SelecLon of the response variable C. Choice of factors and levels D. Choice of experimental design E. Performing the experiment F. StaLsLcal Analysis of the Data G. Conclusions and RecommendaLons 7

8 Resampling and The need for mullple training/validalon sets {X i,v i } i : Training/validaLon sets of fold i K- fold cross- validalon: Divide X into k, X i,i=1,...,k 8

9 5 2 Cross- ValidaLon 5 Lmes 2 fold cross- validalon (Diejerich, 1998) 9

10 Bootstrapping Draw instances from a dataset with replacement Prob that we do not pick an instance aker N draws that is, only 36.8% is new! 10

11 Measuring Error Error rate = # of errors / # of instances = (FN+FP) / N Recall = # of found posilves / # of posilves = TP / (TP+FN) = sensilvity = hit rate Precision = # of found posilves / # of found = TP / (TP+FP) Specificity = TN / (TN+FP) False alarm rate = FP / (FP+TN) = 1 - Specificity 11

12 ROC Curve 12

13 13

14 Precision and Recall 14

15 Interval EsLmaLon X = { x t } t where x t ~ N ( μ, σ 2 ) m ~ N ( μ, σ 2 /N) 100(1- α) percent confidence interval 15

16 When σ 2 is not known: 16

17 Hypothesis TesLng Reject a null hypothesis if not supported by the sample with enough confidence X = { x t } t where x t ~ N ( μ, σ 2 ) H 0 : μ = μ 0 vs. H 1 : μ μ 0 Accept H 0 with level of significance α if μ 0 is in the 100(1- α) confidence interval Two- sided test 17

18 One- sided test: H 0 : μ μ 0 vs. H 1 : μ > μ 0 Accept if Variance unknown: Use t, instead of z Accept H 0 : μ = μ 0 if 18

19 Assessing Error: H 0 : p p 0 vs. H 1 : p > p 0 Single training/validalon set: Binomial Test If error prob is p 0, prob that there are e errors or less in N validalon trials is Accept if this prob is less than 1- α 1- α N=100, e=20 19

20 Normal ApproximaLon to the Binomial Number of errors X is approx N with mean Np 0 and var Np 0 (1- p 0 ) Accept if this prob for X = e is less than z 1- α 1- α 20

21 t Test MulLple training/validalon sets x t i = 1 if instance t misclassified on fold i Error rate of fold i: With m and s 2 average and var of p i, we accept p 0 or less error if 21

22 Comparing Classifiers: Single training/validalon set: McNemar s Test Under H 0, we expect e 01 = e 10 =(e 01 + e 10 )/2 Accept if < X 2 α,1 22

23 K- Fold CV Paired t Test Use K- fold cv to get K training/validalon folds p i1, p i2 : Errors of classifiers 1 and 2 on fold i p i = p 1 i p 2 i : Paired difference on fold i The null hypothesis is whether p i has mean 0 23

24 5 2 cv Paired t Test Use 5 2 cv to get 2 folds of 5 tra/val replicalons (Diejerich, 1998) p i (j) : difference btw errors of 1 and 2 on fold j=1, 2 of replicalon i=1,...,5 Two- sided test: Accept H 0 : μ 0 = μ 1 if in (- t α/2,5,t α/2,5 ) One- sided test: Accept H 0 : μ 0 μ 1 if < t α,5 24

25 5 2 cv Paired F Test Two- sided test: Accept H 0 : μ 0 = μ 1 if < F α,10,5 25

26 Comparing L>2 Algorithms: Errors of L algorithms on K folds We construct two eslmators to σ 2. One is valid if H 0 is true, the other is always valid. We reject H 0 if the two eslmators disagree. 26

27 27

28 28

29 ANOVA table If ANOVA rejects, we do pairwise posthoc tests 29

30 Comparison over MulLple Datasets Comparing two algorithms: Sign test: Count how many Lmes A beats B over N datasets, and check if this could have been by chance if A and B did have the same error rate Comparing mullple algorithms Kruskal- Wallis test: Calculate the average rank of all algorithms on N datasets, and check if these could have been by chance if they all had equal error If KW rejects, we do pairwise posthoc tests to find which ones have significant rank difference 30

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 14: Assessing and Comparing Classification Algorithms