Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Similar documents
Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://

Performance Evaluation and Comparison

Smart Home Health Analytics Information Systems University of Maryland Baltimore County

E. Alpaydın AERFAISS

Stephen Scott.

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

How do we compare the relative performance among competing models?

Hypothesis Evaluation

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Diagnostics. Gad Kimmel

1 Statistical inference for a population mean

Evaluation. Andrea Passerini Machine Learning. Evaluation

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Performance Evaluation and Hypothesis Testing

Linear and Logistic Regression. Dr. Xiaowei Huang

Evaluation requires to define performance measures to be optimized

Lecture Slides for. ETHEM ALPAYDIN The MIT Press,

Evaluation & Credibility Issues

CS 543 Page 1 John E. Boon, Jr.

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Lecture Slides for INTRODUCTION TO. Machine Learning. By: Postedited by: R.

Introduction to Signal Detection and Classification. Phani Chavali

Learning with multiple models. Boosting.

Introduction to Supervised Learning. Performance Evaluation

Data Mining. Chapter 5. Credibility: Evaluating What s Been Learned

Empirical Evaluation (Ch 5)

Machine Learning: Evaluation

Lecture 30. DATA 8 Summer Regression Inference

TDT4173 Machine Learning

Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn. Some overheads from Galit Shmueli and Peter Bruce 2010

Nonparametric tests, Bootstrapping

Chapter 12 - Lecture 2 Inferences about regression coefficient

Significance Tests for Bizarre Measures in 2-Class Classification Tasks

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Introduction to Nonparametric Statistics

Evaluating Hypotheses

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Stats notes Chapter 5 of Data Mining From Witten and Frank

Data Mining and Analysis: Fundamental Concepts and Algorithms

LECTURE 5. Introduction to Econometrics. Hypothesis testing

Machine Learning Linear Classification. Prof. Matteo Matteucci

Empirical Risk Minimization, Model Selection, and Model Assessment

Pointwise Exact Bootstrap Distributions of Cost Curves

Formulas and Tables by Mario F. Triola

Machine Learning 2nd Edi7on

Data Mining und Maschinelles Lernen

Machine Learning. Ensemble Methods. Manfred Huber

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

(Foundation of Medical Statistics)

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms

Chapter ML:II (continued)

PDEEC Machine Learning 2016/17

Performance Measures. Sören Sonnenburg. Fraunhofer FIRST.IDA, Kekuléstr. 7, Berlin, Germany

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Lecture 7: Hypothesis Testing and ANOVA

Modern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K

Bayesian Decision Theory

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

Inferential Statistics

Permutation Tests. Noa Haas Statistics M.Sc. Seminar, Spring 2017 Bootstrap and Resampling Methods

Performance Evaluation

Introduction to Statistical Analysis

Sociology 6Z03 Review II

NeuroComp Machine Learning and Validation

Classifier performance evaluation

Applied Machine Learning Annalisa Marsico

One-way ANOVA. Experimental Design. One-way ANOVA

OHSU OGI Class ECE-580-DOE :Statistical Process Control and Design of Experiments Steve Brainerd Basic Statistics Sample size?

Formulas and Tables. for Elementary Statistics, Tenth Edition, by Mario F. Triola Copyright 2006 Pearson Education, Inc. ˆp E p ˆp E Proportion

CSC314 / CSC763 Introduction to Machine Learning

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

16. Nonparametric Methods. Analysis of ordinal data

Introduction to Logistic Regression

PubH 5450 Biostatistics I Prof. Carlin. Lecture 13

A3. Statistical Inference Hypothesis Testing for General Population Parameters

A Brief Introduction to Adaboost

Module 9: Nonparametric Statistics Statistics (OA3102)

Linear Classifiers: Expressiveness

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

ECE521 Lecture7. Logistic Regression

Introduction to the Analysis of Variance (ANOVA)

Econometrics. 4) Statistical inference

Non-Parametric Statistics: When Normal Isn t Good Enough"

4/6/16. Non-parametric Test. Overview. Stephen Opiyo. Distinguish Parametric and Nonparametric Test Procedures

Table 1: Fish Biomass data set on 26 streams

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

Preliminary Statistics. Lecture 5: Hypothesis Testing

Bagging and Other Ensemble Methods

Classification using stochastic ensembles

Benchmarking Non-Parametric Statistical Tests

Bootstrap, Jackknife and other resampling methods

Lec 1: An Introduction to ANOVA

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Data Mining. CS57300 Purdue University. March 22, 2018

Glossary for the Triola Statistics Series

Machine Learning Concepts in Chemoinformatics

Transcription:

Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml

CHAPTER 14: Assessing and Comparing Classification Algorithms

Introduction Questions: Assessment of the expected error of a learning algorithm: Is the error rate of 1-NN less than 2%? Comparing the expected errors of two algorithms: Is k-nn more accurate than MLP? Training/validation/test sets Resampling methods: K-fold cross-validation 3

Algorithm Preference Criteria (Application-dependent): Misclassification error, or risk (loss functions) Training time/space complexity Testing time/space complexity Interpretability Easy programmability Cost-sensitive learning 4

Resampling and K-Fold Cross-Validation The need for multiple training/validation sets {X i,v i } i : Training/validation sets of fold i K-fold cross-validation: Divide X into k, X i,i=1,...,k T i share K-2 parts 5

5 2 Cross-Validation 5 times 2 fold cross-validation (Dietterich, 1998) 6

Bootstrapping Draw instances from a dataset with replacement Prob that we do not pick an instance after N draws that is, only 36.8% is new! 7

Measuring Error Error rate = # of errors / # of instances = (FN+FP) / N Recall = # of found positives / # of positives = TP / (TP+FN) = sensitivity = hit rate Precision = # of found positives / # of found = TP / (TP+FP) Specificity = TN / (TN+FP) False alarm rate = FP / (FP+TN) = 1 - Specificity 8

ROC Curve 9

Interval Estimation X = { x t } t where x t ~ N ( µ, σ 2 ) m ~ N ( µ, σ 2 /N) 100(1 α) percent confidence interval 10

When σ 2 is not known: 11

Hypothesis Testing Reject a null hypothesis if not supported by the sample with enough confidence X = { x t } t where x t ~ N ( µ, σ 2 ) H 0 : µ = µ 0 vs. H 1 : µ µ 0 Accept H 0 with level of significance α if µ 0 is in the 100(1- α) confidence interval Two-sided test 12

One-sided test: H 0 : µ µ 0 vs. H 1 : µ > µ 0 Accept if Variance unknown: Use t, instead of z. Accept H 0 : µ = µ 0 if 13

Assessing Error: H 0 : p p 0 vs. H 1 : p > p 0 Single training/validation set: Binomial Test If error prob is p 0, prob that there are e errors or less in N validation trials is Accept if this prob is less than 1- α 1 α N=100, e=20 14

Normal Approximation to the Binomial Number of errors X is approx N with mean Np 0 and var Np 0 (1-p 0 ) Accept if this prob for X = e is less than z 1-α 1 α 15

Paired t Test Multiple training/validation sets x t i = 1 if instance t misclassified on fold i Error rate of fold i: With m and s 2 average and var of p i we accept p 0 or less error if is less than t α,k-1 16

Comparing Classifiers: H 0 : µ 0 = µ 1 vs. H 1 : µ 0 µ 1 Single training/validation set: McNemar s Test Under H 0, we expect e 01 = e 10 =(e 01 + e 10 )/2 Accept if < 17

K-Fold CV Paired t Test Use K-fold cv to get K training/validation folds p i1, p i2 : Errors of classifiers 1 and 2 on fold i p i = p i1 p i2 : Paired difference on fold i The null hypothesis is whether p i has mean 0 Accept if in 18

5 2 cv Paired t Test Use 5 2 cv to get 2 folds of 5 tra/val replications (Dietterich, 1998) p (j) i : difference btw errors of 1 and 2 on fold j=1, 2 of replication i=1,...,5 Two-sided test: Accept H 0 : μ 0 = μ 1 if in One-sided test: Accept H 0 : μ 0 μ 1 if < 19

5 2 cv Paired F Test Two-sided test: Accept H 0 : μ 0 = μ 1 if < 20

Comparing L>2 Algorithms: Analysis of Variance (Anova) Errors of L algorithms on K folds We construct two estimators to σ 2. One is valid if H 0 is true, the other is always valid. We reject H 0 if the two estimators disagree. 21

If H 0 is true: 22

Regardless of H 0 our second estimator to σ 2 is the average of group variances s j2 : H 0 : µ 1 = µ 2 = µ 3... = µ L if < 23

Other Tests Range test (Newman-Keuls): Nonparametric tests (Sign test, Kruskal-Wallis) Contrasts: Check if 1 and 2 differ from 3,4, and 5 Multiple comparisons require Bonferroni correction If there are m tests, to have an overall significance of α, each test should have a significance of α/m. Regression: CLT states that the sum of iid variables from any distribution is approximately normal and the preceding methods can be used. Other loss functions? 24