Evaluation. Albert Bifet. April 2012

Similar documents
Concept Drift. Albert Bifet. March 2012

Evaluation requires to define performance measures to be optimized

Efficient Online Evaluation of Big Data Stream Classifiers

Evaluation. Andrea Passerini Machine Learning. Evaluation

How do we compare the relative performance among competing models?

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

Data Mining. Chapter 5. Credibility: Evaluating What s Been Learned

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

ECE 5424: Introduction to Machine Learning

VBM683 Machine Learning

Question. Hypothesis testing. Example. Answer: hypothesis. Test: true or not? Question. Average is not the mean! μ average. Random deviation or not?

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Linear Classifiers: Expressiveness

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

Background to Statistics

4/6/16. Non-parametric Test. Overview. Stephen Opiyo. Distinguish Parametric and Nonparametric Test Procedures

Adaptive Learning and Mining for Data Streams and Frequent Patterns

Machine Learning Linear Classification. Prof. Matteo Matteucci

TDT4173 Machine Learning

Advanced Techniques for Mining Structured Data: Process Mining

Stats Review Chapter 14. Mary Stangler Center for Academic Success Revised 8/16

CSE 258, Winter 2017: Midterm

Resampling Methods. Lukas Meier

Variance Reduction and Ensemble Methods

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

A first model of learning

Performance Evaluation and Comparison

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Lecture 9: Classification, LDA

Fine-grained Photovoltaic Output Prediction using a Bayesian Ensemble

Statistics Primer. ORC Staff: Jayme Palka Peter Boedeker Marcus Fagan Trey Dejong

Boosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13

NON-TRIVIAL APPLICATIONS OF BOOSTING:

Data Exploration vis Local Two-Sample Testing

ECE 5984: Introduction to Machine Learning

Evaluation Strategies

Dynamic Clustering-Based Estimation of Missing Values in Mixed Type Data

Probabilistic Machine Learning. Industrial AI Lab.

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

A Brief Introduction to Adaboost

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

Online Estimation of Discrete Densities using Classifier Chains

MATH 240. Chapter 8 Outlines of Hypothesis Tests

Linear and Logistic Regression. Dr. Xiaowei Huang

Probability and Statistics Notes

Lab #11. Variable B. Variable A Y a b a+b N c d c+d a+c b+d N = a+b+c+d

Classification 1: Linear regression of indicators, linear discriminant analysis

Performance Evaluation and Hypothesis Testing

STAT Section 3.4: The Sign Test. The sign test, as we will typically use it, is a method for analyzing paired data.

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

Data Mining und Maschinelles Lernen

Introduction to Machine Learning CMU-10701

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

Hypothesis Testing One Sample Tests

Holdout and Cross-Validation Methods Overfitting Avoidance

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Section 10.1 (Part 2 of 2) Significance Tests: Power of a Test

CS145: INTRODUCTION TO DATA MINING

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Stochastic Gradient Descent

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

Basic Business Statistics, 10/e

Tanagra Tutorials. The classification is based on a good estimation of the conditional probability P(Y/X). It can be rewritten as follows.

Inference in Regression Analysis

Lecture 9: Classification, LDA

Learning from Time-Changing Data with Adaptive Windowing

Slides for Data Mining by I. H. Witten and E. Frank

Partitioning the Parameter Space. Topic 18 Composite Hypotheses

Stats notes Chapter 5 of Data Mining From Witten and Frank

Using HDDT to avoid instances propagation in unbalanced and evolving data streams

Computational and Statistical Learning Theory

Lecture 30. DATA 8 Summer Regression Inference

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

Lecture 9: Classification, LDA

STATISTICS 4, S4 (4769) A2

POLI 443 Applied Political Research

Chapter 12 - Lecture 2 Inferences about regression coefficient

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5

CHAPTER 10. Regression and Correlation

Evaluating Hypotheses

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Part III: Unstructured Data

1 [15 points] Frequent Itemsets Generation With Map-Reduce

Chapter 14 Combining Models

Empirical Evaluation (Ch 5)

Introduction to Supervised Learning. Performance Evaluation

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

LECTURE 11. Introduction to Econometrics. Autocorrelation

Chart types and when to use them

POLI 443 Applied Political Research

Smart Home Health Analytics Information Systems University of Maryland Baltimore County

CENTRAL TENDENCY (1 st Semester) Presented By Dr. Porinita Dutta Department of Statistics

Lecture 10: Logistic Regression

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

Evaluation methods and decision theory for classification of streaming data with temporal dependence

Channel Coding and Interleaving

The t-statistic. Student s t Test

Transcription:

Evaluation Albert Bifet April 2012

COMP423A/COMP523A Data Stream Mining Outline 1. Introduction 2. Stream Algorithmics 3. Concept drift 4. Evaluation 5. Classification 6. Ensemble Methods 7. Regression 8. Clustering 9. Frequent Pattern Mining 10. Distributed Streaming

Data Streams Big Data & Real Time

Data stream classification cycle 1. Process an example at a time, and inspect it only once (at most) 2. Use a limited amount of memory 3. Work in a limited amount of time 4. Be ready to predict at any point

Evaluation 1. Error estimation: Hold-out or Prequential 2. Evaluation performance measures: Accuracy or κ-statistic 3. Statistical significance validation: MacNemar or Nemenyi test Evaluation Framework

Error Estimation Data available for testing Holdout an independent test set Apply the current decision model to the test set, at regular time intervals The loss estimated in the holdout is an unbiased estimator Holdout Evaluation

1. Error Estimation No data available for testing The error of a model is computed from the sequence of examples. For each example in the stream, the actual model makes a prediction, and then uses it to update the model. Prequential or Interleaved-Test-Then-Train

1. Error Estimation Hold-out or Prequential? Hold-out is more accurate, but needs data for testing. Use prequential to approximate Hold-out Estimate accuracy using sliding windows or fading factors Hold-out or Prequential or Interleaved-Test-Then-Train

2. Evaluation performance measures Predicted Predicted Class+ Class- Total Correct Class+ 75 8 83 Correct Class- 7 10 17 Total 82 18 100 Table : Simple confusion matrix example Accuracy = 75 100 + 10 Arithmetic mean = ( 75 Geometric mean = 100 = 75 83 83 100 + 10 17 17 83 + 10 17 75 10 83 100 = 85% )/2 = 74.59% 17 = 72.90%

2. Performance Measures with Unbalanced Classes Predicted Predicted Class+ Class- Total Correct Class+ 75 8 83 Correct Class- 7 10 17 Total 82 18 100 Table : Simple confusion matrix example Predicted Predicted Class+ Class- Total Correct Class+ 68.06 14.94 83 Correct Class- 13.94 3.06 17 Total 82 18 100 Table : Confusion matrix for chance predictor

2. Performance Measures with Unbalanced Classes Kappa Statistic p 0 : classifier s prequential accuracy p c : probability that a chance classifier makes a correct prediction. κ statistic κ = p 0 p c 1 p c κ = 1 if the classifier is always correct κ = 0 if the predictions coincide with the correct ones as often as those of the chance classifier Forgetting mechanism for estimating prequential kappa Sliding window of size w with the most recent observations

3. Statistical significance validation (2 Classifiers) Classifier A Classifier A Class+ Class- Total Classifier B Class+ c a c+a Classifier B Class- b d b+d Total c+b a+d a+b+c+d M = a b 1 2 /(a + b) The test follows the χ 2 distribution. At 0.99 confidence it rejects the null hypothesis (the performances are equal) if M > 6.635. McNemar test

3. Statistical significance validation (> 2 Classifiers) Two classifiers are performing differently if the corresponding average ranks differ by at least the critical difference CD = q α k(k + 1) 6N k is the number of learners, N is the number of datasets, critical values q α are based on the Studentized range statistic divided by 2. Nemenyi test

3. Statistical significance validation (> 2 Classifiers) Two classifiers are performing differently if the corresponding average ranks differ by at least the critical difference CD = q α k(k + 1) 6N k is the number of learners, N is the number of datasets, critical values q α are based on the Studentized range statistic divided by 2. # classifiers 2 3 4 5 6 7 q 0.05 1.960 2.343 2.569 2.728 2.850 2.949 q 0.10 1.645 2.052 2.291 2.459 2.589 2.693 Table : Critical values for the Nemenyi test

Cost Evaluation Example Accuracy Time Memory Classifier A 70% 100 20 Classifier B 80% 20 40 Which classifier is performing better?

RAM-Hours RAM-Hour Every GB of RAM deployed for 1 hour Cloud Computing Rental Cost Options

Cost Evaluation Example Accuracy Time Memory RAM-Hours Classifier A 70% 100 20 2,000 Classifier B 80% 20 40 800 Which classifier is performing better?

Evaluation 1. Error estimation: Hold-out or Prequential 2. Evaluation performance measures: Accuracy or κ-statistic 3. Statistical significance validation: MacNemar or Nemenyi test 4. Resources needed: time and memory or RAM-Hours Evaluation Framework