E. Alpaydın AERFAISS

Similar documents
Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://

Lecture Slides for. ETHEM ALPAYDIN The MIT Press,

Combining Classifiers

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Performance Evaluation and Comparison

Smart Home Health Analytics Information Systems University of Maryland Baltimore County

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

Ensemble Based on Data Envelopment Analysis

Machine Learning Basics: Estimators, Bias and Variance

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Non-Parametric Non-Line-of-Sight Identification 1

1 Proof of learning bounds

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Probability Distributions

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

Hypothesis Evaluation

Feature Extraction Techniques

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

Soft-margin SVM can address linearly separable problems with outliers

Department of Electronic and Optical Engineering, Ordnance Engineering College, Shijiazhuang, , China

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Stephen Scott.

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

Estimating Parameters for a Gaussian pdf

e-companion ONLY AVAILABLE IN ELECTRONIC FORM

An improved self-adaptive harmony search algorithm for joint replenishment problems

A Simple Regression Problem

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples

Testing equality of variances for multiple univariate normal populations

Pattern Recognition and Machine Learning. Artificial Neural networks

Supervised Baysian SAR image Classification Using The Full Polarimetric Data

Lecture 9: Multi Kernel SVM

Domain-Adversarial Neural Networks

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms

ESTIMATING AND FORMING CONFIDENCE INTERVALS FOR EXTREMA OF RANDOM POLYNOMIALS. A Thesis. Presented to. The Faculty of the Department of Mathematics

Kernel Methods and Support Vector Machines

Pattern Recognition and Machine Learning. Artificial Neural networks

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

Warning System of Dangerous Chemical Gas in Factory Based on Wireless Sensor Network

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

Soft Computing Techniques Help Assign Weights to Different Factors in Vulnerability Analysis

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

arxiv: v2 [cs.lg] 30 Mar 2017

Evaluation. Andrea Passerini Machine Learning. Evaluation

A Note on the Applied Use of MDL Approximations

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

1 Bounding the Margin

Tracking using CONDENSATION: Conditional Density Propagation

RAFIA(MBA) TUTOR S UPLOADED FILE Course STA301: Statistics and Probability Lecture No 1 to 5

Block designs and statistics

Computational and Statistical Learning Theory

OBJECTIVES INTRODUCTION

A remark on a success rate model for DPA and CPA

Support Vector Machines. Goals for the lecture

Meta-Analytic Interval Estimation for Bivariate Correlations

Estimation of the Population Variance Using. Ranked Set Sampling with Auxiliary Variable

ZISC Neural Network Base Indicator for Classification Complexity Estimation

VI. Backpropagation Neural Networks (BPNN)

Biostatistics Department Technical Report

Multi-view Discriminative Manifold Embedding for Pattern Classification

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

A Semi-Parametric Approach to Account for Complex. Designs in Multiple Imputation

NBN Algorithm Introduction Computational Fundamentals. Bogdan M. Wilamoswki Auburn University. Hao Yu Auburn University

A Unified Approach to Universal Prediction: Generalized Upper and Lower Bounds

Predictive Vaccinology: Optimisation of Predictions Using Support Vector Machine Classifiers

Gene Selection for Colon Cancer Classification using Bayesian Model Averaging of Linear and Quadratic Discriminants

CS Lecture 13. More Maximum Likelihood

Statistical Logic Cell Delay Analysis Using a Current-based Model

Proc. of the IEEE/OES Seventh Working Conference on Current Measurement Technology UNCERTAINTIES IN SEASONDE CURRENT VELOCITIES

Bayes Decision Rule and Naïve Bayes Classifier

Correcting a Significance Test for Clustering in Designs With Two Levels of Nesting

1 Rademacher Complexity Bounds

Inference in the Presence of Likelihood Monotonicity for Polytomous and Logistic Regression

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

What is Probability? (again)

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Best Procedures For Sample-Free Item Analysis

Bayesian Approach for Fatigue Life Prediction from Field Inspection

An Introduction to Meta-Analysis

Evaluation requires to define performance measures to be optimized

Rademacher Complexity Margin Bounds for Learning with a Large Number of Classes

Efficient Filter Banks And Interpolators

Support Vector Machines MIT Course Notes Cynthia Rudin

A note on the multiplication of sparse matrices

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

PAC-Bayes Analysis Of Maximum Entropy Learning

Training an RBM: Contrastive Divergence. Sargur N. Srihari

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

Experimental Design For Model Discrimination And Precise Parameter Estimation In WDS Analysis

State Estimation Problem for the Action Potential Modeling in Purkinje Fibers

Topic 5a Introduction to Curve Fitting & Linear Regression

Multiple Testing Issues & K-Means Clustering. Definitions related to the significance level (or type I error) of multiple tests

The Algorithms Optimization of Artificial Neural Network Based on Particle Swarm

arxiv: v1 [cs.lg] 8 Jan 2019

Transcription:

E. Alpaydın AERFAISS 00

Introduction Questions: Is the error rate of y classifier less than %? Is k-nn ore accurate than MLP? Does having PCA before iprove accuracy? Which kernel leads to highest accuracy with SVM? E. Alpaydın AERFAISS 00 3

Material Training/validation/test sets Resapling ethods Coparing ultiple algoriths on a single data set Coparison on ultiple data sets E. Alpaydın AERFAISS 00 4

Algorith Preference Criteria (Application-dependent): Misclassification error, or risk (loss functions) Training tie/space coplexity Testing tie/space coplexity Interpretability Easy prograability Cost-sensitive learning E. Alpaydın AERFAISS 00 5

Experient Design: Factors and Response Controllable factors: -Learning algorith -Hyperparaeters -Input representation Uncontrollable factors: -Noise in data -Randoness in splitting -Randoness in optiization Arrive to conclusions not affected by chance, i.e., statistically significant. E. Alpaydın AERFAISS 00 6

Strategies of Experientation Response surface design E. Alpaydın AERFAISS 00 7

Basic Principles of Experiental Design. Randoization: Independence of results, unaffected by order. Replication: Average over chance and uncontrollable factors (k-fold cv) 3. Blocking: Reduce or eliinate the variability due to nuisance factors: Paired tests E. Alpaydın AERFAISS 00 8

Guidelines for ML experients A. Ai of the study: Copare hyperparaeters or two or ore algoriths Single/ultiple data sets B. Selection of the response variable Accuracy/precision-recall/loss function Cost-conscious fraework C. Choice of factors and levels What are the factors to be played with? What are the factor levels? E. Alpaydın AERFAISS 00 9

Guidelines (cont d) D. Choice of experiental design Factorial design (grid search) How any replicates? E. Perforing the experient Unbiased in experientation, a separate tester Good code and docuentation F. Statistical Analysis of the Data Hypothesis testing Visualization of results: Histogras, plots G. Conclusions and Recoendations Draw obective conclusions E. Alpaydın AERFAISS 00 0

Splitting Data The need for training, validation, and test sets Training set: Optiize paraeters Validation set: Optiize hyperparaeters Test set: Measure generalization perforance Use data once. E. Alpaydın AERFAISS 00

The need for ultiple training/validation sets { i,v i } i : Training/validation sets of fold i Stratification -fold cross-validation: Divide into k, i,i=,..., T i share - parts Resapling and -Fold Cross-Validation 3 3 T V T V T V E. Alpaydın AERFAISS 00

5 Cross-Validation (Dietterich, 998, Neural Coputation) 5 0 5 0 5 9 5 9 4 4 3 3 V T V T V T V T V T V T E. Alpaydın AERFAISS 00 3

Bootstrapping Draw instances fro a dataset with replaceent Prob that we do not pick an instance after N draws N N e 0. 368 that is, only 36.8% is new! E. Alpaydın AERFAISS 00 4

Making Decisions and Error Classifier predicts + if P(+ x)>q and predicts otherwise E. Alpaydın AERFAISS 00 5

Measures of Perforance E. Alpaydın AERFAISS 00 6

E. Alpaydın AERFAISS 00 7

Precision and Recall fp tp fn Retrieved but not relevant Relevant but not retrieved E. Alpaydın AERFAISS 00 8

ROC Precision/Recall Curves E. Alpaydın AERFAISS 00 9

E. Alpaydın AERFAISS 00 0

Statistics Review: Sapling = { x t } t where x t ~ N ( μ, σ ) ~ N ( μ, σ /N) Iplication for odel cobination Ulaş et al (009), Info Sci E. Alpaydın AERFAISS 00

Interval Estiation 0 95 96 96 0 95 96 96 N z N z P N N P N P N / /...... ~ Z E. Alpaydın AERFAISS 00 00(- α) percent confidence interval

95 0 64 0 95 64 N z P N P N P.... E. Alpaydın AERFAISS 00 3

E. Alpaydın AERFAISS 00 4 N S t N S t P t S N N x S N N N t t, /, / ~ / When σ is not known:

Hypothesis Testing Reect a null hypothesis if not supported by the saple with enough confidence E. Alpaydın AERFAISS 00 5

H 0 : μ = μ 0 vs. H : μ μ 0 Accept H 0 with level of significance α if μ 0 is in the 00(- α) confidence interval Two-sided test Type II error N 0 z z /, / How large a saple? E. Alpaydın AERFAISS 00 6

One-sided test: H 0 : μ μ 0 vs. H : μ > μ 0 Accept if N 0, z Variance unknown: Use t, instead of z Accept H 0 : μ = μ 0 if N 0 S t t /, N, /, N E. Alpaydın AERFAISS 00 7

Assessing Error: H 0 : p p 0 vs. H : p > p 0 Single training/validation set: Binoial Test If error prob is p 0, prob that there are e errors or ore P N N 0 0 e e p p N Reect if this prob is less than α α N=00, e=0 E. Alpaydın AERFAISS 00 8

Noral Approxiation to the Binoial H 0 : < 0 vs. H : > 0 Nuber of errors is approx Noral (CLT) with ean Np 0 and var Np 0 (-p 0 ) z e Np Np 0 0 p 0 ~ Z Reect if z > z α α E. Alpaydın AERFAISS 00 9

t Test Multiple training/validation sets x t i = if instance t isclassified on fold i t Error rate of fold i: xi p i N t With and s average and var of p i, we reect p 0 or less error if p0 ~ t S is greater than t α,- N E. Alpaydın AERFAISS 00 30

Coparing Classifiers: H 0 : μ = μ vs. H : μ μ Single training/validation set: McNear s Test Under H 0, we expect e 0 = e 0 =(e 0 + e 0 )/ Accept if < α, e0 e0 e 0 e 0 ~ E. Alpaydın AERFAISS 00 3

-Fold CV Paired t Test 0 0 0 0 0 i i i i t t t s s p s p H H, /, /, ~ : : in if Accept vs. E. Alpaydın AERFAISS 00 3 Use -fold cv to get training/validation folds p i, p i : Errors of classifiers and on fold i p i = p i p i : Paired difference on fold i The null hypothesis is whether p i has ean 0

5 cv Paired t Test (Dietterich, 998, Neural Coputation) Use 5 cv to get folds of 5 tra/val replications p i () : difference btw errors of and on fold =, of replication i=,...,5 p i p p / s p i i p i 5 i s i p p i / 5 Two-sided test: Accept H 0 : μ 0 = μ if in (-t α/,5,t α/,5 ) One-sided test: Accept H 0 : μ 0 μ if < t α,5 ~ t 5 i i p i E. Alpaydın AERFAISS 00 33

5 cv Paired F Test (Alpaydın, 999, Neural Coputation) 5 i 5 i s p i i ~ F 0, 5 Two-sided test: Reect H 0 : μ 0 = μ if > F α,0,5 E. Alpaydın AERFAISS 00 34

Coparing L> Algoriths: Analysis of Variance (Anova) H : 0 L Errors of L algoriths on folds i We construct two estiators to σ. One is valid if H 0 is true, the other is always valid. We reect H 0,,,..., L, i,..., ~ N if the two estiators disagree. E. Alpaydın AERFAISS 00 35

0 0 L L L L i i SSb H SSb L S L S L H N ~ ~ / ˆ /, ~ : we have is true, So when, naely, is Thus anestiatorof is true If E. Alpaydın AERFAISS 00 36

0 0 L L L L L L i i i i L i i F H F L SSw L SSb L SSw L SSb SSw S SSw L L S S S H,,, : ~ / / / / / ~ ~ ˆ if Reect : group variances average of is the secondestiator to our Regardlessof E. Alpaydın AERFAISS 00 37

ANOVA table E. Alpaydın AERFAISS 00 38 If ANOVA reects, we do pairwise posthoc tests ) ( ~ : vs : 0 L w i i i t t H H

More on Coparing Multiple Populations Range tests: Newan-euls test Contrasts: Check if significant difference between, and 3,4,5. H 0 : (μ + μ )/ = (μ 3 + μ 4 + μ 5 )/3 vs. H : (μ + μ )/ (μ 3 + μ 4 + μ 5 )/3 E. Alpaydın AERFAISS 00 39

MultiTest: Coparison of L> algoriths (Yıldız and Alpaydın, 006, IEEE T Pai) Generate a full ordering using pairwise tests and prior ordering Order algoriths in decreasing order of prior preference (e.g., based on coplexity) For a directed graph using pairwise one-sided tests with i preferred over If the test reects, we add a edge fro i to, to show that is to be preferred over i. E. Alpaydın AERFAISS 00 40

MultiTest: Pseudo-code E. Alpaydın AERFAISS 00 4

MultiTest E. Alpaydın AERFAISS 00 4

E. Alpaydın AERFAISS 00 43

E. Alpaydın AERFAISS 00 44

Nonparaetric Tests If the norality assuption does not hold, it does not ake sense to take or copare averages Coparison of training ties, eory needs, and so on Coparison over ultiple data sets We can use order and rank inforation E. Alpaydın AERFAISS 00 45

Sign test Coparing two algoriths: Sign test: Count how any ties A beats B over N datasets, and check if this could have been by chance if A and B did have the sae error rate Wilcoxon signed rank test E. Alpaydın AERFAISS 00 46

ruskal-wallis Test Coparing ultiple algoriths ruskal-wallis test: Calculate the average rank of all algoriths on M datasets, and check if these could have been by chance if they all had equal error If W reects, we do pairwise posthoc tests Tukey s test: E. Alpaydın AERFAISS 00 47

Critical Difference diagras (Desar, 006, JMLR) Friedan s test followed by Neenyi s posthoc test for pairwise coparisons E. Alpaydın AERFAISS 00 48

Conclusions See first, think later, then test. But always see first. Otherwise you will only see what you were expecting. - Douglas Adas So long and thanks for all the fish Testing is not a separate step done after all runs are copleted, but the whole experiental process should be designed beforehand. E. Alpaydın AERFAISS 00 49

References Alpaydın, E. 00. Introduction to Machine Learning, nd edition, The MIT Press. This presentation is based on Chapter 9 of this book. Desar, J. 006. ``Statistical Coparison of Classifiers over Multiple Data Sets.'' Journal of Machine Learning Research 7: --30. Dietterich, T. G. 998. ``Approxiate Statistical Tests for Coparing Supervised Classification Learning Algoriths.'' Neural Coputation 0: 895--93. Fawcett, T. 006. ``An Introduction to ROC Analysis.'' Pattern Recognition Letters 7: 86--874. Montgoery, D. C. 005. Design and Analysis of Experients. 6th ed., New York: Wiley. Yıldız, O. T., and E. Alpaydın. 006. ``Ordering and Finding the Best of > Supervised Learning Algoriths.'' IEEE Transactions on Pattern Analysis and Machine Intelligence 8: 39--40. E. Alpaydın AERFAISS 00 50