Hypothesis Evaluation
|
|
- Anis Lindsay Young
- 5 years ago
- Views:
Transcription
1 Hypothesis Evaluation Machine Learning Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
2 Table of contents 1 Introduction 2 Some performance measures of classifiers 3 Evaluating the performance of a classifier 4 Estimating true error 5 Confidence intervals 6 Paired t Test Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
3 Outline 1 Introduction 2 Some performance measures of classifiers 3 Evaluating the performance of a classifier 4 Estimating true error 5 Confidence intervals 6 Paired t Test Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
4 Introduction 1 Machine Learning algorithms induce hypothesis that depend on the training set, and there is a need for statistical testing to Asses expected performance of a hypothesis and Compare expected Performances of two hypothesis to compare them. 2 Classifier evaluation criteria Accuracy (or Error) The ability of a hypothesis to correctly predict the label of new/previously unseen data. Speed The computational costs involved in generating and using a hypothesis. Robustness The ability of a hypothesis to make correct predictions given noisy data or data with missing values. Scalability The ability to construct a hypothesis efficiently given large amounts of data. Interpretability The level of understanding and insight that is provided by the hypothesis. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
5 Evaluating the accuracy/error of classifiers 1 Given the observed accuracy of a hypothesis over a limited sample data, how well does this estimate its accuracy over additional examples. 2 Given one hypothesis outperforms another over sample data, how probable is that this hypothesis is more accurate in general. 3 When data is limited what is the best way to use this data to learn a hypothesis and estimate its accuracy. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
6 Measuring performance of a classifier 1 Measuring performance of a hypothesis is partitioning data to Training set Validation set different from training set. Test set different from training and validation sets. 2 Problems with this approach Training and validation sets may be small and may contain exceptional instances such as noise, which may mislead us. The learning algorithm may depend on other random factors affecting the accuracy such as initial weights of a neural network trained with BP. We must train/test several times and average the results. 3 Important points Performance of a hypothesis estimated using training set conditioned on the used data set and cant used to compare algorithms in domain independent ways. Validation set is used for model selection, comparing two algorithms, and decide to stop learning. In order to report the expected performance, we should use a separate test set unused during learning. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
7 Error of a classifier Definition (Sample error) The sample error (denoted E E (h)) of hypothesis h with respect to target concept c and data sample S of size N is. Definition (True error) E E (h) = 1 I[c(x) h(x)] N x S The true error (denoted E(h)) of hypothesis h with respect to target concept c and distribution D is the probability that h will misclassify an instance drawn at random according to distribution D. E(h) = P x S [c(x) h(x)] Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
8 Notions of errors 1 True error is Instance space X c h 2 Our concern How we can estimate the true error (E(h)) of hypothesis using its sample error (E E (h))? Can we bound true error of h given sample error of h? Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
9 Outline 1 Introduction 2 Some performance measures of classifiers 3 Evaluating the performance of a classifier 4 Estimating true error 5 Confidence intervals 6 Paired t Test Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
10 Some performance measures of classifiers 1 Error rate The error rate is the fraction of incorrect predictions for the classifier over the testing set, defined as E E (h) = 1 I[c(x) h(x)] N x S Error rate is an estimate of the probability of misclassification. 2 Accuracy The accuracy of a classifier is the fraction of correct predictions over the testing set: Accuracy(h) = 1 I[c(x) = h(x)] = 1 E E (h) N x S Accuracy gives an estimate of the probability of a correct prediction. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
11 Some performance measures of classifiers 1 What you can say about the accuracy of 90% or the error of 10%? 2 For example, if 3 4% of examples are from negative class, clearly accuracy of 90% is not acceptable. 3 Confusion matrix Actual label (+) ( ) Predicted label (+) ( ) TP FN FP TN 4 Given C classes, a confusion matrix is a table of C C. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
12 Some performance measures of classifiers 1 Precision (Positive predictive value) Precision is proportion of predicted positives which are actual positive and defined as Precision(h) = TP TP + FP 2 Recall (Sensitivity) Recall is proportion of actual positives which are predicted positive and defined as Recall(h) = TP TP + FN 3 Specificity Specificity is proportion of actual negative which are predicted negative and defined as Specificity(h) = TN TN + FP Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
13 Some performance measures of classifiers 1 Balanced classification rate (BCR) Balanced classification rate provides an average of recall (sensitivity) and specificity, it gives a more precise picture of classifier effectiveness. Balanced classification rate defined as BCR(h) = 1 [Specificity(h) + Recall(h)] 2 2 F-measure F-measure is harmonic mean between precision and recall and defined as F Measure(h) = 2 Persicion(h) Recall(h) Persicion(h) + Recall(h) Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
14 Outline 1 Introduction 2 Some performance measures of classifiers 3 Evaluating the performance of a classifier 4 Estimating true error 5 Confidence intervals 6 Paired t Test Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
15 Evaluating the performance of a classifier 1 Hold-out method 2 Random Sub-sampling 3 Cross validation method 4 Leave-one-out method Cross validation method 6 Bootstrapping method Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
16 4 Hold-out method 1 Hold-out Hold-out partitions the given data into two independent sets : training and test sets. The holdout method Split dataset into two groups Training set: used to train the classifier Test set: used to estimate the error rate of the trained classifier Total number of examples Training Set Test Set A typical application the holdout method is determining a stopping point for the back propagation error Typically two-thirds of the data are allocated to the training set and the remaining one-third is allocated to the test set. The training set is used to drive the model. The test set is used to estimate the accuracy of the model. MSE Test set error Stopping point Training set error Epochs Intelligent Sensor Systems Ricardo Gutierrez-Osuna Wright State University Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
17 6 Random sub-sampling method 1 Random sub-sampling Random sub-sampling is a variation of the hold-out method in which hold-out is repeated k times. Random Subsampling Random Subsampling performs K data splits of the dataset Each split randomly selects a (fixed) no. examples without replacement For each data split we retrain the classifier from scratch with the training examples and estimate E i with the test examples Total number of examples Test example Experiment 1 Experiment 2 Experiment 3 The true error estimate is obtained as the average of the separate estimates E i This estimate is significantly better than the holdout estimate i K 1 The estimated error rate is the average of the error rates for classifiers derived for the independently and randomly generated test partitions. Random subsampling can produce better error estimates than a single train-and-test partition. = 1 E i = E K Intelligent Sensor Systems Ricardo Gutierrez-Osuna Wright State University Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
18 7 Cross validation method 1 K-fold cross validation The initial data are randomly partitioned into K mutually exclusive subsets or folds, S 1, S 2,..., S K, each of approximately equal size.. K-Fold Cross-validation Create a K-fold partition of the the dataset For each of K experiments, use K-1 folds for training and the remaining one for testing Total number of examples Experiment 1 Experiment 2 Experiment 3 Test examples Experiment 4 K-Fold Cross validation is similar to Random Subsampling The advantage of K-Fold Cross validation is that all the examples in the i K 1 Training and testing is performed K times. In iteration k, partition S k is used for test and the remaining partitions collectively used for training. The accuracy is the percentage of the total number of correctly classified test examples. The advantage of K-fold cross validation is that all the examples in the Hamid Beigy (Sharifdataset University of are Technology) eventually used Hypothesis for Evaluation both training and testing. Fall / 31 dataset are eventually used for both training and testing As before, the true error is estimated as the average error rate E i = 1 = E K Intelligent Sensor Systems Ricardo Gutierrez-Osuna Wright State University
19 8 Leave-one-out method i 1 1 Leave-one-out Leave-one-out is a special case of K-fold cross validation where K is set to number of examples in dataset. Leave-one-out Cross Validation Leave-one-out is the degenerate case of K-Fold Cross Validation, where K is chosen as the total number of examples For a dataset with N examples, perform N experiments For each experiment use N-1 examples for training and the remaining example for testing Total number of examples Experiment 1 Experiment 2 Experiment 3 Single test example For a dataset with N examples, perform N experiments. For each experiment use N 1 examples for training and the remaining example for testing Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31 Experiment N As usual, the true error is estimated as the average error rate on test examples N E i = = E 1 N Intelligent Sensor Systems Ricardo Gutierrez-Osuna Wright State University
20 5 2 Cross validation method p1 s 2 1 p2 s 2 2 p3 s Cross validation method repeates five times 2-fold cross validation method. 2 Training and testing is performed 10 times. 3 The estimated error rate is the average of the error rates for classifiers derived for the independently and randomly generated test partitions. 5x2cv F test p (1,1) p (1,1) p (1) A B 1 p4 s 2 4 p (1,2) p (1,2) p (2) A B 1 (2,1) (2,1) (1) p (2,1) A p (2,1) B p (1) 2 p5 s 2 5 (2,2) (2,2) (2) p (2,2) A p (2,2) B p (2) 2 (3,1) (3,1) (1) p (3,1) A p (3,1) B p (1) 3 (3,2) (3,2) (2) p (3,2) A p (3,2) B p (2) 3 p (1) 4 p (4,1) B p (4,1) A (4,2) (4,2) (2) p (4,2) A p (4,2) B p (2) 4 p (5,1) p (5,1) p (1) A B 5 p (5,2) p (5,2) p (2) A B 5 Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
21 Bootstrapping method 1 Bootstrapping The bootstrap uses sampling with replacement to form the training set. Sample a dataset of N instances N times with replacement to form a new dataset of N instances. Use this data as the training set. An instance may occur more than once in the training set. Use the instances from the original dataset that dont occur in the new training set for testing. This method traines classifieron just on 63% of the instances. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
22 Outline 1 Introduction 2 Some performance measures of classifiers 3 Evaluating the performance of a classifier 4 Estimating true error 5 Confidence intervals 6 Paired t Test Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
23 Estimating true error 1 How well does E E (h) estimate E(h)? Bias in the estimate If training / test set is small, then the accuracy of the resulting hypothesis is a poor estimator of its accuracy over future examples. bias = E[E E (h)] E(h). For unbiased estimate, h and S must be chosen independently. Variance in the estimate Even with unbiased S, E E (h) may still vary from E(h). The smaller test set results in a greater expected variance. 2 Hypothesis h misclassifies 12 of the 40 examples in S. What is E(h)? 3 We use the following experiment Choose sample S of size N according to distribution D. Measure E E (h) E E (h) is a random variable (i.e., result of an experiment). E E (h) is an unbiased estimator for E(h). 4 Given observed E E (h), what can we conclude about E(h)? Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
24 Distribution of error 1 E E (h) is a random variable with binomial distribution, for the experiment with different randomly drawn S of size N, the probability of observing r misclassified examples is N! P(r) = r!(n r)! E(h)r [1 E(h)] N r 2 For example for N = 40 and E(h) = p = 0.2, Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
25 Distribution of error 1 For binomial distribution, we have E(r) = Np Var(r) = Np(1 p) We have shown that the random variable E E (h) obeys a Binomial distribution. 2 Let p be the probability that the result of trial is a success. 3 The E E (h) and E(h) are E E (h) = r N E(h) = p where N is the number of instances in the sample S, r is the number of instances from S misclassified by h, and p is the probability of misclassifying a single instance drawn from D. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
26 Distribution of error 1 It can be shown that E E (h) is unbiased estimator for E(h). 2 Since r is Binomially distributed, its variance is Np(1 p). 3 Unfortunately p is unknown, but we can substitute our estimate r N for p. 4 In general, given r errors in a sample of N independently drawn test examples, the standard deviation for E E (h) is given by Var [E E (h)] = p(1 p) N EE (h)(1 E E (h)) N Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
27 Outline 1 Introduction 2 Some performance measures of classifiers 3 Evaluating the performance of a classifier 4 Estimating true error 5 Confidence intervals 6 Paired t Test Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
28 Confidence intervals 1 One common way to describe the uncertainty associated with an estimate is to give an interval within which the true value is expected to fall, along with the probability with which it is expected to fall into this interval. 2 How can we derive confidence intervals for E E (h)? 3 For a given value of M, how can we find the size of the interval that contains M% of the probability mass? 4 Unfortunately, for the Binomial distribution this calculation can be quite tedious. 5 Fortunately, however, an easily calculated and very good approximation can be found in most cases, based on the fact that for sufficiently large sample sizes the Binomial distribution can be closely approximated by the Normal distribution. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
29 Confidence intervals 1 In Normal Distribution N(µ, σ 2 ), M% of area (probability) lies in µ ± z M σ, where M% 50% 68% 80% 90% 95% 98% 99% z M y σ x Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
30 Comparing two hypotheses Test h 1 on sample S 1 and test h 2 on S 2. 1 Pick parameter to estimate d = E(h 1 ) E(h 2 ). 2 Choose an estimator ˆd = E E (h 1 ) E E (h 2 ). 3 Determine probability distribution that governs estimator: 4 E E (h 1 ) and E E (h 2 ) can be approximated by Normal Distribution. 5 Difference of two Normal distributions is also a Normal distribution, ˆd will also approximated by a Normal distribution with mean d and variance of this distribution is equal to sum of variances of E E (h 1 ) and E E (h 2 ). Hence, we have E E (h 1 )(1 E E (h 1 )) Var[ ˆd] = + E E (h 2 )(1 E E (h 2 )) N 1 N 2 6 Find interval (L, U) such that M% of probability mass falls in interval E E (h 1 )(1 E E (h 1 )) ˆd ± z M + E E (h 2 )(1 E E (h 2 )) N 1 N 2 Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
31 Comparing two algorithms For comparing two learning algorithms L A and L B, we would like to estimate E S D [E(L A (S)) E(L B (S))] where L(S) is the hypothesis output by learner L using training set S. This shows the expected difference in true error between hypotheses output by learners L A and L B, when trained using randomly selected training sets S drawn according to distribution D. But, given limited data S 0, what is a good estimator? 1 Could partition S 0 into training set S tr 0 ˆd = E S ts 0 E and test set S ts 0 (L A(h 1 )) E Sts 0 E (L B(h 2 )). where h1 and h 2 are trained using training set S0 tr error using test set S ts 0. and E S ts 0 E 2 Even better, repeat this many times and average the results., and measure is emprical Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
32 Outline 1 Introduction 2 Some performance measures of classifiers 3 Evaluating the performance of a classifier 4 Estimating true error 5 Confidence intervals 6 Paired t Test Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
33 Paired t Test consider the following estimation problem 1 We are given the observed values of a set of independent, identically distributed random variables Y 1, Y 2,..., Y K. 2 We wish to estimate the mean p of the probability distribution governing these Y i. 3 The estimator we will use is the sample mean Ȳ = 1 K K k=1 Y k ithe task is to estimate the sample mean of a collection of independent, identically and Normally distributed random variables. The approximate M% confidence interval for estimating Ȳ is given by where SȲ = Ȳ ± +t M,K 1 SȲ 1 K (Y i K(K 1 Ȳ )2 k=1 Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
34 Paired t Test Values alues of t M,K 1 for two-sided confidence intervals. M 90% 95% 98% 99% K 1 = K 1 = K 1 = As K, t M,K 1 approaches z M. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
35 ROC Curves 1 ROC puts false positive rate (FPR = FP/NEG) on x axis. 2 ROC puts true positive rate (TPR = TP/POS) on y axis. 3 Each classifier represented by a point in ROC space corresponding to its (FPR, TPR) pair TP rate FP rate Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
36 Practical aspects 1 A note on parameter tuning Some learning schemes operate in two stages: Stage 1: builds the basic structure Stage 2: optimizes parameter settings It is important that the test data is not used in any way to create the classifier The test data cant be used for parameter tuning! Proper procedure uses three sets: training data, validation data, and test data Validation data is used to optimize parameters 2 No Free Lunch Theorem For any ML algorithm there exist data sets on which it performs well and there exist data sets on which it performs badly! We hope that the latter sets do not occur too often in real life. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
37 References 1 C. Schaffer, Selecting a Classification Method by cross-validation, Machine Learning, vol. 13, pp , E. Alpaydin, Combined 5x2 CVF Test for Supervised Classification Learning Algrrithm, Neural Computation, Vol. 11, , C. Schaffer, Multiple Comparisons in Induction Algorithms, Machine Learning, vol. 38, pp , Tom Fawcett, An introduction to ROC analysis, Pattern Recognition Letters 27 (2006) Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31
Performance Evaluation and Hypothesis Testing
Performance Evaluation and Hypothesis Testing 1 Motivation Evaluating the performance of learning systems is important because: Learning systems are usually designed to predict the class of future unlabeled
More informationEvaluating Hypotheses
Evaluating Hypotheses IEEE Expert, October 1996 1 Evaluating Hypotheses Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution,
More informationStephen Scott.
1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training
More informationPerformance Evaluation and Comparison
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation
More informationCptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1
CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1 IEEE Expert, October 1996 CptS 570 - Machine Learning 2 Given sample S from all possible examples D Learner
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationSmart Home Health Analytics Information Systems University of Maryland Baltimore County
Smart Home Health Analytics Information Systems University of Maryland Baltimore County 1 IEEE Expert, October 1996 2 Given sample S from all possible examples D Learner L learns hypothesis h based on
More informationIntroduction to Supervised Learning. Performance Evaluation
Introduction to Supervised Learning Performance Evaluation Marcelo S. Lauretto Escola de Artes, Ciências e Humanidades, Universidade de São Paulo marcelolauretto@usp.br Lima - Peru Performance Evaluation
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationEvaluating Classifiers. Lecture 2 Instructor: Max Welling
Evaluating Classifiers Lecture 2 Instructor: Max Welling Evaluation of Results How do you report classification error? How certain are you about the error you claim? How do you compare two algorithms?
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationRegularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline
Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function
More informationLecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,
Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 14: Assessing and Comparing Classification Algorithms
More informationCSC314 / CSC763 Introduction to Machine Learning
CSC314 / CSC763 Introduction to Machine Learning COMSATS Institute of Information Technology Dr. Adeel Nawab More on Evaluating Hypotheses/Learning Algorithms Lecture Outline: Review of Confidence Intervals
More informationEstimating the accuracy of a hypothesis Setting. Assume a binary classification setting
Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier
More informationEvaluation & Credibility Issues
Evaluation & Credibility Issues What measure should we use? accuracy might not be enough. How reliable are the predicted results? How much should we believe in what was learned? Error on the training data
More information[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4]
Evaluating Hypotheses [Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4] Sample error, true error Condence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution,
More informationQuestion of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning
Question of the Day Machine Learning 2D1431 How can you make the following equation true by drawing only one straight line? 5 + 5 + 5 = 550 Lecture 4: Decision Tree Learning Outline Decision Tree for PlayTennis
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More informationCS 543 Page 1 John E. Boon, Jr.
CS 543 Machine Learning Spring 2010 Lecture 05 Evaluating Hypotheses I. Overview A. Given observed accuracy of a hypothesis over a limited sample of data, how well does this estimate its accuracy over
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationCHAPTER EVALUATING HYPOTHESES 5.1 MOTIVATION
CHAPTER EVALUATING HYPOTHESES Empirically evaluating the accuracy of hypotheses is fundamental to machine learning. This chapter presents an introduction to statistical methods for estimating hypothesis
More informationEnsemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan
Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite
More informationDiagnostics. Gad Kimmel
Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,
More informationPerformance Evaluation
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Example:
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect
More informationMethods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }
More informationVC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms
03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of
More informationHow do we compare the relative performance among competing models?
How do we compare the relative performance among competing models? 1 Comparing Data Mining Methods Frequent problem: we want to know which of the two learning techniques is better How to reliably say Model
More informationPointwise Exact Bootstrap Distributions of Cost Curves
Pointwise Exact Bootstrap Distributions of Cost Curves Charles Dugas and David Gadoury University of Montréal 25th ICML Helsinki July 2008 Dugas, Gadoury (U Montréal) Cost curves July 8, 2008 1 / 24 Outline
More informationEmpirical Evaluation (Ch 5)
Empirical Evaluation (Ch 5) how accurate is a hypothesis/model/dec.tree? given 2 hypotheses, which is better? accuracy on training set is biased error: error train (h) = #misclassifications/ S train error
More informationCross Validation & Ensembling
Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning
More informationMachine Learning. Ensemble Methods. Manfred Huber
Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected
More informationDECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]
1 DECISION TREE LEARNING [read Chapter 3] [recommended exercises 3.1, 3.4] Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting Decision Tree 2 Representation: Tree-structured
More informationEmpirical Risk Minimization, Model Selection, and Model Assessment
Empirical Risk Minimization, Model Selection, and Model Assessment CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 5.7-5.7.2.4, 6.5-6.5.3.1 Dietterich,
More informationCOMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization
: Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage
More informationAn Introduction to Statistical Machine Learning - Theoretical Aspects -
An Introduction to Statistical Machine Learning - Theoretical Aspects - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny,
More informationMachine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://www.cmpe.boun.edu.
Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2010 alpaydin@boun.edu.tr h1p://www.cmpe.boun.edu.tr/~ethem/i2ml2e CHAPTER 19: Design and Analysis of Machine Learning
More informationMachine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /
Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical
More informationChapter ML:II (continued)
Chapter ML:II (continued) II. Machine Learning Basics Regression Concept Learning: Search in Hypothesis Space Concept Learning: Search in Version Space Measuring Performance ML:II-96 Basics STEIN/LETTMANN
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationE. Alpaydın AERFAISS
E. Alpaydın AERFAISS 00 Introduction Questions: Is the error rate of y classifier less than %? Is k-nn ore accurate than MLP? Does having PCA before iprove accuracy? Which kernel leads to highest accuracy
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationResampling techniques for statistical modeling
Resampling techniques for statistical modeling Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 212 http://www.ulb.ac.be/di Resampling techniques p.1/33 Beyond the empirical error
More informationHoldout and Cross-Validation Methods Overfitting Avoidance
Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest
More informationData Mining und Maschinelles Lernen
Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationPerformance evaluation of binary classifiers
Performance evaluation of binary classifiers Kevin P. Murphy Last updated October 10, 2007 1 ROC curves We frequently design systems to detect events of interest, such as diseases in patients, faces in
More informationWhen Dictionary Learning Meets Classification
When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August
More informationCS7267 MACHINE LEARNING
CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning
More informationClassifier performance evaluation
Classifier performance evaluation Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských partyzánu 1580/3, Czech Republic
More informationBias-Variance in Machine Learning
Bias-Variance in Machine Learning Bias-Variance: Outline Underfitting/overfitting: Why are complex hypotheses bad? Simple example of bias/variance Error as bias+variance for regression brief comments on
More informationClass 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio
Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant
More informationPerformance Evaluation
Performance Evaluation Confusion Matrix: Detected Positive Negative Actual Positive A: True Positive B: False Negative Negative C: False Positive D: True Negative Recall or Sensitivity or True Positive
More informationStats notes Chapter 5 of Data Mining From Witten and Frank
Stats notes Chapter 5 of Data Mining From Witten and Frank 5 Credibility: Evaluating what s been learned Issues: training, testing, tuning Predicting performance: confidence limits Holdout, cross-validation,
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationModel Accuracy Measures
Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses
More informationComputational Learning Theory
Computational Learning Theory [read Chapter 7] [Suggested exercises: 7.1, 7.2, 7.5, 7.8] Computational learning theory Setting 1: learner poses queries to teacher Setting 2: teacher chooses examples Setting
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationLecture 3: Decision Trees
Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,
More informationEvaluation Metrics for Intrusion Detection Systems - A Study
Evaluation Metrics for Intrusion Detection Systems - A Study Gulshan Kumar Assistant Professor, Shaheed Bhagat Singh State Technical Campus, Ferozepur (Punjab)-India 152004 Email: gulshanahuja@gmail.com
More information.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.
.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Definitions Data. Consider a set A = {A 1,...,A n } of attributes, and an additional
More information15-388/688 - Practical Data Science: Nonlinear modeling, cross-validation, regularization, and evaluation
15-388/688 - Practical Data Science: Nonlinear modeling, cross-validation, regularization, and evaluation J. Zico Kolter Carnegie Mellon University Fall 2016 1 Outline Example: return to peak demand prediction
More informationAdaBoost. S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology
AdaBoost S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology 1 Introduction In this chapter, we are considering AdaBoost algorithm for the two class classification problem.
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationData Mining. Supervised Learning. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Supervised Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 15 Table of contents 1 Introduction 2 Supervised
More informationBagging and Other Ensemble Methods
Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained
More informationLinear Classifiers as Pattern Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear
More informationLecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher
Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationData Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur
Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture 21 K - Nearest Neighbor V In this lecture we discuss; how do we evaluate the
More informationCOMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)
COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless
More informationLeast Squares Classification
Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting
More informationModel Averaging With Holdout Estimation of the Posterior Distribution
Model Averaging With Holdout stimation of the Posterior Distribution Alexandre Lacoste alexandre.lacoste.1@ulaval.ca François Laviolette francois.laviolette@ift.ulaval.ca Mario Marchand mario.marchand@ift.ulaval.ca
More informationComputational Learning Theory
09s1: COMP9417 Machine Learning and Data Mining Computational Learning Theory May 20, 2009 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationClassification using stochastic ensembles
July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics
More informationSociology 6Z03 Review II
Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability
More informationLinear Classifiers as Pattern Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2013/2014 Lesson 18 23 April 2014 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear
More informationComputational Learning Theory
Computational Learning Theory Sinh Hoa Nguyen, Hung Son Nguyen Polish-Japanese Institute of Information Technology Institute of Mathematics, Warsaw University February 14, 2006 inh Hoa Nguyen, Hung Son
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationPerformance Evaluation
Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,
More informationComputational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory
More informationDecision Tree Learning Lecture 2
Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over
More informationCS 6375 Machine Learning
CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of
More informationIntroduction to Machine Learning
Outline Contents Introduction to Machine Learning Concept Learning Varun Chandola February 2, 2018 1 Concept Learning 1 1.1 Example Finding Malignant Tumors............. 2 1.2 Notation..............................
More informationComputational Learning Theory
CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful
More informationLearning Methods for Linear Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationComputational Learning Theory
0. Computational Learning Theory Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 7 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell 1. Main Questions
More informationArea Under the Precision-Recall Curve: Point Estimates and Confidence Intervals
Area Under the Precision-Recall Curve: Point Estimates and Confidence Intervals Kendrick Boyd 1 Kevin H. Eng 2 C. David Page 1 1 University of Wisconsin-Madison, Madison, WI 2 Roswell Park Cancer Institute,
More informationMachine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning
More informationData Mining. Chapter 5. Credibility: Evaluating What s Been Learned
Data Mining Chapter 5. Credibility: Evaluating What s Been Learned 1 Evaluating how different methods work Evaluation Large training set: no problem Quality data is scarce. Oil slicks: a skilled & labor-intensive
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationMachine Learning Concepts in Chemoinformatics
Machine Learning Concepts in Chemoinformatics Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-Universität Bonn BigChem Winter School 2017 25. October Data Mining in Chemoinformatics
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationClassifier Evaluation. Learning Curve cleval testc. The Apparent Classification Error. Error Estimation by Test Set. Classifier
Classifier Learning Curve How to estimate classifier performance. Learning curves Feature curves Rejects and ROC curves True classification error ε Bayes error ε* Sub-optimal classifier Bayes consistent
More information