Hypothesis Evaluation

Size: px
Start display at page:

Download "Hypothesis Evaluation"

Transcription

1 Hypothesis Evaluation Machine Learning Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

2 Table of contents 1 Introduction 2 Some performance measures of classifiers 3 Evaluating the performance of a classifier 4 Estimating true error 5 Confidence intervals 6 Paired t Test Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

3 Outline 1 Introduction 2 Some performance measures of classifiers 3 Evaluating the performance of a classifier 4 Estimating true error 5 Confidence intervals 6 Paired t Test Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

4 Introduction 1 Machine Learning algorithms induce hypothesis that depend on the training set, and there is a need for statistical testing to Asses expected performance of a hypothesis and Compare expected Performances of two hypothesis to compare them. 2 Classifier evaluation criteria Accuracy (or Error) The ability of a hypothesis to correctly predict the label of new/previously unseen data. Speed The computational costs involved in generating and using a hypothesis. Robustness The ability of a hypothesis to make correct predictions given noisy data or data with missing values. Scalability The ability to construct a hypothesis efficiently given large amounts of data. Interpretability The level of understanding and insight that is provided by the hypothesis. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

5 Evaluating the accuracy/error of classifiers 1 Given the observed accuracy of a hypothesis over a limited sample data, how well does this estimate its accuracy over additional examples. 2 Given one hypothesis outperforms another over sample data, how probable is that this hypothesis is more accurate in general. 3 When data is limited what is the best way to use this data to learn a hypothesis and estimate its accuracy. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

6 Measuring performance of a classifier 1 Measuring performance of a hypothesis is partitioning data to Training set Validation set different from training set. Test set different from training and validation sets. 2 Problems with this approach Training and validation sets may be small and may contain exceptional instances such as noise, which may mislead us. The learning algorithm may depend on other random factors affecting the accuracy such as initial weights of a neural network trained with BP. We must train/test several times and average the results. 3 Important points Performance of a hypothesis estimated using training set conditioned on the used data set and cant used to compare algorithms in domain independent ways. Validation set is used for model selection, comparing two algorithms, and decide to stop learning. In order to report the expected performance, we should use a separate test set unused during learning. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

7 Error of a classifier Definition (Sample error) The sample error (denoted E E (h)) of hypothesis h with respect to target concept c and data sample S of size N is. Definition (True error) E E (h) = 1 I[c(x) h(x)] N x S The true error (denoted E(h)) of hypothesis h with respect to target concept c and distribution D is the probability that h will misclassify an instance drawn at random according to distribution D. E(h) = P x S [c(x) h(x)] Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

8 Notions of errors 1 True error is Instance space X c h 2 Our concern How we can estimate the true error (E(h)) of hypothesis using its sample error (E E (h))? Can we bound true error of h given sample error of h? Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

9 Outline 1 Introduction 2 Some performance measures of classifiers 3 Evaluating the performance of a classifier 4 Estimating true error 5 Confidence intervals 6 Paired t Test Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

10 Some performance measures of classifiers 1 Error rate The error rate is the fraction of incorrect predictions for the classifier over the testing set, defined as E E (h) = 1 I[c(x) h(x)] N x S Error rate is an estimate of the probability of misclassification. 2 Accuracy The accuracy of a classifier is the fraction of correct predictions over the testing set: Accuracy(h) = 1 I[c(x) = h(x)] = 1 E E (h) N x S Accuracy gives an estimate of the probability of a correct prediction. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

11 Some performance measures of classifiers 1 What you can say about the accuracy of 90% or the error of 10%? 2 For example, if 3 4% of examples are from negative class, clearly accuracy of 90% is not acceptable. 3 Confusion matrix Actual label (+) ( ) Predicted label (+) ( ) TP FN FP TN 4 Given C classes, a confusion matrix is a table of C C. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

12 Some performance measures of classifiers 1 Precision (Positive predictive value) Precision is proportion of predicted positives which are actual positive and defined as Precision(h) = TP TP + FP 2 Recall (Sensitivity) Recall is proportion of actual positives which are predicted positive and defined as Recall(h) = TP TP + FN 3 Specificity Specificity is proportion of actual negative which are predicted negative and defined as Specificity(h) = TN TN + FP Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

13 Some performance measures of classifiers 1 Balanced classification rate (BCR) Balanced classification rate provides an average of recall (sensitivity) and specificity, it gives a more precise picture of classifier effectiveness. Balanced classification rate defined as BCR(h) = 1 [Specificity(h) + Recall(h)] 2 2 F-measure F-measure is harmonic mean between precision and recall and defined as F Measure(h) = 2 Persicion(h) Recall(h) Persicion(h) + Recall(h) Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

14 Outline 1 Introduction 2 Some performance measures of classifiers 3 Evaluating the performance of a classifier 4 Estimating true error 5 Confidence intervals 6 Paired t Test Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

15 Evaluating the performance of a classifier 1 Hold-out method 2 Random Sub-sampling 3 Cross validation method 4 Leave-one-out method Cross validation method 6 Bootstrapping method Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

16 4 Hold-out method 1 Hold-out Hold-out partitions the given data into two independent sets : training and test sets. The holdout method Split dataset into two groups Training set: used to train the classifier Test set: used to estimate the error rate of the trained classifier Total number of examples Training Set Test Set A typical application the holdout method is determining a stopping point for the back propagation error Typically two-thirds of the data are allocated to the training set and the remaining one-third is allocated to the test set. The training set is used to drive the model. The test set is used to estimate the accuracy of the model. MSE Test set error Stopping point Training set error Epochs Intelligent Sensor Systems Ricardo Gutierrez-Osuna Wright State University Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

17 6 Random sub-sampling method 1 Random sub-sampling Random sub-sampling is a variation of the hold-out method in which hold-out is repeated k times. Random Subsampling Random Subsampling performs K data splits of the dataset Each split randomly selects a (fixed) no. examples without replacement For each data split we retrain the classifier from scratch with the training examples and estimate E i with the test examples Total number of examples Test example Experiment 1 Experiment 2 Experiment 3 The true error estimate is obtained as the average of the separate estimates E i This estimate is significantly better than the holdout estimate i K 1 The estimated error rate is the average of the error rates for classifiers derived for the independently and randomly generated test partitions. Random subsampling can produce better error estimates than a single train-and-test partition. = 1 E i = E K Intelligent Sensor Systems Ricardo Gutierrez-Osuna Wright State University Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

18 7 Cross validation method 1 K-fold cross validation The initial data are randomly partitioned into K mutually exclusive subsets or folds, S 1, S 2,..., S K, each of approximately equal size.. K-Fold Cross-validation Create a K-fold partition of the the dataset For each of K experiments, use K-1 folds for training and the remaining one for testing Total number of examples Experiment 1 Experiment 2 Experiment 3 Test examples Experiment 4 K-Fold Cross validation is similar to Random Subsampling The advantage of K-Fold Cross validation is that all the examples in the i K 1 Training and testing is performed K times. In iteration k, partition S k is used for test and the remaining partitions collectively used for training. The accuracy is the percentage of the total number of correctly classified test examples. The advantage of K-fold cross validation is that all the examples in the Hamid Beigy (Sharifdataset University of are Technology) eventually used Hypothesis for Evaluation both training and testing. Fall / 31 dataset are eventually used for both training and testing As before, the true error is estimated as the average error rate E i = 1 = E K Intelligent Sensor Systems Ricardo Gutierrez-Osuna Wright State University

19 8 Leave-one-out method i 1 1 Leave-one-out Leave-one-out is a special case of K-fold cross validation where K is set to number of examples in dataset. Leave-one-out Cross Validation Leave-one-out is the degenerate case of K-Fold Cross Validation, where K is chosen as the total number of examples For a dataset with N examples, perform N experiments For each experiment use N-1 examples for training and the remaining example for testing Total number of examples Experiment 1 Experiment 2 Experiment 3 Single test example For a dataset with N examples, perform N experiments. For each experiment use N 1 examples for training and the remaining example for testing Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31 Experiment N As usual, the true error is estimated as the average error rate on test examples N E i = = E 1 N Intelligent Sensor Systems Ricardo Gutierrez-Osuna Wright State University

20 5 2 Cross validation method p1 s 2 1 p2 s 2 2 p3 s Cross validation method repeates five times 2-fold cross validation method. 2 Training and testing is performed 10 times. 3 The estimated error rate is the average of the error rates for classifiers derived for the independently and randomly generated test partitions. 5x2cv F test p (1,1) p (1,1) p (1) A B 1 p4 s 2 4 p (1,2) p (1,2) p (2) A B 1 (2,1) (2,1) (1) p (2,1) A p (2,1) B p (1) 2 p5 s 2 5 (2,2) (2,2) (2) p (2,2) A p (2,2) B p (2) 2 (3,1) (3,1) (1) p (3,1) A p (3,1) B p (1) 3 (3,2) (3,2) (2) p (3,2) A p (3,2) B p (2) 3 p (1) 4 p (4,1) B p (4,1) A (4,2) (4,2) (2) p (4,2) A p (4,2) B p (2) 4 p (5,1) p (5,1) p (1) A B 5 p (5,2) p (5,2) p (2) A B 5 Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

21 Bootstrapping method 1 Bootstrapping The bootstrap uses sampling with replacement to form the training set. Sample a dataset of N instances N times with replacement to form a new dataset of N instances. Use this data as the training set. An instance may occur more than once in the training set. Use the instances from the original dataset that dont occur in the new training set for testing. This method traines classifieron just on 63% of the instances. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

22 Outline 1 Introduction 2 Some performance measures of classifiers 3 Evaluating the performance of a classifier 4 Estimating true error 5 Confidence intervals 6 Paired t Test Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

23 Estimating true error 1 How well does E E (h) estimate E(h)? Bias in the estimate If training / test set is small, then the accuracy of the resulting hypothesis is a poor estimator of its accuracy over future examples. bias = E[E E (h)] E(h). For unbiased estimate, h and S must be chosen independently. Variance in the estimate Even with unbiased S, E E (h) may still vary from E(h). The smaller test set results in a greater expected variance. 2 Hypothesis h misclassifies 12 of the 40 examples in S. What is E(h)? 3 We use the following experiment Choose sample S of size N according to distribution D. Measure E E (h) E E (h) is a random variable (i.e., result of an experiment). E E (h) is an unbiased estimator for E(h). 4 Given observed E E (h), what can we conclude about E(h)? Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

24 Distribution of error 1 E E (h) is a random variable with binomial distribution, for the experiment with different randomly drawn S of size N, the probability of observing r misclassified examples is N! P(r) = r!(n r)! E(h)r [1 E(h)] N r 2 For example for N = 40 and E(h) = p = 0.2, Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

25 Distribution of error 1 For binomial distribution, we have E(r) = Np Var(r) = Np(1 p) We have shown that the random variable E E (h) obeys a Binomial distribution. 2 Let p be the probability that the result of trial is a success. 3 The E E (h) and E(h) are E E (h) = r N E(h) = p where N is the number of instances in the sample S, r is the number of instances from S misclassified by h, and p is the probability of misclassifying a single instance drawn from D. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

26 Distribution of error 1 It can be shown that E E (h) is unbiased estimator for E(h). 2 Since r is Binomially distributed, its variance is Np(1 p). 3 Unfortunately p is unknown, but we can substitute our estimate r N for p. 4 In general, given r errors in a sample of N independently drawn test examples, the standard deviation for E E (h) is given by Var [E E (h)] = p(1 p) N EE (h)(1 E E (h)) N Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

27 Outline 1 Introduction 2 Some performance measures of classifiers 3 Evaluating the performance of a classifier 4 Estimating true error 5 Confidence intervals 6 Paired t Test Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

28 Confidence intervals 1 One common way to describe the uncertainty associated with an estimate is to give an interval within which the true value is expected to fall, along with the probability with which it is expected to fall into this interval. 2 How can we derive confidence intervals for E E (h)? 3 For a given value of M, how can we find the size of the interval that contains M% of the probability mass? 4 Unfortunately, for the Binomial distribution this calculation can be quite tedious. 5 Fortunately, however, an easily calculated and very good approximation can be found in most cases, based on the fact that for sufficiently large sample sizes the Binomial distribution can be closely approximated by the Normal distribution. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

29 Confidence intervals 1 In Normal Distribution N(µ, σ 2 ), M% of area (probability) lies in µ ± z M σ, where M% 50% 68% 80% 90% 95% 98% 99% z M y σ x Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

30 Comparing two hypotheses Test h 1 on sample S 1 and test h 2 on S 2. 1 Pick parameter to estimate d = E(h 1 ) E(h 2 ). 2 Choose an estimator ˆd = E E (h 1 ) E E (h 2 ). 3 Determine probability distribution that governs estimator: 4 E E (h 1 ) and E E (h 2 ) can be approximated by Normal Distribution. 5 Difference of two Normal distributions is also a Normal distribution, ˆd will also approximated by a Normal distribution with mean d and variance of this distribution is equal to sum of variances of E E (h 1 ) and E E (h 2 ). Hence, we have E E (h 1 )(1 E E (h 1 )) Var[ ˆd] = + E E (h 2 )(1 E E (h 2 )) N 1 N 2 6 Find interval (L, U) such that M% of probability mass falls in interval E E (h 1 )(1 E E (h 1 )) ˆd ± z M + E E (h 2 )(1 E E (h 2 )) N 1 N 2 Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

31 Comparing two algorithms For comparing two learning algorithms L A and L B, we would like to estimate E S D [E(L A (S)) E(L B (S))] where L(S) is the hypothesis output by learner L using training set S. This shows the expected difference in true error between hypotheses output by learners L A and L B, when trained using randomly selected training sets S drawn according to distribution D. But, given limited data S 0, what is a good estimator? 1 Could partition S 0 into training set S tr 0 ˆd = E S ts 0 E and test set S ts 0 (L A(h 1 )) E Sts 0 E (L B(h 2 )). where h1 and h 2 are trained using training set S0 tr error using test set S ts 0. and E S ts 0 E 2 Even better, repeat this many times and average the results., and measure is emprical Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

32 Outline 1 Introduction 2 Some performance measures of classifiers 3 Evaluating the performance of a classifier 4 Estimating true error 5 Confidence intervals 6 Paired t Test Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

33 Paired t Test consider the following estimation problem 1 We are given the observed values of a set of independent, identically distributed random variables Y 1, Y 2,..., Y K. 2 We wish to estimate the mean p of the probability distribution governing these Y i. 3 The estimator we will use is the sample mean Ȳ = 1 K K k=1 Y k ithe task is to estimate the sample mean of a collection of independent, identically and Normally distributed random variables. The approximate M% confidence interval for estimating Ȳ is given by where SȲ = Ȳ ± +t M,K 1 SȲ 1 K (Y i K(K 1 Ȳ )2 k=1 Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

34 Paired t Test Values alues of t M,K 1 for two-sided confidence intervals. M 90% 95% 98% 99% K 1 = K 1 = K 1 = As K, t M,K 1 approaches z M. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

35 ROC Curves 1 ROC puts false positive rate (FPR = FP/NEG) on x axis. 2 ROC puts true positive rate (TPR = TP/POS) on y axis. 3 Each classifier represented by a point in ROC space corresponding to its (FPR, TPR) pair TP rate FP rate Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

36 Practical aspects 1 A note on parameter tuning Some learning schemes operate in two stages: Stage 1: builds the basic structure Stage 2: optimizes parameter settings It is important that the test data is not used in any way to create the classifier The test data cant be used for parameter tuning! Proper procedure uses three sets: training data, validation data, and test data Validation data is used to optimize parameters 2 No Free Lunch Theorem For any ML algorithm there exist data sets on which it performs well and there exist data sets on which it performs badly! We hope that the latter sets do not occur too often in real life. Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

37 References 1 C. Schaffer, Selecting a Classification Method by cross-validation, Machine Learning, vol. 13, pp , E. Alpaydin, Combined 5x2 CVF Test for Supervised Classification Learning Algrrithm, Neural Computation, Vol. 11, , C. Schaffer, Multiple Comparisons in Induction Algorithms, Machine Learning, vol. 38, pp , Tom Fawcett, An introduction to ROC analysis, Pattern Recognition Letters 27 (2006) Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall / 31

Performance Evaluation and Hypothesis Testing

Performance Evaluation and Hypothesis Testing Performance Evaluation and Hypothesis Testing 1 Motivation Evaluating the performance of learning systems is important because: Learning systems are usually designed to predict the class of future unlabeled

More information

Evaluating Hypotheses

Evaluating Hypotheses Evaluating Hypotheses IEEE Expert, October 1996 1 Evaluating Hypotheses Sample error, true error Confidence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution,

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1 CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1 IEEE Expert, October 1996 CptS 570 - Machine Learning 2 Given sample S from all possible examples D Learner

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Smart Home Health Analytics Information Systems University of Maryland Baltimore County

Smart Home Health Analytics Information Systems University of Maryland Baltimore County Smart Home Health Analytics Information Systems University of Maryland Baltimore County 1 IEEE Expert, October 1996 2 Given sample S from all possible examples D Learner L learns hypothesis h based on

More information

Introduction to Supervised Learning. Performance Evaluation

Introduction to Supervised Learning. Performance Evaluation Introduction to Supervised Learning Performance Evaluation Marcelo S. Lauretto Escola de Artes, Ciências e Humanidades, Universidade de São Paulo marcelolauretto@usp.br Lima - Peru Performance Evaluation

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Evaluating Classifiers. Lecture 2 Instructor: Max Welling Evaluating Classifiers Lecture 2 Instructor: Max Welling Evaluation of Results How do you report classification error? How certain are you about the error you claim? How do you compare two algorithms?

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function

More information

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press, Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2004 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml CHAPTER 14: Assessing and Comparing Classification Algorithms

More information

CSC314 / CSC763 Introduction to Machine Learning

CSC314 / CSC763 Introduction to Machine Learning CSC314 / CSC763 Introduction to Machine Learning COMSATS Institute of Information Technology Dr. Adeel Nawab More on Evaluating Hypotheses/Learning Algorithms Lecture Outline: Review of Confidence Intervals

More information

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier

More information

Evaluation & Credibility Issues

Evaluation & Credibility Issues Evaluation & Credibility Issues What measure should we use? accuracy might not be enough. How reliable are the predicted results? How much should we believe in what was learned? Error on the training data

More information

[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4]

[Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4] Evaluating Hypotheses [Read Ch. 5] [Recommended exercises: 5.2, 5.3, 5.4] Sample error, true error Condence intervals for observed hypothesis error Estimators Binomial distribution, Normal distribution,

More information

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning

Question of the Day. Machine Learning 2D1431. Decision Tree for PlayTennis. Outline. Lecture 4: Decision Tree Learning Question of the Day Machine Learning 2D1431 How can you make the following equation true by drawing only one straight line? 5 + 5 + 5 = 550 Lecture 4: Decision Tree Learning Outline Decision Tree for PlayTennis

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

CS 543 Page 1 John E. Boon, Jr.

CS 543 Page 1 John E. Boon, Jr. CS 543 Machine Learning Spring 2010 Lecture 05 Evaluating Hypotheses I. Overview A. Given observed accuracy of a hypothesis over a limited sample of data, how well does this estimate its accuracy over

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

CHAPTER EVALUATING HYPOTHESES 5.1 MOTIVATION

CHAPTER EVALUATING HYPOTHESES 5.1 MOTIVATION CHAPTER EVALUATING HYPOTHESES Empirically evaluating the accuracy of hypotheses is fundamental to machine learning. This chapter presents an introduction to statistical methods for estimating hypothesis

More information

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite

More information

Diagnostics. Gad Kimmel

Diagnostics. Gad Kimmel Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,

More information

Performance Evaluation

Performance Evaluation Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Example:

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect

More information

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }

More information

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms

VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms 03/Feb/2010 VC dimension, Model Selection and Performance Assessment for SVM and Other Machine Learning Algorithms Presented by Andriy Temko Department of Electrical and Electronic Engineering Page 2 of

More information

How do we compare the relative performance among competing models?

How do we compare the relative performance among competing models? How do we compare the relative performance among competing models? 1 Comparing Data Mining Methods Frequent problem: we want to know which of the two learning techniques is better How to reliably say Model

More information

Pointwise Exact Bootstrap Distributions of Cost Curves

Pointwise Exact Bootstrap Distributions of Cost Curves Pointwise Exact Bootstrap Distributions of Cost Curves Charles Dugas and David Gadoury University of Montréal 25th ICML Helsinki July 2008 Dugas, Gadoury (U Montréal) Cost curves July 8, 2008 1 / 24 Outline

More information

Empirical Evaluation (Ch 5)

Empirical Evaluation (Ch 5) Empirical Evaluation (Ch 5) how accurate is a hypothesis/model/dec.tree? given 2 hypotheses, which is better? accuracy on training set is biased error: error train (h) = #misclassifications/ S train error

More information

Cross Validation & Ensembling

Cross Validation & Ensembling Cross Validation & Ensembling Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) CV & Ensembling Machine Learning

More information

Machine Learning. Ensemble Methods. Manfred Huber

Machine Learning. Ensemble Methods. Manfred Huber Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected

More information

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4]

DECISION TREE LEARNING. [read Chapter 3] [recommended exercises 3.1, 3.4] 1 DECISION TREE LEARNING [read Chapter 3] [recommended exercises 3.1, 3.4] Decision tree representation ID3 learning algorithm Entropy, Information gain Overfitting Decision Tree 2 Representation: Tree-structured

More information

Empirical Risk Minimization, Model Selection, and Model Assessment

Empirical Risk Minimization, Model Selection, and Model Assessment Empirical Risk Minimization, Model Selection, and Model Assessment CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 5.7-5.7.2.4, 6.5-6.5.3.1 Dietterich,

More information

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization

COMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization : Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage

More information

An Introduction to Statistical Machine Learning - Theoretical Aspects -

An Introduction to Statistical Machine Learning - Theoretical Aspects - An Introduction to Statistical Machine Learning - Theoretical Aspects - Samy Bengio bengio@idiap.ch Dalle Molle Institute for Perceptual Artificial Intelligence (IDIAP) CP 592, rue du Simplon 4 1920 Martigny,

More information

Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://www.cmpe.boun.edu.

Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://www.cmpe.boun.edu. Lecture Slides for INTRODUCTION TO Machine Learning ETHEM ALPAYDIN The MIT Press, 2010 alpaydin@boun.edu.tr h1p://www.cmpe.boun.edu.tr/~ethem/i2ml2e CHAPTER 19: Design and Analysis of Machine Learning

More information

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring /

Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring / Machine Learning Ensemble Learning I Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi Spring 2015 http://ce.sharif.edu/courses/93-94/2/ce717-1 / Agenda Combining Classifiers Empirical view Theoretical

More information

Chapter ML:II (continued)

Chapter ML:II (continued) Chapter ML:II (continued) II. Machine Learning Basics Regression Concept Learning: Search in Hypothesis Space Concept Learning: Search in Version Space Measuring Performance ML:II-96 Basics STEIN/LETTMANN

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

E. Alpaydın AERFAISS

E. Alpaydın AERFAISS E. Alpaydın AERFAISS 00 Introduction Questions: Is the error rate of y classifier less than %? Is k-nn ore accurate than MLP? Does having PCA before iprove accuracy? Which kernel leads to highest accuracy

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Resampling techniques for statistical modeling

Resampling techniques for statistical modeling Resampling techniques for statistical modeling Gianluca Bontempi Département d Informatique Boulevard de Triomphe - CP 212 http://www.ulb.ac.be/di Resampling techniques p.1/33 Beyond the empirical error

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

Data Mining und Maschinelles Lernen

Data Mining und Maschinelles Lernen Data Mining und Maschinelles Lernen Ensemble Methods Bias-Variance Trade-off Basic Idea of Ensembles Bagging Basic Algorithm Bagging with Costs Randomization Random Forests Boosting Stacking Error-Correcting

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Performance evaluation of binary classifiers

Performance evaluation of binary classifiers Performance evaluation of binary classifiers Kevin P. Murphy Last updated October 10, 2007 1 ROC curves We frequently design systems to detect events of interest, such as diseases in patients, faces in

More information

When Dictionary Learning Meets Classification

When Dictionary Learning Meets Classification When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August

More information

CS7267 MACHINE LEARNING

CS7267 MACHINE LEARNING CS7267 MACHINE LEARNING ENSEMBLE LEARNING Ref: Dr. Ricardo Gutierrez-Osuna at TAMU, and Aarti Singh at CMU Mingon Kang, Ph.D. Computer Science, Kennesaw State University Definition of Ensemble Learning

More information

Classifier performance evaluation

Classifier performance evaluation Classifier performance evaluation Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských partyzánu 1580/3, Czech Republic

More information

Bias-Variance in Machine Learning

Bias-Variance in Machine Learning Bias-Variance in Machine Learning Bias-Variance: Outline Underfitting/overfitting: Why are complex hypotheses bad? Simple example of bias/variance Error as bias+variance for regression brief comments on

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

Performance Evaluation

Performance Evaluation Performance Evaluation Confusion Matrix: Detected Positive Negative Actual Positive A: True Positive B: False Negative Negative C: False Positive D: True Negative Recall or Sensitivity or True Positive

More information

Stats notes Chapter 5 of Data Mining From Witten and Frank

Stats notes Chapter 5 of Data Mining From Witten and Frank Stats notes Chapter 5 of Data Mining From Witten and Frank 5 Credibility: Evaluating what s been learned Issues: training, testing, tuning Predicting performance: confidence limits Holdout, cross-validation,

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory [read Chapter 7] [Suggested exercises: 7.1, 7.2, 7.5, 7.8] Computational learning theory Setting 1: learner poses queries to teacher Setting 2: teacher chooses examples Setting

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,

More information

Evaluation Metrics for Intrusion Detection Systems - A Study

Evaluation Metrics for Intrusion Detection Systems - A Study Evaluation Metrics for Intrusion Detection Systems - A Study Gulshan Kumar Assistant Professor, Shaheed Bhagat Singh State Technical Campus, Ferozepur (Punjab)-India 152004 Email: gulshanahuja@gmail.com

More information

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Definitions Data. Consider a set A = {A 1,...,A n } of attributes, and an additional

More information

15-388/688 - Practical Data Science: Nonlinear modeling, cross-validation, regularization, and evaluation

15-388/688 - Practical Data Science: Nonlinear modeling, cross-validation, regularization, and evaluation 15-388/688 - Practical Data Science: Nonlinear modeling, cross-validation, regularization, and evaluation J. Zico Kolter Carnegie Mellon University Fall 2016 1 Outline Example: return to peak demand prediction

More information

AdaBoost. S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology

AdaBoost. S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology AdaBoost S. Sumitra Department of Mathematics Indian Institute of Space Science and Technology 1 Introduction In this chapter, we are considering AdaBoost algorithm for the two class classification problem.

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Data Mining. Supervised Learning. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Supervised Learning. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Supervised Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 15 Table of contents 1 Introduction 2 Supervised

More information

Bagging and Other Ensemble Methods

Bagging and Other Ensemble Methods Bagging and Other Ensemble Methods Sargur N. Srihari srihari@buffalo.edu 1 Regularization Strategies 1. Parameter Norm Penalties 2. Norm Penalties as Constrained Optimization 3. Regularization and Underconstrained

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture 21 K - Nearest Neighbor V In this lecture we discuss; how do we evaluate the

More information

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d)

COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) COMP 551 Applied Machine Learning Lecture 3: Linear regression (cont d) Instructor: Herke van Hoof (herke.vanhoof@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless

More information

Least Squares Classification

Least Squares Classification Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting

More information

Model Averaging With Holdout Estimation of the Posterior Distribution

Model Averaging With Holdout Estimation of the Posterior Distribution Model Averaging With Holdout stimation of the Posterior Distribution Alexandre Lacoste alexandre.lacoste.1@ulaval.ca François Laviolette francois.laviolette@ift.ulaval.ca Mario Marchand mario.marchand@ift.ulaval.ca

More information

Computational Learning Theory

Computational Learning Theory 09s1: COMP9417 Machine Learning and Data Mining Computational Learning Theory May 20, 2009 Acknowledgement: Material derived from slides for the book Machine Learning, Tom M. Mitchell, McGraw-Hill, 1997

More information

MODULE -4 BAYEIAN LEARNING

MODULE -4 BAYEIAN LEARNING MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities

More information

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015

Machine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015 Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and

More information

Classification using stochastic ensembles

Classification using stochastic ensembles July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2013/2014 Lesson 18 23 April 2014 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

Computational Learning Theory

Computational Learning Theory Computational Learning Theory Sinh Hoa Nguyen, Hung Son Nguyen Polish-Japanese Institute of Information Technology Institute of Mathematics, Warsaw University February 14, 2006 inh Hoa Nguyen, Hung Son

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Performance Evaluation

Performance Evaluation Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,

More information

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar

Computational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory

More information

Decision Tree Learning Lecture 2

Decision Tree Learning Lecture 2 Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Course Info. Instructor: Nicholas Ruozzi Office: ECSS 3.409 Office hours: Tues.

More information

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri

CSE 151 Machine Learning. Instructor: Kamalika Chaudhuri CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Ensemble Learning How to combine multiple classifiers into a single one Works well if the classifiers are complementary This class: two types of

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Contents Introduction to Machine Learning Concept Learning Varun Chandola February 2, 2018 1 Concept Learning 1 1.1 Example Finding Malignant Tumors............. 2 1.2 Notation..............................

More information

Computational Learning Theory

Computational Learning Theory CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful

More information

Learning Methods for Linear Detectors

Learning Methods for Linear Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

Computational Learning Theory

Computational Learning Theory 0. Computational Learning Theory Based on Machine Learning, T. Mitchell, McGRAW Hill, 1997, ch. 7 Acknowledgement: The present slides are an adaptation of slides drawn by T. Mitchell 1. Main Questions

More information

Area Under the Precision-Recall Curve: Point Estimates and Confidence Intervals

Area Under the Precision-Recall Curve: Point Estimates and Confidence Intervals Area Under the Precision-Recall Curve: Point Estimates and Confidence Intervals Kendrick Boyd 1 Kevin H. Eng 2 C. David Page 1 1 University of Wisconsin-Madison, Madison, WI 2 Roswell Park Cancer Institute,

More information

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning

More information

Data Mining. Chapter 5. Credibility: Evaluating What s Been Learned

Data Mining. Chapter 5. Credibility: Evaluating What s Been Learned Data Mining Chapter 5. Credibility: Evaluating What s Been Learned 1 Evaluating how different methods work Evaluation Large training set: no problem Quality data is scarce. Oil slicks: a skilled & labor-intensive

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Machine Learning Concepts in Chemoinformatics

Machine Learning Concepts in Chemoinformatics Machine Learning Concepts in Chemoinformatics Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-Universität Bonn BigChem Winter School 2017 25. October Data Mining in Chemoinformatics

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Classifier Evaluation. Learning Curve cleval testc. The Apparent Classification Error. Error Estimation by Test Set. Classifier

Classifier Evaluation. Learning Curve cleval testc. The Apparent Classification Error. Error Estimation by Test Set. Classifier Classifier Learning Curve How to estimate classifier performance. Learning curves Feature curves Rejects and ROC curves True classification error ε Bayes error ε* Sub-optimal classifier Bayes consistent

More information