Model Accuracy Measures

Size: px
Start display at page:

Download "Model Accuracy Measures"

Transcription

1 Model Accuracy Measures Master in Bioinformatics UPF Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain

2 Variables What we can measure (attributes) Hypotheses What we want to predict (Class values/labels) Examples Training set (labeled data) Model Training Model Predict on new cases

3 Variables What we can measure (attributes) Hypotheses What we want to predict (Class values/labels) Examples Training set (labeled data) Model Training Model Prediction: Does it example belong to this model? Predict on new cases Classification: what is the most probable label?

4 Testing the accuracy of a model Is my method good enough? (for the specific problem) How does my method compare to other methods?

5 Testing the accuracy of a model We need a systematic way to evaluate and compare multiple methods Methods are heterogenous in their purposes, e.g.: 1) Ability to classify instances accurately 2) Predicting/scoring the class labels 3) Methods may predict numerical or nominal values (score, class label, yes/no, posterior probability, etc.) Thus we need a methodology that is applicable to all of them

6 Training and Testing Accuracy expected performance (accuracy) of the model in future (new) data It is wrong to estimate the accuracy on the same dataset used to build (train) the model. This estimation would be overly optimistic: Overfitting è it won t necessarily adapt well to new different instances

7 Training and Testing Separate known cases into a training set and a test set Labeled cases Cases for training Cases for testing Training step Evaluation step model On the cases for testing we predict and compare the predictions with the known labels. How to do the splitting? A common splitting choice is 2/3 for training and 1/3 for testing This approach is suitable when the entire dataset is large

8 Training and Testing How to select the data for training and testing: 1) Stratification: The size of each of the prediction classes should be similar in each subset, training and testing (balanced subsets) 2) Homogeneity: Data sets should have similar properties to have a reliable test. E.g. GC-content, peptide lengths, species represented. These conditions ensure representativity of the different properties and prediction classes (e.g. would you test a model of human transmembrane domains with yeast proteins?) (e.g. think of GC content). Provided that sets are balanced and homogeneous, the accuracy on the test set will be a good estimation of future performance.

9 Training and Testing N-fold cross validation Test set Data 1/N parts of the data set Training set (N-1)/N parts of the data Accuracy1 Build a predictive model where accuracy is used generically: any measure of prediction performance

10 Training and Testing N-fold cross validation Test set Data set Training set Accuracy1 Accuracy2 Build a predictive model where accuracy is used generically: any measure of prediction performance

11 N-fold cross validation Training and Testing Test set Data set Training set Accuracy 1 Accuracy 2 Accuracy 3 Accuracy n Average accuracy The average accuracy reflects the performance of the model on the entire dataset. Important: subsets must be representative of the original data (stratification and homogeneity) The standard is to do 10-fold cross validation

12 Leave-one out Training and Testing It is like n-fold cross validation, but where n is the size of the set (number of instances), that is: train in all but 1, test on this one Advantages: 1) The greatest possible amount of data is used for training (n-1 instances) 2) It is deterministic: no random sampling of subsets is involved. Disadvantages: 1) Computationally more expensive 2) It cannot be stratified E.g. Imagine you have the same number of examples for 2 classes. A random classifier predicting the majority class is expected to have an error rate of 50%, but in the leave-one out method, the majority class is always the opposite class, which will produce 100% error rate.

13 Accuracy measures

14 Accuracy measure Example: The model of transmembrane helices We have two models: (1) the loop model M loop given by the observed frequencies of AA in loops p (2) the helix model M helix given by the observed frequencies of AA in helices q Given a peptide s=x 1 x N we can predict whether it is part of a helix or a loop using the log-likelihood test (assuming uniform priors and positional independence) N S = log L(s M helix) L(s M loop ) = i=1 N As a default, we can use as classification the rule: if S>0 then s is part of a helix if S 0 then s is a loop i=1 q xi p xi

15 Accuracy measure Example: The model of transmembrane helices Training set S = log L(s M helix ) L(s M loop ) = N i=1 N i=1 q xi p xi A test set: a set of labelled (annotated) proteins that we do not use for training Helix Loop

16 Accuracy measure Real False

17 Accuracy measure Our model divides the test set according to our predictions of Real and False: Our predictions Real False The red area contains the predictions (helix) made by our model

18 Accuracy measure TP (True positives): elements predicted as real that are real TP Real False

19 Accuracy measure TP (True positives): TN (True Negatives): elements predicted as real that are real elements predicted as false that are false TP Real False TN

20 Accuracy measure TP (True positives): elements predicted as real that are real TN (True Negatives): elements predicted as false that are false FP (False Positives): elements predicted as real that are false TP Real FP False TN

21 Accuracy measure TP (True positives): elements predicted as real that are real TN (True Negatives): elements predicted as false that are false FP (False Positives): elements predicted as real that are false FN (False Negatives): elements predicted as false that are real TP Real FN FP False TN

22 Accuracy measure True Positive Rate (Sensitivity): proportion of true elements that is correctly predicted (a.k.a hit rate, recall) Sn = TPR = TP TP + FN TP Real FN FP False TN False Positive Rate (FPR): proportion of negative cases that are mislabelled (a.k.a. fall-out) FPR = FP FP + TN Specificity: proportion of the negatives that are correctly predicted Sp =1 FPR = TN FP + TN Sn and Sp take values between 0 and 1. A perfect classification would have Sn=1 and Sp=1

23 Accuracy measure Positive Predictive Value (PPV): sometimes called Precision it gives the fraction of our predictions that are correct PPV = TP TP + FP TP Real FN FP False TN False Discovery Rate (FDR): what fraction of our predictions are wrong FDR = FP FP + TP PPV à 1 means most of our predictions are correct FDR à 0 means that very few of our predictions are wrong

24 The issue of True Negatives Accuracy measure Sometimes we cannot find a True Negative set (e.g. Think of genomic features, like genes, regulatory regions, etc it is very hard to find real negative cases for some biological features) TP Real FP FN We can still use the TPR, PPV and FDR: TPR = TP TP + FN PPV = TP FP +TP FDR = FP FP + TP

25 Accuracy measure Overall success rate: is the number of correct classifications divided by the total number of classifications (sometimes called accuracy ): Overall Success Rate = TP + TN TP + TN + FN + FP A value of 1 for the Success rate means that the model identifies all the positive and negative cases correctly The error rate: 1 minus the overall success rate: Error Rate =1 TP + TN TP + TN + FN + FP

26 Accuracy measure Correlation coefficient (a.k.a. Matthews Correlation Coefficient (MCC)) CC = (TP)(TN ) (FP)(FN ) (TP + FN )(TN + FP)(TP + FP)(TN + FN ) This measure scores positively correct predictions and negatively incorrect ones, and takes values between -1 and 1. The more correct the method, the closer to one CC --> 1 A very bad method will have a CC closer to -1

27 Accuracy measure TP FP Yes No FN TN This can also be represented by a confusion matrix for a 2-class prediction: Predicted class yes no Actual class yes no true positive false positive false negative true negative

28 For multiclass predictions: Accuracy measure Predicted class Predicted class a b c Total a b c Total Actual a Actual a class b class b c c (a) Total Total Good results correspond to large numbers on the diagonal and small numbers off the diagonal In the example we have 200 instances ( ) and 140 of them are predicted correctly, thus the success rate is 70%. Question: is this a good measure? How many agreements do we expect by chance? (b)

29 For multiclass predictions: Accuracy measure Predicted class Predicted class a b c Total a b c Total Actual a Actual a class b class b c c (a) Total Total Observed values We build the matrix of expected values by using the same totals as before and sharing the total of each class: Totals in each actual (Real) class: a = 100, b = 60, c = 40 (b) Expected values

30 For multiclass predictions: Accuracy measure Predicted class Predicted class a b c Total a b c Total Actual a Actual a class b class b c c (a) Total Total Observed values We build the matrix of expected values by using the same totals as before and sharing the total of each class: Totals in each actual (Real) class: a = 100, b = 60, c = 40 We split each of them into the three groups using the proportions of the predicted classes: a =120, b=60, c =20 è a= 60%, b=30%, c = 10% (b) Expected values

31 For multiclass predictions: Accuracy measure Predicted class Predicted class a b c Total a b c Total Actual a Actual a class b class b c c (a) Total Total Observed values We build the matrix of expected values by using the same totals as before and sharing the total of each class: Totals in each actual (Real) class: a = 100, b = 60, c = 40 We split each of them into the three groups using the proportions of the predicted classes: a =120, b=60, c =20 è a= 60%, b=30%, c = 10% (b) Expected values

32 For multiclass predictions: Accuracy measure Predicted class Predicted class a b c Total a b c Total Actual a Actual a class b class b c c Total Total (a) Observed values (b) Expected values To estimate the relative agreement between observed and expected values we can use the kappa statistic: κ = P(A) P(E) 1 P(E) = n(a) n(e) N n(e) = = 0.49 Where P(A) is the probability of agreement and P(E) is the probability of agreement by chance. The maximum possible value is κ=1, and for a random predictor κ=0

33 Accuracy measure What is a good accuracy? Every measure shows a different perspective on the performance of the model. In general we will use two or more complementary measures to evaluate a model. E.g. a method that finds almost all elements will have an Sn close to 1, but this can be achieved with a method with very low Sp E.g. a method that has Sp close to 1, may have very low Sn In general, one would like to have a method that balances Sn and Sp (or equivalent measures)

34 Accuracy measure What is a good accuracy? Which accuracy measure we want to maximize often depends on the question Do you want to find all the true cases? (You want higher sensitivity) Or want to find only correct cases? (You want higher specificity) Question: predicting novel genes might require high Sp or perhaps high Sn?

35 Choosing a prediction threshold

36 Accuracy measure Although we have one single model in fact we have a family of predictions, which are defined by one or more parameters, e.g. the log-likelihood test: S = log L(s M helix) L(s M loop ) > λ λ λ λ λ λ Real False

37 Accuracy measure Although we have one single model in fact we have a family of predictions, which are defined by one or more parameters, e.g. the log-likelihood test: S = log L(s M helix) L(s M loop ) > λ λ λ λ λ λ λ λ TP FP TN FN Real False λ λ λ

38 Receiver Operating Characteristic (ROC) curve A ROC curve is a graphical plot of TPR (Sn) vs. FPR built for the same prediction model by varying one or more of the model parameters It is quite common in binary classifiers For instance, it can be plotted for several values of the discrimination threshold, but other parameters of the model can be used. Real λ λ λ λ λ TPR FPR False λ λ λ λ λ

39 Receiver Operating Characteristic (ROC) curve Distribution of the scores In negative cases Example for threshold B This area are our positive predictions True Negative TPR FPR In positive cases False Negative A B Low TPR Low FPR C High TPR High FPR Threshold criterion TPR = TP TP + FN FPR = FP FP + TN

40 Receiver Operating Characteristic (ROC) curve Distribution of the scores In negative cases True Negative 1 Model classification In positive cases False Negative Threshold criterion TPR 0 Random classification 0 1 FPR TPR = TP TP + FN FPR = FP FP + TN

41 Receiver Operating Characteristic (ROC) curve Each dot in the line corresponds to a choice of parameters (usually 1 single parameter) The information that is not visible in this graph is the threshold used at each point of the graph. The x=y line corresponds to the random classification, i.e choosing positive or negative at every threshold with 50% chance. 1 TPR 0 Model classification Random classification 0 1 FPR TPR = TP TP + FN FPR = FP FP + TN

42 Receiver Operating Characteristic (ROC) curve Example: Consider the ranking of scores: S = log L(s M helix ) L(s M loop ) S

43 Receiver Operating Characteristic (ROC) curve Example: Consider the ranking of scores: The test set is labeled: S Known label 10 R 7 R 4 R 2 F 1 R -0.4 R -2 F -5 F -9 F S = log L(s M helix ) L(s M loop )

44 Receiver Operating Characteristic (ROC) curve Example: Consider the ranking of scores: Let s choose a cut-off (a λ): S = log L(s M helix ) L(s M loop ) S Known label 10 R 7 R 4 R 2 F 1 R -0.4 R -2 F -5 F -9 F 3 = Cut-off for prediction, i.e. above this value we predict R

45 Receiver Operating Characteristic (ROC) curve Example: Consider the ranking of scores: Calculate TP, FP, for this λ S = log L(s M helix ) L(s M loop ) TPR = S Known label 10 R 7 R 4 R 2 F 1 R -0.4 R -2 F -5 F -9 F TP TP + FN FPR = FP FP + TN λ TP FP TN FN TPR FPR /5 0

46 Receiver Operating Characteristic (ROC) curve Example: Consider the ranking of scores: Repeat for other λ s S = log L(s M helix ) L(s M loop ) S Known label 10 R 7 R 4 R 2 F 1 R -0.4 R -2 F -5 F -9 F λ TP FP TN FN TPR FPR / /5 1/4 Note: I m using arbitrary intermediate values for cut-off

47 Receiver Operating Characteristic (ROC) curve Example: Consider the ranking of scores: Repeat for other λ s S = log L(s M helix ) L(s M loop ) S Known label 10 R 7 R 4 R 2 F 1 R -0.4 R -2 F -5 F -9 F λ TP FP TN FN TPR FPR / /5 1/ /4 Note: I m using arbitrary intermediate values for cut-off

48 Receiver Operating Characteristic (ROC) curve Example: Consider the ranking of scores: S = log L(s M helix ) L(s M loop ) Exercise: complete the table You should see that for smaller cut-offs the TPR (sensitivity) increases but the FPR increases as well (i.e. the specificity drops) Whereas for high cut-offs TPR decreases but the FPR is low (specificity is high) λ TP FP TN FN TPR FPR / /5 1/4 The variability of the accuracy as a function of the parameters and/or cut-offs is generally described with a ROC curve /4

49 Receiver Operating Characteristic (ROC) curve Comparing multiple methods Each line corresponds to a different method ROC curves Random classification Better models are further from the x=y line (random classification) Method 1 Method 2 Method 3 (see e.g. Corvelo et al. PLOS Comp. Biology 2010)

50 Receiver Operating Characteristic (ROC) curve Example: If you wish to discover at least 60% of the true elements (TPR=0.6), the graph says that Model 1 has lower FPR than Model 2 and 3. We may want to choose Model 1. We would then decide to make predictions with Model 1 and choose parameters that produce FPR=0.2 at TPR=0.6 ROC curves Random classification Method 1 Method 2 Method 3 But is this the best choice?

51 Receiver Operating Characteristic (ROC) curve Optimal configuration Note that the more distant the points from the diagonal (the line of TPR=FPR) the better the classification. ROC curves An optimal choice for a dot in the curve is the one that is at a maximum distance from the TPR=FPR line. There are standard methods to calculate this point. But again: this is optimal for the balance of TPR and FPR, but it might not be the one most appropriate for the model at hand, e.g. predicting novel genes. Method 1 Method 2 Method 3

52 Receiver Operating Characteristic (ROC) curve ROC curves Method 1 Method 2 Method 3 Models A summary measure for the best model is the Area Under the Curve (AUC). The best model in general will have the highest AUC The maximum value is AUC=1. The closer AUC is to one, the better the model There are also standard methods to estimate the AUC from the sampled

53 Receiver Operating Characteristic (ROC) curve ROC curves Method 1 Method 2 Method 3 Question: Models Why do you think there are error bars in the AUC barplot and in the ROC curves?

54 Precision recall curves ROC curves are useful to compare predictive models. However, they still do not provide a complete picture of the accuracy of models. If we predict many TPs at the cost of producing many false predictions (FP is large), the FPR might not look so bad if in our testing set we have many Negatives, such that TN >> FP: FPR = FP FP + TN " "" 0 TN large So we may have a situation where our TPR is high, the FPR is low, but where for the actual counts FP >> TP That is, TPR is not affected by FP and FPR can be low even if FP is high (as long as TN >> FP).

55 Precision recall curves For instance, consider a method to classify documents. Let s supposed that the first method selects 100 documents, but 40 are correct. Imagine that our test set is composed of 100 True instances and Negative instances. TPR 1 = TP TP + FN = = 0.4 FPR 1 = FP FP +TN = = 0.006

56 Precision recall curves For instance, consider a method to classify documents. Let s supposed that the first method selects 100 documents, but 40 are correct. Imagine that our test set is composed of 100 True instances and Negative instances. TPR 1 = TP TP + FN = = 0.4 FPR 1 = FP FP +TN = = Now consider a second method selects 680 documents with 80 correct, and imagine that our test set is composed now of 100 True instances and Negative instances. TPR 2 = TP TP + FN = = 0.8 FPR 2 = FP FP +TN = = Which method is better?

57 Precision recall curves The second one may seem better, because it retrieves more relevant documents, but the proportion of predictions that are correct (precision or PPV) is smaller: PPV = TP TP + FP Precision 1 = = 0.40 Precision 1 = = 0.11 (Note: you can also use FDR = 1 PPV) Thus, one must also take into account the relative cost of the predictions, i.e. the FN and FP values that must be assumed to achieve high TPR One can make TN arbitrarily large to make FPR à 0 So other accuracy measures are needed to have a more correct picture.

58 Precision recall curves Precision = proportion of the predictions that are correct precision = PPV = TP TP + FP Recall = proportion of the true instances that are correctly recovered recall = TPR = TP TP + FN (see e.g. Plass et al. RNA 2012)

59 Precision recall curves Model 1 Has greater AUC, but low precision (high cost of false positives) Model B We achieve a lower AUC than model A, but still pretty good. Precision is highly improved

60 Precision recall curves Model 1 Has greater AUC, but low precision (high cost of false positives) Model 2 We achieve a lower AUC than model A, but still quite good. Precision is highly improved

61 References Data Mining: Prac-cal Machine Learning Tools and Techniques. Ian H. Wi)en, Eibe Frank, Mark A. Hall. Morgan Kaufmann ISBN Methods for Computa-onal Gene Predic-on. W.H. Majoros. Cambridge University Press 2007

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Performance Evaluation

Performance Evaluation Performance Evaluation Confusion Matrix: Detected Positive Negative Actual Positive A: True Positive B: False Negative Negative C: False Positive D: True Negative Recall or Sensitivity or True Positive

More information

Performance evaluation of binary classifiers

Performance evaluation of binary classifiers Performance evaluation of binary classifiers Kevin P. Murphy Last updated October 10, 2007 1 ROC curves We frequently design systems to detect events of interest, such as diseases in patients, faces in

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

Introduction to Supervised Learning. Performance Evaluation

Introduction to Supervised Learning. Performance Evaluation Introduction to Supervised Learning Performance Evaluation Marcelo S. Lauretto Escola de Artes, Ciências e Humanidades, Universidade de São Paulo marcelolauretto@usp.br Lima - Peru Performance Evaluation

More information

Evaluation & Credibility Issues

Evaluation & Credibility Issues Evaluation & Credibility Issues What measure should we use? accuracy might not be enough. How reliable are the predicted results? How much should we believe in what was learned? Error on the training data

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

Machine Learning in Action

Machine Learning in Action Machine Learning in Action Tatyana Goldberg (goldberg@rostlab.org) August 16, 2016 @ Machine Learning in Biology Beijing Genomics Institute in Shenzhen, China June 2014 GenBank 1 173,353,076 DNA sequences

More information

Performance Measures. Sören Sonnenburg. Fraunhofer FIRST.IDA, Kekuléstr. 7, Berlin, Germany

Performance Measures. Sören Sonnenburg. Fraunhofer FIRST.IDA, Kekuléstr. 7, Berlin, Germany Sören Sonnenburg Fraunhofer FIRST.IDA, Kekuléstr. 7, 2489 Berlin, Germany Roadmap: Contingency Table Scores from the Contingency Table Curves from the Contingency Table Discussion Sören Sonnenburg Contingency

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }

More information

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Q1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each)

Q1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each) Q1 (1 points): Chap 4 Exercise 3 (a) to (f) ( points each) Given a table Table 1 Dataset for Exercise 3 Instance a 1 a a 3 Target Class 1 T T 1.0 + T T 6.0 + 3 T F 5.0-4 F F 4.0 + 5 F T 7.0-6 F T 3.0-7

More information

Machine Learning Concepts in Chemoinformatics

Machine Learning Concepts in Chemoinformatics Machine Learning Concepts in Chemoinformatics Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-Universität Bonn BigChem Winter School 2017 25. October Data Mining in Chemoinformatics

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics.

Plan. Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics. Plan Lecture: What is Chemoinformatics and Drug Design? Description of Support Vector Machine (SVM) and its used in Chemoinformatics. Exercise: Example and exercise with herg potassium channel: Use of

More information

CSC314 / CSC763 Introduction to Machine Learning

CSC314 / CSC763 Introduction to Machine Learning CSC314 / CSC763 Introduction to Machine Learning COMSATS Institute of Information Technology Dr. Adeel Nawab More on Evaluating Hypotheses/Learning Algorithms Lecture Outline: Review of Confidence Intervals

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Classifier Evaluation. Learning Curve cleval testc. The Apparent Classification Error. Error Estimation by Test Set. Classifier

Classifier Evaluation. Learning Curve cleval testc. The Apparent Classification Error. Error Estimation by Test Set. Classifier Classifier Learning Curve How to estimate classifier performance. Learning curves Feature curves Rejects and ROC curves True classification error ε Bayes error ε* Sub-optimal classifier Bayes consistent

More information

Supplementary Information

Supplementary Information Supplementary Information Performance measures A binary classifier, such as SVM, assigns to predicted binding sequences the positive class label (+1) and to sequences predicted as non-binding the negative

More information

Anomaly Detection. Jing Gao. SUNY Buffalo

Anomaly Detection. Jing Gao. SUNY Buffalo Anomaly Detection Jing Gao SUNY Buffalo 1 Anomaly Detection Anomalies the set of objects are considerably dissimilar from the remainder of the data occur relatively infrequently when they do occur, their

More information

Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms.

Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms. Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms. January 5, 25 Outline Methodologies for the development of classification

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Evaluating Classifiers. Lecture 2 Instructor: Max Welling Evaluating Classifiers Lecture 2 Instructor: Max Welling Evaluation of Results How do you report classification error? How certain are you about the error you claim? How do you compare two algorithms?

More information

Performance Evaluation

Performance Evaluation Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,

More information

Smart Home Health Analytics Information Systems University of Maryland Baltimore County

Smart Home Health Analytics Information Systems University of Maryland Baltimore County Smart Home Health Analytics Information Systems University of Maryland Baltimore County 1 IEEE Expert, October 1996 2 Given sample S from all possible examples D Learner L learns hypothesis h based on

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Performance Evaluation

Performance Evaluation Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Example:

More information

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1 CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1 IEEE Expert, October 1996 CptS 570 - Machine Learning 2 Given sample S from all possible examples D Learner

More information

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Definitions Data. Consider a set A = {A 1,...,A n } of attributes, and an additional

More information

Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn. Some overheads from Galit Shmueli and Peter Bruce 2010

Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn. Some overheads from Galit Shmueli and Peter Bruce 2010 Big Data Analytics: Evaluating Classification Performance April, 2016 R. Bohn 1 Some overheads from Galit Shmueli and Peter Bruce 2010 Most accurate Best! Actual value Which is more accurate?? 2 Why Evaluate

More information

Hidden Markov Models for biological sequence analysis

Hidden Markov Models for biological sequence analysis Hidden Markov Models for biological sequence analysis Master in Bioinformatics UPF 2017-2018 http://comprna.upf.edu/courses/master_agb/ Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA

More information

Pointwise Exact Bootstrap Distributions of Cost Curves

Pointwise Exact Bootstrap Distributions of Cost Curves Pointwise Exact Bootstrap Distributions of Cost Curves Charles Dugas and David Gadoury University of Montréal 25th ICML Helsinki July 2008 Dugas, Gadoury (U Montréal) Cost curves July 8, 2008 1 / 24 Outline

More information

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only

More information

Hypothesis Evaluation

Hypothesis Evaluation Hypothesis Evaluation Machine Learning Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall 1395 1 / 31 Table of contents 1 Introduction

More information

Performance Evaluation and Hypothesis Testing

Performance Evaluation and Hypothesis Testing Performance Evaluation and Hypothesis Testing 1 Motivation Evaluating the performance of learning systems is important because: Learning systems are usually designed to predict the class of future unlabeled

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Diagnostics. Gad Kimmel

Diagnostics. Gad Kimmel Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,

More information

Hidden Markov Models for biological sequence analysis I

Hidden Markov Models for biological sequence analysis I Hidden Markov Models for biological sequence analysis I Master in Bioinformatics UPF 2014-2015 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Example: CpG Islands

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

CSC 411: Lecture 03: Linear Classification

CSC 411: Lecture 03: Linear Classification CSC 411: Lecture 03: Linear Classification Richard Zemel, Raquel Urtasun and Sanja Fidler University of Toronto Zemel, Urtasun, Fidler (UofT) CSC 411: 03-Classification 1 / 24 Examples of Problems What

More information

Classifier performance evaluation

Classifier performance evaluation Classifier performance evaluation Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských partyzánu 1580/3, Czech Republic

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes dr. Petra Kralj Novak Petra.Kralj.Novak@ijs.si 7.11.2017 1 Course Prof. Bojan Cestnik Data preparation Prof. Nada Lavrač: Data mining overview Advanced

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Introduction to Signal Detection and Classification. Phani Chavali

Introduction to Signal Detection and Classification. Phani Chavali Introduction to Signal Detection and Classification Phani Chavali Outline Detection Problem Performance Measures Receiver Operating Characteristics (ROC) F-Test - Test Linear Discriminant Analysis (LDA)

More information

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference

hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference CS 229 Project Report (TR# MSB2010) Submitted 12/10/2010 hsnim: Hyper Scalable Network Inference Machine for Scale-Free Protein-Protein Interaction Networks Inference Muhammad Shoaib Sehgal Computer Science

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Article from Predictive Analytics and Futurism July 2016 Issue 13 Regression and Classification: A Deeper Look By Jeff Heaton Classification and regression are the two most common forms of models fitted

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Moving Average Rules to Find. Confusion Matrix. CC283 Intelligent Problem Solving 05/11/2010. Edward Tsang (all rights reserved) 1

Moving Average Rules to Find. Confusion Matrix. CC283 Intelligent Problem Solving 05/11/2010. Edward Tsang (all rights reserved) 1 Machine Learning Overview Supervised Learning Training esting Te Unseen data Data Observed x 1 x 2... x n 1.6 7.1... 2.7 1.4 6.8... 3.1 2.1 5.4... 2.8... Machine Learning Patterns y = f(x) Target y Buy

More information

Optimizing Abstaining Classifiers using ROC Analysis. Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / ICML 2005 August 9, 2005

Optimizing Abstaining Classifiers using ROC Analysis. Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / ICML 2005 August 9, 2005 IBM Zurich Research Laboratory, GSAL Optimizing Abstaining Classifiers using ROC Analysis Tadek Pietraszek / 'tʌ dek pɪe 'trʌ ʃek / pie@zurich.ibm.com ICML 2005 August 9, 2005 To classify, or not to classify:

More information

Classification using stochastic ensembles

Classification using stochastic ensembles July 31, 2014 Topics Introduction Topics Classification Application and classfication Classification and Regression Trees Stochastic ensemble methods Our application: USAID Poverty Assessment Tools Topics

More information

CC283 Intelligent Problem Solving 28/10/2013

CC283 Intelligent Problem Solving 28/10/2013 Machine Learning What is the research agenda? How to measure success? How to learn? Machine Learning Overview Unsupervised Learning Supervised Learning Training Testing Unseen data Data Observed x 1 x

More information

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite

More information

Stats notes Chapter 5 of Data Mining From Witten and Frank

Stats notes Chapter 5 of Data Mining From Witten and Frank Stats notes Chapter 5 of Data Mining From Witten and Frank 5 Credibility: Evaluating what s been learned Issues: training, testing, tuning Predicting performance: confidence limits Holdout, cross-validation,

More information

Multiple regression: Categorical dependent variables

Multiple regression: Categorical dependent variables Multiple : Categorical Johan A. Elkink School of Politics & International Relations University College Dublin 28 November 2016 1 2 3 4 Outline 1 2 3 4 models models have a variable consisting of two categories.

More information

Classification. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 162

Classification. Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester / 162 Classification Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2015 66 / 162 Department Biosysteme Karsten Borgwardt Data Mining Course Basel Fall Semester 2015 67 / 162

More information

Data Mining algorithms

Data Mining algorithms Data Mining algorithms 2017-2018 spring 02.07-09.2018 Overview Classification vs. Regression Evaluation I Basics Bálint Daróczy daroczyb@ilab.sztaki.hu Basic reachability: MTA SZTAKI, Lágymányosi str.

More information

Probability and Statistics. Terms and concepts

Probability and Statistics. Terms and concepts Probability and Statistics Joyeeta Dutta Moscato June 30, 2014 Terms and concepts Sample vs population Central tendency: Mean, median, mode Variance, standard deviation Normal distribution Cumulative distribution

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2013/2014 Lesson 18 23 April 2014 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

Data Privacy in Biomedicine. Lecture 11b: Performance Measures for System Evaluation

Data Privacy in Biomedicine. Lecture 11b: Performance Measures for System Evaluation Data Privacy in Biomedicine Lecture 11b: Performance Measures for System Evaluation Bradley Malin, PhD (b.malin@vanderbilt.edu) Professor of Biomedical Informatics, Biostatistics, & Computer Science Vanderbilt

More information

Machine Learning - Michaelmas Term 2016 Lectures 9, 10 : Support Vector Machines and Kernel Methods

Machine Learning - Michaelmas Term 2016 Lectures 9, 10 : Support Vector Machines and Kernel Methods Machine Learning - Michaelmas Term 206 Lectures 9, 0 : Support Vector Machines and Kernel Methods Lecturer: Varun Kanade In the previous lecture, we studied logistic regression, a discriminative model

More information

Filter Methods. Part I : Basic Principles and Methods

Filter Methods. Part I : Basic Principles and Methods Filter Methods Part I : Basic Principles and Methods Feature Selection: Wrappers Input: large feature set Ω 10 Identify candidate subset S Ω 20 While!stop criterion() Evaluate error of a classifier using

More information

Evaluation Metrics for Intrusion Detection Systems - A Study

Evaluation Metrics for Intrusion Detection Systems - A Study Evaluation Metrics for Intrusion Detection Systems - A Study Gulshan Kumar Assistant Professor, Shaheed Bhagat Singh State Technical Campus, Ferozepur (Punjab)-India 152004 Email: gulshanahuja@gmail.com

More information

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7. Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also

More information

CS395T Computational Statistics with Application to Bioinformatics

CS395T Computational Statistics with Application to Bioinformatics CS395T Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2009 The University of Texas at Austin Unit 21: Support Vector Machines The University of Texas at

More information

Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines

Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines Article Prediction and Classif ication of Human G-protein Coupled Receptors Based on Support Vector Machines Yun-Fei Wang, Huan Chen, and Yan-Hong Zhou* Hubei Bioinformatics and Molecular Imaging Key Laboratory,

More information

Computational Statistics with Application to Bioinformatics. Unit 18: Support Vector Machines (SVMs)

Computational Statistics with Application to Bioinformatics. Unit 18: Support Vector Machines (SVMs) Computational Statistics with Application to Bioinformatics Prof. William H. Press Spring Term, 2008 The University of Texas at Austin Unit 18: Support Vector Machines (SVMs) The University of Texas at

More information

BLAST: Target frequencies and information content Dannie Durand

BLAST: Target frequencies and information content Dannie Durand Computational Genomics and Molecular Biology, Fall 2016 1 BLAST: Target frequencies and information content Dannie Durand BLAST has two components: a fast heuristic for searching for similar sequences

More information

Machine Learning and Data Mining. Linear classification. Kalev Kask

Machine Learning and Data Mining. Linear classification. Kalev Kask Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification

More information

Online Advertising is Big Business

Online Advertising is Big Business Online Advertising Online Advertising is Big Business Multiple billion dollar industry $43B in 2013 in USA, 17% increase over 2012 [PWC, Internet Advertising Bureau, April 2013] Higher revenue in USA

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Information Retrieval Tutorial 6: Evaluation

Information Retrieval Tutorial 6: Evaluation Information Retrieval Tutorial 6: Evaluation Professor: Michel Schellekens TA: Ang Gao University College Cork 2012-11-30 IR Evaluation 1 / 19 Overview IR Evaluation 2 / 19 Precision and recall Precision

More information

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018

High-Throughput Sequencing Course. Introduction. Introduction. Multiple Testing. Biostatistics and Bioinformatics. Summer 2018 High-Throughput Sequencing Course Multiple Testing Biostatistics and Bioinformatics Summer 2018 Introduction You have previously considered the significance of a single gene Introduction You have previously

More information

Least Squares Classification

Least Squares Classification Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting

More information

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018

PAC Learning Introduction to Machine Learning. Matt Gormley Lecture 14 March 5, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University PAC Learning Matt Gormley Lecture 14 March 5, 2018 1 ML Big Picture Learning Paradigms:

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014

CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014 CS4445 Data Mining and Knowledge Discovery in Databases. B Term 2014 Solutions Exam 2 - December 15, 2014 Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute NAME: Prof.

More information

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes November 9, 2012 1 / 1 Nearest centroid rule Suppose we break down our data matrix as by the labels yielding (X

More information

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report Hujia Yu, Jiafu Wu [hujiay, jiafuwu]@stanford.edu 1. Introduction Housing prices are an important

More information

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH

HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH HYPERGRAPH BASED SEMI-SUPERVISED LEARNING ALGORITHMS APPLIED TO SPEECH RECOGNITION PROBLEM: A NOVEL APPROACH Hoang Trang 1, Tran Hoang Loc 1 1 Ho Chi Minh City University of Technology-VNU HCM, Ho Chi

More information

Solving and Graphing a Linear Inequality of a Single Variable

Solving and Graphing a Linear Inequality of a Single Variable Chapter 3 Graphing Fundamentals Section 3.1 Solving and Graphing a Linear Inequality of a Single Variable TERMINOLOGY 3.1 Previously Used: Isolate a Variable Simplifying Expressions Prerequisite Terms:

More information

CLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC

CLASSIFICATION NAIVE BAYES. NIKOLA MILIKIĆ UROŠ KRČADINAC CLASSIFICATION NAIVE BAYES NIKOLA MILIKIĆ nikola.milikic@fon.bg.ac.rs UROŠ KRČADINAC uros@krcadinac.com WHAT IS CLASSIFICATION? A supervised learning task of determining the class of an instance; it is

More information

Microarray Data Analysis: Discovery

Microarray Data Analysis: Discovery Microarray Data Analysis: Discovery Lecture 5 Classification Classification vs. Clustering Classification: Goal: Placing objects (e.g. genes) into meaningful classes Supervised Clustering: Goal: Discover

More information

CS 188: Artificial Intelligence. Outline

CS 188: Artificial Intelligence. Outline CS 188: Artificial Intelligence Lecture 21: Perceptrons Pieter Abbeel UC Berkeley Many slides adapted from Dan Klein. Outline Generative vs. Discriminative Binary Linear Classifiers Perceptron Multi-class

More information

Statistics for classification

Statistics for classification AstroInformatics Statistics for classification Una rappresentazione utile è la matrice di confusione. L elemento sulla riga i e sulla colonna j è il numero assoluto oppure la percentuale di casi della

More information

15-388/688 - Practical Data Science: Nonlinear modeling, cross-validation, regularization, and evaluation

15-388/688 - Practical Data Science: Nonlinear modeling, cross-validation, regularization, and evaluation 15-388/688 - Practical Data Science: Nonlinear modeling, cross-validation, regularization, and evaluation J. Zico Kolter Carnegie Mellon University Fall 2016 1 Outline Example: return to peak demand prediction

More information

CSE 546 Final Exam, Autumn 2013

CSE 546 Final Exam, Autumn 2013 CSE 546 Final Exam, Autumn 0. Personal info: Name: Student ID: E-mail address:. There should be 5 numbered pages in this exam (including this cover sheet).. You can use any material you brought: any book,

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP

More information

Learning Methods for Linear Detectors

Learning Methods for Linear Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2

More information

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome.

Classification. Classification is similar to regression in that the goal is to use covariates to predict on outcome. Classification Classification is similar to regression in that the goal is to use covariates to predict on outcome. We still have a vector of covariates X. However, the response is binary (or a few classes),

More information

A Posteriori Corrections to Classification Methods.

A Posteriori Corrections to Classification Methods. A Posteriori Corrections to Classification Methods. Włodzisław Duch and Łukasz Itert Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland; http://www.phys.uni.torun.pl/kmk

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information