Bayesian Decision Theory

Similar documents
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

Bayesian Decision Theory

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Machine Learning Linear Classification. Prof. Matteo Matteucci

Error Rates. Error vs Threshold. ROC Curve. Biometrics: A Pattern Recognition System. Pattern classification. Biometrics CSE 190 Lecture 3

Classifier performance evaluation

Minimum Error-Rate Discriminant

Bayesian Decision Theory

Stephen Scott.

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Performance Evaluation and Comparison

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Linear Classifiers as Pattern Detectors

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

BANA 7046 Data Mining I Lecture 4. Logistic Regression and Classications 1

Introduction to Supervised Learning. Performance Evaluation

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Least Squares Classification

Introduction to Machine Learning

Bayes Decision Theory

Performance evaluation of binary classifiers

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Introduction to Signal Detection and Classification. Phani Chavali

Naïve Bayes classification

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Performance Evaluation

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

Model Accuracy Measures

Bayesian Decision Theory Lecture 2

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

What does Bayes theorem give us? Lets revisit the ball in the box example.

Applied Machine Learning Annalisa Marsico

Generalized Linear Models

MACHINE LEARNING ADVANCED MACHINE LEARNING

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

Classifica(on and predic(on omics style. Dr Nicola Armstrong Mathema(cs and Sta(s(cs Murdoch University

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Learning Methods for Linear Detectors

Bayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?

Q1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each)

Introduction to Machine Learning

Pattern Classification

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Performance Evaluation

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Linear Classifiers as Pattern Detectors

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

Concerns of the Psychophysicist. Three methods for measuring perception. Yes/no method of constant stimuli. Detection / discrimination.

LDA, QDA, Naive Bayes

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko.

Intelligent Systems Statistical Machine Learning

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

PATTERN RECOGNITION AND MACHINE LEARNING

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Confusion matrix. a = true positives b = false negatives c = false positives d = true negatives 1. F-measure combines Recall and Precision:

Data Privacy in Biomedicine. Lecture 11b: Performance Measures for System Evaluation

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Evaluation & Credibility Issues

The Bayes classifier

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Intelligent Systems Statistical Machine Learning

IN Pratical guidelines for classification Evaluation Feature selection Principal component transform Anne Solberg

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

L11: Pattern recognition principles

7 Gaussian Discriminant Analysis (including QDA and LDA)

day month year documentname/initials 1

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Introduction to Machine Learning

Computational paradigms for the measurement signals processing. Metodologies for the development of classification algorithms.

Machine Learning for natural language processing

Multivariate statistical methods and data mining in particle physics

Maximum Likelihood Estimation. only training data is available to design a classifier

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

15-388/688 - Practical Data Science: Nonlinear modeling, cross-validation, regularization, and evaluation

The Naïve Bayes Classifier. Machine Learning Fall 2017

Performance Evaluation and Hypothesis Testing

Evaluation. Andrea Passerini Machine Learning. Evaluation

Lecture 9: Classification, LDA

Classifier Evaluation. Learning Curve cleval testc. The Apparent Classification Error. Error Estimation by Test Set. Classifier

Classification, Linear Models, Naïve Bayes

Lecture 9: Classification, LDA

Topic 3: Hypothesis Testing

Cost-based classifier evaluation for imbalanced problems

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 3

Diagnostics. Gad Kimmel

10-701/ Machine Learning - Midterm Exam, Fall 2010

Machine Learning Concepts in Chemoinformatics

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Evaluation requires to define performance measures to be optimized

Machine Learning, Fall 2009: Midterm

Transcription:

Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi

Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is either linear or quadratic. In statistics, this approach to the classification task is known as linear discriminant analysis (LDA) or quadratic discriminant analysis(qda)

Remarks

Error probabilities We can obtain additional insight into the operation of a general classifier if we consider the sources of its error. Suppose the dichotomizer has divided the space into two regions R 1 and R 2 in a possibly non-optimal way. For multicategory cases:

Operating Characteristic Started in electronic signal detection theory (1940s - 1950s) A simple example: A person is faced with a signal and must make a decision, is the signal there or not. What makes this situation confusing and difficult is the presences of other mess that is similar to the signal. It has become very popular in biomedical applications. Also used in machine learning applications to assess classifiers ROC Receiver Operating Characteristic

Suppose we are interested in cancer test using a measured blood parameter like protein level (x). o x has mean µ 2 for cancerous sample, o x has mean µ 1 for healthy sample. o the actual value is a random variable. o The classifier employs a threshold value x for determination o The outcomes are labeled either as positive or negative

P(x>x x ω 2 ): the probability that the protein level (x) is above x given that the sample is cancerous. True positive (TP) (Hit) False positive (FP) (False alarm) P(x>x x ω 1 ): the probability that the protein level (x) is above x despite the sample is healthy.

P(x<x x ω 2 ): the probability that the protein level (x) is below x given that the sample is cancerous. True Negative (TN) (Correct rejection) False Negative (FN) (Miss) P(x<x x ω 1 ): the probability that the protein level (x) is below x given that the sample is healthy.

Accuracy is defined as the proportion of patterns predicted truly among all patterns positive/negative refers to prediction true/false refers to correctness Confusion matrix Assigned TN FN TP True 1 2 1 2 TN FP FN TP N P FP we assume here that we know only the state of nature and the decision of the system.

Accuracy is defined as the proportion of patterns predicted truly among all patterns Sensitivity is defined as the proportion of patterns predicted truly as positive among all positive patterns (Recall) Confusion matrix Assigned 1 2 True 1 2 TN FN FP TP N P False Positive Rate is defined as the proportion of patterns predicted falsely as positive among all negative patterns Specificity is the proportion of patterns predicted truly as negative among all negative responses

perfect classification A ROC space 1 Sensitivity B D Random guess C 1- Specificity ROC Space depicts relative trade-offs between true positive (benefits) and false positive (costs)

ROC space P(x i ) Sensitivity 1 2 1- Specificity x* x The actual shape of the curve is determined by how much overlap the two distributions have.

ROC space P(x i ) Sensitivity 1 2 1- Specificity x

ROC curves are often summarized into a single metric known as the: Area under the curve (AUC). Sensitivity d ROC space J ROC curve can be used for determination of the optimal cutoff point o minimize the distance to topleft corner (d) o identify the point with furthest vertical distance from the diagonal line (J) 1- Specificity AUC can be used for performance comparison across different classifiers

We should note that since the distributions can be arbitrary, the operating characteristic need not be symmetric

Receiver operating characteristic Syntax [tpr,fpr,thresholds] roc(targets,outputs) load iris_dataset net patternnet(20); net train(net,irisinputs,iristargets); irisoutputs sim(net,irisinputs); [tpr,fpr,thresholds] roc(iristargets,irisoutputs) plotroc(targets,outputs) Is it possible to perform ROC analysis for a multiclass classification problem?

Discrete Features In many practical applications the components of x are binary-, ternary-, or higher integer valued, so that x can assume only one of d discrete value. The definition of the conditional risk R(α x) is unchanged, and the fundamental Bayes decision rule remains the same. consider the two-category problem in which the components of the feature vector are binary-valued and conditionally independent

Discrete Features Let x(x 1,..., x d ) t each feature gives us a yes/no answer about the pattern Likelihood ratio ln ln

Summary Basic ideas of Bayes decision theory is to minimize the overall risk Choose the action that minimizes the conditional risk Weighted posterior for different penalties for misclassifying patterns If the underlying distributions are multivariate Gaussian, the decision boundaries will be hyper-quadrics Receiver operating characteristic curves describe the inherent and unchangeable properties of a classifier and can be used, for example, to determine the Bayes rate.