Lecture 8: Classification

Size: px
Start display at page:

Download "Lecture 8: Classification"

Transcription

1 1/26 Lecture 8: Classification Måns Eriksson Department of Mathematics, Uppsala University Multivariate Methods 19/5 2010

2 Classification: introductory examples Goal: Classify an observation x as belonging to one of several predefined categories, π 1, π 2,..., π g. Examples: Classify insects into one of several sub-species using measurements on external features. Use measurements on blood proteins and family history to classify women as carriers or non-carriers of a genetic disorder. Classify the quality of a new mobile phone battery as good or bad based on a few preliminary measurements. Use information on background, family support, psychological test scores etc. to screen applicants for parole from prison. Use information on sex, age, income, education level, marital status, debts etc. to classify a potential borrower as eligible or ineligible for a bank loan. Detect spam messages based upon the message header and content. 2/26

3 3/26 Classification: discrimination rule We d like to find a discrimination rule for classification that In general classifies observations correctly. Minimizes the probability of misclassification. Minimizes the expected cost of misclassification. Ideally should be a simple rule. If the distributions of the populations are known, we can use this knowledge to derive rules. Otherwise, we must use training data to find good rules.

4 4/26 ML approach: assumptions We ve seem many times before that the likelihood approach yields good tests and estimators. It seems reasonable to try to use it for classification. In the maximum likelihood approach to discrimination, the distributions of the g populations are assumed to be known. Simplest to analyse theoretically, although the least realistic in practice - Mardia, Kent & Bibby (1979). The ML discriminant rule for allocating an observation x to one of the populations π 1,..., π g is to allocate x to the population which gives the largest likelihood to x. See blackboard!

5 ML approach: univariate case Consider the univariate case with two populations: Population 1: X N(µ 1, σ 2 ) Population 2: X N(µ 2, σ 2 ) /26

6 6/26 ML approach: likelihood ratio For these populations, the likelihood ratio in the point x is: Thus: λ = likelihood of x for Pop. 1 likelihood of x for Pop. 2 = f 1(x) f 2 (x) = 1 σ 2π e (x µ 1) 2 /2σ 2 1 σ 2π e (x µ 2) 2 /2σ 2 { λ = exp 1 ( (x µ1 ) 2 2 σ 2 (x µ 2) 2 )} σ 2 Rule: classify x into Pop. 1 if λ > 1, into Pop. 2 if λ < 1 and flip a coin if λ = 1.

7 7/26 ML approach: further elaboration The rule tells us to classify into Pop. 1 if the standardized distance of x from µ 1 is less than the standardized distance of x from µ 2. We can rewrite the rule in a simpler form by taking logarithms: 2 ln λ = (x µ 1) 2 σ 2 (x µ 2) 2 σ 2 = 2 (µ 1 µ 2 ) σ 2 x + µ2 1 µ2 2 σ 2 = βx + α Rule: classify into Pop. 1 if βx + α < 0. This is a linear rule.

8 8/26 ML approach: multivariate case Now, consider the more general setting where p traits are measured, with Population 1: X N p (µ 1, Σ 1 ), Population 2: X N p (µ 2, Σ 2 ). Consider the natural logarithm of the likelihood ratio for an observed x for some individual: ( 1 (2π) p/2 (det(σ 1)) exp { (x µ 1/2 1 ) Σ 1 2 ln ( ) f1 (x) = 2 ln f 2 (x) =ln(det Σ 1 ) + (x µ 1 ) Σ 1 1 (x µ 1) 1 (x µ 1)/2 } ) 2 (x µ 2)/2 } ( ) ln(det Σ 2 ) + (x µ 2 ) Σ 1 2 (x µ 2) 1 (2π) p/2 (det(σ 2)) exp { (x µ 1/2 2 ) Σ 1 Classify into Pop 1 if this quantity is less than zero, otherwise classify into Pop 2. This is a quadratic rule, but if the covariance matrices are equal it reduces to a linear rule. See blackboard!

9 9/26 ML approach: general g Theorem. If π i is the N p (µ i, Σ) population, i = 1,..., g and Σ > 0, then the ML discriminant rule allocates x to π j, where j {1,..., g} is that value of i which minimizes the square of the Mahalanobis distance (x µ i ) Σ 1 (x µ i ). When g = 2, the rule allocates x to π 1 if α (x µ) > 0 where α = Σ 1 (µ 1 µ 2 ) and µ = 1 2 (µ 1 + µ 2 ), and to π 2 otherwise.

10 10/26 ML approach: an example See blackboard!

11 11/26 Decision theory approach: idea In many cases, we have some more information or would like to have other criterions for the classification. What if we want to minimize the probability of misclassification? Does that change anything? What if one kind of misclassification costs more than another? How can we take this into account? What if we know beforehand that, say, 80 % of the observations belong to π 1 and 20 % belong to π 2? How can we use this knowledge? Decision theory is the theory concerned with finding optimal decisions given certain information. Estimation and testing can both be viewed in a decision theoretical context, as can classification.

12 12/26 Decision theory approach: misclassification Suppose that we have some prior probabilities of the observations belonging to the different populations. Prior probability that an individual comes from Pop. 1: p 1 = P(π 1 ) Prior probability that an individual comes from Pop. 2: Now, let p 2 = P(π 2 ) = 1 p 1. P(2 1) = Conditional probability of misclassifying an observation into Pop. 2 if the observation actually belongs to Pop. 1. P(1 2) = Conditional probability of misclassifying an observation into Pop. 1 if the observation actually belongs to Pop. 2.

13 13/26 Decision theory approach: TPM Partition the sample space into R 1 and R 2 = R 1 such that If x R 1, classify to π 1 If x R 2, classify to π 2 The TPM Total Probability of Misclassification is defined as P(Misclassific.) = P(x is in π 2 but is classified as π 1 ) +P(x is in π 1 but is classified as π 2 ) = P(Classify x in π 1 π 2 )P(π 2 ) + P(Classify x in π 2 π 1 )P(π 1 ) = P(1 2)p 2 + P(2 1)p 1 See blackboard!

14 14/26 Decision theory approach: misclassification costs What if the costs of the different misclassifications differ? For instance, the cost of classifying a patient with a deadly disease as healthy is higher than the cost of classifying a healthy patient as having a deadly disease. Define the costs c(1 2), c(2 1) Costs table True population: Classify as: π 1 π 2 π 1 0 c(2 1) π 2 c(1 2) 0 and study the ECM Expected Cost of Misclassification: ECM = E(cost of decision) = c(1 2)P(1 2)p 2 + c(2 1)P(2 1)p 1 If c(1 2) = c(2 1), minimizing ECM is mathematically equivalent to minimizing TPM.

15 15/26 Decision theory approach: minimization of ECM Result 11.1: The regions R 1 and R 2 that minimize the ECM are R 1 : { f 1 (x) x; f 2 (x) c(1 2) } p 2 c(2 1) p 1 R 2 : { f 1 (x) x; f 2 (x) < c(1 2) } p 2 c(2 1) x is classified into π 1 if x R 1 and into π 2 if x R 2. The maximum likelihood approach can be viewed as the special case where p 1 = p 2 and c(1 2) = c(2 1), or where p 2 /p 1 = c(1 2)/c(2 1): Classify x 0 to π 1 if f 1 (x 0 ) f 2 (x 0 ) > 1 p 1

16 Comparing P(π 1 x) and P(π 2 x) is equivalent to taking c(1 2) = c(2 1) in the decision theory approach. 16/26 Decision theory approach: Bayesian approach Recall Bayes theorem: P(A B) = P(B A). P(B) Bayesian approach: allocate x to the population with largest posterior probability P(π i x). We find that P(π 1 x) = P(π 1, x) P(x π 1 )P(π 1 ) = P(x) P(x π 1 )P(π 1 ) + P(x π 2 )P(π 2 ) p 1 f 1 (x) = p 1 f 1 (x) + p 2 f 2 (x) Similarly, we get P(π 2 x) = p 2 f 2 (x) p 1 f 1 (x) + p 2 f 2 (x)

17 17/26 Decision theory approach: normal data If π 1 is N p (µ 1, Σ) and π 2 is N p (µ 2, Σ) then the decision theory rule becomes: allocate x to π 1 if (µ 1 µ 2 ) Σ 1 x 1 [( ) ( )] c(1 2) 2 (µ 1 µ 2 ) Σ 1 p2 (µ 1 +µ 2 ) ln c(2 1) and to π 2 otherwise. Now assume that we have two normal populations with unknown means and equal but unknown covariance matrices. n 1 observations are available fromn π 1 and n 2 observations are available from π 2. The estimated minimum ECM rule is: Allocate a new observation x 0 to π 1 if ( x 1 x 2 ) S 1 pool x 0 1 [( ) ( )] c(1 2) 2 ( x 1 x 2 ) S 1 pool ( x p2 1+ x 2 ) ln c(2 1) Allocate x 0 to π 2 otherwise. p 1 p 1

18 18/26 Decision theory approach: quadratic classification rule Now suppose that we have two normal populations with unequal covariance matrices Σ 1 and Σ 2. In this case, the classification rule becomes more complicated: Allocate x 0 to π 1 if 1 [( ) ( )] c(1 2) 2 x 0(Σ 1 1 Σ 1 2 )x 0+(µ 1Σ 1 1 µ 2Σ 1 2 )x p2 0 k ln c(2 1) p 1 where k = 1 ( ) 2 ln det(σ1 ) + 1 ( µ det(σ 2 ) 2 1 Σ 1 1 µ 1 µ 2Σ 1 2 µ 2). Allocate x 0 to π 2 otherwise. Replacing µ i with x i and Σ i with S i we obtain the estimated minimum ECM rule.

19 19/26 Fisher s approach: introduction Suppose that observations from g populations are given and that we wish to use these to classify a new observation. It is easier to tell the populations apart if the variation between the groups is larger than the variation within the groups. Let B = W = g n l ( x l x)( x l x) l=1 g n l (x lj x l )(x lj x l ) l=1 j=1 Fisher s idea: Look for the linear function a x which maximizes the ratio of between-group sum of squares to within-group of sum of squares.

20 20/26 Fisher s approach: linear discriminant function For the vector a maximizing a Ba a Wa the linear function a x is called Fisher s linear discriminant function. Theorem. The vector a in Fisher s linear discriminant function is the eigenvector of W 1 B corresponding to the largest eigenvalue.

21 21/26 Fisher s approach: allocation rule Rule: an observation x is allocated to that population whose mean score is closest to a x. That is, allocate x to π j if a x a x j < a x a x i for all i j For g = 2 groups, this becomes: allocate x to π 1 if d W (x 1 1 ) 2 ( x 1 + x 2 ) > 0 where d = x 1 x 2 ; allocate to π 2 otherwise. This coincides with the classification rule obtained by the ML approach for two multivariate normal populations with equal covariance matrices. However, no assumption of normality was made here.

22 22/26 Discrimination and MANOVA Of course, before applying our classification methods, we should ask ourselves if it really is meaningful to use a certain dataset for classification. Consider g multinormal populations, assumed to have the same covariance matrix, Σ 1 =... = Σ g. To check whether or not a discriminant analysis is worthwhile, test the hypothesis µ 1 =... = µ g This is the MANOVA problem!

23 23/26 Evaluating classification functions OER (Optimum Error Rate): The smallest value of TPM. AER (Actual Error Rate): Based on performance of sample classification functions. APER (Apparent Error Rate): The fraction of observations in a training sample that are misclassified by the sample classification function. Lachenbruch s cross-validation procedure: method for estimating AER. See blackboard!

24 24/26 The Museum Gustavianum sword data Museum Gustavianum in downtown Uppsala has a large collection of antique swords. In a current research project, the researchers are measuring lengths and various other properties of the swords. The goal is to use this to classify swords as coming from different epochs. They need help from someone with knowledge of classification methods, so they asked the mathematics department for help. This could be a fun project for a bachelor or master thesis, or perhaps just a nice project to work with along with your studies. Talk to Jesper Rydén, jesper@math.uu.se, if you are interested or if you would like to know more!

25 Classification: a second look at the introductory examples Can the methods presented today be used in our introductory examples? Examples: Classify insects into one of several sub-species using measurements on external features. Use measurements on blood proteins and family history to classify women as carriers or non-carriers of a genetic disorder. Classify the quality of a new mobile phone battery as good or bad based on a few preliminary measurements. Use information on background, family support, psychological test scores etc. to screen applicants for parole from prison. Use information on sex, age, income, education level, marital status, debts etc. to classify a potential borrower as eligible or ineligible for a bank loan. Detect spam messages based upon the message header and content. 25/26

26 26/26 Classification: next lecture In the next lecture, we will talk about decision trees (for classification) and other algorithmic methods. These methods are popular within, for instance, the field of data mining and do not require assumptions about distributions. We will also compare ordinary probabilistic methods with algorithmic methods and discuss when the different methods should be used.

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko.

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko. SF2935: MODERN METHODS OF STATISTICAL LEARNING LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS Tatjana Pavlenko 5 November 2015 SUPERVISED LEARNING (REP.) Starting point: we have an outcome

More information

DISCRIMINANT ANALYSIS. 1. Introduction

DISCRIMINANT ANALYSIS. 1. Introduction DISCRIMINANT ANALYSIS. Introduction Discrimination and classification are concerned with separating objects from different populations into different groups and with allocating new observations to one

More information

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr Ruey S Tsay Lecture 9: Discrimination and Classification 1 Basic concept Discrimination is concerned with separating

More information

12 Discriminant Analysis

12 Discriminant Analysis 12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into

More information

Discrimination: finding the features that separate known groups in a multivariate sample.

Discrimination: finding the features that separate known groups in a multivariate sample. Discrimination and Classification Goals: Discrimination: finding the features that separate known groups in a multivariate sample. Classification: developing a rule to allocate a new object into one of

More information

6-1. Canonical Correlation Analysis

6-1. Canonical Correlation Analysis 6-1. Canonical Correlation Analysis Canonical Correlatin analysis focuses on the correlation between a linear combination of the variable in one set and a linear combination of the variables in another

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

Classification: Linear Discriminant Analysis

Classification: Linear Discriminant Analysis Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based

More information

which is the chance that, given the observation/subject actually comes from π 2, it is misclassified as from π 1. Analogously,

which is the chance that, given the observation/subject actually comes from π 2, it is misclassified as from π 1. Analogously, 47 Chapter 11 Discrimination and Classification Suppose we have a number of multivariate observations coming from two populations, just as in the standard two-sample problem Very often we wish to, by taking

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

The Naïve Bayes Classifier. Machine Learning Fall 2017

The Naïve Bayes Classifier. Machine Learning Fall 2017 The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

Principal component analysis

Principal component analysis Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition Memorial University of Newfoundland Pattern Recognition Lecture 6 May 18, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30-9:30 PM EN-3026 Review Distance-based Classification

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples

Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples DOI 10.1186/s40064-016-1718-3 RESEARCH Open Access Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples Atinuke Adebanji 1,2, Michael Asamoah Boaheng

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

Bayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?

Bayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture? Bayes Rule CS789: Machine Learning and Neural Network Bayesian learning P (Y X) = P (X Y )P (Y ) P (X) Jakramate Bootkrajang Department of Computer Science Chiang Mai University P (Y ): prior belief, prior

More information

Introduction to Data Science

Introduction to Data Science Introduction to Data Science Winter Semester 2018/19 Oliver Ernst TU Chemnitz, Fakultät für Mathematik, Professur Numerische Mathematik Lecture Slides Contents I 1 What is Data Science? 2 Learning Theory

More information

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use? Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

7 Gaussian Discriminant Analysis (including QDA and LDA)

7 Gaussian Discriminant Analysis (including QDA and LDA) 36 Jonathan Richard Shewchuk 7 Gaussian Discriminant Analysis (including QDA and LDA) GAUSSIAN DISCRIMINANT ANALYSIS Fundamental assumption: each class comes from normal distribution (Gaussian). X N(µ,

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Data Mining 2018 Logistic Regression Text Classification

Data Mining 2018 Logistic Regression Text Classification Data Mining 2018 Logistic Regression Text Classification Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 50 Two types of approaches to classification In (probabilistic)

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

Bayes Decision Theory

Bayes Decision Theory Bayes Decision Theory Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 16

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010 STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify

More information

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Machine Learning for Signal Processing Bayes Classification

Machine Learning for Signal Processing Bayes Classification Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Intelligent Systems Statistical Machine Learning

Intelligent Systems Statistical Machine Learning Intelligent Systems Statistical Machine Learning Carsten Rother, Dmitrij Schlesinger WS2014/2015, Our tasks (recap) The model: two variables are usually present: - the first one is typically discrete k

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Announcements Machine Learning Lecture 2 Eceptional number of lecture participants this year Current count: 449 participants This is very nice, but it stretches our resources to their limits Probability

More information

Statistical Classification. Minsoo Kim Pomona College Advisor: Jo Hardin

Statistical Classification. Minsoo Kim Pomona College Advisor: Jo Hardin Statistical Classification Minsoo Kim Pomona College Advisor: Jo Hardin April 2, 2010 2 Contents 1 Introduction 5 2 Basic Discriminants 7 2.1 Linear Discriminant Analysis for Two Populations...................

More information

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and Athens Journal of Sciences December 2014 Discriminant Analysis with High Dimensional von Mises - Fisher Distributions By Mario Romanazzi This paper extends previous work in discriminant analysis with von

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

Minimum Error Rate Classification

Minimum Error Rate Classification Minimum Error Rate Classification Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Table of Contents 1.Minimum Error Rate Classification...

More information

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the

More information

Non-parametric Methods

Non-parametric Methods Non-parametric Methods Machine Learning Alireza Ghane Non-Parametric Methods Alireza Ghane / Torsten Möller 1 Outline Machine Learning: What, Why, and How? Curve Fitting: (e.g.) Regression and Model Selection

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

Lecture 5: Hypothesis tests for more than one sample

Lecture 5: Hypothesis tests for more than one sample 1/23 Lecture 5: Hypothesis tests for more than one sample Måns Thulin Department of Mathematics, Uppsala University thulin@math.uu.se Multivariate Methods 8/4 2011 2/23 Outline Paired comparisons Repeated

More information

Info 2950, Lecture 4

Info 2950, Lecture 4 Info 2950, Lecture 4 7 Feb 2017 More Programming and Statistics Boot Camps? This week (only): PG Wed Office Hour 8 Feb at 3pm Prob Set 1: due Mon night 13 Feb (no extensions ) Note: Added part to problem

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

Classification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization

Classification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization Classification Volker Blobel University of Hamburg March 2005 Given objects (e.g. particle tracks), which have certain features (e.g. momentum p, specific energy loss de/ dx) and which belong to one of

More information

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 8: Canonical Correlation Analysis

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 8: Canonical Correlation Analysis MS-E2112 Multivariate Statistical (5cr) Lecture 8: Contents Canonical correlation analysis involves partition of variables into two vectors x and y. The aim is to find linear combinations α T x and β

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 1

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 1 CS434a/541a: Pattern Recognition Prof. Olga Veksler Lecture 1 1 Outline of the lecture Syllabus Introduction to Pattern Recognition Review of Probability/Statistics 2 Syllabus Prerequisite Analysis of

More information

Generative Learning algorithms

Generative Learning algorithms CS9 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we ve mainly been talking about learning algorithms that model p(y x; θ), the conditional distribution of y given x. For instance,

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification

10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification 10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene

More information

Stat 216 Final Solutions

Stat 216 Final Solutions Stat 16 Final Solutions Name: 5/3/05 Problem 1. (5 pts) In a study of size and shape relationships for painted turtles, Jolicoeur and Mosimann measured carapace length, width, and height. Their data suggest

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information