DISCRIMINANT ANALYSIS. 1. Introduction

Size: px
Start display at page:

Download "DISCRIMINANT ANALYSIS. 1. Introduction"

Transcription

1 DISCRIMINANT ANALYSIS. Introduction Discrimination and classification are concerned with separating objects from different populations into different groups and with allocating new observations to one of these groups. The goals thus are To describe (graphically or algebraically) the difference between objects from several known populations. We construct discriminants that have numerical values which separate the different collections as much as possible. To assign objects into several labeled classes. We derive a classification rule that can be used to assign (new) objects to one of the labeled classes. Examples. Based on historical bank data, separate the good from poor credit risks (based on income, age, family size, etc). Classify new credit applications into one of these two classes to decide to allow or reject a loan.. Make a distinction between readers and non-readers of a magazine or newspaper based on e.g. education level, age, income, profession, etc... such that the publishers knows which category of people are potential new readers.

2 . Introduction A good procedure should result in as few misclassifications as possible. It should take into account the likelihood of objects to belong to each of the classes (=prior probability of occurrence). One often also takes into account the costs of misclassification. For example the cost of not operating a person needing surgery is much higher than unnecessarily operating a person, so the first type a misclassification has to be avoided as much as possible.

3 . Discrimination and Classification of Two Populations 3. Discrimination and Classification of Two Populations We now focus on separating objects from two classes and assigning new objects to one of these two classes. The classes will be labeled π and π. Each object consists of measurements for p random variables X,..., X p such that the observed values differ to some extend from one class to the other. The distributions associated with both populations will be described by their density functions f and f respectively. Now consider an observed value x = (x,..., x p ) τ X = (X,..., X p ) τ. Then f (x) is the density in x if x belongs to population π f (x) is the density in x if x belongs to population π of the random variable The object x must be assigned to either population π or π. Denote Ω the sample space (= collection of all possible outcomes of X) and partition the sample space as Ω = R R where R is the subspace of outcomes which we classify as belonging to population π and R = Ω R the subspace of outcomes classified as belonging to π. It follows that the (conditional) probability of classifying an object as belonging to π when it is really from π equals P ( ) = P (X R X π ) = f (x)d x R and the (conditional) probability of assigning an object to π when it in fact is from π equals P ( ) = P (X R X π ) = f (x)d x R

4 . Discrimination and Classification of Two Populations 4 y f (x) f (x) P( ) P( ) R R x Similarly, we define the conditional probabilities P ( ) and P ( ). To obtain the probabilities of correctly and incorrectly classifying objects we also have to take the prior class probabilities into account. Denote p = P (X π ) = prior probability of π p = P (X π ) = prior probability of π where p + p =. It follows that the overall probabilities of correctly and incorrectly classifying objects are given by

5 . Discrimination and Classification of Two Populations 5 P (object is correctly classified as π ) = P (X π and X R ) = P (X R X π )P (X π ) = P ( )p P (object is misclassified as π ) = P (X π and X R ) = P (X R X π )P (X π ) = P ( )p P (object is correctly classified as π ) = P (X π and X R ) = P (X R X π )P (X π ) = P ( )p P (object is misclassified as π ) = P (X π and X R ) = P (X R X π )P (X π ) = P ( )p To consider the cost of misclassification, denote c( ) = the cost of classifying an object from π as π c( ) = the cost of classifying an object from π as π A classification rule is obtained by minimizing the expected cost of misclassification: ECM := c( )P ( )p + c( )P ( )p

6 . Discrimination and Classification of Two Populations 6 Result. The regions R and R that minimize the ECM are given by { R = x Ω; f ( ) ( )} (x) c( ) f (x) p c( ) p { R = x Ω; f ( ) ( )} (x) c( ) f (x) < p c( ) p Proof. Using that P ( ) + P ( ) = (since R R = Ω) we obtain ECM = c( )P ( )p + c( )P ( )p = c( )( P ( ))p + c( )P ( )p = c( )p + [c( )f (x)p c( )f (x)p ]d x R Since probabilities and densities, as well as misclassification costs (there is no gain by misclassifying objects) are nonnegative, the ECM is minimal if the integrand [c( )f (x)p c( )f (x)p ] 0 for all x R which yields the regions above. Note that these regions only depend on ratios: f (x) f (x) c( ) c( ) = density ratio = cost ratio p p = prior probability ratio These ratios are often much easier to determine than the exact values of the components.

7 . Discrimination and Classification of Two Populations 7 Special Cases:. Equal (or unknown) prior probabilities: Compare density ratio with cost ratio R : f (x) f (x) c( ) c( ) R : f (x) f (x) < c( ) c( ). Equal (or undetermined) misclassification cost: Compare density ratio with prior probability ratio: R : f (x) f (x) p p R : f (x) f (x) < p p 3. Equal prior probabilities and equal misclassification cost (or p p = ( c( ) c( ) ) ) R : f (x) f (x) R : f (x) f (x) < Example. If we set the cost ratio equal to and we know that 0% of all objects belong to π, then given that f (x 0 ) = 0.3 and f (x 0 ) = 0.4, do we classify x 0 as belonging to π or π? We have that p = 0. so p = 0.8, and p /p = 0.5. Therefore, we obtain For x 0 we have R : f (x) f (x) (0.5) = 0.5 and R : f (x) f (x) f (x 0 ) f (x 0 ) = = 0.75 > 0.5 so we find x 0 R and classify x 0 as belonging to π. < (0.5) = 0.5

8 3. Classification with Two Multivariate Normal Populations 8 3. Classification with Two Multivariate Normal Populations We now assume that f and f are multivariate normal densities with respectively mean vectors µ and µ and covariance matrices Σ and Σ. 3.. Σ = Σ = Σ The density of population π i (i =, ) is now given by f i (x) = (π) p/ det(σ) /e (x µ i) τ Σ (x µ i ). Result. If the populations π and π both have multivariate normal densities with equal covariance matrices, then the classification rule corresponding to minimizing ECM becomes: Classify x 0 as π if (µ µ ) τ Σ x 0 [( ) ( )] c( ) (µ µ ) τ Σ p (µ + µ ) ln c( ) and classify x 0 as π otherwise. p Proof. We assign x 0 to π if f (x 0 ) f (x 0 ) ( c( ) c( ) ) ( ) p which can be rewritten as e (x 0 µ ) τ Σ (x 0 µ ) + (x ( ) ( ) 0 µ ) τ Σ (x 0 µ ) c( ) p c( ) p p

9 3. Classification with Two Multivariate Normal Populations 9 By taking logarithms on both sides and using the equality (x 0 µ ) τ Σ (x 0 µ ) + (x 0 µ ) τ Σ (x 0 µ ) = we obtain the classification rule. (µ µ ) τ Σ x 0 (µ µ ) τ Σ (µ + µ ) In practice, the population parameters µ, µ and Σ are unknown and have to be estimated from the data. Suppose we have n objects belonging to π (denoted as x (),..., x() n ) and n objects from π (denoted as x (),..., x() n ) with n + n = n the total sample size. The sample mean vectors and covariance matrices of both groups are given by x = n n i= x = n n i= x () j S = n x () j S = n n i= n i= (x () j x )(x () j (x () j x )(x () j x ) τ x ) τ Since both populations have the same covariance matrix Σ we combine the two sample covariance matrices S and S to obtain a more precise estimate of Σ given by ( S pooled = n (n ) + (n ) ) ( S + n (n ) + (n ) ) S By replacing µ, µ and Σ with x, x and S pooled in Result we obtain the sample classification rule: Classify x 0 as π if ( x x ) τ S pooled x 0 ( x x ) τ S pooled ( x + x ) ln and classify x 0 as π otherwise. [( c( ) c( ) ) ( )] p p

10 3. Classification with Two Multivariate Normal Populations 0 Special Case: Equal prior probabilities and equal misclassification cost: [( ) ( )] c( ) p ln = ln() = 0 c( ) such that we assign x 0 to π if p ( x x ) τ S pooled x 0 ( x x ) τ S pooled ( x + x ) Denote a = S pooled ( x x ) IR p, then this can be rewritten as a τ x 0 (aτ x + a τ x ) That is, we have to compare the scalar ŷ 0 = a τ x 0 with the midpoint ˆm = (ȳ + ȳ ) = (aτ x + a τ x ). We thus have created to univariate populations (determined by the y-values) by projecting the original data on the direction determined by a. This direction is the (estimated) direction in which the two populations are best separated. Remark. By replacing the unknown parameters with their estimates, there is no guarantee anymore that the resulting classification rule minimizes the expected cost of misclassification. However, we expect that we obtain a good estimate of the optimal rule. Example. To develop a test for potential hemophilia carriers, blood samples were taken from two groups of patients. The two variables measured are AHF activity and AHF-like antigen where AHF means AntiHemophilic Factor. For both variables we take the logarithm (base 0). The first group of n = 30 patients did not carry the hemophilia gene. The second group consisted of known hemophilia carriers. From these samples the following statistics have been derived x = , x = , and S pooled =

11 3. Classification with Two Multivariate Normal Populations Assuming equal costs and equal priors, we compute a = S pooled ( x x ) = 37.6 and 8.9 ȳ = a τ x = 0.88, ȳ = a τ x = 0.0. The corresponding midpoint thus becomes ˆm = (ȳ + ȳ ) = 4.6. A new object x = (x, x ) τ is classified as non-carrier if ŷ = 37.6x 8.9x ˆm = 4.6 and is a carrier otherwise. A potential hemophilia carrier has values x = 0.0 and x = Should this patient be classified as carrier? We obtain ŷ = 6.6 < 4.6 so we indeed assign this patient to the population of carriers. Suppose now that it is known that the prior probability of being a hemophilia carrier is 5%, then a new patient is classified as non-carrier if ŷ ˆm ln ( p p ). We find ŷ ˆm = =.0 and ( ) ln p p = ln ( ( ) = ln 3) =.0 so we still classify this patient as carrier.

12 3. Classification with Two Multivariate Normal Populations 3.. Σ Σ The density of population π i (i =, ) is now given by f i (x) = (π) p/ det(σ i ) /e (x µ i) τ Σ i (x µ i ). Result 3. If the populations π and π both have multivariate normal densities with mean vectors and covariance matrices µ, Σ and µ, Σ respectively, then the classification rule corresponding to minimizing ECM becomes: Classify x 0 as π if xτ 0(Σ Σ )x 0 + (µ τ Σ µ τ Σ )x 0 k ln and classify x 0 as π otherwise. The constant k is given by k = ( ) ln det(σ ) det(σ ) [( c( ) c( ) + (µτ Σ µ µ τ Σ µ ) ) ( )] p p Proof. We assign x 0 to π if ( f (x 0 ) ln f (x 0 ) ln ( ) f (x 0 ) = ( f (x 0 ) ln det(σ ) det(σ ) ) ln ( c( ) c( ) ) ( ) p p and ) + (x 0 µ ) τ Σ (x 0 µ ) (x 0 µ ) τ Σ (x 0 µ ) = xτ 0(Σ Σ )x 0 + (µ τ Σ µ τ Σ )x 0 k

13 3. Classification with Two Multivariate Normal Populations 3 In practice, the parameters µ, µ, Σ and Σ are unknown and replaced by the estimates x, x, S and S which yields the following sample classification rule: Classify x 0 as π if xτ 0(S S )x 0 + ( x τ S x τ S )x 0 k ln and classify x 0 as π otherwise. The constant k is given by k = ( ) ln det(s ) det(s ) [( c( ) c( ) + ( xτ S x x τ S x ) ) ( )] p p

14 4. Evaluating Classification Rules 4. Evaluating Classification Rules 4 To judge the performance of a sample classification procedure, we want to calculate its misclassification probability or error rate. A measure of performance that can be calculated for any classification procedure is the apparent error rate (APER) which is defined as the fraction of observations in the sample that are misclassified by the classification procedure. Denote n M and n M the number of objects misclassified as π respectively π objects, then APER = n M + n M n + n The APER is intuitively appealing and easy to calculate. Unfortunately, it tends to underestimate the actual error rate (AER) when classifying new objects. This underestimation occurs because we used the sample to build the classification rule (therefore we call this the training sample ) as well as to evaluate it. To obtain a reliable estimate of the AER we ideally consider an independent test sample of new objects from which we know the true class label. This means that we split the original sample in a training sample and test sample. The AER is then estimated by the proportion of misclassified objects in the test sample while the training sample was used to construct the classification rule. However, there are two drawbacks with this approach It requires large samples. The classification rule is less precise because we do not use the information from the test sample to build the classifier.

15 4. Evaluating Classification Rules 5 An alternative is the (leave-one-out) cross-validation or jackknife procedure which works as follows.. Leave one object out of the sample and construct a classification rule based on the remaining n objects in the sample.. Classify the left-out observation using the classification rule constructed in step. 3. Repeat the two previous steps for each of the objects in the sample. Denote n CV M and ncv M in class and respectively. the number of left-out observations misclassified Then a good estimate of the actual error rate is given by AER ˆ = ncv n + n M + ncv M Example 3. We consider a sample of size n = 98 containing the response to visual stimuli for both eyes measured for patients suffering from multiplesclerosis and for controls (healthy patients). Based on these measured responses and age we want to develop a rule that will allow to classify potential patients. Estimate the actual error rate as well. The assumption of equal covariances is acceptable. Prior probabilities and cost of misclassification are undetermined and thus considered to be equal.

16 Analyzing the data in S-Plus we find the group means 4. Evaluating Classification Rules x =.5639 and x =.7586 and the pooled covariance matrix S pooled = The classification rule becomes: Classify patient as suffering from multiplesclerosis if ŷ ˆm = 0.0x x x and otherwise the patient is healthy. Based on the training sample we obtain the following misclassifications: n M = 4 and n M = 3 which yields the apparent error rate APER = = 7.3%. On the other hand, by using cross-validation we obtain the misclassifications n CV M = 5 and ncv M ˆ AER = = 0.4% which is 3% higher! = 5 which yields the estimated actual error rate Note that three times as many persons are misclassified as MS patients than as healthy even while misclassification cost was assumed equal.

17 5. Classification with Several Populations 7 5. Classification with Several Populations We now consider the more general situation of separating objects from g (g ) classes and assigning new objects to one of these g classes. For i =,..., g denote f i the density associated with population π i p i the prior probability of population π i R i the subspace of outcomes assigned to π i c(j i) the cost of misclassifying an object to π j when it is from π i P (j i) the conditional probability of assigning an object of π i to π j. The (conditional) expected cost of misclassifying an object of population π is given by ECM() = P ( )c( ) + + P (g )c(g ) g = P (i )c(i ) i= and similarly we can determine the expected cost of misclassifying objects of population π,..., π g. It follows that the overall ECM equals ECM = g p j ECM(j) = j= g j= p j g P (i j)c(i j) i= i j

18 5. Classification with Several Populations 8 Result 4. The classification rule that minimizes the ECM assigns each object x to the population π i for which g p j f j (x)c(i j) j= j i is smallest. If the minimum is not unique then x can be assigned to any of the populations for which the minimum is attained. (without proof) Special Case: If all misclassification costs are equal (or unknown) we assign x to the population π i for which g j= p j f j (x) is smallest, or equivalently for j i which p i f i (x) is largest. We thus obtain Classify x as π i if p i f i (x) > p j f j (x) j i

19 5. Classification with Several Populations Classification with Normal Populations The density of population π i (i =,..., g) is now given by f i (x) = (π) p/ det(σ i ) /e (x µ i) τ Σ i (x µ i ). Result 5. If all misclassification costs are equal (or unknown) we assign x to the population π i if the (quadratic) score d i (x) = scores are given by g max j= d j(x) where the d j (x) = ln(det(σ j)) (x µ j) τ Σ j (x µ j ) + ln(p j ) j =,..., g Proof. We assign x to π i if ln(p i f i (x)) = max ln(p j f j (x)) and ln(p j f j (x)) = ln(p j ) ( ) p ln(π) ln(det(σ j)) (x µ j) τ Σ j (x µ j ). Dropping the second term which is constant yields the result. In practice, the parameters µ j and Σ j are unknown and will be replaced by the sample means x j and covariances S j which yields the sample classification rule Classify x as π i if the (quadratic) score ˆd i (x) = are given by g max j= ˆd j (x) where the scores ˆd j (x) = ln(det(s j)) (x x j) τ S j (x x j ) + ln(p j ) j =,..., g

20 5. Classification with Several Populations 0 If all covariance matrices are equal: Σ j quadratic scores d j (x) become = Σ for j =,..., g, then the d j (x) = ln(det(σ)) xτ Σ x + µ τ j Σ x µτ j Σ µ j + ln(p j ). The first two terms are the same for all d j (x) so they can be left out, which yields the (linear) scores d j (x) = µ τ j Σ x µτ j Σ µ j + ln(p j ). To estimate these scores in practice we use the sample means x j and the pooled estimate of Σ given by S pooled = (n ) + + (n g ) [(n )S + + (n g )S g ] which yields the sample classification rule Classify x as π i if the (linear) score ˆd i (x) = given by g max j= ˆd j (x) where the scores are ˆd j (x) = x τ j S pooled x xτ j S pooled x j + ln(p j ) j =,..., g Remark. In the case of equal covariance matrices, the scores d j (x) can also be reduced to d j (x) = (x µ j) τ Σ (x µ j ) + ln(p j ) = d Σ(x, µ j ) + ln(p j ) Which can be estimated by ˆd j (x) = d S pooled (x, x j ) + ln(p j ). If the prior probabilities are all equal (or unknown) we thus assign an object x to the closest population.

21 5. Classification with Several Populations Example 4. The admission board of a business school uses two measures to decide on admittance of applicants: GPA= undergraduate grade point average GMAT=graduate management aptitude test score Based on these measures applicants are categorized as: admit (π ), do not admit (π ), and borderline (π 3 ). The training set is shown below. GMAT GPA Based on the training sample with group sizes n = 3, n = 8, and n 3 = 6 we calculate the group means x = 3.40, x =.48, and x 3 =.99, and their pooled covariance matrix S pooled =

22 5. Classification with Several Populations With equal prior probabilities we assign a new applicant x = (x, x ) τ the closest class, so we compute its (quadratic) distance to each of the three classes: d S pooled (x, x ) = (x x ) τ S pooled (x x ) d S pooled (x, x ) = (x x ) τ S pooled (x x ) d S pooled (x, x 3 ) = (x x 3 ) τ S pooled (x x 3 ) Suppose a new applicant has test scores x τ = (3., 497) then we obtain ( ) d S pooled (x, x ) = =.58 ( ) d S pooled (x, x ) = = 7.0 ( ) d S pooled (x, x 3 ) = =.47 The distance from x to π 3 is thus smallest such that this applicant is a borderline case. to

6-1. Canonical Correlation Analysis

6-1. Canonical Correlation Analysis 6-1. Canonical Correlation Analysis Canonical Correlatin analysis focuses on the correlation between a linear combination of the variable in one set and a linear combination of the variables in another

More information

Lecture 8: Classification

Lecture 8: Classification 1/26 Lecture 8: Classification Måns Eriksson Department of Mathematics, Uppsala University eriksson@math.uu.se Multivariate Methods 19/5 2010 Classification: introductory examples Goal: Classify an observation

More information

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr Ruey S Tsay Lecture 9: Discrimination and Classification 1 Basic concept Discrimination is concerned with separating

More information

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko.

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko. SF2935: MODERN METHODS OF STATISTICAL LEARNING LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS Tatjana Pavlenko 5 November 2015 SUPERVISED LEARNING (REP.) Starting point: we have an outcome

More information

Classification: Linear Discriminant Analysis

Classification: Linear Discriminant Analysis Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based

More information

Discrimination: finding the features that separate known groups in a multivariate sample.

Discrimination: finding the features that separate known groups in a multivariate sample. Discrimination and Classification Goals: Discrimination: finding the features that separate known groups in a multivariate sample. Classification: developing a rule to allocate a new object into one of

More information

12 Discriminant Analysis

12 Discriminant Analysis 12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into

More information

which is the chance that, given the observation/subject actually comes from π 2, it is misclassified as from π 1. Analogously,

which is the chance that, given the observation/subject actually comes from π 2, it is misclassified as from π 1. Analogously, 47 Chapter 11 Discrimination and Classification Suppose we have a number of multivariate observations coming from two populations, just as in the standard two-sample problem Very often we wish to, by taking

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples

Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples DOI 10.1186/s40064-016-1718-3 RESEARCH Open Access Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples Atinuke Adebanji 1,2, Michael Asamoah Boaheng

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Bayes Rule for Minimizing Risk

Bayes Rule for Minimizing Risk Bayes Rule for Minimizing Risk Dennis Lee April 1, 014 Introduction In class we discussed Bayes rule for minimizing the probability of error. Our goal is to generalize this rule to minimize risk instead

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

Stat 216 Final Solutions

Stat 216 Final Solutions Stat 16 Final Solutions Name: 5/3/05 Problem 1. (5 pts) In a study of size and shape relationships for painted turtles, Jolicoeur and Mosimann measured carapace length, width, and height. Their data suggest

More information

Classification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization

Classification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization Classification Volker Blobel University of Hamburg March 2005 Given objects (e.g. particle tracks), which have certain features (e.g. momentum p, specific energy loss de/ dx) and which belong to one of

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

How to evaluate credit scorecards - and why using the Gini coefficient has cost you money

How to evaluate credit scorecards - and why using the Gini coefficient has cost you money How to evaluate credit scorecards - and why using the Gini coefficient has cost you money David J. Hand Imperial College London Quantitative Financial Risk Management Centre August 2009 QFRMC - Imperial

More information

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition Memorial University of Newfoundland Pattern Recognition Lecture 6 May 18, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30-9:30 PM EN-3026 Review Distance-based Classification

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

Efficient Approach to Pattern Recognition Based on Minimization of Misclassification Probability

Efficient Approach to Pattern Recognition Based on Minimization of Misclassification Probability American Journal of Theoretical and Applied Statistics 05; 5(-): 7- Published online November 9, 05 (http://www.sciencepublishinggroup.com/j/ajtas) doi: 0.648/j.ajtas.s.060500. ISSN: 36-8999 (Print); ISSN:

More information

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution

Regularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution Part I 09.06.2006 Discriminant Analysis The purpose of discriminant analysis is to assign objects to one of several (K) groups based on a set of measurements X = (X 1, X 2,..., X p ) which are obtained

More information

Chapter 10 Logistic Regression

Chapter 10 Logistic Regression Chapter 10 Logistic Regression Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Logistic Regression Extends idea of linear regression to situation where outcome

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Statistical Classification. Minsoo Kim Pomona College Advisor: Jo Hardin

Statistical Classification. Minsoo Kim Pomona College Advisor: Jo Hardin Statistical Classification Minsoo Kim Pomona College Advisor: Jo Hardin April 2, 2010 2 Contents 1 Introduction 5 2 Basic Discriminants 7 2.1 Linear Discriminant Analysis for Two Populations...................

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

Introduction to Logistic Regression

Introduction to Logistic Regression Introduction to Logistic Regression Problem & Data Overview Primary Research Questions: 1. What are the risk factors associated with CHD? Regression Questions: 1. What is Y? 2. What is X? Did player develop

More information

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010 STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Let s assume that 700 patients suffering from kidney stones have been scored for: the size of the stones, classified into either large or small

Let s assume that 700 patients suffering from kidney stones have been scored for: the size of the stones, classified into either large or small Chapter Logistic regression. A dataset Let s assume that 700 patients suffering from kidney stones have been scored for: the size of the stones, classified into either large or small the type of treatment

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Statistical methods in recognition. Why is classification a problem?

Statistical methods in recognition. Why is classification a problem? Statistical methods in recognition Basic steps in classifier design collect training images choose a classification model estimate parameters of classification model from training images evaluate model

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

A Robust Approach to Regularized Discriminant Analysis

A Robust Approach to Regularized Discriminant Analysis A Robust Approach to Regularized Discriminant Analysis Moritz Gschwandtner Department of Statistics and Probability Theory Vienna University of Technology, Austria Österreichische Statistiktage, Graz,

More information

Machine Learning Lecture 2

Machine Learning Lecture 2 Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging

More information

Machine Learning 2017

Machine Learning 2017 Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section

More information

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur

Fundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities

More information

INF Introduction to classifiction Anne Solberg

INF Introduction to classifiction Anne Solberg INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300

More information

COWLEY COLLEGE & Area Vocational Technical School

COWLEY COLLEGE & Area Vocational Technical School COWLEY COLLEGE & Area Vocational Technical School COURSE PROCEDURE FOR COLLEGE ALGEBRA WITH REVIEW MTH 4421 5 Credit Hours Student Level: This course is open to students on the college level in the freshman

More information

Fast and Robust Discriminant Analysis

Fast and Robust Discriminant Analysis Fast and Robust Discriminant Analysis Mia Hubert a,1, Katrien Van Driessen b a Department of Mathematics, Katholieke Universiteit Leuven, W. De Croylaan 54, B-3001 Leuven. b UFSIA-RUCA Faculty of Applied

More information

Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance. Probal Chaudhuri and Subhajit Dutta

Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance. Probal Chaudhuri and Subhajit Dutta Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance Probal Chaudhuri and Subhajit Dutta Indian Statistical Institute, Kolkata. Workshop on Classification and Regression

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

Principal component analysis

Principal component analysis Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance

More information

Lecture 9: Classification, LDA

Lecture 9: Classification, LDA Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples

More information

Bayes Decision Theory

Bayes Decision Theory Bayes Decision Theory Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 16

More information

Fast and Robust Classifiers Adjusted for Skewness

Fast and Robust Classifiers Adjusted for Skewness Fast and Robust Classifiers Adjusted for Skewness Mia Hubert 1 and Stephan Van der Veeken 2 1 Department of Mathematics - LStat, Katholieke Universiteit Leuven Celestijnenlaan 200B, Leuven, Belgium, Mia.Hubert@wis.kuleuven.be

More information

An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance

An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance Dhaka Univ. J. Sci. 61(1): 81-85, 2013 (January) An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance A. H. Sajib, A. Z. M. Shafiullah 1 and A. H. Sumon Department of Statistics,

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

Modelling Academic Risks of Students in a Polytechnic System With the Use of Discriminant Analysis

Modelling Academic Risks of Students in a Polytechnic System With the Use of Discriminant Analysis Progress in Applied Mathematics Vol. 6, No., 03, pp. [59-69] DOI: 0.3968/j.pam.955803060.738 ISSN 95-5X [Print] ISSN 95-58 [Online] www.cscanada.net www.cscanada.org Modelling Academic Risks of Students

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

44 CHAPTER 2. BAYESIAN DECISION THEORY

44 CHAPTER 2. BAYESIAN DECISION THEORY 44 CHAPTER 2. BAYESIAN DECISION THEORY Problems Section 2.1 1. In the two-category case, under the Bayes decision rule the conditional error is given by Eq. 7. Even if the posterior densities are continuous,

More information

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests

Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests http://benasque.org/2018tae/cgi-bin/talks/allprint.pl TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics

More information

Minimum Error Rate Classification

Minimum Error Rate Classification Minimum Error Rate Classification Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Table of Contents 1.Minimum Error Rate Classification...

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

Introduction to Data Science

Introduction to Data Science Introduction to Data Science Winter Semester 2018/19 Oliver Ernst TU Chemnitz, Fakultät für Mathematik, Professur Numerische Mathematik Lecture Slides Contents I 1 What is Data Science? 2 Learning Theory

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

CS 590D Lecture Notes

CS 590D Lecture Notes CS 590D Lecture Notes David Wemhoener December 017 1 Introduction Figure 1: Various Separators We previously looked at the perceptron learning algorithm, which gives us a separator for linearly separable

More information

ALGEBRA II Grades 9-12

ALGEBRA II Grades 9-12 Summer 2015 Units: 10 high school credits UC Requirement Category: c General Description: ALGEBRA II Grades 9-12 Algebra II is a course which further develops the concepts learned in Algebra I. It will

More information

CHAPTER 2. Types of Effect size indices: An Overview of the Literature

CHAPTER 2. Types of Effect size indices: An Overview of the Literature CHAPTER Types of Effect size indices: An Overview of the Literature There are different types of effect size indices as a result of their different interpretations. Huberty (00) names three different types:

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Learning From Crowds. Presented by: Bei Peng 03/24/15

Learning From Crowds. Presented by: Bei Peng 03/24/15 Learning From Crowds Presented by: Bei Peng 03/24/15 1 Supervised Learning Given labeled training data, learn to generalize well on unseen data Binary classification ( ) Multi-class classification ( y

More information

Statistical aspects of prediction models with high-dimensional data

Statistical aspects of prediction models with high-dimensional data Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

An Application of Discriminant Analysis On University Matriculation Examination Scores For Candidates Admitted Into Anamabra State University

An Application of Discriminant Analysis On University Matriculation Examination Scores For Candidates Admitted Into Anamabra State University An Application of Discriminant Analysis On University Matriculation Examination Scores For Candidates Admitted Into Anamabra State University Okoli C.N Department of Statistics Anambra State University,

More information

COWLEY COLLEGE & Area Vocational Technical School

COWLEY COLLEGE & Area Vocational Technical School COWLEY COLLEGE & Area Vocational Technical School COURSE PROCEDURE FOR Student Level: This course is open to students on the college level in their freshman or sophomore year. Catalog Description: INTERMEDIATE

More information

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction Robot Image Credit: Viktoriya Sukhanova 13RF.com Dimensionality Reduction Feature Selection vs. Dimensionality Reduction Feature Selection (last time) Select a subset of features. When classifying novel

More information

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth

Application: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth Application: Can we tell what people are looking at from their brain activity (in real time? Gaussian Spatial Smooth 0 The Data Block Paradigm (six runs per subject Three Categories of Objects (counterbalanced

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7. Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also

More information

Probability Models for Bayesian Recognition

Probability Models for Bayesian Recognition Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIAG / osig Second Semester 06/07 Lesson 9 0 arch 07 Probability odels for Bayesian Recognition Notation... Supervised Learning for Bayesian

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

INF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.

INF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image. INF 4300 700 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter -6 6inDuda and Hart: attern Classification 303 INF 4300 Introduction to classification One of the most challenging

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information

Discrete Multivariate Statistics

Discrete Multivariate Statistics Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are

More information

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size

Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

More information