DISCRIMINANT ANALYSIS. 1. Introduction
|
|
- Hector Hawkins
- 6 years ago
- Views:
Transcription
1 DISCRIMINANT ANALYSIS. Introduction Discrimination and classification are concerned with separating objects from different populations into different groups and with allocating new observations to one of these groups. The goals thus are To describe (graphically or algebraically) the difference between objects from several known populations. We construct discriminants that have numerical values which separate the different collections as much as possible. To assign objects into several labeled classes. We derive a classification rule that can be used to assign (new) objects to one of the labeled classes. Examples. Based on historical bank data, separate the good from poor credit risks (based on income, age, family size, etc). Classify new credit applications into one of these two classes to decide to allow or reject a loan.. Make a distinction between readers and non-readers of a magazine or newspaper based on e.g. education level, age, income, profession, etc... such that the publishers knows which category of people are potential new readers.
2 . Introduction A good procedure should result in as few misclassifications as possible. It should take into account the likelihood of objects to belong to each of the classes (=prior probability of occurrence). One often also takes into account the costs of misclassification. For example the cost of not operating a person needing surgery is much higher than unnecessarily operating a person, so the first type a misclassification has to be avoided as much as possible.
3 . Discrimination and Classification of Two Populations 3. Discrimination and Classification of Two Populations We now focus on separating objects from two classes and assigning new objects to one of these two classes. The classes will be labeled π and π. Each object consists of measurements for p random variables X,..., X p such that the observed values differ to some extend from one class to the other. The distributions associated with both populations will be described by their density functions f and f respectively. Now consider an observed value x = (x,..., x p ) τ X = (X,..., X p ) τ. Then f (x) is the density in x if x belongs to population π f (x) is the density in x if x belongs to population π of the random variable The object x must be assigned to either population π or π. Denote Ω the sample space (= collection of all possible outcomes of X) and partition the sample space as Ω = R R where R is the subspace of outcomes which we classify as belonging to population π and R = Ω R the subspace of outcomes classified as belonging to π. It follows that the (conditional) probability of classifying an object as belonging to π when it is really from π equals P ( ) = P (X R X π ) = f (x)d x R and the (conditional) probability of assigning an object to π when it in fact is from π equals P ( ) = P (X R X π ) = f (x)d x R
4 . Discrimination and Classification of Two Populations 4 y f (x) f (x) P( ) P( ) R R x Similarly, we define the conditional probabilities P ( ) and P ( ). To obtain the probabilities of correctly and incorrectly classifying objects we also have to take the prior class probabilities into account. Denote p = P (X π ) = prior probability of π p = P (X π ) = prior probability of π where p + p =. It follows that the overall probabilities of correctly and incorrectly classifying objects are given by
5 . Discrimination and Classification of Two Populations 5 P (object is correctly classified as π ) = P (X π and X R ) = P (X R X π )P (X π ) = P ( )p P (object is misclassified as π ) = P (X π and X R ) = P (X R X π )P (X π ) = P ( )p P (object is correctly classified as π ) = P (X π and X R ) = P (X R X π )P (X π ) = P ( )p P (object is misclassified as π ) = P (X π and X R ) = P (X R X π )P (X π ) = P ( )p To consider the cost of misclassification, denote c( ) = the cost of classifying an object from π as π c( ) = the cost of classifying an object from π as π A classification rule is obtained by minimizing the expected cost of misclassification: ECM := c( )P ( )p + c( )P ( )p
6 . Discrimination and Classification of Two Populations 6 Result. The regions R and R that minimize the ECM are given by { R = x Ω; f ( ) ( )} (x) c( ) f (x) p c( ) p { R = x Ω; f ( ) ( )} (x) c( ) f (x) < p c( ) p Proof. Using that P ( ) + P ( ) = (since R R = Ω) we obtain ECM = c( )P ( )p + c( )P ( )p = c( )( P ( ))p + c( )P ( )p = c( )p + [c( )f (x)p c( )f (x)p ]d x R Since probabilities and densities, as well as misclassification costs (there is no gain by misclassifying objects) are nonnegative, the ECM is minimal if the integrand [c( )f (x)p c( )f (x)p ] 0 for all x R which yields the regions above. Note that these regions only depend on ratios: f (x) f (x) c( ) c( ) = density ratio = cost ratio p p = prior probability ratio These ratios are often much easier to determine than the exact values of the components.
7 . Discrimination and Classification of Two Populations 7 Special Cases:. Equal (or unknown) prior probabilities: Compare density ratio with cost ratio R : f (x) f (x) c( ) c( ) R : f (x) f (x) < c( ) c( ). Equal (or undetermined) misclassification cost: Compare density ratio with prior probability ratio: R : f (x) f (x) p p R : f (x) f (x) < p p 3. Equal prior probabilities and equal misclassification cost (or p p = ( c( ) c( ) ) ) R : f (x) f (x) R : f (x) f (x) < Example. If we set the cost ratio equal to and we know that 0% of all objects belong to π, then given that f (x 0 ) = 0.3 and f (x 0 ) = 0.4, do we classify x 0 as belonging to π or π? We have that p = 0. so p = 0.8, and p /p = 0.5. Therefore, we obtain For x 0 we have R : f (x) f (x) (0.5) = 0.5 and R : f (x) f (x) f (x 0 ) f (x 0 ) = = 0.75 > 0.5 so we find x 0 R and classify x 0 as belonging to π. < (0.5) = 0.5
8 3. Classification with Two Multivariate Normal Populations 8 3. Classification with Two Multivariate Normal Populations We now assume that f and f are multivariate normal densities with respectively mean vectors µ and µ and covariance matrices Σ and Σ. 3.. Σ = Σ = Σ The density of population π i (i =, ) is now given by f i (x) = (π) p/ det(σ) /e (x µ i) τ Σ (x µ i ). Result. If the populations π and π both have multivariate normal densities with equal covariance matrices, then the classification rule corresponding to minimizing ECM becomes: Classify x 0 as π if (µ µ ) τ Σ x 0 [( ) ( )] c( ) (µ µ ) τ Σ p (µ + µ ) ln c( ) and classify x 0 as π otherwise. p Proof. We assign x 0 to π if f (x 0 ) f (x 0 ) ( c( ) c( ) ) ( ) p which can be rewritten as e (x 0 µ ) τ Σ (x 0 µ ) + (x ( ) ( ) 0 µ ) τ Σ (x 0 µ ) c( ) p c( ) p p
9 3. Classification with Two Multivariate Normal Populations 9 By taking logarithms on both sides and using the equality (x 0 µ ) τ Σ (x 0 µ ) + (x 0 µ ) τ Σ (x 0 µ ) = we obtain the classification rule. (µ µ ) τ Σ x 0 (µ µ ) τ Σ (µ + µ ) In practice, the population parameters µ, µ and Σ are unknown and have to be estimated from the data. Suppose we have n objects belonging to π (denoted as x (),..., x() n ) and n objects from π (denoted as x (),..., x() n ) with n + n = n the total sample size. The sample mean vectors and covariance matrices of both groups are given by x = n n i= x = n n i= x () j S = n x () j S = n n i= n i= (x () j x )(x () j (x () j x )(x () j x ) τ x ) τ Since both populations have the same covariance matrix Σ we combine the two sample covariance matrices S and S to obtain a more precise estimate of Σ given by ( S pooled = n (n ) + (n ) ) ( S + n (n ) + (n ) ) S By replacing µ, µ and Σ with x, x and S pooled in Result we obtain the sample classification rule: Classify x 0 as π if ( x x ) τ S pooled x 0 ( x x ) τ S pooled ( x + x ) ln and classify x 0 as π otherwise. [( c( ) c( ) ) ( )] p p
10 3. Classification with Two Multivariate Normal Populations 0 Special Case: Equal prior probabilities and equal misclassification cost: [( ) ( )] c( ) p ln = ln() = 0 c( ) such that we assign x 0 to π if p ( x x ) τ S pooled x 0 ( x x ) τ S pooled ( x + x ) Denote a = S pooled ( x x ) IR p, then this can be rewritten as a τ x 0 (aτ x + a τ x ) That is, we have to compare the scalar ŷ 0 = a τ x 0 with the midpoint ˆm = (ȳ + ȳ ) = (aτ x + a τ x ). We thus have created to univariate populations (determined by the y-values) by projecting the original data on the direction determined by a. This direction is the (estimated) direction in which the two populations are best separated. Remark. By replacing the unknown parameters with their estimates, there is no guarantee anymore that the resulting classification rule minimizes the expected cost of misclassification. However, we expect that we obtain a good estimate of the optimal rule. Example. To develop a test for potential hemophilia carriers, blood samples were taken from two groups of patients. The two variables measured are AHF activity and AHF-like antigen where AHF means AntiHemophilic Factor. For both variables we take the logarithm (base 0). The first group of n = 30 patients did not carry the hemophilia gene. The second group consisted of known hemophilia carriers. From these samples the following statistics have been derived x = , x = , and S pooled =
11 3. Classification with Two Multivariate Normal Populations Assuming equal costs and equal priors, we compute a = S pooled ( x x ) = 37.6 and 8.9 ȳ = a τ x = 0.88, ȳ = a τ x = 0.0. The corresponding midpoint thus becomes ˆm = (ȳ + ȳ ) = 4.6. A new object x = (x, x ) τ is classified as non-carrier if ŷ = 37.6x 8.9x ˆm = 4.6 and is a carrier otherwise. A potential hemophilia carrier has values x = 0.0 and x = Should this patient be classified as carrier? We obtain ŷ = 6.6 < 4.6 so we indeed assign this patient to the population of carriers. Suppose now that it is known that the prior probability of being a hemophilia carrier is 5%, then a new patient is classified as non-carrier if ŷ ˆm ln ( p p ). We find ŷ ˆm = =.0 and ( ) ln p p = ln ( ( ) = ln 3) =.0 so we still classify this patient as carrier.
12 3. Classification with Two Multivariate Normal Populations 3.. Σ Σ The density of population π i (i =, ) is now given by f i (x) = (π) p/ det(σ i ) /e (x µ i) τ Σ i (x µ i ). Result 3. If the populations π and π both have multivariate normal densities with mean vectors and covariance matrices µ, Σ and µ, Σ respectively, then the classification rule corresponding to minimizing ECM becomes: Classify x 0 as π if xτ 0(Σ Σ )x 0 + (µ τ Σ µ τ Σ )x 0 k ln and classify x 0 as π otherwise. The constant k is given by k = ( ) ln det(σ ) det(σ ) [( c( ) c( ) + (µτ Σ µ µ τ Σ µ ) ) ( )] p p Proof. We assign x 0 to π if ( f (x 0 ) ln f (x 0 ) ln ( ) f (x 0 ) = ( f (x 0 ) ln det(σ ) det(σ ) ) ln ( c( ) c( ) ) ( ) p p and ) + (x 0 µ ) τ Σ (x 0 µ ) (x 0 µ ) τ Σ (x 0 µ ) = xτ 0(Σ Σ )x 0 + (µ τ Σ µ τ Σ )x 0 k
13 3. Classification with Two Multivariate Normal Populations 3 In practice, the parameters µ, µ, Σ and Σ are unknown and replaced by the estimates x, x, S and S which yields the following sample classification rule: Classify x 0 as π if xτ 0(S S )x 0 + ( x τ S x τ S )x 0 k ln and classify x 0 as π otherwise. The constant k is given by k = ( ) ln det(s ) det(s ) [( c( ) c( ) + ( xτ S x x τ S x ) ) ( )] p p
14 4. Evaluating Classification Rules 4. Evaluating Classification Rules 4 To judge the performance of a sample classification procedure, we want to calculate its misclassification probability or error rate. A measure of performance that can be calculated for any classification procedure is the apparent error rate (APER) which is defined as the fraction of observations in the sample that are misclassified by the classification procedure. Denote n M and n M the number of objects misclassified as π respectively π objects, then APER = n M + n M n + n The APER is intuitively appealing and easy to calculate. Unfortunately, it tends to underestimate the actual error rate (AER) when classifying new objects. This underestimation occurs because we used the sample to build the classification rule (therefore we call this the training sample ) as well as to evaluate it. To obtain a reliable estimate of the AER we ideally consider an independent test sample of new objects from which we know the true class label. This means that we split the original sample in a training sample and test sample. The AER is then estimated by the proportion of misclassified objects in the test sample while the training sample was used to construct the classification rule. However, there are two drawbacks with this approach It requires large samples. The classification rule is less precise because we do not use the information from the test sample to build the classifier.
15 4. Evaluating Classification Rules 5 An alternative is the (leave-one-out) cross-validation or jackknife procedure which works as follows.. Leave one object out of the sample and construct a classification rule based on the remaining n objects in the sample.. Classify the left-out observation using the classification rule constructed in step. 3. Repeat the two previous steps for each of the objects in the sample. Denote n CV M and ncv M in class and respectively. the number of left-out observations misclassified Then a good estimate of the actual error rate is given by AER ˆ = ncv n + n M + ncv M Example 3. We consider a sample of size n = 98 containing the response to visual stimuli for both eyes measured for patients suffering from multiplesclerosis and for controls (healthy patients). Based on these measured responses and age we want to develop a rule that will allow to classify potential patients. Estimate the actual error rate as well. The assumption of equal covariances is acceptable. Prior probabilities and cost of misclassification are undetermined and thus considered to be equal.
16 Analyzing the data in S-Plus we find the group means 4. Evaluating Classification Rules x =.5639 and x =.7586 and the pooled covariance matrix S pooled = The classification rule becomes: Classify patient as suffering from multiplesclerosis if ŷ ˆm = 0.0x x x and otherwise the patient is healthy. Based on the training sample we obtain the following misclassifications: n M = 4 and n M = 3 which yields the apparent error rate APER = = 7.3%. On the other hand, by using cross-validation we obtain the misclassifications n CV M = 5 and ncv M ˆ AER = = 0.4% which is 3% higher! = 5 which yields the estimated actual error rate Note that three times as many persons are misclassified as MS patients than as healthy even while misclassification cost was assumed equal.
17 5. Classification with Several Populations 7 5. Classification with Several Populations We now consider the more general situation of separating objects from g (g ) classes and assigning new objects to one of these g classes. For i =,..., g denote f i the density associated with population π i p i the prior probability of population π i R i the subspace of outcomes assigned to π i c(j i) the cost of misclassifying an object to π j when it is from π i P (j i) the conditional probability of assigning an object of π i to π j. The (conditional) expected cost of misclassifying an object of population π is given by ECM() = P ( )c( ) + + P (g )c(g ) g = P (i )c(i ) i= and similarly we can determine the expected cost of misclassifying objects of population π,..., π g. It follows that the overall ECM equals ECM = g p j ECM(j) = j= g j= p j g P (i j)c(i j) i= i j
18 5. Classification with Several Populations 8 Result 4. The classification rule that minimizes the ECM assigns each object x to the population π i for which g p j f j (x)c(i j) j= j i is smallest. If the minimum is not unique then x can be assigned to any of the populations for which the minimum is attained. (without proof) Special Case: If all misclassification costs are equal (or unknown) we assign x to the population π i for which g j= p j f j (x) is smallest, or equivalently for j i which p i f i (x) is largest. We thus obtain Classify x as π i if p i f i (x) > p j f j (x) j i
19 5. Classification with Several Populations Classification with Normal Populations The density of population π i (i =,..., g) is now given by f i (x) = (π) p/ det(σ i ) /e (x µ i) τ Σ i (x µ i ). Result 5. If all misclassification costs are equal (or unknown) we assign x to the population π i if the (quadratic) score d i (x) = scores are given by g max j= d j(x) where the d j (x) = ln(det(σ j)) (x µ j) τ Σ j (x µ j ) + ln(p j ) j =,..., g Proof. We assign x to π i if ln(p i f i (x)) = max ln(p j f j (x)) and ln(p j f j (x)) = ln(p j ) ( ) p ln(π) ln(det(σ j)) (x µ j) τ Σ j (x µ j ). Dropping the second term which is constant yields the result. In practice, the parameters µ j and Σ j are unknown and will be replaced by the sample means x j and covariances S j which yields the sample classification rule Classify x as π i if the (quadratic) score ˆd i (x) = are given by g max j= ˆd j (x) where the scores ˆd j (x) = ln(det(s j)) (x x j) τ S j (x x j ) + ln(p j ) j =,..., g
20 5. Classification with Several Populations 0 If all covariance matrices are equal: Σ j quadratic scores d j (x) become = Σ for j =,..., g, then the d j (x) = ln(det(σ)) xτ Σ x + µ τ j Σ x µτ j Σ µ j + ln(p j ). The first two terms are the same for all d j (x) so they can be left out, which yields the (linear) scores d j (x) = µ τ j Σ x µτ j Σ µ j + ln(p j ). To estimate these scores in practice we use the sample means x j and the pooled estimate of Σ given by S pooled = (n ) + + (n g ) [(n )S + + (n g )S g ] which yields the sample classification rule Classify x as π i if the (linear) score ˆd i (x) = given by g max j= ˆd j (x) where the scores are ˆd j (x) = x τ j S pooled x xτ j S pooled x j + ln(p j ) j =,..., g Remark. In the case of equal covariance matrices, the scores d j (x) can also be reduced to d j (x) = (x µ j) τ Σ (x µ j ) + ln(p j ) = d Σ(x, µ j ) + ln(p j ) Which can be estimated by ˆd j (x) = d S pooled (x, x j ) + ln(p j ). If the prior probabilities are all equal (or unknown) we thus assign an object x to the closest population.
21 5. Classification with Several Populations Example 4. The admission board of a business school uses two measures to decide on admittance of applicants: GPA= undergraduate grade point average GMAT=graduate management aptitude test score Based on these measures applicants are categorized as: admit (π ), do not admit (π ), and borderline (π 3 ). The training set is shown below. GMAT GPA Based on the training sample with group sizes n = 3, n = 8, and n 3 = 6 we calculate the group means x = 3.40, x =.48, and x 3 =.99, and their pooled covariance matrix S pooled =
22 5. Classification with Several Populations With equal prior probabilities we assign a new applicant x = (x, x ) τ the closest class, so we compute its (quadratic) distance to each of the three classes: d S pooled (x, x ) = (x x ) τ S pooled (x x ) d S pooled (x, x ) = (x x ) τ S pooled (x x ) d S pooled (x, x 3 ) = (x x 3 ) τ S pooled (x x 3 ) Suppose a new applicant has test scores x τ = (3., 497) then we obtain ( ) d S pooled (x, x ) = =.58 ( ) d S pooled (x, x ) = = 7.0 ( ) d S pooled (x, x 3 ) = =.47 The distance from x to π 3 is thus smallest such that this applicant is a borderline case. to
6-1. Canonical Correlation Analysis
6-1. Canonical Correlation Analysis Canonical Correlatin analysis focuses on the correlation between a linear combination of the variable in one set and a linear combination of the variables in another
More informationLecture 8: Classification
1/26 Lecture 8: Classification Måns Eriksson Department of Mathematics, Uppsala University eriksson@math.uu.se Multivariate Methods 19/5 2010 Classification: introductory examples Goal: Classify an observation
More informationTHE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay
THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr Ruey S Tsay Lecture 9: Discrimination and Classification 1 Basic concept Discrimination is concerned with separating
More informationSF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko.
SF2935: MODERN METHODS OF STATISTICAL LEARNING LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS Tatjana Pavlenko 5 November 2015 SUPERVISED LEARNING (REP.) Starting point: we have an outcome
More informationClassification: Linear Discriminant Analysis
Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based
More informationDiscrimination: finding the features that separate known groups in a multivariate sample.
Discrimination and Classification Goals: Discrimination: finding the features that separate known groups in a multivariate sample. Classification: developing a rule to allocate a new object into one of
More information12 Discriminant Analysis
12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into
More informationwhich is the chance that, given the observation/subject actually comes from π 2, it is misclassified as from π 1. Analogously,
47 Chapter 11 Discrimination and Classification Suppose we have a number of multivariate observations coming from two populations, just as in the standard two-sample problem Very often we wish to, by taking
More informationDiscriminant analysis and supervised classification
Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationRobustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples
DOI 10.1186/s40064-016-1718-3 RESEARCH Open Access Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples Atinuke Adebanji 1,2, Michael Asamoah Boaheng
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationBayes Rule for Minimizing Risk
Bayes Rule for Minimizing Risk Dennis Lee April 1, 014 Introduction In class we discussed Bayes rule for minimizing the probability of error. Our goal is to generalize this rule to minimize risk instead
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationBayesian Decision Theory
Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian
More informationStat 216 Final Solutions
Stat 16 Final Solutions Name: 5/3/05 Problem 1. (5 pts) In a study of size and shape relationships for painted turtles, Jolicoeur and Mosimann measured carapace length, width, and height. Their data suggest
More informationClassification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization
Classification Volker Blobel University of Hamburg March 2005 Given objects (e.g. particle tracks), which have certain features (e.g. momentum p, specific energy loss de/ dx) and which belong to one of
More informationBayesian Decision Theory
Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationHow to evaluate credit scorecards - and why using the Gini coefficient has cost you money
How to evaluate credit scorecards - and why using the Gini coefficient has cost you money David J. Hand Imperial College London Quantitative Financial Risk Management Centre August 2009 QFRMC - Imperial
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
Memorial University of Newfoundland Pattern Recognition Lecture 6 May 18, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30-9:30 PM EN-3026 Review Distance-based Classification
More informationBuilding a Prognostic Biomarker
Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,
More informationEfficient Approach to Pattern Recognition Based on Minimization of Misclassification Probability
American Journal of Theoretical and Applied Statistics 05; 5(-): 7- Published online November 9, 05 (http://www.sciencepublishinggroup.com/j/ajtas) doi: 0.648/j.ajtas.s.060500. ISSN: 36-8999 (Print); ISSN:
More informationRegularized Discriminant Analysis. Part I. Linear and Quadratic Discriminant Analysis. Discriminant Analysis. Example. Example. Class distribution
Part I 09.06.2006 Discriminant Analysis The purpose of discriminant analysis is to assign objects to one of several (K) groups based on a set of measurements X = (X 1, X 2,..., X p ) which are obtained
More informationChapter 10 Logistic Regression
Chapter 10 Logistic Regression Data Mining for Business Intelligence Shmueli, Patel & Bruce Galit Shmueli and Peter Bruce 2010 Logistic Regression Extends idea of linear regression to situation where outcome
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationStatistical Classification. Minsoo Kim Pomona College Advisor: Jo Hardin
Statistical Classification Minsoo Kim Pomona College Advisor: Jo Hardin April 2, 2010 2 Contents 1 Introduction 5 2 Basic Discriminants 7 2.1 Linear Discriminant Analysis for Two Populations...................
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationClassification 1: Linear regression of indicators, linear discriminant analysis
Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification
More informationLinear Classifiers as Pattern Detectors
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear
More informationIntroduction to Logistic Regression
Introduction to Logistic Regression Problem & Data Overview Primary Research Questions: 1. What are the risk factors associated with CHD? Regression Questions: 1. What is Y? 2. What is X? Did player develop
More informationSTATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010
STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationLet s assume that 700 patients suffering from kidney stones have been scored for: the size of the stones, classified into either large or small
Chapter Logistic regression. A dataset Let s assume that 700 patients suffering from kidney stones have been scored for: the size of the stones, classified into either large or small the type of treatment
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationAlgorithm-Independent Learning Issues
Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning
More informationLogistic Regression: Regression with a Binary Dependent Variable
Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationStatistical methods in recognition. Why is classification a problem?
Statistical methods in recognition Basic steps in classifier design collect training images choose a classification model estimate parameters of classification model from training images evaluate model
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationA Robust Approach to Regularized Discriminant Analysis
A Robust Approach to Regularized Discriminant Analysis Moritz Gschwandtner Department of Statistics and Probability Theory Vienna University of Technology, Austria Österreichische Statistiktage, Graz,
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationFundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur
Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationBayesian Decision Theory
Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities
More informationINF Introduction to classifiction Anne Solberg
INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300
More informationCOWLEY COLLEGE & Area Vocational Technical School
COWLEY COLLEGE & Area Vocational Technical School COURSE PROCEDURE FOR COLLEGE ALGEBRA WITH REVIEW MTH 4421 5 Credit Hours Student Level: This course is open to students on the college level in the freshman
More informationFast and Robust Discriminant Analysis
Fast and Robust Discriminant Analysis Mia Hubert a,1, Katrien Van Driessen b a Department of Mathematics, Katholieke Universiteit Leuven, W. De Croylaan 54, B-3001 Leuven. b UFSIA-RUCA Faculty of Applied
More informationSemiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance. Probal Chaudhuri and Subhajit Dutta
Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance Probal Chaudhuri and Subhajit Dutta Indian Statistical Institute, Kolkata. Workshop on Classification and Regression
More informationStatistical Machine Learning Hilary Term 2018
Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification
ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationPrincipal component analysis
Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationGenerative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham
Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples
More informationBayes Decision Theory
Bayes Decision Theory Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 16
More informationFast and Robust Classifiers Adjusted for Skewness
Fast and Robust Classifiers Adjusted for Skewness Mia Hubert 1 and Stephan Van der Veeken 2 1 Department of Mathematics - LStat, Katholieke Universiteit Leuven Celestijnenlaan 200B, Leuven, Belgium, Mia.Hubert@wis.kuleuven.be
More informationAn Alternative Algorithm for Classification Based on Robust Mahalanobis Distance
Dhaka Univ. J. Sci. 61(1): 81-85, 2013 (January) An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance A. H. Sajib, A. Z. M. Shafiullah 1 and A. H. Sumon Department of Statistics,
More informationBayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory
Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology
More informationModelling Academic Risks of Students in a Polytechnic System With the Use of Discriminant Analysis
Progress in Applied Mathematics Vol. 6, No., 03, pp. [59-69] DOI: 0.3968/j.pam.955803060.738 ISSN 95-5X [Print] ISSN 95-58 [Online] www.cscanada.net www.cscanada.org Modelling Academic Risks of Students
More informationLecture Notes 1: Vector spaces
Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector
More information44 CHAPTER 2. BAYESIAN DECISION THEORY
44 CHAPTER 2. BAYESIAN DECISION THEORY Problems Section 2.1 1. In the two-category case, under the Bayes decision rule the conditional error is given by Eq. 7. Even if the posterior densities are continuous,
More informationStatistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests
Statistical Methods for Particle Physics Lecture 1: parameter estimation, statistical tests http://benasque.org/2018tae/cgi-bin/talks/allprint.pl TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics
More informationMinimum Error Rate Classification
Minimum Error Rate Classification Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Table of Contents 1.Minimum Error Rate Classification...
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationIntroduction to Data Science
Introduction to Data Science Winter Semester 2018/19 Oliver Ernst TU Chemnitz, Fakultät für Mathematik, Professur Numerische Mathematik Lecture Slides Contents I 1 What is Data Science? 2 Learning Theory
More informationLecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides
Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture
More informationCS 590D Lecture Notes
CS 590D Lecture Notes David Wemhoener December 017 1 Introduction Figure 1: Various Separators We previously looked at the perceptron learning algorithm, which gives us a separator for linearly separable
More informationALGEBRA II Grades 9-12
Summer 2015 Units: 10 high school credits UC Requirement Category: c General Description: ALGEBRA II Grades 9-12 Algebra II is a course which further develops the concepts learned in Algebra I. It will
More informationCHAPTER 2. Types of Effect size indices: An Overview of the Literature
CHAPTER Types of Effect size indices: An Overview of the Literature There are different types of effect size indices as a result of their different interpretations. Huberty (00) names three different types:
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationLearning From Crowds. Presented by: Bei Peng 03/24/15
Learning From Crowds Presented by: Bei Peng 03/24/15 1 Supervised Learning Given labeled training data, learn to generalize well on unseen data Binary classification ( ) Multi-class classification ( y
More informationStatistical aspects of prediction models with high-dimensional data
Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by
More informationProblem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30
Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain
More informationApplied Machine Learning Annalisa Marsico
Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature
More informationBayesian Decision and Bayesian Learning
Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i
More informationAn Application of Discriminant Analysis On University Matriculation Examination Scores For Candidates Admitted Into Anamabra State University
An Application of Discriminant Analysis On University Matriculation Examination Scores For Candidates Admitted Into Anamabra State University Okoli C.N Department of Statistics Anambra State University,
More informationCOWLEY COLLEGE & Area Vocational Technical School
COWLEY COLLEGE & Area Vocational Technical School COURSE PROCEDURE FOR Student Level: This course is open to students on the college level in their freshman or sophomore year. Catalog Description: INTERMEDIATE
More informationRobot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction
Robot Image Credit: Viktoriya Sukhanova 13RF.com Dimensionality Reduction Feature Selection vs. Dimensionality Reduction Feature Selection (last time) Select a subset of features. When classifying novel
More informationApplication: Can we tell what people are looking at from their brain activity (in real time)? Gaussian Spatial Smooth
Application: Can we tell what people are looking at from their brain activity (in real time? Gaussian Spatial Smooth 0 The Data Block Paradigm (six runs per subject Three Categories of Objects (counterbalanced
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationLinear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.
Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also
More informationProbability Models for Bayesian Recognition
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIAG / osig Second Semester 06/07 Lesson 9 0 arch 07 Probability odels for Bayesian Recognition Notation... Supervised Learning for Bayesian
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationINF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.
INF 4300 700 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter -6 6inDuda and Hart: attern Classification 303 INF 4300 Introduction to classification One of the most challenging
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationPrincipal Component Analysis -- PCA (also called Karhunen-Loeve transformation)
Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationFeature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size
Feature selection and classifier performance in computer-aided diagnosis: The effect of finite sample size Berkman Sahiner, a) Heang-Ping Chan, Nicholas Petrick, Robert F. Wagner, b) and Lubomir Hadjiiski
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More information