Lecture 8: Classification
|
|
- Jeffery Hodge
- 6 years ago
- Views:
Transcription
1 1/26 Lecture 8: Classification Måns Eriksson Department of Mathematics, Uppsala University Multivariate Methods 19/5 2010
2 Classification: introductory examples Goal: Classify an observation x as belonging to one of several predefined categories, π 1, π 2,..., π g. Examples: Classify insects into one of several sub-species using measurements on external features. Use measurements on blood proteins and family history to classify women as carriers or non-carriers of a genetic disorder. Classify the quality of a new mobile phone battery as good or bad based on a few preliminary measurements. Use information on background, family support, psychological test scores etc. to screen applicants for parole from prison. Use information on sex, age, income, education level, marital status, debts etc. to classify a potential borrower as eligible or ineligible for a bank loan. Detect spam messages based upon the message header and content. 2/26
3 3/26 Classification: discrimination rule We d like to find a discrimination rule for classification that In general classifies observations correctly. Minimizes the probability of misclassification. Minimizes the expected cost of misclassification. Ideally should be a simple rule. If the distributions of the populations are known, we can use this knowledge to derive rules. Otherwise, we must use training data to find good rules.
4 4/26 ML approach: assumptions We ve seem many times before that the likelihood approach yields good tests and estimators. It seems reasonable to try to use it for classification. In the maximum likelihood approach to discrimination, the distributions of the g populations are assumed to be known. Simplest to analyse theoretically, although the least realistic in practice - Mardia, Kent & Bibby (1979). The ML discriminant rule for allocating an observation x to one of the populations π 1,..., π g is to allocate x to the population which gives the largest likelihood to x. See blackboard!
5 ML approach: univariate case Consider the univariate case with two populations: Population 1: X N(µ 1, σ 2 ) Population 2: X N(µ 2, σ 2 ) /26
6 6/26 ML approach: likelihood ratio For these populations, the likelihood ratio in the point x is: Thus: λ = likelihood of x for Pop. 1 likelihood of x for Pop. 2 = f 1(x) f 2 (x) = 1 σ 2π e (x µ 1) 2 /2σ 2 1 σ 2π e (x µ 2) 2 /2σ 2 { λ = exp 1 ( (x µ1 ) 2 2 σ 2 (x µ 2) 2 )} σ 2 Rule: classify x into Pop. 1 if λ > 1, into Pop. 2 if λ < 1 and flip a coin if λ = 1.
7 7/26 ML approach: further elaboration The rule tells us to classify into Pop. 1 if the standardized distance of x from µ 1 is less than the standardized distance of x from µ 2. We can rewrite the rule in a simpler form by taking logarithms: 2 ln λ = (x µ 1) 2 σ 2 (x µ 2) 2 σ 2 = 2 (µ 1 µ 2 ) σ 2 x + µ2 1 µ2 2 σ 2 = βx + α Rule: classify into Pop. 1 if βx + α < 0. This is a linear rule.
8 8/26 ML approach: multivariate case Now, consider the more general setting where p traits are measured, with Population 1: X N p (µ 1, Σ 1 ), Population 2: X N p (µ 2, Σ 2 ). Consider the natural logarithm of the likelihood ratio for an observed x for some individual: ( 1 (2π) p/2 (det(σ 1)) exp { (x µ 1/2 1 ) Σ 1 2 ln ( ) f1 (x) = 2 ln f 2 (x) =ln(det Σ 1 ) + (x µ 1 ) Σ 1 1 (x µ 1) 1 (x µ 1)/2 } ) 2 (x µ 2)/2 } ( ) ln(det Σ 2 ) + (x µ 2 ) Σ 1 2 (x µ 2) 1 (2π) p/2 (det(σ 2)) exp { (x µ 1/2 2 ) Σ 1 Classify into Pop 1 if this quantity is less than zero, otherwise classify into Pop 2. This is a quadratic rule, but if the covariance matrices are equal it reduces to a linear rule. See blackboard!
9 9/26 ML approach: general g Theorem. If π i is the N p (µ i, Σ) population, i = 1,..., g and Σ > 0, then the ML discriminant rule allocates x to π j, where j {1,..., g} is that value of i which minimizes the square of the Mahalanobis distance (x µ i ) Σ 1 (x µ i ). When g = 2, the rule allocates x to π 1 if α (x µ) > 0 where α = Σ 1 (µ 1 µ 2 ) and µ = 1 2 (µ 1 + µ 2 ), and to π 2 otherwise.
10 10/26 ML approach: an example See blackboard!
11 11/26 Decision theory approach: idea In many cases, we have some more information or would like to have other criterions for the classification. What if we want to minimize the probability of misclassification? Does that change anything? What if one kind of misclassification costs more than another? How can we take this into account? What if we know beforehand that, say, 80 % of the observations belong to π 1 and 20 % belong to π 2? How can we use this knowledge? Decision theory is the theory concerned with finding optimal decisions given certain information. Estimation and testing can both be viewed in a decision theoretical context, as can classification.
12 12/26 Decision theory approach: misclassification Suppose that we have some prior probabilities of the observations belonging to the different populations. Prior probability that an individual comes from Pop. 1: p 1 = P(π 1 ) Prior probability that an individual comes from Pop. 2: Now, let p 2 = P(π 2 ) = 1 p 1. P(2 1) = Conditional probability of misclassifying an observation into Pop. 2 if the observation actually belongs to Pop. 1. P(1 2) = Conditional probability of misclassifying an observation into Pop. 1 if the observation actually belongs to Pop. 2.
13 13/26 Decision theory approach: TPM Partition the sample space into R 1 and R 2 = R 1 such that If x R 1, classify to π 1 If x R 2, classify to π 2 The TPM Total Probability of Misclassification is defined as P(Misclassific.) = P(x is in π 2 but is classified as π 1 ) +P(x is in π 1 but is classified as π 2 ) = P(Classify x in π 1 π 2 )P(π 2 ) + P(Classify x in π 2 π 1 )P(π 1 ) = P(1 2)p 2 + P(2 1)p 1 See blackboard!
14 14/26 Decision theory approach: misclassification costs What if the costs of the different misclassifications differ? For instance, the cost of classifying a patient with a deadly disease as healthy is higher than the cost of classifying a healthy patient as having a deadly disease. Define the costs c(1 2), c(2 1) Costs table True population: Classify as: π 1 π 2 π 1 0 c(2 1) π 2 c(1 2) 0 and study the ECM Expected Cost of Misclassification: ECM = E(cost of decision) = c(1 2)P(1 2)p 2 + c(2 1)P(2 1)p 1 If c(1 2) = c(2 1), minimizing ECM is mathematically equivalent to minimizing TPM.
15 15/26 Decision theory approach: minimization of ECM Result 11.1: The regions R 1 and R 2 that minimize the ECM are R 1 : { f 1 (x) x; f 2 (x) c(1 2) } p 2 c(2 1) p 1 R 2 : { f 1 (x) x; f 2 (x) < c(1 2) } p 2 c(2 1) x is classified into π 1 if x R 1 and into π 2 if x R 2. The maximum likelihood approach can be viewed as the special case where p 1 = p 2 and c(1 2) = c(2 1), or where p 2 /p 1 = c(1 2)/c(2 1): Classify x 0 to π 1 if f 1 (x 0 ) f 2 (x 0 ) > 1 p 1
16 Comparing P(π 1 x) and P(π 2 x) is equivalent to taking c(1 2) = c(2 1) in the decision theory approach. 16/26 Decision theory approach: Bayesian approach Recall Bayes theorem: P(A B) = P(B A). P(B) Bayesian approach: allocate x to the population with largest posterior probability P(π i x). We find that P(π 1 x) = P(π 1, x) P(x π 1 )P(π 1 ) = P(x) P(x π 1 )P(π 1 ) + P(x π 2 )P(π 2 ) p 1 f 1 (x) = p 1 f 1 (x) + p 2 f 2 (x) Similarly, we get P(π 2 x) = p 2 f 2 (x) p 1 f 1 (x) + p 2 f 2 (x)
17 17/26 Decision theory approach: normal data If π 1 is N p (µ 1, Σ) and π 2 is N p (µ 2, Σ) then the decision theory rule becomes: allocate x to π 1 if (µ 1 µ 2 ) Σ 1 x 1 [( ) ( )] c(1 2) 2 (µ 1 µ 2 ) Σ 1 p2 (µ 1 +µ 2 ) ln c(2 1) and to π 2 otherwise. Now assume that we have two normal populations with unknown means and equal but unknown covariance matrices. n 1 observations are available fromn π 1 and n 2 observations are available from π 2. The estimated minimum ECM rule is: Allocate a new observation x 0 to π 1 if ( x 1 x 2 ) S 1 pool x 0 1 [( ) ( )] c(1 2) 2 ( x 1 x 2 ) S 1 pool ( x p2 1+ x 2 ) ln c(2 1) Allocate x 0 to π 2 otherwise. p 1 p 1
18 18/26 Decision theory approach: quadratic classification rule Now suppose that we have two normal populations with unequal covariance matrices Σ 1 and Σ 2. In this case, the classification rule becomes more complicated: Allocate x 0 to π 1 if 1 [( ) ( )] c(1 2) 2 x 0(Σ 1 1 Σ 1 2 )x 0+(µ 1Σ 1 1 µ 2Σ 1 2 )x p2 0 k ln c(2 1) p 1 where k = 1 ( ) 2 ln det(σ1 ) + 1 ( µ det(σ 2 ) 2 1 Σ 1 1 µ 1 µ 2Σ 1 2 µ 2). Allocate x 0 to π 2 otherwise. Replacing µ i with x i and Σ i with S i we obtain the estimated minimum ECM rule.
19 19/26 Fisher s approach: introduction Suppose that observations from g populations are given and that we wish to use these to classify a new observation. It is easier to tell the populations apart if the variation between the groups is larger than the variation within the groups. Let B = W = g n l ( x l x)( x l x) l=1 g n l (x lj x l )(x lj x l ) l=1 j=1 Fisher s idea: Look for the linear function a x which maximizes the ratio of between-group sum of squares to within-group of sum of squares.
20 20/26 Fisher s approach: linear discriminant function For the vector a maximizing a Ba a Wa the linear function a x is called Fisher s linear discriminant function. Theorem. The vector a in Fisher s linear discriminant function is the eigenvector of W 1 B corresponding to the largest eigenvalue.
21 21/26 Fisher s approach: allocation rule Rule: an observation x is allocated to that population whose mean score is closest to a x. That is, allocate x to π j if a x a x j < a x a x i for all i j For g = 2 groups, this becomes: allocate x to π 1 if d W (x 1 1 ) 2 ( x 1 + x 2 ) > 0 where d = x 1 x 2 ; allocate to π 2 otherwise. This coincides with the classification rule obtained by the ML approach for two multivariate normal populations with equal covariance matrices. However, no assumption of normality was made here.
22 22/26 Discrimination and MANOVA Of course, before applying our classification methods, we should ask ourselves if it really is meaningful to use a certain dataset for classification. Consider g multinormal populations, assumed to have the same covariance matrix, Σ 1 =... = Σ g. To check whether or not a discriminant analysis is worthwhile, test the hypothesis µ 1 =... = µ g This is the MANOVA problem!
23 23/26 Evaluating classification functions OER (Optimum Error Rate): The smallest value of TPM. AER (Actual Error Rate): Based on performance of sample classification functions. APER (Apparent Error Rate): The fraction of observations in a training sample that are misclassified by the sample classification function. Lachenbruch s cross-validation procedure: method for estimating AER. See blackboard!
24 24/26 The Museum Gustavianum sword data Museum Gustavianum in downtown Uppsala has a large collection of antique swords. In a current research project, the researchers are measuring lengths and various other properties of the swords. The goal is to use this to classify swords as coming from different epochs. They need help from someone with knowledge of classification methods, so they asked the mathematics department for help. This could be a fun project for a bachelor or master thesis, or perhaps just a nice project to work with along with your studies. Talk to Jesper Rydén, jesper@math.uu.se, if you are interested or if you would like to know more!
25 Classification: a second look at the introductory examples Can the methods presented today be used in our introductory examples? Examples: Classify insects into one of several sub-species using measurements on external features. Use measurements on blood proteins and family history to classify women as carriers or non-carriers of a genetic disorder. Classify the quality of a new mobile phone battery as good or bad based on a few preliminary measurements. Use information on background, family support, psychological test scores etc. to screen applicants for parole from prison. Use information on sex, age, income, education level, marital status, debts etc. to classify a potential borrower as eligible or ineligible for a bank loan. Detect spam messages based upon the message header and content. 25/26
26 26/26 Classification: next lecture In the next lecture, we will talk about decision trees (for classification) and other algorithmic methods. These methods are popular within, for instance, the field of data mining and do not require assumptions about distributions. We will also compare ordinary probabilistic methods with algorithmic methods and discuss when the different methods should be used.
SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko.
SF2935: MODERN METHODS OF STATISTICAL LEARNING LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS Tatjana Pavlenko 5 November 2015 SUPERVISED LEARNING (REP.) Starting point: we have an outcome
More informationDISCRIMINANT ANALYSIS. 1. Introduction
DISCRIMINANT ANALYSIS. Introduction Discrimination and classification are concerned with separating objects from different populations into different groups and with allocating new observations to one
More informationTHE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay
THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr Ruey S Tsay Lecture 9: Discrimination and Classification 1 Basic concept Discrimination is concerned with separating
More information12 Discriminant Analysis
12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into
More informationDiscrimination: finding the features that separate known groups in a multivariate sample.
Discrimination and Classification Goals: Discrimination: finding the features that separate known groups in a multivariate sample. Classification: developing a rule to allocate a new object into one of
More information6-1. Canonical Correlation Analysis
6-1. Canonical Correlation Analysis Canonical Correlatin analysis focuses on the correlation between a linear combination of the variable in one set and a linear combination of the variables in another
More informationDiscriminant analysis and supervised classification
Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationBayesian Decision Theory
Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian
More informationGenerative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham
Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationBayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory
Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationClassification: Linear Discriminant Analysis
Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based
More informationwhich is the chance that, given the observation/subject actually comes from π 2, it is misclassified as from π 1. Analogously,
47 Chapter 11 Discrimination and Classification Suppose we have a number of multivariate observations coming from two populations, just as in the standard two-sample problem Very often we wish to, by taking
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationThe Naïve Bayes Classifier. Machine Learning Fall 2017
The Naïve Bayes Classifier Machine Learning Fall 2017 1 Today s lecture The naïve Bayes Classifier Learning the naïve Bayes Classifier Practical concerns 2 Today s lecture The naïve Bayes Classifier Learning
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationPrincipal component analysis
Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
Memorial University of Newfoundland Pattern Recognition Lecture 6 May 18, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30-9:30 PM EN-3026 Review Distance-based Classification
More informationProblem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30
Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationRobustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples
DOI 10.1186/s40064-016-1718-3 RESEARCH Open Access Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples Atinuke Adebanji 1,2, Michael Asamoah Boaheng
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationBayesian Decision and Bayesian Learning
Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i
More informationBayes Rule. CS789: Machine Learning and Neural Network Bayesian learning. A Side Note on Probability. What will we learn in this lecture?
Bayes Rule CS789: Machine Learning and Neural Network Bayesian learning P (Y X) = P (X Y )P (Y ) P (X) Jakramate Bootkrajang Department of Computer Science Chiang Mai University P (Y ): prior belief, prior
More informationIntroduction to Data Science
Introduction to Data Science Winter Semester 2018/19 Oliver Ernst TU Chemnitz, Fakultät für Mathematik, Professur Numerische Mathematik Lecture Slides Contents I 1 What is Data Science? 2 Learning Theory
More informationToday. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?
Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationBayesian Decision Theory
Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification
ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationClassification 1: Linear regression of indicators, linear discriminant analysis
Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification
More information7 Gaussian Discriminant Analysis (including QDA and LDA)
36 Jonathan Richard Shewchuk 7 Gaussian Discriminant Analysis (including QDA and LDA) GAUSSIAN DISCRIMINANT ANALYSIS Fundamental assumption: each class comes from normal distribution (Gaussian). X N(µ,
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More information5. Discriminant analysis
5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationData Mining 2018 Logistic Regression Text Classification
Data Mining 2018 Logistic Regression Text Classification Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Data Mining 1 / 50 Two types of approaches to classification In (probabilistic)
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationFundamentals to Biostatistics. Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur
Fundamentals to Biostatistics Prof. Chandan Chakraborty Associate Professor School of Medical Science & Technology IIT Kharagpur Statistics collection, analysis, interpretation of data development of new
More informationBayes Decision Theory
Bayes Decision Theory Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 16
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationSTATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010
STATS306B Discriminant Analysis Jonathan Taylor Department of Statistics Stanford University June 3, 2010 Spring 2010 Classification Given K classes in R p, represented as densities f i (x), 1 i K classify
More informationGenerative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul
Generative Learning INFO-4604, Applied Machine Learning University of Colorado Boulder November 29, 2018 Prof. Michael Paul Generative vs Discriminative The classification algorithms we have seen so far
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,
More informationMachine Learning for Signal Processing Bayes Classification
Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationIntelligent Systems Statistical Machine Learning
Intelligent Systems Statistical Machine Learning Carsten Rother, Dmitrij Schlesinger WS2014/2015, Our tasks (recap) The model: two variables are usually present: - the first one is typically discrete k
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis Jonathan Taylor, 10/12 Slide credits: Sergio Bacallado 1 / 1 Review: Main strategy in Chapter 4 Find an estimate ˆP
More informationMotivating the Covariance Matrix
Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationClassification 2: Linear discriminant analysis (continued); logistic regression
Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:
More informationMachine Learning Lecture 2
Announcements Machine Learning Lecture 2 Eceptional number of lecture participants this year Current count: 449 participants This is very nice, but it stretches our resources to their limits Probability
More informationStatistical Classification. Minsoo Kim Pomona College Advisor: Jo Hardin
Statistical Classification Minsoo Kim Pomona College Advisor: Jo Hardin April 2, 2010 2 Contents 1 Introduction 5 2 Basic Discriminants 7 2.1 Linear Discriminant Analysis for Two Populations...................
More informationDiscriminant Analysis with High Dimensional. von Mises-Fisher distribution and
Athens Journal of Sciences December 2014 Discriminant Analysis with High Dimensional von Mises - Fisher Distributions By Mario Romanazzi This paper extends previous work in discriminant analysis with von
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationCS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis
CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationBayesian Decision Theory
Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is
More informationMinimum Error Rate Classification
Minimum Error Rate Classification Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Table of Contents 1.Minimum Error Rate Classification...
More informationPart I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis
Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the
More informationNon-parametric Methods
Non-parametric Methods Machine Learning Alireza Ghane Non-Parametric Methods Alireza Ghane / Torsten Möller 1 Outline Machine Learning: What, Why, and How? Curve Fitting: (e.g.) Regression and Model Selection
More informationTable of Contents. Multivariate methods. Introduction II. Introduction I
Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation
More informationLecture 5: Hypothesis tests for more than one sample
1/23 Lecture 5: Hypothesis tests for more than one sample Måns Thulin Department of Mathematics, Uppsala University thulin@math.uu.se Multivariate Methods 8/4 2011 2/23 Outline Paired comparisons Repeated
More informationInfo 2950, Lecture 4
Info 2950, Lecture 4 7 Feb 2017 More Programming and Statistics Boot Camps? This week (only): PG Wed Office Hour 8 Feb at 3pm Prob Set 1: due Mon night 13 Feb (no extensions ) Note: Added part to problem
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationClassification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization
Classification Volker Blobel University of Hamburg March 2005 Given objects (e.g. particle tracks), which have certain features (e.g. momentum p, specific energy loss de/ dx) and which belong to one of
More informationMS-E2112 Multivariate Statistical Analysis (5cr) Lecture 8: Canonical Correlation Analysis
MS-E2112 Multivariate Statistical (5cr) Lecture 8: Contents Canonical correlation analysis involves partition of variables into two vectors x and y. The aim is to find linear combinations α T x and β
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 1
CS434a/541a: Pattern Recognition Prof. Olga Veksler Lecture 1 1 Outline of the lecture Syllabus Introduction to Pattern Recognition Review of Probability/Statistics 2 Syllabus Prerequisite Analysis of
More informationGenerative Learning algorithms
CS9 Lecture notes Andrew Ng Part IV Generative Learning algorithms So far, we ve mainly been talking about learning algorithms that model p(y x; θ), the conditional distribution of y given x. For instance,
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationSUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION
SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology
More information10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification
10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene
More informationStat 216 Final Solutions
Stat 16 Final Solutions Name: 5/3/05 Problem 1. (5 pts) In a study of size and shape relationships for painted turtles, Jolicoeur and Mosimann measured carapace length, width, and height. Their data suggest
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More information