THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay
|
|
- Clyde Osborne
- 6 years ago
- Views:
Transcription
1 THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr Ruey S Tsay Lecture 9: Discrimination and Classification 1 Basic concept Discrimination is concerned with separating distinct sets of observations whereas classification is to allocate new objects to previously defined groups The goals of these two multivariate techniques are 1 Discrimination: To describe the differential features of objects (observations) from several known collections (populations) To find discriminants whose numerical values are such that the collections are separated as much as possible 2 Classification: To sort objects (observations) into two or more labeled classes To derive a rule that can be used to optimally assign new objects to the labeled classes 2 The case of two populations Denote the two populations by π 1 and π 2, respectively Denote by X = (X 1,, X p ), the p-dimensional random vector of measurements from the object of the populations Let f i (x) be the probability density function of X under the ith population An object with associated measurements x must be assigned to either π 1 or π 2 Let Ω be the sample space, ie the collection of all possible values of x Let R 1 be the set of x values for which we classify objects as π 1 and R 2 = Ω R 1 be the remaining x values for which we classify objects as π 2 Since every object must be assigned to one and only one of the two populations, the sets R 1 and R 2 are mutually exclusive and exhaustive The condition probability, P (2 1), of classifying an object as π 2 when, in fact, it is from π 1 is P (2 1) = P (X R 2 π 1 ) = f 1 (x)dx R 2 Similarly, the conditional probability, P (1 2), of classifying an object as π 1 when is is really from π 2 is P (1 2) = P (X R 1 π 2 ) = f 2 (x)dx R 1 Suppose that the prior distribution for the populations is P (π i ) = p i, i = 1 and 2, where p 1 + p 2 = 1 Then, we have
2 P(obs correctly classified as π 1 ) = P(obs comes from π 1 and is correctly classified as π 1 ) = P (X R 1 π 1 )P (π 1 ) = P (1 1)p 1 P(obs incorrectly classified as π 1 ) = P(obs comes from π 2 and is incorrectly classified as π 1 ) = P (X R 1 π 2 )P (π 2 ) = P (1 2)p 2 P(obs correctly classified as π 2 ) = P(obs comes from π 2 and is correctly classified as π 2 ) = P (X R 2 π 2 )P (π 2 ) = P (2 2)p 2 P(obs incorrectly classified as π 2 ) = P(obs comes from π 1 and is incorrectly classified as π 2 ) = P (X R 2 π 1 )P (π 1 ) = P (2 1)p 1 The cost of misclassification is defined by a cost matrix: Classify as: π 1 π 2 True population: π 1 0 c(2 1) π 2 c(1 2) 0 Thus, the costs are (1) zero for correct classification, (2) c(1 2) when an observation from π 2 is incorrectly classified as π 1, and (3) c(2 1) when a π 1 observation is incorrectly classified as π 2 The expected cost of misclassification is ECM = c(2 1)P (2 1)p 1 + c(1 2)P (1 2)p 2 A reasonable classification rule should have an ECM as small as possible Result 111 The regions R 1 and R 2 that minimize the ECM are defined by the values x for which the following inequalities hold: R 1 : f 1(x) f 2 (x) c(1 2) c(2 1) p 2 p 1 Thus, R 1 consists of x such that ( density ratio R 2 : f 1(x) f 2 (x) < c(1 2) c(2 1) p 2 p 1 ) ( cost ratio ) ( prior prob ratio ) Proof: See the hint at Exercise 113 Another criterion: Total probability of misclassification (TPM) T P M = p 1 R 2 f 1 (x)dx + p 2 2 R 1 f 2 (x)dx
3 This criterion is the same as the ECM if the costs of misclassification are the same Finally, a new observation x o can be allocated to the population with the largest posterior probability P (π i x o ) Specifically, by Bayes s rule, the posterior probabilities are P (π 1 x o ) = P (π 1 occurs and we observe x o ) P (we observe x o ) P (we observe x o π 1 )P (π 1 ) = P (we observe x o π 1 )P (π 1 ) + P (we observe x o π 2 )P (π 2 ) p 1 f 1 (x o ) = p 1 f 1 (x o ) + p 2 f 2 (x o ) P (π 2 x o ) = p 2 f 2 (x o ) p 1 f 1 (x o ) + p 2 f 2 (x o ) Classifying an observation x o as π 1 if P (π 1 x o ) > P (π 2 x o ) 3 Classification with two multivariate normal populations Case 1: equal covariance matrices Result 112 Let the populations π i (i = 1 and 2) be described by the density functions N p (µ i, Σ) Then, the allocation rule that minimizes the ECM is as follows: Allocate x o to π 1 if (µ 1 µ 2 ) Σ 1 x o 1 [( ) ( )] c(1 2) 2 (µ 1 µ 2 ) Σ 1 p2 (µ 1 + µ 2 ) ln c(2 1) p 1 Allocate x o to π 2 otherwise Proof: Use the identity 1 2 (x µ 1) Σ 1 (x µ 1 ) (x µ 2) Σ 1 (x µ 2 ) = (µ 1 µ 2 ) Σ 1 x Also, the minimum ECM regions are 1 2 (µ 1 µ 2 ) Σ 1 (µ 1 + µ 2 ) [ R 1 : exp 1 2 (x µ 1) Σ 1 (x µ 1 ) + 1 ] 2 (x µ 2) Σ 1 (x µ 2 ) c(1 2)p 2 [ R 2 : exp 1 2 (x µ 1) Σ 1 (x µ 1 ) + 1 ] 2 (x µ 2) Σ 1 (x µ 2 ) < c(1 2)p 2 3
4 In practice, Σ is estimated by the pooled covariance matrix S pool and µ i by sample mean of the population π 1 Consequently, the sample classification rule is as follows: The estimated minimum ECM rule for two normal populations: Allocate x o to π 1 if ( x 1 x 2 ) S 1 poolx o 1 [( ) ( )] c(1 2) 2 ( x 1 x 2 ) S 1 p2 pool( x 1 + x 2 ) ln c(2 1) p 1 Allocate x o to π 2 otherwise Fisher s approach to classification with two populations: Fisher arrived at prior minimum ECM rule using an alternative argument He considered linear combinations of the random vector so as to transform multivariate distributions into univariate ones The transformation is chosen to achieve maximum separation of the transformed sample means Let y 1i = a x 1i and y 2i = a x 2i be the transformed samples Also,let ȳ j be the sample mean of {y ji }, j = 1 and 2, and s 2 y = n1 i=1(y 1i ȳ 1 ) 2 + n 2 i=1(y 2i ȳ 2 ) 2 n 1 + n 2 2 The separation is measured by (ȳ 1 ȳ 2 ) 2 /s 2 y Result 113 The linear transformation ŷ = â x = ( x 1 x 2 ) S 1 poolx maximizes the ratio (ȳ 1 ȳ 2 ) 2 s 2 y = [â ( x 1 x 2 )] 2 â S pool â = (â d) 2 â S pool â over all possible coefficient vectors â, where d = x 1 x 2 The maximum of the above ratio is D 2 = ( x 1 x 2 ) S 1 pool( x 1 x 2 ) Proof: Use the inequality in Eq (2-50) of the textbook An allocation rule based on Fisher s discriminant function: Allocate x o to π 1 if ŷ o = ( x 1 x 2 ) S 1 poolx o ˆm where ˆm = 1 2 ( x 1 x 2 ) S 1 pool( x 1 + x 2 ) Allocate x o to π 2 otherwise This rule is known as the Fisher linear discriminant function Case 2: Unequal covariance matrices: Quadratic classification rule The Result 111 leads to the following classification regions when the covariance matrices are not equal R 1 : 1 [ ] 2 x (Σ 1 1 Σ 1 2 )x + (µ 1Σ 1 1 µ 2Σ 1 c(1 2)p2 2 )x k ln, 4
5 where R 2 : 1 [ ] 2 x (Σ 1 1 Σ 1 2 )x + (µ 1Σ 1 1 µ 2Σ 1 c(1 2)p2 2 )x k < ln, k = 1 ( ) 2 ln Σ1 + 1 Σ 2 2 (µ 1Σ 1 1 µ 1 µ 2Σ 1 2 µ 2 ) For a new observation x o, we may use sample statistics to obtain the following quadratic classification rule: Allocate x o to π 1 if [ ] 1 2 x o(s 1 1 S 1 2 )x o + ( x 1S 1 1 x 2S 1 c(1 2)p2 2 )x o k ln Allocate x o to π 2 otherwise 4 Evaluating classification functions If the population distributions are known, the total probability of misclassification is T P M = p 1 f 1 (x)dx + p 2 f 2 (x)dx R 2 R 1 The choices of R 1 and R 2 that minimizes TPM give rise to the optimum error rate (OER) When R 1 and R 2 are chosen based on Result 111 with c(1 2) = c(2 1), we have OER Thus, OER is the error rate determined by the minimum TPM classification rule In practice, the population densities are unknown, the actual error rate (AER) is AER = p 1 f 1 (x)dx + p 2 f 2 (x)dx, ˆR 2 ˆR 1 where ˆR i are the classification regions determined by the sample statistics A measure that does not depend on the form of populations is called apparent error rate (APER), which is defined as the fraction of observations in the training sample that are misclassified by the sample classification function Predicted membership π 1 π 2 Actual π 1 n 1c n 1m = n 1 n 1c membership π 2 n 2m = n 2 n 2c n 2c where n 1c = number of π 1 items correctly classified as π 1 items n 1m = number of π 1 items incorrectly classified as π 2 items 5
6 n 2c = number of π 2 items correctly classified as π 2 items n 2m = number of π 2 items incorrectly classified as π 1 items The apparent error rate is then AP ER = n 1m + n 2m n 1 + n 2 Example Example 118 about classification of Alaskan and Canadian salmon Page 603 of the textbook 5 More than two populations 51 The minimum expected cost of misclassification method Let f i (x) be the density function of the population π i, i = 1,, g Define p i = prior probability of population π i c(k i) = the cost of allocating an item to π k when, in fact, it belongs to π i, for k, i = 1,, g (with c(i i) = 0) R i = the set of x s classified as π i P (k i) = P(classifying item an π k π i ) = R k f i (x)dx Note that {R i } are mutually exclusive and exhaustive The cost of misclassifying an item x from π i is g ECM(i) = P (1 i)c(1 i) + P (2 i)c(2 i) + + P (g i)c(g i) = P (j i)c(j i), j=1 where it is understood that c(i i) = 0 and P (i i) = 1 j i P (j i) The overall ECM is ECM = p 1 ECM(1) + p 2 ECM(2) + + p g ECM(g) g g = P (j i)c(j i) p i i=1 j=1,j i It turns out that we have the following result Result 115 The classification regions that minimize the ECM are defined by allocating x to that population π k for which g p i f i (x)c(k i) i=1,i k 6
7 is smallest If a tie occurs, x can be assigned to any of the tied populations If the misclassification costs are the same, then the classification rule is: Allocate x to π k if p k f k (x) > p i f i (x), for all i k or, equivalently, Allocate x to π k if ln(p k f k (x)) > ln(p i f i (x)), for all i k This is equivalent to the one that maximizes the posterior probability P (π k x), which is P (π k x) = p kf k (x) g i=1 p i f i (x) Normal populations: Assume that f i (x) N p (µ i, Σ i ) and the costs of misclassification are the same Then, we have ln(p i f i (x)) = ln(p i ) p 2 ln(2π) 1 2 ln( Σ i ) 1 2 (x µ i) Σ 1 i (x µ i ) This leads to the definition of quadratic discrimination score as d Q i (x) = 1 2 ln( Σ i ) 1 2 (x µ i) Σ 1 i (x µ i ) + ln(p i ) for i = 1,, g In practice, replace the theoretical values by their sample estimates, we have the estimated minimum total probability misclassification rule for several normal populations as Allocate x to π i if ˆd Q i (x) = max{ ˆd Q 1 (x),, ˆd Q g (x)}, where ˆd Q j (x) is the sample estimate of d Q j (x) When Σ i are equal, then the function d Q i (x) reduces to the linear discriminant score defined as d i (x) = µ Σ 1 x 1 2 µ iσ 1 µ i + ln(p i ) for i = 1,, g Example: See Examples 1111 and 1112 Logistic regression and classification: Use salmon data to demonstrate the analysis 7
8 Bayesian Classification And Regression Tree (Bayesian CART) Some references Classification and Regression Trees by Leo Breiman, Jerome H Friedman, Richard A Olshen, and Charles J Stone (1984) Wadsworth Statistics/Probability Series Wadsworth International Group Bayesian cart model search by Hugh A Chipman, Edward I George and Robert E McCulloch (1998) Journal of the American Statistical Association, 93(443) A recent lecture note by Teresa Jacobson, Department of Applied Mathematics and Statistics, UC Santa Cruz March 11, 2010 (Available on Web) 8
6-1. Canonical Correlation Analysis
6-1. Canonical Correlation Analysis Canonical Correlatin analysis focuses on the correlation between a linear combination of the variable in one set and a linear combination of the variables in another
More informationDISCRIMINANT ANALYSIS. 1. Introduction
DISCRIMINANT ANALYSIS. Introduction Discrimination and classification are concerned with separating objects from different populations into different groups and with allocating new observations to one
More informationLecture 8: Classification
1/26 Lecture 8: Classification Måns Eriksson Department of Mathematics, Uppsala University eriksson@math.uu.se Multivariate Methods 19/5 2010 Classification: introductory examples Goal: Classify an observation
More informationDiscrimination: finding the features that separate known groups in a multivariate sample.
Discrimination and Classification Goals: Discrimination: finding the features that separate known groups in a multivariate sample. Classification: developing a rule to allocate a new object into one of
More informationClassification: Linear Discriminant Analysis
Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based
More informationwhich is the chance that, given the observation/subject actually comes from π 2, it is misclassified as from π 1. Analogously,
47 Chapter 11 Discrimination and Classification Suppose we have a number of multivariate observations coming from two populations, just as in the standard two-sample problem Very often we wish to, by taking
More information12 Discriminant Analysis
12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into
More informationRobustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples
DOI 10.1186/s40064-016-1718-3 RESEARCH Open Access Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples Atinuke Adebanji 1,2, Michael Asamoah Boaheng
More informationSF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko.
SF2935: MODERN METHODS OF STATISTICAL LEARNING LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS Tatjana Pavlenko 5 November 2015 SUPERVISED LEARNING (REP.) Starting point: we have an outcome
More informationDiscriminant analysis and supervised classification
Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical
More informationEEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1
EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle
More informationTHE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2014, Mr. Ruey S. Tsay. Solutions to Final Exam
THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2014, Mr. Ruey S. Tsay Solutions to Final Exam 1. City crime: The distance matrix is 694 915 1073 528 716 881 972 464
More informationBayesian Decision Theory
Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationIntroduction to Machine Learning Spring 2018 Note 18
CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes
More informationTHE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam
THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay Solutions to Final Exam 1. (13 pts) Consider the monthly log returns, in percentages, of five
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationApplied Multivariate and Longitudinal Data Analysis
Applied Multivariate and Longitudinal Data Analysis Discriminant analysis and classification Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 Consider the examples: An online banking service
More informationContents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)
Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture
More informationClassification Methods II: Linear and Quadratic Discrimminant Analysis
Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear
More informationECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam
ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The
More informationBayes Rule for Minimizing Risk
Bayes Rule for Minimizing Risk Dennis Lee April 1, 014 Introduction In class we discussed Bayes rule for minimizing the probability of error. Our goal is to generalize this rule to minimize risk instead
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationMachine Learning 2017
Machine Learning 2017 Volker Roth Department of Mathematics & Computer Science University of Basel 21st March 2017 Volker Roth (University of Basel) Machine Learning 2017 21st March 2017 1 / 41 Section
More informationMachine Learning, Midterm Exam: Spring 2009 SOLUTION
10-601 Machine Learning, Midterm Exam: Spring 2009 SOLUTION March 4, 2009 Please put your name at the top of the table below. If you need more room to work out your answer to a question, use the back of
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationLecture 4 Discriminant Analysis, k-nearest Neighbors
Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationBayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory
Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology
More informationLINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,
More informationCLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS
CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS EECS 833, March 006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@gs.u.edu 864-093 Overheads and resources available at http://people.u.edu/~gbohling/eecs833
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationBoosting. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL , 10.7, 10.13
Boosting Ryan Tibshirani Data Mining: 36-462/36-662 April 25 2013 Optional reading: ISL 8.2, ESL 10.1 10.4, 10.7, 10.13 1 Reminder: classification trees Suppose that we are given training data (x i, y
More informationMotivating the Covariance Matrix
Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role
More informationClassification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization
Classification Volker Blobel University of Hamburg March 2005 Given objects (e.g. particle tracks), which have certain features (e.g. momentum p, specific energy loss de/ dx) and which belong to one of
More informationSGN (4 cr) Chapter 5
SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006
More informationDiscriminant Analysis and Statistical Pattern Recognition
Discriminant Analysis and Statistical Pattern Recognition GEOFFREY J. McLACHLAN Department of Mathematics The University of Queensland St. Lucia, Queensland, Australia A Wiley-Interscience Publication
More informationMachine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber
Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to
More informationMultivariate statistical methods and data mining in particle physics
Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general
More informationPrincipal component analysis
Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance
More informationday month year documentname/initials 1
ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi
More informationClassification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative
More informationStatistical Consulting Topics Classification and Regression Trees (CART)
Statistical Consulting Topics Classification and Regression Trees (CART) Suppose the main goal in a data analysis is the prediction of a categorical variable outcome. Such as in the examples below. Given
More informationBINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES
BINARY TREE-STRUCTURED PARTITION AND CLASSIFICATION SCHEMES DAVID MCDIARMID Abstract Binary tree-structured partition and classification schemes are a class of nonparametric tree-based approaches to classification
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
Memorial University of Newfoundland Pattern Recognition Lecture 6 May 18, 2006 http://www.engr.mun.ca/~charlesr Office Hours: Tuesdays & Thursdays 8:30-9:30 PM EN-3026 Review Distance-based Classification
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 3
CS434a/541a: attern Recognition rof. Olga Veksler Lecture 3 1 Announcements Link to error data in the book Reading assignment Assignment 1 handed out, due Oct. 4 lease send me an email with your name and
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationLinear Classification: Probabilistic Generative Models
Linear Classification: Probabilistic Generative Models Sargur N. University at Buffalo, State University of New York USA 1 Linear Classification using Probabilistic Generative Models Topics 1. Overview
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationMULTIVARIATE HOMEWORK #5
MULTIVARIATE HOMEWORK #5 Fisher s dataset on differentiating species of Iris based on measurements on four morphological characters (i.e. sepal length, sepal width, petal length, and petal width) was subjected
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationStatistical Classification. Minsoo Kim Pomona College Advisor: Jo Hardin
Statistical Classification Minsoo Kim Pomona College Advisor: Jo Hardin April 2, 2010 2 Contents 1 Introduction 5 2 Basic Discriminants 7 2.1 Linear Discriminant Analysis for Two Populations...................
More informationLecture 5: Classification
Lecture 5: Classification Advanced Applied Multivariate Analysis STAT 2221, Spring 2015 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department of Mathematical Sciences Binghamton
More informationFast and Robust Discriminant Analysis
Fast and Robust Discriminant Analysis Mia Hubert a,1, Katrien Van Driessen b a Department of Mathematics, Katholieke Universiteit Leuven, W. De Croylaan 54, B-3001 Leuven. b UFSIA-RUCA Faculty of Applied
More informationLinear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.
Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationCS 590D Lecture Notes
CS 590D Lecture Notes David Wemhoener December 017 1 Introduction Figure 1: Various Separators We previously looked at the perceptron learning algorithm, which gives us a separator for linearly separable
More informationLinear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining
Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes November 9, 2012 1 / 1 Nearest centroid rule Suppose we break down our data matrix as by the labels yielding (X
More informationMachine Learning Lecture 2
Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen
More informationPart I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis
Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More information10-810: Advanced Algorithms and Models for Computational Biology. Optimal leaf ordering and classification
10-810: Advanced Algorithms and Models for Computational Biology Optimal leaf ordering and classification Hierarchical clustering As we mentioned, its one of the most popular methods for clustering gene
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationc 4, < y 2, 1 0, otherwise,
Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,
More informationDecision Tree Learning
Decision Tree Learning Goals for the lecture you should understand the following concepts the decision tree representation the standard top-down approach to learning a tree Occam s razor entropy and information
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationThe exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.
CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationHow to evaluate credit scorecards - and why using the Gini coefficient has cost you money
How to evaluate credit scorecards - and why using the Gini coefficient has cost you money David J. Hand Imperial College London Quantitative Financial Risk Management Centre August 2009 QFRMC - Imperial
More informationApplied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition
Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world
More informationUniversity of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians
University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering
More informationMachine Learning for Signal Processing Bayes Classification
Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification
More informationLearning From Crowds. Presented by: Bei Peng 03/24/15
Learning From Crowds Presented by: Bei Peng 03/24/15 1 Supervised Learning Given labeled training data, learn to generalize well on unseen data Binary classification ( ) Multi-class classification ( y
More information12. Structural Risk Minimization. ECE 830 & CS 761, Spring 2016
12. Structural Risk Minimization ECE 830 & CS 761, Spring 2016 1 / 23 General setup for statistical learning theory We observe training examples {x i, y i } n i=1 x i = features X y i = labels / responses
More informationComparison of Different Classification Methods on Glass Identification for Forensic Research
Journal of Statistical Science and Application, April 2016, Vol. 4, No. 03-04, 65-84 doi: 10.17265/2328-224X/2015.0304.001 D DAV I D PUBLISHING Comparison of Different Classification Methods on Glass Identification
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationLecture 16: Modern Classification (I) - Separating Hyperplanes
Lecture 16: Modern Classification (I) - Separating Hyperplanes Outline 1 2 Separating Hyperplane Binary SVM for Separable Case Bayes Rule for Binary Problems Consider the simplest case: two classes are
More informationBagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7
Bagging Ryan Tibshirani Data Mining: 36-462/36-662 April 23 2013 Optional reading: ISL 8.2, ESL 8.7 1 Reminder: classification trees Our task is to predict the class label y {1,... K} given a feature vector
More informationGenerative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham
Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationMachine Learning Basics Lecture 7: Multiclass Classification. Princeton University COS 495 Instructor: Yingyu Liang
Machine Learning Basics Lecture 7: Multiclass Classification Princeton University COS 495 Instructor: Yingyu Liang Example: image classification indoor Indoor outdoor Example: image classification (multiclass)
More informationNaive Bayes classification
Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental
More informationClassification with Gaussians
Classification with Gaussians Andreas C. Kapourani (Credit: Hiroshi Shimodaira) 09 March 2018 1 Classification In the previous lab sessions, we applied Bayes theorem in order to perform statistical pattern
More informationModel Averaging (Bayesian Learning)
Model Averaging (Bayesian Learning) We want to predict the output Y of a new case that has input X = x given the training examples e: p(y x e) = m M P(Y m x e) = m M P(Y m x e)p(m x e) = m M P(Y m x)p(m
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More information