Discrimination: finding the features that separate known groups in a multivariate sample.

Size: px
Start display at page:

Download "Discrimination: finding the features that separate known groups in a multivariate sample."

Transcription

1 Discrimination and Classification Goals: Discrimination: finding the features that separate known groups in a multivariate sample. Classification: developing a rule to allocate a new object into one of a number of known groups. A classification rule is based on the features that separate the groups, so the goals overlap. 1

2 Simplest Case: Two Groups Both discrimination and classification depend on multivariate observation X. Basic setup: Consider each group a population: π 1 and π 2. The density of X in π g is f g (x), g = 1, 2. 2

3 Alternatively: A single population with a group indicator G. The density of X given G = g is f g (x). P(G = 1) = p 1, P(G = 2) = p 2 = 1 p 1. p 1 and p 2 are the prior probabilities of the groups. 3

4 Example: ownership of a riding mower. Lotsize Owner Non owner Income 4

5 Classification problem is to predict whether a homeowner owns a riding mower, given the Income and Lot Size. Rule must give an answer for any x, so it defines two regions R 1 and R 2. Classification errors: P (2 1) = P(X R 2 G = 1) = P (1 2) = P(X R 1 G = 2) = R 2 f 1 (x)dx, R 1 f 2 (x)dx. 5

6 Unconditionally: P (correctly classified as π 1 ) = P (1 1)p 1 P (missclassified as π 1 ) = P (1 2)p 2 P (correctly classified as π 2 ) = P (2 2)p 2 P (missclassified as π 2 ) = P (2 1)p 1 The two types of classification error may have different costs: Classify as: π 1 π 2 True population: π 1 0 c(2 1) π 2 c(1 2) 0 6

7 Expected Cost of Misclassification: ECM = c(2 1)P (2 1)p 1 + c(1 2)P (1 2)p 2. The classification region that minimizes ECM is ( ) ( ) f R 1 : 1 (x) c(1 2) p2 f 2 (x). c(2 1) p 1 Note: these regions depend on the ratios of: the densities; the costs of misclassification; the prior probabilities. 7

8 Special cases: Total Probability of Misclassification: TPM = P(misclassify either way) = P (2 1)p 1 + P (1 2)p 2 is ECM with c(1 2) = c(2 1) = 1. Posterior mode: allocate to the population with the higher posterior probability; by Bayes s rule: P (π 1 x 0 ) = P (π 2 x 0 ) = same rule as minimizing TPM. p 1 f 1 (x 0 ) p 1 f 1 (x 0 ) + p 2 f 2 (x 0 ) p 2 f 2 (x 0 ) p 1 f 1 (x 0 ) + p 2 f 2 (x 0 ) 8

9 Multivariate Normal Populations If f 1 (x) and f 2 (x) are multivariate normal with the same Σ and means µ 1 and µ 2, then the minimum-ecm allocation rule is based on a linear function of x. Basic result is [ ] [ f1 (x) ln = (µ f 2 (x) 1 µ 2 ) Σ 1 x 1 ] 2 (µ 1 + µ 2) Optimal region is R 1 : (µ 1 µ 2 ) Σ [x 1 1 ] 2 (µ 1 + µ 2) ln [( c(1 2) c(2 1) ) ( )] p2 p 1. 9

10 Typically, Σ, µ 1 and µ 2 are unknown, but replaced with sample estimates, based on training data. Example: Hemophilia A carriers. Women subjects in two groups: Non-carriers and Obligatory carriers. Variables: X 1 = log 10 (AHF acitivity) X 2 = log 10 (AHF-like antigen) SAS proc discrim program and output. 10

11 Evaluating Classification Functions The Optimum Error Rate is the minimum TPM. For multivariate normal populations, OER = 1 Φ ( ) where 2 is the Generalized Squared Distance between groups: 2 = (µ 1 µ 2 ) Σ 1 (µ 1 µ 2 ). 2, For the hemophilia example, SAS reports 2 = , whence OER =

12 The OER calculation is the estimated error rate for allocation using the true parameters good in large samples, but dubious in small samples. Sample-based evaluation: APparent Error Rate = fraction of training data that are misclassified; reported by SAS as Resubstitution Summary ; APER = for the example; under-estimates error rate, because the data used to develop the rule are then used to evaluate it; Lachenbruch s holdout procedure: omit one observation at a time, recalculate the classification rule, and classify the omitted observation; reported by SAS as Crossvalidation Summary, for the example. 12

13 Quadratic Classification If f 1 (x) and f 2 (x) are multivariate normal with different covariances Σ 1 and Σ 2 and means µ 1 and µ 2, then the minimum- ECM allocation rule is based on a quadratic function of x. Classification regions are now bounded by conic sections (generally elliptical or hyperbolic). No simple explicit representation. 13

14 In proc discrim, use pool = no on the proc statement. For the AHF example, the resubstition Total error rate is and the cross-validated Total error rate is The cross-validated Total error rate is higher than for linear classification, suggesting that the estimation cost of the additional parameters in the two Σ s outweighs their value in improving the fit. 14

15 Nonparametric Classification One alternative to the multivariate normal assumption is classification based on nonparametric density estimates of f g (x). In SAS, use method = npar on the proc discrim statement. You have the choice of nearest neighbor and kernel estimation: for nearest neighbor estimation, use k =... to specify the number of neighbors; for kernel estimation, use kernel =... to name a kernel, and r =... to specify the radius (bandwidth). 15

which is the chance that, given the observation/subject actually comes from π 2, it is misclassified as from π 1. Analogously,

which is the chance that, given the observation/subject actually comes from π 2, it is misclassified as from π 1. Analogously, 47 Chapter 11 Discrimination and Classification Suppose we have a number of multivariate observations coming from two populations, just as in the standard two-sample problem Very often we wish to, by taking

More information

Lecture 8: Classification

Lecture 8: Classification 1/26 Lecture 8: Classification Måns Eriksson Department of Mathematics, Uppsala University eriksson@math.uu.se Multivariate Methods 19/5 2010 Classification: introductory examples Goal: Classify an observation

More information

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr. Ruey S. Tsay THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2012, Mr Ruey S Tsay Lecture 9: Discrimination and Classification 1 Basic concept Discrimination is concerned with separating

More information

6-1. Canonical Correlation Analysis

6-1. Canonical Correlation Analysis 6-1. Canonical Correlation Analysis Canonical Correlatin analysis focuses on the correlation between a linear combination of the variable in one set and a linear combination of the variables in another

More information

DISCRIMINANT ANALYSIS. 1. Introduction

DISCRIMINANT ANALYSIS. 1. Introduction DISCRIMINANT ANALYSIS. Introduction Discrimination and classification are concerned with separating objects from different populations into different groups and with allocating new observations to one

More information

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko.

SF2935: MODERN METHODS OF STATISTICAL LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS LEARNING. Tatjana Pavlenko. SF2935: MODERN METHODS OF STATISTICAL LEARNING LECTURE 3 SUPERVISED CLASSIFICATION, LINEAR DISCRIMINANT ANALYSIS Tatjana Pavlenko 5 November 2015 SUPERVISED LEARNING (REP.) Starting point: we have an outcome

More information

Discriminant Analysis and Statistical Pattern Recognition

Discriminant Analysis and Statistical Pattern Recognition Discriminant Analysis and Statistical Pattern Recognition GEOFFREY J. McLACHLAN Department of Mathematics The University of Queensland St. Lucia, Queensland, Australia A Wiley-Interscience Publication

More information

12 Discriminant Analysis

12 Discriminant Analysis 12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into

More information

Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance. Probal Chaudhuri and Subhajit Dutta

Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance. Probal Chaudhuri and Subhajit Dutta Semiparametric Discriminant Analysis of Mixture Populations Using Mahalanobis Distance Probal Chaudhuri and Subhajit Dutta Indian Statistical Institute, Kolkata. Workshop on Classification and Regression

More information

Efficient Approach to Pattern Recognition Based on Minimization of Misclassification Probability

Efficient Approach to Pattern Recognition Based on Minimization of Misclassification Probability American Journal of Theoretical and Applied Statistics 05; 5(-): 7- Published online November 9, 05 (http://www.sciencepublishinggroup.com/j/ajtas) doi: 0.648/j.ajtas.s.060500. ISSN: 36-8999 (Print); ISSN:

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Discriminant analysis and classification Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 Consider the examples: An online banking service

More information

Chapter 25 The DISCRIM Procedure. Chapter Table of Contents

Chapter 25 The DISCRIM Procedure. Chapter Table of Contents Chapter 25 Chapter Table of Contents OVERVIEW...1013 GETTING STARTED...1014 SYNTAX...1019 PROCDISCRIMStatement...1019 BYStatement...1027 CLASSStatement...1028 FREQStatement...1028 IDStatement...1028 PRIORSStatement...1028

More information

Discriminant Analysis (DA)

Discriminant Analysis (DA) Discriminant Analysis (DA) Involves two main goals: 1) Separation/discrimination: Descriptive Discriminant Analysis (DDA) 2) Classification/allocation: Predictive Discriminant Analysis (PDA) In DDA Classification

More information

The DISCRIM Procedure

The DISCRIM Procedure SAS/STAT 9.2 User s Guide (Book Excerpt) SAS Documentation This document is an individual chapter from SAS/STAT 9.2 User s Guide. The correct bibliographic citation for the complete manual is as follows:

More information

MULTIVARIATE HOMEWORK #5

MULTIVARIATE HOMEWORK #5 MULTIVARIATE HOMEWORK #5 Fisher s dataset on differentiating species of Iris based on measurements on four morphological characters (i.e. sepal length, sepal width, petal length, and petal width) was subjected

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

Is cross-validation valid for small-sample microarray classification?

Is cross-validation valid for small-sample microarray classification? Is cross-validation valid for small-sample microarray classification? Braga-Neto et al. Bioinformatics 2004 Topics in Bioinformatics Presented by Lei Xu October 26, 2004 1 Review 1) Statistical framework

More information

Classification: Linear Discriminant Analysis

Classification: Linear Discriminant Analysis Classification: Linear Discriminant Analysis Discriminant analysis uses sample information about individuals that are known to belong to one of several populations for the purposes of classification. Based

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

STA 414/2104, Spring 2014, Practice Problem Set #1

STA 414/2104, Spring 2014, Practice Problem Set #1 STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,

More information

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham

Generative classifiers: The Gaussian classifier. Ata Kaban School of Computer Science University of Birmingham Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham Outline We have already seen how Bayes rule can be turned into a classifier In all our examples

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

What does Bayes theorem give us? Lets revisit the ball in the box example.

What does Bayes theorem give us? Lets revisit the ball in the box example. ECE 6430 Pattern Recognition and Analysis Fall 2011 Lecture Notes - 2 What does Bayes theorem give us? Lets revisit the ball in the box example. Figure 1: Boxes with colored balls Last class we answered

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

Classification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization

Classification. 1. Strategies for classification 2. Minimizing the probability for misclassification 3. Risk minimization Classification Volker Blobel University of Hamburg March 2005 Given objects (e.g. particle tracks), which have certain features (e.g. momentum p, specific energy loss de/ dx) and which belong to one of

More information

Algorithm-Independent Learning Issues

Algorithm-Independent Learning Issues Algorithm-Independent Learning Issues Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2007 c 2007, Selim Aksoy Introduction We have seen many learning

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples

Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples DOI 10.1186/s40064-016-1718-3 RESEARCH Open Access Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples Atinuke Adebanji 1,2, Michael Asamoah Boaheng

More information

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the

More information

GENOMIC SIGNAL PROCESSING. Lecture 2. Classification of disease subtype based on microarray data

GENOMIC SIGNAL PROCESSING. Lecture 2. Classification of disease subtype based on microarray data GENOMIC SIGNAL PROCESSING Lecture 2 Classification of disease subtype based on microarray data 1. Analysis of microarray data (see last 15 slides of Lecture 1) 2. Classification methods for microarray

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining

Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes. November 9, Statistics 202: Data Mining Linear Discriminant Analysis Based in part on slides from textbook, slides of Susan Holmes November 9, 2012 1 / 1 Nearest centroid rule Suppose we break down our data matrix as by the labels yielding (X

More information

Bayes Decision Theory

Bayes Decision Theory Bayes Decision Theory Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 / 16

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Fast and Robust Discriminant Analysis

Fast and Robust Discriminant Analysis Fast and Robust Discriminant Analysis Mia Hubert a,1, Katrien Van Driessen b a Department of Mathematics, Katholieke Universiteit Leuven, W. De Croylaan 54, B-3001 Leuven. b UFSIA-RUCA Faculty of Applied

More information

Small sample size generalization

Small sample size generalization 9th Scandinavian Conference on Image Analysis, June 6-9, 1995, Uppsala, Sweden, Preprint Small sample size generalization Robert P.W. Duin Pattern Recognition Group, Faculty of Applied Physics Delft University

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI Bayes Classifiers CAP5610 Machine Learning Instructor: Guo-Jun QI Recap: Joint distributions Joint distribution over Input vector X = (X 1, X 2 ) X 1 =B or B (drinking beer or not) X 2 = H or H (headache

More information

Generative modeling of data. Instructor: Taylor Berg-Kirkpatrick Slides: Sanjoy Dasgupta

Generative modeling of data. Instructor: Taylor Berg-Kirkpatrick Slides: Sanjoy Dasgupta Generative modeling of data Instructor: Taylor BergKirkpatrick Slides: Sanjoy Dasgupta Parametric versus nonparametric classifiers Nearest neighbor classification: size of classifier size of data set Nonparametric:

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Machine Learning for Signal Processing Bayes Classification

Machine Learning for Signal Processing Bayes Classification Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification

More information

A Robust Approach to Regularized Discriminant Analysis

A Robust Approach to Regularized Discriminant Analysis A Robust Approach to Regularized Discriminant Analysis Moritz Gschwandtner Department of Statistics and Probability Theory Vienna University of Technology, Austria Österreichische Statistiktage, Graz,

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

The Bayes classifier

The Bayes classifier The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal

More information

Introduction to Machine Learning Spring 2018 Note 18

Introduction to Machine Learning Spring 2018 Note 18 CS 189 Introduction to Machine Learning Spring 2018 Note 18 1 Gaussian Discriminant Analysis Recall the idea of generative models: we classify an arbitrary datapoint x with the class label that maximizes

More information

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Introduction to Classification Algorithms Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some

More information

Discriminant Analysis and Statistical Pattern Recognition

Discriminant Analysis and Statistical Pattern Recognition Discriminant Analysis and Statistical Pattern Recognition GEOFFRY J. McLACHLAN The University of Queensland @EEC*ENCE A JOHN WILEY & SONS, INC., PUBLICATION This Page Intentionally Left Blank Discriminant

More information

Classification via kernel regression based on univariate product density estimators

Classification via kernel regression based on univariate product density estimators Classification via kernel regression based on univariate product density estimators Bezza Hafidi 1, Abdelkarim Merbouha 2, and Abdallah Mkhadri 1 1 Department of Mathematics, Cadi Ayyad University, BP

More information

Stat 216 Final Solutions

Stat 216 Final Solutions Stat 16 Final Solutions Name: 5/3/05 Problem 1. (5 pts) In a study of size and shape relationships for painted turtles, Jolicoeur and Mosimann measured carapace length, width, and height. Their data suggest

More information

CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS

CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS EECS 833, March 006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@gs.u.edu 864-093 Overheads and resources available at http://people.u.edu/~gbohling/eecs833

More information

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries Overview Gaussians estimated from training data Guido Sanguinetti Informatics B Learning and Data Lecture 1 9 March 1 Today s lecture Posterior probabilities, decision regions and minimising the probability

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012

Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Supervised Learning: Linear Methods (1/2) Applied Multivariate Statistics Spring 2012 Overview Review: Conditional Probability LDA / QDA: Theory Fisher s Discriminant Analysis LDA: Example Quality control:

More information

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Topics Discriminant functions Logistic regression Perceptron Generative models Generative vs. discriminative

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

How to evaluate credit scorecards - and why using the Gini coefficient has cost you money

How to evaluate credit scorecards - and why using the Gini coefficient has cost you money How to evaluate credit scorecards - and why using the Gini coefficient has cost you money David J. Hand Imperial College London Quantitative Financial Risk Management Centre August 2009 QFRMC - Imperial

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification

More information

Supervised Learning: Non-parametric Estimation

Supervised Learning: Non-parametric Estimation Supervised Learning: Non-parametric Estimation Edmondo Trentin March 18, 2018 Non-parametric Estimates No assumptions are made on the form of the pdfs 1. There are 3 major instances of non-parametric estimates:

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Pattern recognition. "To understand is to perceive patterns" Sir Isaiah Berlin, Russian philosopher

Pattern recognition. To understand is to perceive patterns Sir Isaiah Berlin, Russian philosopher Pattern recognition "To understand is to perceive patterns" Sir Isaiah Berlin, Russian philosopher The more relevant patterns at your disposal, the better your decisions will be. This is hopeful news to

More information

Advanced statistical methods for data analysis Lecture 1

Advanced statistical methods for data analysis Lecture 1 Advanced statistical methods for data analysis Lecture 1 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Multivariate signal approaches for object detection using microwave sensing

Multivariate signal approaches for object detection using microwave sensing Multivariate signal approaches for object detection using microwave sensing Master s Thesis in Engineering Mathematics and Computational Science Claes Andersson Department of Signals & Systems Signal Processing

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 305 Part VII

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data. Ilya Narsky, Caltech

StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data. Ilya Narsky, Caltech StatPatternRecognition: A C++ Package for Multivariate Classification of HEP Data Ilya Narsky, Caltech Motivation Introduction advanced classification tools in a convenient C++ package for HEP researchers

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber

Machine Learning. Regression-Based Classification & Gaussian Discriminant Analysis. Manfred Huber Machine Learning Regression-Based Classification & Gaussian Discriminant Analysis Manfred Huber 2015 1 Logistic Regression Linear regression provides a nice representation and an efficient solution to

More information

An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance

An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance Dhaka Univ. J. Sci. 61(1): 81-85, 2013 (January) An Alternative Algorithm for Classification Based on Robust Mahalanobis Distance A. H. Sajib, A. Z. M. Shafiullah 1 and A. H. Sumon Department of Statistics,

More information

Probability Models for Bayesian Recognition

Probability Models for Bayesian Recognition Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIAG / osig Second Semester 06/07 Lesson 9 0 arch 07 Probability odels for Bayesian Recognition Notation... Supervised Learning for Bayesian

More information

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML) Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics 8. Multivariate Analysis Prof. Dr. Klaus Reygers (lectures) Dr. Sebastian Neubert (tutorials) Heidelberg University WS 2017/18 Multi-Variate Classification Consider

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Nearest Neighbor Pattern Classification

Nearest Neighbor Pattern Classification Nearest Neighbor Pattern Classification T. M. Cover and P. E. Hart May 15, 2018 1 The Intro The nearest neighbor algorithm/rule (NN) is the simplest nonparametric decisions procedure, that assigns to unclassified

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning

LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In K-NN we saw an example of a non-linear classifier: the decision boundary

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

KWAME NKRUMAH UNIVERSITY OF SCIENCE AND TECHNOLOGY

KWAME NKRUMAH UNIVERSITY OF SCIENCE AND TECHNOLOGY KWAME NKRUMAH UNIVERSITY OF SCIENCE AND TECHNOLOGY Performance Evaluation of Classification Methods: the case of Equal Mean Discrimination. By Michael Asamoah-Boaheng (Bsc. Statistics) A THESIS SUBMITTED

More information

Cross-validation of N earest-neighbour Discriminant Analysis

Cross-validation of N earest-neighbour Discriminant Analysis r--;"'-'p'-;lo':y;q":;lpaof!i"joq PS56:t;;-'5""V""""::;':::'''-''i;''''':h'''''--:''':-c'-::.'".- - -.. - -.- -.-.-!.1 ---;-- '.' -" ',". ----_. -- Cross-validation of N earest-neighbour Discriminant

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

On optimum choice of k in nearest neighbor classification

On optimum choice of k in nearest neighbor classification Computational Statistics & Data Analysis 50 (2006) 3113 3123 www.elsevier.com/locate/csda On optimum choice of k in nearest neighbor classification Anil K. Ghosh Theoretical Statistics and Mathematics

More information

Course in Data Science

Course in Data Science Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an

More information

SGN (4 cr) Chapter 5

SGN (4 cr) Chapter 5 SGN-41006 (4 cr) Chapter 5 Linear Discriminant Analysis Jussi Tohka & Jari Niemi Department of Signal Processing Tampere University of Technology January 21, 2014 J. Tohka & J. Niemi (TUT-SGN) SGN-41006

More information

ISSN X On misclassification probabilities of linear and quadratic classifiers

ISSN X On misclassification probabilities of linear and quadratic classifiers Afrika Statistika Vol. 111, 016, pages 943 953. DOI: http://dx.doi.org/10.1699/as/016.943.85 Afrika Statistika ISSN 316-090X On misclassification probabilities of linear and quadratic classifiers Olusola

More information

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception LINEAR MODELS FOR CLASSIFICATION Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification,

More information