Machine Learning. Classification. Theory of Classification and Nonparametric Classifier. Representing data: Hypothesis (classifier) Eric Xing

Similar documents
Classification. Representing data: Hypothesis (classifier) Lecture 2, September 14, Reading: Eric CMU,

Bayesian Decision Theory

Lecture 12: Classification

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Pattern Classification

Instance-Based Learning (a.k.a. memory-based learning) Part I: Nearest Neighbor Classification

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Statistical pattern recognition

Classification Bayesian Classifiers

Pattern Recognition. Approximating class densities, Bayesian classifier, Errors in Biometric Systems

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Homework Assignment 3 Due in class, Thursday October 15

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

The big picture. Outline

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

Maximum Likelihood Estimation (MLE)

9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Bayesian classification CISC 5800 Professor Daniel Leeds

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Classification as a Regression Problem

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Naïve Bayes Classifier

Pattern Classification (II) 杜俊

U-Pb Geochronology Practical: Background

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Logistic Classifier CISC 5800 Professor Daniel Leeds

Clustering & Unsupervised Learning

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Clustering & (Ken Kreutz-Delgado) UCSD

Machine learning: Density estimation

10-701/ Machine Learning, Fall 2005 Homework 3

Pattern Classification

Composite Hypotheses testing

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Support Vector Machines

Which Separator? Spring 1

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Probabilistic Classification: Bayes Classifiers. Lecture 6:

A total variation approach

SDMML HT MSc Problem Sheet 4

Kernel Methods and SVMs Extension

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Linear Classification, SVMs and Nearest Neighbors

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

Lecture Notes on Linear Regression

CSE 546 Midterm Exam, Fall 2014(with Solution)

Approximate Inference: Mean Field Methods

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Independent Component Analysis

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Chapter 7 Clustering Analysis (1)

Outline. EM Algorithm and its Applications. K-Means Classifier. K-Means Classifier (Cont.) Introduction of EM K-Means EM EM Applications.

Advanced Differential Expression Analysis

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Web-Mining Agents Probabilistic Information Retrieval

UVA CS / Introduc8on to Machine Learning and Data Mining

Absolute chain codes. Relative chain code. Chain code. Shape representations vs. descriptors. Start

Lecture 3: Dual problems and Kernels

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

Natural Language Processing and Information Retrieval

Generative classification models

Gaussian Mixture Models

Multilayer Perceptron (MLP)

Conjugacy and the Exponential Family

Multiple Choice. Choose the one that best completes the statement or answers the question.

Discriminative classifier: Logistic Regression. CS534-Machine Learning

Evaluation for sets of classes

Supporting Information

Unified Subspace Analysis for Face Recognition

Boostrapaggregating (Bagging)

1 The Mistake Bound Model

CSCI B609: Foundations of Data Science

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Support Vector Machines

Mixture of Gaussians Expectation Maximization (EM) Part 2

Algorithms for factoring

Errors for Linear Systems

Probability Theory (revisited)

CHAPTER 3: BAYESIAN DECISION THEORY

Fundamentals of Neural Networks

Laplacian Regularized Subspace Learning for Interactive Image Reranking

Chapter 3 Describing Data Using Numerical Measures

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

Support Vector Machines

Image classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?

Small Sample Problem in Bayes Plug-in Classifier for Image Recognition

Machine Learning for Language Technology Lecture 8: Decision Trees and k- Nearest Neighbors

The Zürich School of Crystallography

ERROR RATES STABILITY OF THE HOMOSCEDASTIC DISCRIMINANT FUNCTION

Transcription:

Machne Learnng 0-70/5 70/5-78, 78, Fall 008 Theory of Classfcaton and Nonarametrc Classfer Erc ng Lecture, Setember 0, 008 Readng: Cha.,5 CB and handouts Classfcaton Reresentng data: M K Hyothess classfer x n xn x M K n

Outlne What s theoretcally the best classfer Bayesan decson rule for Mnmum Error Nonarametrc Classfer Instance-based learnng Nonarametrc densty estmaton K-nearest-neghbor classfer Otmalty of knn Problem of knn Decson-makng as dvdng a hgh-dmensonal sace Dstrbutons of samles from normal and abnormal machne

3 Unform Probablty Densty Functon Normal Gaussan Probablty Densty Functon The dstrbuton s symmetrc, and s often llustrated as a bell-shaed curve. Two arameters, µ mean and σ standard devaton, determne the locaton and shae of the dstrbuton. The hghest ont on the normal curve s at the mean, whch s also the medan and mode. The mean can be any numercal value: negatve, zero, or ostve. Multvarate Gaussan Contnuous Dstrbutons elsewhere for / 0 b x a a b x σ µ πσ / x e x µ x fx µ x fx Σ Σ Σ µ µ π µ r r r T n ex, ; / / Class-Condtonal Probablty Classfcaton-secfc Dst.: P Y Class ror.e., "weght": PY, ; Σ µ r Y, ; Σ µ r Y

4 The Bayes Rule What we have just dd leads to the followng general exresson: Ths s Bayes Rule P Y Y P Y P The Bayes Decson Rule for Mnmum Error The a osteror robablty of a samle Bayes Test: Lkelhood Rato: Dscrmnant functon: q Y P Y Y P π π h l

Examle of Decson Rules When each class s a normal We can wrte the decson boundary analytcally n some cases homework!! Bayes Error We must calculate the robablty of error the robablty that a samle s assgned to the wrong class Gven a datum, what s the rsk? The Bayes error the exected rsk: 5

More on Bayes Error Bayes error s the lower bound of robablty of classfcaton error Bayes classfer s the theoretcally best classfer that mnmzes robablty of classfcaton error Comutng Bayes error s n general a very comlex roblem. Why? Densty estmaton: Integratng densty functon: Learnng Classfer The decson rule: Learnng strateges Generatve Learnng Parametrc Nonarametrc Dscrmnatve Learnng Parametrc Nonarametrc Instance-based Learnng Store all ast exerence n memory A secal case of nonarametrc classfer 6

Suervsed Learnng K-Nearest-Neghbor Classfer: where the h s reresented by all the data, and by an algorthm Recall: Vector Sace Reresentaton Each document s a vector, one comonent for each term word. Doc Doc Doc 3... Word 3 0 0... Word 0 8... Word 3 0...... 0 3...... 0 0 0... Normalze to unt length. Hgh-dmensonal vector sace: Terms are axes, 0,000+ dmensons, or even 00,000+ Docs are vectors n ths sace 7

Classes n a Vector Sace Sorts Scence Arts Test Document? Sorts Scence Arts 8

K-Nearest Neghbor knn classfer Sorts Scence Arts knn Is Close to Otmal Cover and Hart 967 Asymtotcally, the error rate of -nearest-neghbor classfcaton s less than twce the Bayes rate [error rate of classfer knowng model that generated data] In artcular, asymtotc error rate s 0 f Bayes rate s 0. Decson boundary: 9

Where does knn come from? How to estmaton? Nonarametrc densty estmaton Parzen densty estmate E.g. Kernel densty est.: More generally: Where does knn come from? Nonarametrc densty estmaton Parzen densty estmate knn densty estmate Bayes classfer based on knn densty estmator: Votng knn classfer Pck K and K mlctly by ckng K +K K, V V, N N 0

Votng knn The rocedure Sorts Scence Arts Asymtotc Analyss Condton rsk: r k, NN Test samle NN samle NN Denote the event s class I as I Assumng k When an nfnte number of samles s avalable, NN wll be so close to

Asymtotc Analyss, cont. Recall condtonal Bayes rsk: Thus the asymtotc condton rsk Ths s called the MacLaurn seres exanson It can be shown that Ths s remarkable, consderng that the rocedure does not use any nformaton about the underlyng dstrbutons and only the class of the sngle nearest neghbor determnes the outcome of the decson. In fact Examle:

knn s an nstance of Instance-Based Learnng What makes an Instance-Based Learner? A dstance metrc How many nearby neghbors to look at? A weghtng functon otonal How to relate to the local onts? Eucldean Dstance Metrc L norm: x-x' L norm: max x-x' elementwse Mahalanobs: where Σ s full, and symmetrc Correlaton Angle Hammng dstance, Manhattan dstance D x, x' σ x x ' Or equvalently, T D x, x' x x' Σ x x' Other metrcs: 3

-Nearest Neghbor knn classfer Sorts Scence Arts -Nearest Neghbor knn classfer Sorts Scence Arts 4

3-Nearest Neghbor knn classfer Sorts Scence Arts 5-Nearest Neghbor knn classfer Sorts Scence Arts 5

Nearest-Neghbor Learnng Algorthm Learnng s just storng the reresentatons of the tranng examles n D. Testng nstance x: Comute smlarty between x and all examles n D. Assgn x the category of the most smlar examle n D. Does not exlctly comute a generalzaton or category rototyes. Also called: Case-based learnng Memory-based learnng Lazy learnng Case Study: knn for Web Classfcaton Dataset 0 News Grous 0 classes Download :htt://eole.csal.mt.edu/jrenne/0newsgrous/ 6,8 words, 8,774 documents Class labels descrtons Erc Erc ng ng @ CMU, CMU, 006-008 006-008 6

Exermental Setu Tranng/Test Sets: 50%-50% randomly slt. 0 runs reort average results Evaluaton Crtera: Erc Erc ng ng @ CMU, CMU, 006-008 006-008 Results: Bnary Classes Accuracy alt.athesm vs. com.grahcs rec.autos vs. rec.sort.baseball com.wndows.x vs. rec.motorcycles k 7

Results: Multle Classes Accuracy Random select 5-out-of-0 classes, reeat 0 runs and average All 0 classes k Is knn deal? more later 8

Effect of Parameters Samle sze The more the better Need effcent search algorthm for NN Dmensonalty Curse of dmensonalty Densty How smooth? Metrc The relatve scalngs n the dstance metrc affect regon shaes. Weght Surous or less relevant onts need to be downweghted K Samle sze and dmensonalty From age 36, Fukumaga 9

Neghborhood sze From age 350, Fukumaga Summary Bayes classfer s the best classfer whch mnmzes the robablty of classfcaton error. Nonarametrc and arametrc classfer A nonarametrc classfer does not rely on any assumton concernng the structure of the underlyng densty functon. A classfer becomes the Bayes classfer f the densty estmates converge to the true denstes when an nfnte number of samles are used The resultng error s the Bayes error, the smallest achevable error gven the underlyng dstrbutons. 0