Linear Models for Classification

Similar documents
Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Sparse Kernel Machines - SVM

Linear Models for Classification

Neural Networks - II

Introduction Dual Representations Kernel Design RBF Linear Reg. GP Regression GP Classification Summary. Kernel Methods. Henrik I Christensen

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Learning Linear Detectors

Linear Models for Classification

Slides modified from: PATTERN RECOGNITION CHRISTOPHER M. BISHOP. and: Computer vision: models, learning and inference Simon J.D.

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

April 9, Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá. Linear Classification Models. Fabio A. González Ph.D.

Machine Learning Lecture 5

Linear & nonlinear classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

An Introduction to Statistical and Probabilistic Linear Models

Machine Learning. 7. Logistic and Linear Regression

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Ch 4. Linear Models for Classification

Inf2b Learning and Data

Intelligent Systems Discriminative Learning, Neural Networks

SGN (4 cr) Chapter 5

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Linear Classifiers as Pattern Detectors

CSC 411: Lecture 04: Logistic Regression

Outline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)

Multilayer Neural Networks

Machine Learning Lecture 7

Vote. Vote on timing for night section: Option 1 (what we have now) Option 2. Lecture, 6:10-7:50 25 minute dinner break Tutorial, 8:15-9

Preliminaries. Definition: The Euclidean dot product between two vectors is the expression. i=1

Multi-layer Neural Networks

Learning Methods for Linear Detectors

Overview c 1 What is? 2 Definition Outlines 3 Examples of 4 Related Fields Overview Linear Regression Linear Classification Neural Networks Kernel Met

Linear Classifiers as Pattern Detectors

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Deep Learning for Computer Vision

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

Max Margin-Classifier

Linear Classification

Discriminative Models

Classification: The rest of the story

Introduction to Machine Learning

Machine Learning for Signal Processing Bayes Classification and Regression

Linear & nonlinear classifiers

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Multilayer Neural Networks

Classification and Pattern Recognition

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5

Learning with multiple models. Boosting.

Multilayer Perceptron = FeedForward Neural Network

MLPR: Logistic Regression and Neural Networks

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Advanced statistical methods for data analysis Lecture 2

PATTERN CLASSIFICATION

Midterm: CS 6375 Spring 2015 Solutions

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Second-order methods.

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

Discriminative Models

Reading Group on Deep Learning Session 1

Artificial Neural Networks

The Perceptron. Volker Tresp Summer 2016

Neural Networks and the Back-propagation Algorithm

CSC321 Lecture 4 The Perceptron Algorithm

Pattern Recognition and Machine Learning

Introduction to Logistic Regression and Support Vector Machine

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Multilayer Perceptron

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

The Perceptron. Volker Tresp Summer 2014

y(x n, w) t n 2. (1)

Switch Mechanism Diagnosis using a Pattern Recognition Approach

The Perceptron Algorithm

Machine Learning (CSE 446): Learning as Minimizing Loss; Least Squares

Learning features by contrasting natural images with noise

LEARNING & LINEAR CLASSIFIERS

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Neural Network Training

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Machine Learning: Fisher s Linear Discriminant. Lecture 05

(Kernels +) Support Vector Machines

Overfitting, Bias / Variance Analysis

Classification with Perceptrons. Reading:

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

From perceptrons to word embeddings. Simon Šuster University of Groningen

Classifier Complexity and Support Vector Classifiers

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

Inf2b Learning and Data

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Big Data Analytics. Lucas Rego Drumond

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

Machine Learning Linear Models

Transcription:

Linear Models for Classification Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Linear Classification 1 / 33

Outline 1 Introduction 2 Linear Discriminant Functions 3 LSQ for Classification 4 Fisher s Discriminant Method 5 Perceptrons 6 Summary Henrik I Christensen (RIM@GT) Linear Classification 2 / 33

Introduction Last time: prediction of new functional values Today: linear classification of data Basic pattern recognition Separation of data: buy/sell Segmentation of line data,... Henrik I Christensen (RIM@GT) Linear Classification 3 / 33

Simple Example - Bolts or Needles Henrik I Christensen (RIM@GT) Linear Classification 4 / 33

Classification Given An input vector: X A set of classes: c i C, Mapping m : X C i = 1,..., k Separation of space into decision regions Boundaries termed decision boundaries/surfaces Henrik I Christensen (RIM@GT) Linear Classification 5 / 33

Basis Formulation It is a 1-of-K coding problem Target vector: t = (0,..., 1,..., 0) Consideration of 3 different approaches 1 Optimization of discriminant function 2 Bayesian Formulation: p(c i x) 3 Learning & Decision fusion Henrik I Christensen (RIM@GT) Linear Classification 6 / 33

Code for experimentation There are data sets and sample code available NETLAB: http://www.ncrg.aston.ac.uk/netlab/index.php NAVTOOLBOX: http://www.cas.kth.se/toolbox/ SLAM Dataset: http://kaspar.informatik.uni-freiburg.de/ ~slamevaluation/datasets.php Henrik I Christensen (RIM@GT) Linear Classification 7 / 33

Outline 1 Introduction 2 Linear Discriminant Functions 3 LSQ for Classification 4 Fisher s Discriminant Method 5 Perceptrons 6 Summary Henrik I Christensen (RIM@GT) Linear Classification 8 / 33

Discriminant Functions Objective: input vector x assigned to a class c i Simple formulation: y(x) = w T x + w 0 w is termed a weight vector w 0 is termed a bias Two class example: c 1 if y(x) 0 otherwise c 2 Henrik I Christensen (RIM@GT) Linear Classification 9 / 33

Basic Design Two points on decision surface x a and x b y(x a ) = y(x b ) = 0 w T (x a x b ) = 0 w perpendicular to decision surface w T x w = w 0 w Define: w = (w 0, w) and x = (1, x) so that: y(x) = w T x Henrik I Christensen (RIM@GT) Linear Classification 10 / 33

Linear discriminant function x 2 y > 0 y = 0 y < 0 R 1 R 2 w x y(x) w x x 1 w 0 w Henrik I Christensen (RIM@GT) Linear Classification 11 / 33

Multi Class Discrimination Generation of multiple decision functions y k (x) = w T k x + w k0 Decision strategy j = arg max i 1..k y i(x) Henrik I Christensen (RIM@GT) Linear Classification 12 / 33

Multi-Class Decision Regions R j R i R k x A x x B Henrik I Christensen (RIM@GT) Linear Classification 13 / 33

Example - Bolts or Needles Henrik I Christensen (RIM@GT) Linear Classification 14 / 33

Minimum distance classification Suppose we have computed the mean value for each of the classes m needle = [0.86, 2.34] T and m bolt = [5.74, 5, 85] T We can then compute the minimum distance d j (x) = x m j argmin i d i (x) is the best fit Decision functions can be derived Henrik I Christensen (RIM@GT) Linear Classification 15 / 33

Bolts / Needle Decision Functions Needle d needle (x) = 0.86x 1 + 2.34x 2 3.10 Bolt d bolt (x) = 5.74x 1 + 5.85x 2 33.59 Decision boundary d i (x) d j (x) = 0 d needle/bolt (x) = 4.88x 1 3.51x 2 + 30.49 Henrik I Christensen (RIM@GT) Linear Classification 16 / 33

Example decision surface Henrik I Christensen (RIM@GT) Linear Classification 17 / 33

Outline 1 Introduction 2 Linear Discriminant Functions 3 LSQ for Classification 4 Fisher s Discriminant Method 5 Perceptrons 6 Summary Henrik I Christensen (RIM@GT) Linear Classification 18 / 33

Least Squares for Classification Just like we could do LSQ for regression we can perform an approximation to the classification vector C Consider again y k (x) = w T k x + w k0 Rewrite to y(x) = W T x Assuming we have a target vector T Henrik I Christensen (RIM@GT) Linear Classification 19 / 33

Least Squares for Classification The error is then: E D ( W) = 1 {( X 2 Tr W T) T ( X W } T) The solution is then 1 W = ( XT X) XT T Henrik I Christensen (RIM@GT) Linear Classification 20 / 33

LSQ and Outliers 4 4 2 2 0 0 2 2 4 4 6 6 8 4 2 0 2 4 6 8 8 4 2 0 2 4 6 8 Henrik I Christensen (RIM@GT) Linear Classification 21 / 33

Outline 1 Introduction 2 Linear Discriminant Functions 3 LSQ for Classification 4 Fisher s Discriminant Method 5 Perceptrons 6 Summary Henrik I Christensen (RIM@GT) Linear Classification 22 / 33

Fisher s linear discriminant Selection of a decision function that maximizes distance between classes Assume for a start Compute m 1 and m 2 y = W T x m 1 = 1 N 1 i C 1 x i m 2 = 1 N 2 j C 2 x j Distance: m 2 m 1 = w T (m 2 m 1 ) where m i = wm i Henrik I Christensen (RIM@GT) Linear Classification 23 / 33

The suboptimal solution 4 2 0 2 2 2 6 Henrik I Christensen (RIM@GT) Linear Classification 24 / 33

The Fisher criterion Consider the expression J(w) = wt S B w w T S W w where S B is the between class covariance and S W is the within class covariance, i.e. S B = (m 1 m 2 )(m 1 m 2 ) T and S W = i=c 1 (x i m 1 )(x i m 1 ) T + i=c 2 (x i m 2 )(x i m 2 ) T Optimized when (w T S B w)s w w = (w T S W w)s B w or w S 1 w (m 2 m 1 ) Henrik I Christensen (RIM@GT) Linear Classification 25 / 33

The Fisher result 4 2 0 2 2 2 6 Henrik I Christensen (RIM@GT) Linear Classification 26 / 33

Generalization to N 2 Define a stacked weight factor y = W T x The within class covariance generalizes to S w = K k=1 The between class covariance is K S B = N k (m k m)(m k m) T k=1 It can be shown that J(w) is optimized by the eigenvectors to the equation S = S 1 W S B Henrik I Christensen (RIM@GT) Linear Classification 27 / 33 S k

Outline 1 Introduction 2 Linear Discriminant Functions 3 LSQ for Classification 4 Fisher s Discriminant Method 5 Perceptrons 6 Summary Henrik I Christensen (RIM@GT) Linear Classification 28 / 33

Perceptron Algorithm Developed by Rosenblatt (1962) Formed an important basis for neural networks Use a non-linear transformation φ(x) Construct a decision function y(x) = f ( ) w T φ(x) where f (a) = { +1, a 0 1, a < 0 Henrik I Christensen (RIM@GT) Linear Classification 29 / 33

The perceptron criterion Normally we want w t φ(x n ) > 0 Given the target vector definition E p (w) = w T φ n t n n inm Where M represents all the mis-classified samples We can make this a gradient descent as seen in last lecture w (τ+1) = w (τ) η E P (w) = w (τ) + ηφ n t n Henrik I Christensen (RIM@GT) Linear Classification 30 / 33

Perceptron learning example 1 1 0.5 0.5 0 0 0.5 0.5 1 1 0.5 0 0.5 1 1 1 0.5 0 0.5 1 1 1 0.5 0.5 0 0 0.5 0.5 1 1 0.5 0 0.5 1 1 1 0.5 0 0.5 1 Henrik I Christensen (RIM@GT) Linear Classification 31 / 33

Outline 1 Introduction 2 Linear Discriminant Functions 3 LSQ for Classification 4 Fisher s Discriminant Method 5 Perceptrons 6 Summary Henrik I Christensen (RIM@GT) Linear Classification 32 / 33

Summary Basics for discrimination / classification Obviously not all problems are linear Optimization of the distance/overlap between classes Minimizing the probability of error classification Basic formulation as an optimization problem How to optimize between cluster distance? Covariance Weighted Basic recursive formulation Could we make it more robust? Henrik I Christensen (RIM@GT) Linear Classification 33 / 33