Pattern recognition systems Lab 10 Linear Classifiers and the Perceptron Algorithm

Similar documents
Pattern recognition systems Laboratory 10 Linear Classifiers and the Perceptron Algorithm

Perceptron. Inner-product scalar Perceptron. XOR problem. Gradient descent Stochastic Approximation to gradient descent 5/10/10

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

10-701/ Machine Learning Mid-term Exam Solution

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

10/2/ , 5.9, Jacob Hays Amit Pillay James DeFelice

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Machine Learning. Ilya Narsky, Caltech

Admin REGULARIZATION. Schedule. Midterm 9/29/16. Assignment 5. Midterm next week, due Friday (more on this in 1 min)

Chapter 7. Support Vector Machine

Lecture 2 Clustering Part II

PAPER : IIT-JAM 2010

Multilayer perceptrons

1 Review of Probability & Statistics

Introduction to Machine Learning DIS10

Optimally Sparse SVMs

LECTURE 17: Linear Discriminant Functions

Optimization Methods: Linear Programming Applications Assignment Problem 1. Module 4 Lecture Notes 3. Assignment Problem

IP Reference guide for integer programming formulations.

Linear Support Vector Machines

Binary classification, Part 1

Intro to Learning Theory

Lecture 11: Decision Trees

6.867 Machine learning, lecture 7 (Jaakkola) 1

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Pattern Classification

Linear Classifiers III

Algebra of Least Squares

Selective Prediction

Naïve Bayes. Naïve Bayes

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

Lecture 2 October 11

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam

Lecture 4. Hw 1 and 2 will be reoped after class for every body. New deadline 4/20 Hw 3 and 4 online (Nima is lead)

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

ME 539, Fall 2008: Learning-Based Control

Week 1, Lecture 2. Neural Network Basics. Announcements: HW 1 Due on 10/8 Data sets for HW 1 are online Project selection 10/11. Suggested reading :

Section 14. Simple linear regression.

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3

Machine Learning Brett Bernstein

Linear Regression Demystified

THE SOLUTION OF NONLINEAR EQUATIONS f( x ) = 0.

REGRESSION (Physics 1210 Notes, Partial Modified Appendix A)

Lecture 15: Learning Theory: Concentration Inequalities

Topics Machine learning: lecture 2. Review: the learning problem. Hypotheses and estimation. Estimation criterion cont d. Estimation criterion

Introduction to Optimization Techniques. How to Solve Equations

Statistical Pattern Recognition

Support vector machine revisited

Solutions for the Exam 9 January 2012

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Regression with quadratic loss

Difference Equation Construction (1) ENGG 1203 Tutorial. Difference Equation Construction (2) Grow, baby, grow (1)

Vector Quantization: a Limiting Case of EM

CS284A: Representations and Algorithms in Molecular Biology

CS407 Neural Computation

REGRESSION WITH QUADRATIC LOSS

6 Integers Modulo n. integer k can be written as k = qn + r, with q,r, 0 r b. So any integer.

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

TEACHER CERTIFICATION STUDY GUIDE

Structuring Element Representation of an Image and Its Applications

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Problem Set 4 Due Oct, 12

Support Vector Machines and Kernel Methods

Machine Learning Theory (CS 6783)

PC5215 Numerical Recipes with Applications - Review Problems

Recursive Algorithm for Generating Partitions of an Integer. 1 Preliminary

Rademacher Complexity

THE KALMAN FILTER RAUL ROJAS

Linear Associator Linear Layer

The multiplicative structure of finite field and a construction of LRC

The Random Walk For Dummies

Differentiable Convex Functions

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Pattern Classification, Ch4 (Part 1)

NCSS Statistical Software. Tolerance Intervals

Disjoint set (Union-Find)

On the Theory of Learning with Privileged Information

} is said to be a Cauchy sequence provided the following condition is true.

Learning Bounds for Support Vector Machines with Learned Kernels

Lecture #20. n ( x p i )1/p = max

Sieve Estimators: Consistency and Rates of Convergence

Mixtures of Gaussians and the EM Algorithm

Morphological Image Processing

September 2012 C1 Note. C1 Notes (Edexcel) Copyright - For AS, A2 notes and IGCSE / GCSE worksheets 1

b i u x i U a i j u x i u x j

Machine Learning for Data Science (CS 4786)

18.657: Mathematics of Machine Learning

FMA901F: Machine Learning Lecture 4: Linear Models for Classification. Cristian Sminchisescu

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

6.003 Homework #3 Solutions

Most text will write ordinary derivatives using either Leibniz notation 2 3. y + 5y= e and y y. xx tt t

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CSCI567 Machine Learning (Fall 2014)

Random Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices

Equivalence Between An Approximate Version Of Brouwer s Fixed Point Theorem And Sperner s Lemma: A Constructive Analysis

ENGI 9420 Engineering Analysis Assignment 3 Solutions

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

6.867 Machine learning

A New Solution Method for the Finite-Horizon Discrete-Time EOQ Problem

Transcription:

Patter recogitio systems Lab 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his lab sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet descet ad stochastic gradiet descet procedure to obtai the weight vector for a two-class classificatio problem. 2. heoretical Backgroud he goal of classificatio is to group items that have similar features values ito a sigle class or group. A liear classifier achieves this goal via a discrimiat fuctio that is the liear combiatio of the features. Defiitios Defie a traiig set as the tuple (X,Y), where X M m (R) ad Y is a vector Y M 1 (D), where D is the set of class labels. X represets the cocateatio the feature vectors for each sample from the traiig set, where each row is a m dimesioal vector represetig a sample. Y is the vector the desired outputs for the classifier. A classifier is a map from the feature space to the class labels: f: R m D. hus a classifier partitios the feature space ito D decisio regios. he surface separatig the classes is called decisio boudary. If we have oly two dimesioal feature vectors the decisio boudaries are lies or curves. I the followig we will discuss biary classifiers. I this case the set of class labels cotais exactly two elemets. We will deote the labels for classes as D={-1,1}. Figure 1. Example of liear classifier o a two-class classificatio problem. Each sample is characterized by two features.

2.1. Geeral form of a liear classifier he simplest classifier is a liear classifier. A liear classifier outputs the class labels based o a liear combiatio of the iput features. Cosiderig x M 1 (R) as a feature vector we ca write the liear decisio fuctio as: g(x) wx w w x w 0 i i 0 i1 Where w is the weight vector w0 is the bias or the threshold weight A schematic view of the liear classifier is give i the ext figure. x1 w1 x2 xm 1 w2 wm w0 f=w 1 x 1 +w 2 x 2 + +w m x m +w 0 Weighted Sum of the iputs (f) hreshold fuctio {c1,c2} Output = Class decisio For coveiece, we will absorb the itercept w0 by augmetig the feature vector x with a additioal costat dimesio (let the bar over a variable deote the augmeted versio of the vector): 1 wx w0 w0 w wx x A two-category liear classifier (or biary classifier) implemets the followig decisio rule: if g( x) 0 decide that sample x belogs to class 1 if g( x) 0 decide that sample x belogs to class 1 or if w x w0 decide that sample x belogs to class 1 if w x w0 decide that sample x belogs to class 1 If g(x) = 0, x ca ordiarily be assiged to either class.

Figure 2. Image for 2D case depictig: liear decisio regios (red ad blue), decisio boudary (dashed lie), weight vector (w) ad bias (w0=d). 2.2. Learig algorithms for liear classifiers We will preset two mai learig algorithms for liear classifiers. I order to perform learig we trasform the task ito a optimizatio problem. For this we defie a loss fuctios L. he loss fuctio applies a pealty for every istace that is classified ito the wrog class. he perceptro algorithm adopts the followig form for the loss fuctio: L(w ) = 1 max(0, y iw x i ) = 1 L i (w ) If a istace is classified correctly, o pealty is applied because the secod term is egative. I the case of a misclassificatio the secod (positive) term will be added to the fuctio value. he objective ow is to fid the weights that miimize the loss fuctio. Gradiet descet ca be employed to fid the global miimum of the loss fuctio. his relies o the idea that a differetiable multivariate fuctio decreases fastest i the opposite directio of the gradiet. he update rule accordig to this observatio is: w k+1 w k η L(w k) where w k is the weight vector at time k, η is a parameter that cotrols the step size ad is called the learig rate, ad L(w ) is the gradiet vector of the loss fuctio at poit w k. he gradiet of the loss fuctio is: L(w ) = 1 L i(w ) 0, if y L i (w ) = { i w x i > 0 y i x i, otherwise I the stadard gradiet descet approach we update the weights after visitig all the traiig examples. his is also called the batch-update learig algorithm. We ca use stochastic gradiet descet istead. his etails updatig the weights after visitig each traiig example resultig i the classical olie perceptro learig algorithm from [1]. I this case the update rule becomes: w k+1 w k η L i (w )

Algorithm: Batch Perceptro iit w, η, Elimit, max_iter for iter=1:max_iter E = 0, L = 0 for : d z i = j=0 w j X ij if z i y i 0 L L y w w ix i E E + 1 L L y i z i edif edfor E E/ L L/ L L / w w if E < E limit break w w η L w edfor Algorithm: Olie Perceptro iit w, η, Elimit, max_iter for iter=1:max_iter E = 0 for : d z i = j=0 w j X ij if z i y i 0 w w + ηx i y i E E + 1 edif edfor E E/ if E < E limit break edfor 2.3. wo-class two feature liear classifier I this laboratory sessio we will fid a liear classifier that discrimiates betwee two sets of poits. he poits i class 1 are colored i red ad the poits i class 2 are colored i blue. Each poit is described by the color (that deotes the class label) ad the two coordiates, x1 ad x2. he augmeted weight vector will have the form w = [w0 w1 w2]. he augmeted feature vector will be x = [1 x1 x2 ]. Figure 3. he decisio boudary obtaied from the perceptro algorithm

3. Practical work 1. Read the poits from the file test0*.bmp ad costruct the traiig set (X,Y). Assig the class label +1 to blue poits ad -1 to red poits. 2. Implemet ad apply the olie perceptro algorithm to fid the liear classifier that divides the poits ito two groups. Suggestio for parameters: η=10-4, w0 = [0.1, 0.1, 0.1], Elimit=10-5, max_iter = 10 5. 3. Draw the fial decisio boudary based o the weight vector w. 4. Implemet the batch perceptro algorithm ad fid suitable parameters values. Show the loss fuctio at each step. It must decrease slowly. 5. Visualize the decisio boudary at itermediate steps, while the learig algorithm is ruig. 6. Chage the startig values for the weight vector w, the learig rate ad termiatig coditios to observe what happes i each case. What does a oscillatig cost fuctio sigal? 4. Refereces [1] Roseblatt, Frak (1957), he Perceptro - a perceivig ad recogizig automato. Report 85-460-1, Corell Aeroautical Laboratory. [2] Richard O. Duda, Peter E. Hart, David G. Stork: Patter Classificatio 2 d ed. [3] Xiaoli Z. Fer, Machie Learig Course, Orego Uiversity - http://web.egr.oregostate.edu/~xfer/classes/cs534/otes/perceptro-4-11.pdf [4] Gradiet Descet - http://e.wikipedia.org/wiki/gradiet_descet [5] Avrim Blum, Machie Learig heory, Caregie Mello Uiversity - https://www.cs.cmu.edu/~avrim/ml10/lect0125.pdf