Patter recogitio systems Lab 10 Liear Classifiers ad the Perceptro Algorithm 1. Objectives his lab sessio presets the perceptro learig algorithm for the liear classifier. We will apply gradiet descet ad stochastic gradiet descet procedure to obtai the weight vector for a two-class classificatio problem. 2. heoretical Backgroud he goal of classificatio is to group items that have similar features values ito a sigle class or group. A liear classifier achieves this goal via a discrimiat fuctio that is the liear combiatio of the features. Defiitios Defie a traiig set as the tuple (X,Y), where X M m (R) ad Y is a vector Y M 1 (D), where D is the set of class labels. X represets the cocateatio the feature vectors for each sample from the traiig set, where each row is a m dimesioal vector represetig a sample. Y is the vector the desired outputs for the classifier. A classifier is a map from the feature space to the class labels: f: R m D. hus a classifier partitios the feature space ito D decisio regios. he surface separatig the classes is called decisio boudary. If we have oly two dimesioal feature vectors the decisio boudaries are lies or curves. I the followig we will discuss biary classifiers. I this case the set of class labels cotais exactly two elemets. We will deote the labels for classes as D={-1,1}. Figure 1. Example of liear classifier o a two-class classificatio problem. Each sample is characterized by two features.
2.1. Geeral form of a liear classifier he simplest classifier is a liear classifier. A liear classifier outputs the class labels based o a liear combiatio of the iput features. Cosiderig x M 1 (R) as a feature vector we ca write the liear decisio fuctio as: g(x) wx w w x w 0 i i 0 i1 Where w is the weight vector w0 is the bias or the threshold weight A schematic view of the liear classifier is give i the ext figure. x1 w1 x2 xm 1 w2 wm w0 f=w 1 x 1 +w 2 x 2 + +w m x m +w 0 Weighted Sum of the iputs (f) hreshold fuctio {c1,c2} Output = Class decisio For coveiece, we will absorb the itercept w0 by augmetig the feature vector x with a additioal costat dimesio (let the bar over a variable deote the augmeted versio of the vector): 1 wx w0 w0 w wx x A two-category liear classifier (or biary classifier) implemets the followig decisio rule: if g( x) 0 decide that sample x belogs to class 1 if g( x) 0 decide that sample x belogs to class 1 or if w x w0 decide that sample x belogs to class 1 if w x w0 decide that sample x belogs to class 1 If g(x) = 0, x ca ordiarily be assiged to either class.
Figure 2. Image for 2D case depictig: liear decisio regios (red ad blue), decisio boudary (dashed lie), weight vector (w) ad bias (w0=d). 2.2. Learig algorithms for liear classifiers We will preset two mai learig algorithms for liear classifiers. I order to perform learig we trasform the task ito a optimizatio problem. For this we defie a loss fuctios L. he loss fuctio applies a pealty for every istace that is classified ito the wrog class. he perceptro algorithm adopts the followig form for the loss fuctio: L(w ) = 1 max(0, y iw x i ) = 1 L i (w ) If a istace is classified correctly, o pealty is applied because the secod term is egative. I the case of a misclassificatio the secod (positive) term will be added to the fuctio value. he objective ow is to fid the weights that miimize the loss fuctio. Gradiet descet ca be employed to fid the global miimum of the loss fuctio. his relies o the idea that a differetiable multivariate fuctio decreases fastest i the opposite directio of the gradiet. he update rule accordig to this observatio is: w k+1 w k η L(w k) where w k is the weight vector at time k, η is a parameter that cotrols the step size ad is called the learig rate, ad L(w ) is the gradiet vector of the loss fuctio at poit w k. he gradiet of the loss fuctio is: L(w ) = 1 L i(w ) 0, if y L i (w ) = { i w x i > 0 y i x i, otherwise I the stadard gradiet descet approach we update the weights after visitig all the traiig examples. his is also called the batch-update learig algorithm. We ca use stochastic gradiet descet istead. his etails updatig the weights after visitig each traiig example resultig i the classical olie perceptro learig algorithm from [1]. I this case the update rule becomes: w k+1 w k η L i (w )
Algorithm: Batch Perceptro iit w, η, Elimit, max_iter for iter=1:max_iter E = 0, L = 0 for : d z i = j=0 w j X ij if z i y i 0 L L y w w ix i E E + 1 L L y i z i edif edfor E E/ L L/ L L / w w if E < E limit break w w η L w edfor Algorithm: Olie Perceptro iit w, η, Elimit, max_iter for iter=1:max_iter E = 0 for : d z i = j=0 w j X ij if z i y i 0 w w + ηx i y i E E + 1 edif edfor E E/ if E < E limit break edfor 2.3. wo-class two feature liear classifier I this laboratory sessio we will fid a liear classifier that discrimiates betwee two sets of poits. he poits i class 1 are colored i red ad the poits i class 2 are colored i blue. Each poit is described by the color (that deotes the class label) ad the two coordiates, x1 ad x2. he augmeted weight vector will have the form w = [w0 w1 w2]. he augmeted feature vector will be x = [1 x1 x2 ]. Figure 3. he decisio boudary obtaied from the perceptro algorithm
3. Practical work 1. Read the poits from the file test0*.bmp ad costruct the traiig set (X,Y). Assig the class label +1 to blue poits ad -1 to red poits. 2. Implemet ad apply the olie perceptro algorithm to fid the liear classifier that divides the poits ito two groups. Suggestio for parameters: η=10-4, w0 = [0.1, 0.1, 0.1], Elimit=10-5, max_iter = 10 5. 3. Draw the fial decisio boudary based o the weight vector w. 4. Implemet the batch perceptro algorithm ad fid suitable parameters values. Show the loss fuctio at each step. It must decrease slowly. 5. Visualize the decisio boudary at itermediate steps, while the learig algorithm is ruig. 6. Chage the startig values for the weight vector w, the learig rate ad termiatig coditios to observe what happes i each case. What does a oscillatig cost fuctio sigal? 4. Refereces [1] Roseblatt, Frak (1957), he Perceptro - a perceivig ad recogizig automato. Report 85-460-1, Corell Aeroautical Laboratory. [2] Richard O. Duda, Peter E. Hart, David G. Stork: Patter Classificatio 2 d ed. [3] Xiaoli Z. Fer, Machie Learig Course, Orego Uiversity - http://web.egr.oregostate.edu/~xfer/classes/cs534/otes/perceptro-4-11.pdf [4] Gradiet Descet - http://e.wikipedia.org/wiki/gradiet_descet [5] Avrim Blum, Machie Learig heory, Caregie Mello Uiversity - https://www.cs.cmu.edu/~avrim/ml10/lect0125.pdf