INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

Size: px

Start display at page:

Download "INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018"

Willis Rodgers
6 years ago
Views:

1 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08

2 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton usng a lnear classfer. Lnk to probablstc classfers and SVM INF

3 Relevant addtonal vdeo lnks: W7Lu35JvHM8lY-zLfQRF3EO8sYv Lecture and 3 Remark: they do not cover regreson. INF

4 From last week: Introducton to logstc regresson Let us show how a regresson problem can be transformed nto a bnary -class classfcaton problem usng a nonlnear loss functon. Then generalze to multple classes net week. INF

5 From last week: What f we ftted t to a functon f that s close to ether 0 or? Hypothess h s now a non-lnear functon of Classfcaton: y=0 or Threshold h : f h >0.5 : set y=, otherwse set y=0 Desrable to have h Cod= Herrng=0 INF 5860 length 6

6 Logstc regresson model Want 0 h bnary problem Let gz s called the sgmod functon INF

7 Decsons for logstc regresson Decde y= f h > 0.5, and y=0 otherwse h X T g g z z e h X e T gz>0.5 f z>0 w+b>0 gz<0 f z<0 w+b<0 Here the compact notaton means the vector of parameters [w,b] INF

8 Loss functon for logstc regreeson We have two classes, and 0. Let us use a probablstc model Let the parameters be =[w,.w nk,b] f we have nk features. Py=, = h Py=0,= - h Ths can be wrtten more compactly as py, = h y - h -y INF

9 Loss functon for logstc regreeson The lkelhood of the parameter values s It s easer to mamze the log-lkelhood We wll use gradent descent to mamze ths, takng a step n the postve drecton snce we are mamzng, not mnmzng INF

10 Computng the gradent of the lkelhood functon Here, we used the fact the g z=gz-gz INF 5860

11 Gradent descent of J=-L INF 5860,,: : Repeat usng gradent descent that mnmze J fnd : To fnd,: log,: log X y X h m X h y X h y m J m m Ths algorthm looks smlar to lnear regresson, but now T e h

12 Overfttng and regularzaton For any classfer, t s a rsk of overfttng to the tranng data. Overfttng: Hgh accuracy on tranng data Lower accuracy on valdaton data. Ths rsk s hgher the more parameters the classfer can use. INF

13 Eample: polynomal regresson If a lnear model s not suffcent, we can etend to allow hgherorder terms or cross-terms between the varables by changng our hypothess h INF h h

14 The danger of overfttng A hgher-order model can easly overft the tranng data For the hgher order terms: The hgher the value of the coeffcents, the more the curve can fluctuate Ths s not vald for the frst two coeffcents Restrctng only the value of hgher-order terms s dffcult n general e.g. for neural nets But we can restrct the magntude of the coeffcents ecept 0. INF

15 Overfttng for classfcaton Overfttng must be avoded for classfaton also ths s partly why we start wth smple lnear models INF

16 Regularzaton - ntuton Suppose we add a penalty to restrct 3 and 4 m J,: h X y m To mnmze, 3 and 4 must be small INF

17 Regularzed cost functon Smplfy the hypothess by havng small values for,. n n m J h X,: y m s the regularzaton parameter Ths s L-regularzaton, later we wll see Dropout, ma norm Remark: we do not regularze the offset b also called 0 INF

18 What f s very large? Wll we get overft or underft? INF

19 Gradent descent wth regularzaton: lnear regresson INF : Repeat NO penalty on Note : usng gradent descent that mnmze J fnd : To fnd 0,,:,,:,,: 0 0 X y X h m m m X y X h m X y X h m m m m

20 Regularzed logstc regresson: gradent descent INF 5860,...,,:,0,: : Repeat 0 0 X m m T e X h n m X y X h m X y X h m

21 Introducng classfyng CIFAR mages CIFAR-0 mages: 333 pels Stack one mage nto a vector of length 333=307 Classfcaton wll be to fnd a mappng fw,,b from mage space to a set of C classes. For CIFAR: pel pel 307 weght for pelfor class W weght for pelfor class0 weght for pel 307 for class weght for pel 307 for class0 b b b0 INF 5860

22 Small eample classes 40 Graylevel mage 6 Score for class 0.5 Score for class W W: 4 One weght w, for pel for class b. 0.3 INF

23 If color mage, append the r,g,b bands nto one long vector. Note: no spatal nformaton concernng pel neghbors s used here. Convolutonal nets use spatal nformaton. All mages are standarzed to the same sze! For CIFAR-0 t s 33. If a classfer s traned on CIFAR and we have a new mage to classfy, resze to 33. INF

24 W for multclass mage classfcaton W s a Cn+-matr C classes, n pels n the mage plus for b We tran one lnear model pr. class, so each class has a dfferent Wc,:-vector If Wc,: s a vector of length n+ pel pel 307 b W b C weght for pelfor class weght for pelfor class C... weght for pel 307 for class weght for pel 307 for class C Let the score for class s c be fw,=wc,: b s ncluded n W and INF

25 From to C classes: alternatve One vs. all classfcaton: Tran a logstc classfer h,c for each class c to predct the probablty for y=c. Classfy new sample by pckng the class c that mamze ma, c h c INF

26 From to multple classes: Softma The common generalzaton to multple clasess s the softma classfer. We want to predct the class label y ={, C} for sample X,:, y can take one of C dscrete values, so t follows a multnomal probablty dstrbuton. Ths s derved from an assumpton that the probablty/score of class y=k s T k e h p y k, C e T INF

27 Softma predcton/classfcaton Assgn each sample to the class that mamze the score: T k e h p y k, C T e INF

28 Cross-entropy From nformaton theory, the cross entropy between a true dstrbuton p and an estmated dstrbuton q s: H p, q p log q Softma mnmze the cross-entropy between the estmated class probabltes and the true dstrbuton the dstrbuton where all the mass s n the correct class. INF

29 Softma From a tranng data set wth m samples, we formulate the loglkelhood functon that the model fts the data: l m log p y X,:, We can now fnd that mamze the lkelhood usng e.g. gradent ascent of the log-lkelhood functon. Or we can mnmze l usng gradent descent More detals on dervng softma net week Ole-Johan INF

30 Cross-entropy loss functon for softma The loss functon for softma, ncludng regularzaton: Iy= s the ndcator functon that s f y= and zero otherwse. See INF n n C C C l n T W y p y I n J W e e y I n J W X T l T,, log the row for class,:, let, values for mage the n pel,,: 0

31 Softma predcton eample INF

32 Gradents of the cross entropy loss, ncludng regularzaton INF n n C C C l n T W y p y I n J W e e y I n J W X T l T,, log the row for class,:, let, values for mage the n pel,,: 0

33 For those who want calculus.. Computng the dervatve of the softma functon: see all detals at INF

34 Lnk to Gaussan classfers In INF 4300, we used a tradtonal Gaussan classfer Ths type of models s called generatve models, where a specfc dstrbuton s assumed. INF

35 FROM INF 4300:Dscrmnant functons for the Gaussan densty When fndng the class wth the hghest probablty, these functons are equvalent: Wth a multvarate Gaussan we get: If we assume all classes have equal dagonal covarance matr, the dscrmnant functon s a lnear functon of : INF ln ln ln t P d g μ μ ln ln P p g P p g p P p P g ln T T P μ μ μ

36 Gaussan classfer vs. logstc regresson These Gaussan wth dagonal covarance and the logstc regresson/softma classfer wll result n dfferent lnear decson boundares. If the Gaussan assumpton s correct, we wll epect that ths classfer has the lowest error rate. The logstc regreson mght be better f the data s not entrely Gaussan. NOTE: SOFTMAX reduces to logstc regresson f we have classes. INF

37 Support Vector Machne classfers Another popular classfer s the Support Vector Machne SVM formulaton, whch also can be formulated n terms of loss functons The followng fols are for completeness, only a basc understand of the SVM as a mamum-margn classfer s epected n ths course. INF

38 Hyperplanes and margns Background SVM. Have a margn of w w w. Requre that all pels are correctly classfed: w w T T w w 0 0,, Goal: fnd w and w 0 39

39 Support Vector Machne loss A SVM loss functon can be formulated by havng as large margn as possble. Ths s generalzed to multple classes so the SVM wants the correct class to have a score hgher than the scores for the ncorrect classes by som margn If s s the score for class, the loss functon for SVM s L ma 0, s s y Ths s called the hnge loss 40

40 SVM and gradent descent We can also solve the SVM usng gradent descent also, we wll not cover ths, but see INF

41 FROM INF 4300:Dscrmnant functons for the Gaussan densty When fndng the class wth the hghest probablty, these functons are equvalent: Wth a multvarate Gaussan we get: If we assume all classes have equal dagonal covarance matr, the dscrmnant functon s a lnear functon of : INF ln ln ln t P d g μ μ ln ln P p g P p g p P p P g ln T T P μ μ μ

42 Net week: Feed forward nets and learnng by backpropagaton Readng materal: Deep learnng Chapter 6 INF

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson