Multi-Layer Perceptrons

Size: px

Start display at page:

Download "Multi-Layer Perceptrons"

Ethan Rodgers
5 years ago
Views:

1 Mlt-Layer Perceptrons

2 Mlt-Layer Perceptrons Wth hdden layers One hdden layer - any Boolean fncton or conve decson reons To hdden layers - arbtrary decson reons PR, ANN, & ML

3 Decson bondares PR, ANN, & ML 3

4 Decson Bondares PR, ANN, & ML 4

5 Bacpropaaton Learnn rle O z O 1 O z1 z z NET W y W W y y1 W y 3 y W y W NET y net net PR, ANN, & ML 5

6 Cost fncton E 1 W, O z O W,, 1 W E l PR, ANN, & ML 6

7 PR, ANN, & ML 7 ' ' 1 1 NET z O y y NET z O W y W NET NET O z O z O W y W O W E W Chane.r.t. W

8 Interpretaton y1 1 O1 O z1 z W y 3 y 3 4 O z ' NET y y O z ' NET A form of Hebb s rle If error, reard mtal ectaton lare feedbac otpt and lare npt PR, ANN, & ML 8

9 PR, ANN, & ML 9 W net net W net W NET z O W W W O W O W O W O E ' ' ' ' 1 1,,,,,, Chane.r.t.

10 PR, ANN, & ML 10 W net net W net W NET z O y y E W O E ' ' ' ' 1,,, Chane.r.t.

11 Interpretaton cont. z 1 y1 1 O 1 O z W y 3 y 3 O z O z W ' net ' NET A to-level Hebb s rle Error ' net bacpropaated error Wehted bacpropaated error Wehted bacpropaated error bacpropaated aan If error, reard mtal ectaton lare feedbac otpt and lare npt W Wehted bacpropaated error bacpropaated aan and ehted aan PR, ANN, & ML 11

12 Interpretaton cont. V pq otpt npt patterns Hebb s learnn Error at the otpt end Actvaton at the npt end Learnn rate PR, ANN, & ML 1

13 PR, ANN, & ML 13

14 Graphcs Illstraton of Bacpropaaton O 1 b 1 PR, ANN, & ML 14

15 PR, ANN, & ML 15

16 Caveats on Bacpropaaton Slo Netor Paralyss f ehts become lare operates at lmts of sqash transfer fnctons dervatves of sqash fncton feedbac small Step sze too lare may lead to satraton too small case slo converence PR, ANN, & ML 16

17 Caveats on Bacpropaaton Local mnma many dfferent ntal esses momentm varyn step sze lare ntally, ettn small as trann oes on smlated annealn Temporal nstablty learn B and forot abot A PR, ANN, & ML 17

18 Other than BacPropaaton In realty, radent descent s slo and hhly dependent on ntal ess More sophstcated nmercal methods est Trst reon methods, combnaton of Gradent descent Neton s methods PR, ANN, & ML 18

19 Caveats Error bacpropaaton s the or horse of all sch learnn alorthms In realty, hode-pode of hacs, teas and trals and errors are needed Eperence and ntton or dmb lc are eys PR, ANN, & ML 19

20 Other Practcal Isses Whch transfer fncton? mst be nonlnear net H WX shold be contnos and smooth So and are defned shold satrate Bolocally electroncally plasble PR, ANN, & ML 0

21 PR, ANN, & ML 1 Smod Fncton 3 / b a a e a e e a net b a net b net b net b tanh

22 Trend Smod s replaced by ReL rectfed lnear nt or soft pls n many applcatons PR, ANN, & ML

23 Inpt Scaln Inpts eht, sze, etc. have dfferent nts and dynamc rane and may be learned at dfferent rates Small npt ranes mae small contrbton to the error and are often nored Normalzaton to same rane and same varance smlar to Whtenn transform PR, ANN, & ML 3

24 Weht ntalzaton Don t set the ntal ehts to zero, the netor s not on to learn at all Don t set the ntal ehts too hh, that leads to paralyss and slo learnn th smod fncton Don t set the ntal eht too small, otpt snal shrnae s a problem PR, ANN, & ML 4

25 Weht ntalzaton Random ntalzaton both postve and neatve random ehts to nsre nform learnn Xaver ntalzaton Certan varance of the eht dstrbton shold be mantaned to avod shrnae and blop problems PR, ANN, & ML 5

26 A snle neron Xaver Intalzaton Varance f a snle term Assme zero mean Otpt varance n var npt varance Mantan same varance f n n and n ot are dfferent PR, ANN, & ML 6

27 Otpt Scaln Rle of thmb: Avod operatn nerons n the satraton tal reons Tendency for eht satraton s small, learnn s very slo For smod fncton as shon before, se rane -1, 1 nstead of , PR, ANN, & ML 7

28 Otpt Scaln: Batch Normalzaton Mantan mean and varance of not st npt, bt also otpt Xaver ntalzaton? Too many assmptons ndependence, zero mean, etc. not holdn Forced renormalzaton after each layer Zero mean and nt varance Done batch by batch before ReL PR, ANN, & ML 8

29 Atoencoder Error Fnctons Reprodcn otpt atomatcally No snle featre s more or less mportant than others RMS error PR, ANN, & ML 9

30 Classfer Otpts ntrmmed ndcator scores To cases: One-hot encodn: a do, a cat, a vehcle, a person, etc. General encodn: Presdent Obama predctn fnal-4 otcome. Poltcal? Sports? Comedy? A probablty fncton PR, ANN, & ML 30

31 Classfer Error Fnc To components: Forced normalzaton: e.. softma Error: cross entropy E.., n tensorflo PR, ANN, & ML 31

32 Nmber of Hdden Layers Too fe poor fttn Too many over fttn, poor eneralzaton PR, ANN, & ML 3

33 PR, ANN, & ML 33 Nmercal Stablty step sze Adaptve J J J J J J J J J J J opt opt opt

34 Nmercal Stablty - momentm ne 0.9 crr 1 crr bp prev Wthot Red: as compted from crrent bac propaaton. momentm Ble: as compted from prevos bac propaaton PR, ANN, & ML 34

35 Weht decay Nmercal Stablty To ensre no snle lare eht domnates the trann process ne old 1 PR, ANN, & ML 35

36 Optmzers Wrapper arond error bacpropaaton Stochastc GD, Moment, adaptve stepsze advanced lne search, and decay are often there E.., Adam Optmzer adaptve and tme varyn learnn rate for all parameters Not for fanted heart, as arond! PR, ANN, & ML 36

37 Essentally Yes, mlt-layer perceptrons can dstnsh classes even hen they are not lnearly separable? Qestons: Ho many layers? Ho many nerons per layers? Can # layers/# nerons per layer be learned too? n addton to ehts PR, ANN, & ML 37

38 Easer Sad than Done Blnd learnn th lare nmber of parameters s nmercally mpossble Maor recent advance Redced nmber of parameters Layered learnn PR, ANN, & ML 38

39 Emlaton of Hman Vson Sparsty of connecton PR, ANN, & ML 39

40 Emlaton of Hman Vson Shared eht PR, ANN, & ML 40

41 Layered Learnn A herarchcal featre descrptor Learnn atomatcally from npt data Layer-by-layer learnn th ato encoder Partton: CNN: featre detecton Flly-connected netor: reconton PR, ANN, & ML 41

42 PR, ANN, & ML 4

43 Adaptve Netors Netor sze/layer s not fed ntally Layer/sze are added hen necessary or hen a lare nmber of epochs proress thot fndn stable ehts Assmptons: to classes 1,0 may not be lnearly separable e.., mltple concave reons PR, ANN, & ML 43

44 Intally one neron O y Ideal Real n 0 1 ronly on O 0 y 1 ronly off O 1 y 0 PR, ANN, & ML 44

45 Refnement th more nerons Tran throh a nmber of epochs f no ronly on/off cases, the to classes are lnearly separable, stop f there are ronly on/off cases, the to classes are not lnearly separable, then remember the best ehts the ehts that case the less nmber of msclassfcaton ntrodce more nts nstead of thron aay everythn and restartn from scratch th a larer netor PR, ANN, & ML 45

46 Increase Netor Complety O y n 0 n 1 n n 1 p 1 PR, ANN, & ML 46

47 O y acton lare off neatve otpt 1 N 1n : correct ronly-on error fre neatve feedbac only O y 0 don' t care n 0 n 1 n n 1 p 0 off 1 PR, ANN, & ML 47

48 N 1p : correct ronly-off error O 0 y 0 fre postve feedbac only acton off O y n don' t care off n 1 n n 1 p 1 0 lare postve otpt 1 PR, ANN, & ML 48

49 Frther Refnement O y n 0 n 1 n n 1 p n n n p n' n n' p 1 PR, ANN, & ML 49

50 General Learnn Rle N n Fre neatve mplse Correct ronly on cases Trn off f O=1 no matter hat y s Don t care f O=0 and y=0 N p Fre postve mplse Correct ronly off cases Trn off f O=0 no matter hat y s Don t care f O=1 and y=1 PR, ANN, & ML 50

51 Bacpropaaton Learnn rle O 1 O 1 O O h W V W W V V 1 V V 3 W h W V W V h h PR, ANN, & ML 51

52 Chane.r.t. _ W E W WV W O ' h V V O ' h PR, ANN, & ML 5

53 Chane.r.t. _ E E V W, V O ' h W ' h, W ' h, ' h W PR, ANN, & ML 53

54 Interpretaton 1 O 1 O O ' h O h ' W V 1 V V 3 V PR, ANN, & ML 54

55 Interpretaton cont. 1 O h ' W O 1 O V 1 V V 3 W ' h ' h W PR, ANN, & ML 55

I. Decision trees II. Ensamble methods: Mixtures of experts

I. Decision trees II. Ensamble methods: Mixtures of experts CS 75 Machne Learnn Lectre 4 I. Decson trees II. Ensamble methods: Mtres of eperts Mlos Hasrecht mlos@cs.ptt.ed 539 Sennott Sqare CS 75 Machne Learnn Eam: Aprl 8 7 Schedle Term proects & proect presentatons: