Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM)

Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn functon f(.) precse error can be determned and s used to drve the learnng. Unsupervsed learnng: (compettve, SOM, BM) no target/desred output provded to help learnng, learnng s self-organzed/clusterng renforcement f tlearnng: n between bt the two no target output for nput vectors n tranng samples a udge/crtc wll evaluate the output good: reward sgnal (+1) bad: penalty sgnal (-1)

RL exsts n many places Orgnated from psychology (condtonal reflex) In many applcatons, t s much easer to determne good/bad, rght/wrong, acceptable/unacceptable than to provde precse correct answer/error. It s up to the learnng process to mprove the system s performance based on the crtc s sgnal. Machne learnng communty, dfferent theores and algorthms maor dffculty: credt/blame dstrbuton chess playng: W/L (mult-step) soccer playng: W/L (mult-player)

P Prncple of frl Let r = +1 reword (good output) r = -1 penalty (bad output) If r = +1, the system s encouraged to contnue what t s dong If r = -1, the system s encouraged not to do what t s d dong. Need to search for better output because r = -1does not tndcate what tthe good output should be. common method s random search

ARP: the assocatve reword-and-penalty Algorthm for NN RL (Barton and Anandan, 1985) Archtecture z(k) crtc y(k) nput: x(k) output: y(k) x(k) stochastc unts:z(k) for random search

Random search by stochastc t unts z 2 / 1 ( 1) (1 net T pz e ) 2 / 1 ( 1) (1 net T pz e ) or let z obey a contnuous probablty dstrbuton functon. or let z net where s a random nose, obeys certan dstrbuton. Key: z s not a determnstc functon of x, ths gves z a chance to be a good output. t Prepare desred output (temporary) dk ( ) yk ( ) f rk ( ) 1 yk ( ) f rk ( ) 1

Compute the errors at z layer ek ( ) dk ( ) Ezk ( ( )) where E(z(k)) s the expected value of z(k) because z s a random varable How to compute E(z(k)) tk take average of z over a perod of tme compute from the dstrbuton, f possble f logstc sgmod functon s used, E( z) ( 1) g( net) ( 1)(1 g( net)) tanh( net / T ) Tranng: Delta rule to learn weghts for output nodes w ey f r 1 wth ey f r1 BP or other methods to modfy weghts at lower layers

Probablstc Neural Networks 1. Purpose: classfy a gven nput pattern x nto one of the predefned classes by Bayesan decson rule. Suppose there are k predefned classes s 1, s k P(s ): pror probablty of class s P(x s ): condtonal probablty of x, gven s P(x): probablty of x P(s x): posteror probablty of s, gven x Example: S { s1 s k }, the set of all patents s : the set of all patents havng dsease s x: a descrpton (manfestatons) t of a patent t

P(x s ): prob. patent t wth dsease s wll have descrpton x P(s x): prob. patent t wth descrpton x wll have dsease s. by Bayes theorem: P ( s x ) max P ( s x ) Px ( s ) Ps ( ) Ps ( x) because Px ( ) s constant, s Px ( ) Ps ( x) max Ps ( x) ff Px ( s) Ps ( ) max Px ( s) Ps ( ) In PNN, P( x s ) are learned from examplers

2. Estmate probabltes () - Tranng exemplars: x the th exemplar belongng to s - Prors can be obtaned ether by experts estmate or k calculated from exemplars Ps ( ) s / 1 s - Condtonals are estmated accordng to Parzen estmator: () 2 n 1 x x Px ( s ) exp m /2 m 2 (2 ) n 1 2 where m : dmenson of the pattern n : # of exemplars n s x : nput pattern - closely related to radal bass functon of Gaussan 2 1 ( x u ) f ( x) exp( ) 2 2 2

3. PNN archtecture: feed forward of 4 layers decson layer class layer z y exemplar layer () 2 2 y exp( x x / ) nput layer Exemplar layer: RBF nodes, one per exemplar, centered on () y determned by the dstance between and x x () y s large f t s close to x, Class layer: connectng to all exemplars belongng to that class s, z approx. Parzen estmate t of P( x s) () z s large f x s close to more x Decson layer: pcks up wnner based on z P ( s ) If necessary tranng to adust weghts for upper layers () x

4. Comments: Classfcaton by Bayes rule Fast classfcaton Fast learnng Guaranteed to approach the Bayes optmal decson surface provded that the class probablty densty functons are smooth and contnuous. Trade nodes for tme( not good wth large tranng samples) The probablstc densty functon to be represented must be smooth and contnuous.