Lecture 23: Artificial neural networks

Size: px

Start display at page:

Download "Lecture 23: Artificial neural networks"

Noel Austin
6 years ago
Views:

1 Lecture 23: Artfcal neural networks Broad feld that has developed over the past 20 to 30 years Confluence of statstcal mechancs, appled math, bology and computers Orgnal motvaton: mathematcal modelng of neurologcal networks Practcal applcatons: pattern recognton (e.g. appled to speech, handwrtng, underwrtng) used by USPS to read handwrtten zp codes Can be very fast, partcularly when mplemented n specal purpose hardware 525

2 Bologcal neurons Schematc structure of a neuron (Cchock and Unbehauen reproduced from Prncples of Neurocomputng for Scence & Engneerng by Ham and Kostanc) 526

3 Image of prmate neurons From bran-maps.org. Human bran contans ~ neurons 527

4 Bologcal neurons Input sgnals come from the axons of other neurons, whch connect to dendrtes (nput termnals) at the synapses If a suffcent exctatory sgnal s receved, the neuron fres and sends an output sgnal along the axons Synapse can be exctatory or nhbtory Neuron Dendrtes Axons 528

5 Threshold effect The frng of the neuron occurs when a threshold exctaton s reached 529

6 Mathematcal model Nonlnear model of an artfcal neuron ξ 1 w 1 ξ 2 ξ 3... ξ N w 2 w 3... w N Synaptc weghts Σ h g Actvaton or threshold functon O Output sgnal Input sgnal 530

7 Mathematcal model Input and output sgnals are normalzed, typcally over the range [ 1,+1] or [0, +1] Actvaton functon can be lnear: g(h) = 0 step-lke: g(h) = sgn(h) sgmod: g(h) = tanh(βh) or 1/(1+e 2βh ) Neuron output, y = g(σ(w x )) 531

8 Network archtecture Here, we confne our attenton to feed-forward networks (a.k.a. perceptrons ) no feedback loops: transfer of nformaton s undrectonal Smplest example: 1-layer perceptron wth N nputs (ξ k wth k = 1, N) connected to M outputs (Ο wth = 1, M) va MN synaptc weghts W k ξ 1 ξ 2 ξ 3 Ο 1 Ο 2 O = g 1, w k k = N ξk 532

9 Example: the logcal AND functon Represent by a 1 x 3 perceptron +1 always ξ 1 w 1 = or +1 ξ 2 w 2 = 1 Ο 1 = sgn( ξ 1 +ξ 2 ) 1 or +1 ξ 3 w 3 = 1 (example of settng thresholds wth hardwred nputs) 533

10 Tranng the network How can we teach the network to yeld desred responses? Need a set of desred nputs and outputs: ξ and ζ, for =1, p Need a measure of learnng: the cost functon, E(w), that we seek to mnmze Matrx of synaptc weghts, w k Fndng the best set of weghts s then an MN dmensonal mnmzaton problem 534

11 The cost functon, E(w) 2 common choces 1) E MSE (w) = ( ζ ) = O ζ g 2 k w k ξ k 2 2) (mean square error) E RE (w) = 1 2 (1 + ζ (relatve entropy, for specfc case of the tanh actvaton functon) (1 + ζ )ln (1 + O ) + (1 ζ ) (1 ζ )ln (1 O ) ) 535

12 Mnmzng E(w) Smple tranng method: start wth ntal guess of weghts and change n accord wth Δw k E = η w k Learnng rate ~ 0 to 1 (Move along drecton of steepest descent) 536

13 Mnmzng E(w) The dervatves are easy to compute E w MSE k = ( ) ζ O g w ξ ξ whch s convenently wrtten as k k k E w MSE k k = δ ξ k wth and δ = k k k ( ) ζ O g w ξ g ( h) = 2βg( h)[1 g( h)] g ( h) = β[1 g 2 ( h)] for g=tanh(βh) for g = 1/(1+e 2βh ) 537

14 Mnmzng E(w) For the RE cost functon, we fnd that E w RE k = δ ξ k where, for the tanh actvaton functon, δ = ( ζ O )β 538

15 Mnmzng E(w) So the procedure s to start wth an ntal set of startng weghts and to change them teratvely accordng to Δw k = ηδ ξ k untl the cost functon changes by less than some preset amount. 539

16 Geometrc nterpretaton For each output node, there s an N 1 dmensonal hyperplane whch separates nput values yeldng O > 0 from those wth O < 0 ξ 3 ξ 2 ξ 1 540

17 Mult-layer networks Interest n feed-forward networks was lmted untl t was realzed that 2-layer networks could descrbe any contnuous functon of the nputs Hdden layer ξ 1 V 1 O 1 ξ 2 O 2 ξ 3 V 2 O 3 and a three-layer network can descrbe any functon of the nputs 541

18 Mult-layer networks Obvously, f the actvaton functon s lnear, the two-layer network s equvalent to a one-layer network wth synaptc weghts q k = Σ j w jk W j But for the sgmod or step-functon (sgn) actvaton functon, nterestng new behavor can result 542

19 Tranng multlayer perceptrons As before we mnmze the cost functon, e.g. ( ζ ) 1 E MSE ( W, w) = O 2 but now we have to vary two sets of weghts. 1) The dervatves w.r.t. W k are 2 E W MSE k = ( ζ O ) g W V V k k k k = δ V k where δ ( ) ( ζ O g H ) and H W k k k V k 543

20 Tranng multlayer perceptrons 2) The dervatves w.r.t. w k are computed usng the chan rule E w = MSE k = E V j j ( ) ( ) ( ) ζ O g H W g h ξ V w k j j j For each layer, we update the weghts by movng along the drecton of steepest descent: ΔW k E = η W k Δw k E = η w k 544

21 Tranng multlayer perceptrons So the tranng procedure s 1. Intalze all weghts to random values 2. Propagate nput sgnal forwards through network to compute the ntermedate and output sgnals 3. Compute the cost functon and ts dervatves w.r.t. each weght, startng wth the fnal layer and workng backwards 4. Update all weghts 5. Return to 2 (or stop f the convergence crteron s met) 545

22 Improvements n mnmzaton As we know from prevous lectures, the method of steepest descent can be very slow. We can use conjugate gradent or varable metrc methods Alternatvely, we can add a momentum term so that we nclude some of the prevous step Δw k new E = η w k + α Δw k prevous where α s the momentum parameter (typcally ~ 0.9 and must be between 0 and 1) Ths can smooth the approach to the mnmum. The best algorthms vary α and η as the mnmum E s approached 546

23 Applcatons of ANN Underwrtng Input: nformaton about borrower/nsured Output: loan/nsurance outcome Tranng set: prevous experence Speech and handwrtng recognton Fnancal predctons Attempts to beat the stock market not successful: consstent wth the effcent market hypothess Forecastng: weather, solar flares Dagnoss/classfcaton: medcal, astronomcal 547

24 Flexble networks Whle the number of nput and output nodes s generally fxed by the problem, the number of hdden layers and nodes wthn them can be vared to reduce the cost functon 3-layer perceptrons are always suffcent, although the use of more layers may reduce the requred number of nodes 548

25 Example: Handwrtng recognton Input: 16 x 16 pxel mage: N = 256 ξ= 1 for pxels where nk s present, 1 otherwse ~ possble nput states Output: 10 nodes: O k = 1 f number s k and 1 otherwse 549

26 Example: Handwrtng recognton Feed forward neural net developed by Le Cun et al. wth four hdden layers Tranng set: 10 4 mages dgtzed from addresses on actual US mal 28 x 28 grayscale pxels Test set (not used for tranng): ~ 3000 addtonal mages 4635 nodes, connectons 550

27 Example: Handwrtng recognton Performance after 30 adaptaton cycles: 1.1% error on tranng set 3.4% error on test set Can acheve 1% error on test set f t rejects 5.7% of the characters 551

28 Example: photometrc redshfts Use pattern recognton derve hard-to-measure parameter from observatons of easy-tomeasure parameter Sloan Dgtal Sky Survey provdes broad-band photometrc data (n 5 bands) for ~ 10 8 objects (manly galaxes) and spectra for ~ 10 6 Spectra allow the redshft to be determned unequvocally, provdng the dstance and allowng the 3-D dstrbuton to be determned 552

29 Example: photometrc redshfts Whle only 1% of the objects are observed spectroscopcally, photometrc redshfts can be estmated for the other 99% relatve and absolute fluxes at 5 observed bands are correlated wth z Collster and Lahav (2004, PASP, 116, 345) used artfcal neural networks (e.g. 3-layer perceptron) to determne photometrc redshfts: ANNz program 553

30 from CL04 554

31 Example: photometrc redshfts CL04 used a 3-layer perceptron wth a 5:10:10:1 archtecture Tranng set: 10 4 galaxes wth spectroscopc redshfts Used commttee of fve ndependently traned networks Cost functon modfed to prevent blowup of weghts 555

32 556

33 Example: photometrc redshfts Results are qute mpressve R.m.s. error n z s ~ 0.02 (versus 0.07 for competng method) 557

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,