Development of a General Purpose On-Line Update Multiple Layer Feedforward Backpropagation Neural Network

Size: px
Start display at page:

Download "Development of a General Purpose On-Line Update Multiple Layer Feedforward Backpropagation Neural Network"

Transcription

1 Master Thess MEE 97-4 Made by Development of a General Purpose On-Lne Update Multple Layer Feedforward Backpropagaton Neural Network Master Program n Electrcal Scence 997 College/Unversty of Karlskrona/Ronneby Supervsor/Examner: Mattas Dahl

2 Master Thess MEE 97-4 /5/98 Abstract Ths Master thess deals wth the complete understandng and creaton of a 3-layer Backpropagaton Neural Network wth synaptc weght update performed on a per sample bass (called, On-Lne update). The am s to create such a network for general purpose applcatons and wth a great degree of freedom n choosng the nner structure of the network. The algorthms used are all members of supervsed learnng classes,.e. they are all supervsed by a desred sgnal. The theory wll be treated thoroughly for the steepest descent algorthm and for addtonal features whch can be employed n order to ncrease the degree of generalzaton and learnng rate for the network. Emprcal results wll be presented and some comparsons wth pure lnear algorthms wll be made for a sgnal processng applcaton, speech enhancement. Page

3 Master Thess MEE 97-4 /5/98 Contents Secton Preface 4 Secton Structure of a sngle neuron 5. General methods for unconstraned optmzaton 5. Basc neuron model 6.3 Wdrow-Hoff Delta Rule or LMS Algorthm 6.4 Sngle Layer Perceptron 9.5 General Transfer functons 9 Secton 3 Neural Network Models 3. Dfferent Models 3. Feedforward Multlayer Neural Network Secton 4 Steepest Descent Backpropagaton Learnng Algorthm 4 4. Learnng of a Sngle Perceptron 4 4. Learnng of a Multlayer Perceptron 5 Secton 5 Addtonal Features for mprovng Convergence speed and 9 Generalzaton 5. Algorthm wth Momentum Updatng 9 5. Algorthms wth Non-Eucldean Error Sgnals Algorthm wth an Adaptaton of the Slope of the Actvaton Functons 5.4 Adaptaton of the Learnng Rate and Mxng of nput patterns 5.5 A combned stochastc-determnstc weght update 3 Secton 6 Comparson of a Batch update system wth an On-Lne Update system 4 Secton 7 Emprcal Results 5 7. Approxmaton of sn(x) wth two neurons n hdden layer 5 7. Separaton of sgnals wth nose Processng of EEG-sgnals Echo-cancelng n a Hybrd Speech enhancement for Hands-Free moble telephone set 3 Secton 8 Further Development 33 Secton 9 Conclusons 34 Page 3

4 Master Thess MEE 97-4 /5/98. Preface Neural Networks are used n a broad varety of applcatons. The way they work and the tasks they solve dffers wdely. Often a Neural Network s used as a assocator where one nput vector s assocated wth an output vector. These networks are traned rather than programmed to perform a gven task,.e. one set of assocated vector pars are presented n order to tran the network. The Network wll then hopefully gve satsfactory outputs when facng nput vectors not present n the tranng phase. Ths s the generalzaton property whch n fact s one of the key features of Neural Networks. The tranng-phase s sometmes done n batch, whch means that the tranng vector-pars are all presented at the same tme to the network. If the task to be solved s the same durng tme,.e. the task s statonary, tranng wll be needed only once. In ths case the batch procedure can be used. If the task s changng durng tme the network wll have to be adaptve,.e. to change wth the task. In ths case an on-lne tranng s preferable. The on-lne approach takes the tranng vector-pars one at the tme and performs a small adustment n performance for every such presentaton. The on-lne approach has the great advantage that learnng tme s reduced consderably when compared to a batch system. In sgnal processng applcatons the adaptaton durng tranng s of great mportance. Ths can only be acheved wth the on-lne system. In ths thess an on-lne update Neural Network wll be created and thoroughly explaned. The Neural Network wll be general n the sense that t can be chosen arbtrary when t comes to structure, performance, complexty and adaptaton rules. Emprcal experments of both artfcal and real-lfe applcatons wll be presented. The results show that many standard applcatons whch earler has been solved wth lnear systems can easly be solved wth neural networks. Often the result s better. Wth properly chosen structure adaptaton durng learnng can be satsfactory. In real tme mplementatons the use of neural networks can be lmted due to ther need for computatonal power. But, the parallel structure of these networks can make way for mplementatons wth parallel processors. I would lke to thank Professor Andrze Cchock for lettng me use materal from hs and Dr Rolf Unbehauens book [7]. Table. and fgures 3., 4. and 4. are taken from ths book. Page 4

5 Master Thess MEE 97-4 /5/98. Structure of a sngle neuron. General methods for unconstraned optmzaton Consder the followng optmzaton problem: fnd a vector w that mnmzes the real value scalar functon J(w): mn J(w), where w = [ w w... w ] T, the number of elements, denoted n, of w s arbtrary. n Ths problem can be transformed nto an assocated system of frst order ordnary dfferental equatons as n dw J = η dt w = The vector w wll then follow a trace n the phase-portrat of the dfferental equatons to a local- or global mnmum. In vector/matrx notaton ths becomes dw = η( w,t ) J(w) dt w An ntal vector w must be chosen, from whch the trace starts. Dfferent selectons of the matrx h(w,t) gves dfferent methods. The smplest selecton s to choose the matrx as the unty matrx multpled wth a small constant η, the learnng parameter, n ths case the method becomes the well known method of Steepest Descent. If a Taylor approxmaton of J(w) truncated to second order terms, and the gradent for ths, s taken then the resultng equaton can be solved for zero. A necessary condton for a local mnmum s that the gradent equals zeros. In ths case h(w,t) s chosen as the nverse Hessan matrx. Ths s the well known Newton s method. dw - = η w J(w) J(w) dt [ ] w There are some drawbacks wth Newton s method. Frst the nverse of the Hessan must be calculated and ths nverse does not always exst (or the Hessan can be very ll-condtoned). One way to overcome ths s to add an unty matrx multpled wth a small scalar to the Hessan matrx n order to mprove the condton. Of course ths wll then only be an approxmaton to the true Hessan. Ths method s then called the Marquardt-Levenberg algorthm. Second, there are strong restrctons about the chosen ntal values of w because the truncated Taylor seres gves poor approxmatons far away from the local mnma. In the context of Neural Networks there are even more dffcultes because of the computatonal overhead and large memory requrements. In an On-Lne update system we would need to calculate as many matrx nversons as the number of neurons n the network for every sample. There are several approaches whch teratvely approxmates ths nverse wthout actually calculate one,.e. Quas Newton s methods. Ths thess wll emphasze the Steepest Descent method snce the weght update wll be On- Lne and ths sets strong requrements on computatonal smplcty. Page 5

6 Master Thess MEE 97-4 /5/98. Basc neuron model Let us consder a sngle neuron whch takes a vector x =[ x x x ] T... n as nput and produces a scalar y as output. As before the number of elements, denoted n, n vector x s arbtrary. The neuron has nner strengths w =[ w w w ] T... n also called the synaptc weghts. In most models of a neuron t also has a bas, a scalar denoted Θ. Often the nput x s multpled wth the synaptc weghts and then the bas s added. The result s passed through a scalar actvaton functon, Ψ( ), accordng to n y = Ψ( w x + Θ) = Ths can be wrtten n a compact form as y = Ψ( w x ) n =, ndex stands for the :th scalar n vector w and x respectvely., where w = Θ and x = by defnton. The actvaton functon Ψ( ) can be any lnear or non-lnear functon whch s pece-wse dfferentable. The state of the neuron s measured by the output sgnal y.. Wdrow-Hoff Delta rule or LMS algorthm In the Wdrow-Hoff Delta rule, also called the LMS algorthm, the actvaton functon s lnear and often wth slope of unty. In supervsed learnng the output y should equal a desred value d. The LMS algorthm s a method for adustng the synaptc weghts to assure mnmzaton of the error functon J = e ( t) = ( d y) For several nputs the sum of J s to be mnmzed accordng to the L -norm (other norms can be used, see secton 5.) Applyng the steepest descent approach the followng system of dfferental equatons wll be obtaned dw J J y = η = η dt w y w snce y = n = w x we get dw = η e ( t) x ( t) dt Where t s the contnuous tme ndex and η s the learnng parameter. Convertng ths nto the standard dscrete tme LMS algorthm and wrtng t n vector notaton gves w(k+) = w (k) + µ e(k) x(k) Here ndex k stands for the dscrete sample ndex and µ s the learnng parameter (stepsze). The parameter µ dffers from the learnng parameter η n the contnuos case, generally t must be smaller to ensure convergence. Page 6

7 Master Thess MEE 97-4 /5/98 In sgnal processng applcatons, where LMS s wdely used, the vector x s a tme shfted verson of a stream of dscrete nput samples. Consder a stream of nput scalars (a sampled sgnal) wth sample-ndex k startng at and ncreasng to nfnty, that s X = [x() x(),.., x(-) x() x(+),.] The am s to map ths sgnal one-to-one on a desred sgnal D D = [d() d(),.., d(-) d() d(+),.] by flterng the sgnal X wth a fnte mpulse response flter F. Ths type of flter uses a fnte length of past experence, wthout any recurson, from the sgnal X as nput. The number of T flter coeffcents n F s denoted n and the flter coeffcents s w =[ w( ) w( )... w( n )]. The output at sample ndex k s n y( k) = w( ) x( k ) = Whch s the convoluton between w and x, that s y = w x. Now usng the LMS algorthm n order to update the flter coeffcents durng presentaton of the nput stream gves as before w(k+) = w (k) + µ e(k) x(k) where the nput vector x, at sample ndex k, s x(k) = [ x( k) x( k )... x( k n + )] The error e(k) s, as before, defned as the dfference between the desred value and the actual outcome from the flter, that s e(k) = (d(k) - y(k)). There are several mprovements to ths standard LMS algorthm whch all ft ther specal purposes. Table. shows some modfed versons of the LMS algorthm. The am for these modfcatons s to mprove the standard LMS algorthm wth regard to dfferent aspects. These aspects could be n focus on rapd convergence speed or on mnmzng the excess mean squared error. Some of the modfcatons ams to reduce computatonal complexty, as the sgn algorthms. A further nsght n ther behavor can be obtaned from any textbook n sgnal processng T Page 7

8 Master Thess MEE 97-4 /5/98 Table. Modfed and mproved versons of the LMS algorthm, [7] Page 8

9 Master Thess MEE 97-4 /5/98.3 Sngle layer perceptron The sngle layer perceptron uses the hardlmter as the actvaton functon. Ths hardlmter gves only quantzed bnary outputs, that s ~ y {-, }. In ths case the output s compared wth the quantzed bnary d ~ {-, } and a quantzed error sgnal ~ ( ~ e = d ~ y) {-,,} s produced. The update rule for ths s very smlar to that for the LMS algorthm w(k+) = w (k) + µ e ~ ( k) x( k) The only dfferens s that the error s quantzed. Ths perceptron can acheve any separaton of nput vectors that are lnearly separable. For nput vectors that are not lnearly separable t s possble to separate them as well wth approprately nterconnectng several perceptrons. In more general perceptron mplementatons the actvaton functon s non-lnear, se next secton. If an arbtrary transfer functon s used whch s pece-wse dfferentable the update rule can be derved n smlar way as for the LMS algorthm. Consder a general transfer functon Ψ( ) wth gradent Ψ ( ). n u = w x = and y = Ψ( u) As prevous we wll use the L -norm for error measurement (a more general error norm wll be descrbed n secton 5.) J = e~ ~ ( t ) = ( d ~ y ), ndex t stands for contnuous tme The dfferental equaton states (applyng the chan rule) dw J J e J e u = η = η ~ = η ~ dt w e~ w e~ u w Ths gves for a general actvaton functon n dscrete vector form w(k + ) = w(k) + µ e ~ (k) Ψ'( u(k) ) x(k) Here the error ~ e s quantzed as before and ndex k stands for the dscrete sample ndex, µ s the dscrete stepsze. Often the actvaton functon s a s-shaped sgmod functon..4 General transfer functons There are a varety of transfer functons where all of them has ther key applcaton. The most general and wdely used functons are the lnear, bpolar sgmod and the unpolar sgmod functons. These are the transfer functon whch wll be used n ths approach. In the network, whch wll be derved later, they can be ntermxed n any way. The lnear functon ust passes the value wth an amplfcaton due to the slope. Often the slope s set to unty because ths wll only affect the magntude of the stepsze. The bpolar sgmod functon s the tangent hyperbolcus functon wth a slope, see fg.4. Ψ(u) = tanh(γ u) = γu e γ u + e The defnton set s all real numbers and the target set s all real numbers between - and. The unpolar sgmod functon s also s-shaped but the target set s the real numbers between and, see fg.4. Ψ(u) = + e uγ Page 9

10 Master Thess MEE 97-4 /5/98 The slope γs a parameter whch controls the steepness n the non-lnearty and ths can ether be pre-specfed or t can be updated accordng to a scheme, secton 5.3 deals wth ths aspect. The bpolar sgmod functon for dfferent slopes (.5,.,.) Fg.4. The bpolar sgmod functon for dfferent slopes (.5,.,.) The unpolar sgmod functon for dfferent slopes (.5,.,.) Fg.4. The unpolar sgmod functon for dfferent slopes (.5,.,.) The partal dervatve for the bpolar sgmod functon (bsg) follows by usng the dervatve rule of a quote. γu γu γu γu dψ( u) γ e ( + e ) ( e )( γe ) = γ u = d( u) ( + e ) γ u γ u γ u γ u γ u γ u γ u γ u γ e + γ ( e ) + γ e γ ( e ) γ + γ e + γ ( e ) γ + γ e γ ( e ) γ u = γ u ( + e ) ( + e ) γ u γ u γ u ( + e ) ( e ) ( + e ) γ γ u = γ ( γ u = γ γ ( + e ) ( e ) ) ( tanh( u) ) = And for the unpolar sgmod functon (usg) dψ( u) = du γ + e γ u γ u γ u γ u γ e + e + e γ u = γ γ u = γ ( u u ) = ( + e ) ( + e ) ( γ + e ) ( γ + e ) + e γ u = γ Ψ( u) ( Ψ( u)) Page

11 Master Thess MEE 97-4 /5/98 3. Neural Network Models 3. Dfferent models There are dfferent type of Neural Network models for dfferent tasks. The way they work and the task they solve dffers wdely, but they also share some common features. Generally Neural Networks conssts of smple processng unts nterconnected n parallel. These connectons dvde the set of Networks nto three large categores:. Feedforward networks These networks can be compared wth conventonal FIR-flters wthout adaptaton of the weghts. The dfferens s that they can perform a non-lnear mappng of nput sgnals onto the target set. There are as for the FIR-flters no stablty problems.. Feedback networks These Networks are non-lnear dynamc systems and can be compared wth IIR-flters. The Elman Network and the Hopfeld Network are both feedback Networks used n practce. 3. Cellular networks These Networks has more complex nterconnectons. Every neuron s nterconnected wth ts neghbors and the neurons can be organzed n two or three dmensons. A change n one neurons state wll affect all the others. Often used tranng methods are smulated annealng schedule (whch s a pure stochastc algorthm) or mean feld theory (whch s a determnstc algorthm). In supervsed learnng the backpropagaton learnng can be used n order to fnd the weghts that best (accordng to an error norm) performs a specfc task. Ths method wll be descrbed n detal n secton 4. for a feedforward Neural Network. In unsupervsed learnng, where no desred target s present, the task to be performed s often to reduce all components n the n-sgnal that are correlated (here not only lnear correlaton s meant, as often s the case). Ths can for nstance be done when mnmzng the followng α T E( w) = w σ( w x) where σ(u) s the loss functon, typcally σ(u) = u Ths s called the potental learnng rule. The frst term on the rght sde s usually called the leaky effect. If ths unsupervsed approach s adopted, the algorthm descrbed above wll only change wth the followng w(k+) = ( α ) w(k) + µ u~ (k) Ψ'( u ~ (k)) x(k) Here the error e s replaced wth the nternal state u and a small constant α s added to prevent the weghts to grow beyond lmts. Sometmes the constrant on w s chosen to fxed length at unty, that s w =. In ths case one wsh to mnmze the absolute value of the loss functon. Observe that ths s a local update rule, that s, no nformaton has to be passed to other nterconnected neurons. 3. Feedforward Multlayer Neural Network A feedforward neural network can consst of an arbtrary number of layers but n practce there wll be only one, two or three layers. It has been shown theoretcally that t s suffcent to use a maxmum of three layers to solve an arbtrarly complex pattern classfcaton problem []. The layers are ordered n seral and the last layer s called the output layer. The precedng layers are Page

12 Master Thess MEE 97-4 /5/98 called hdden layers wth ndces startng at one n the layer closest to the nput vector. Sometmes the nput vector s called the nput layer. Here a three-layer perceptron wll be regarded. In every layer there can be neurons wth any transfer functon and wth or wthout a bas. The number of neurons n hdden layers are determned by the complexty of the problem. The number of neurons n the output layer are determned by the type of problem to be solved because ths s the number of the output sgnals from the network. In ths approach the type of transfer functons used are the lnear, bsg and usg ntermxed arbtrarly n each layer. All neurons wll have a bas n order to mprove the generalzaton. If usng the compact notaton (as n secton.) y = Ψ( w x ) n = where w = Θ and x = and arrangng the number of neurons as a column, the processng of the whole layer can be descrbed n matrx notaton. Defnng a matrx W [ ] for the frst layer, where the :th row contans the weghts for the :th neuron, and the transfer functon Ψ [] as a multvarable functon defned as Ψ [] n : R R n and Ψ [ ] [ Ψ [ ] Ψ [ ]... Ψ [ ] T = n ] The components n Ψ [] s arranged accordng to the chosen transfer functons. The frst layers processng can then be descrbed as ( ) o [ ] = Ψ [ ] W [ ] x, where o [ ] s the frst layers output (see fg 3.) In smlar way the other layers can be descrbed wth nput sgnals for each layer as the output values of the precedng layer. The total processng of the feedforward network wll be ( ( ( ) ( ) [ 3 ] [ 3 ] [ ] [ ] [ ] ( [ ] ) y = Ψ x = Ψ W Ψ W Ψ W x Fgure 3. shows the confguraton of a three-layer feedforward neural network. Fg 3. A Feedforward Multlayer Neural Network (3-layer), [7] The number of neurons n layer, and 3 wll be denoted n, n and n 3 respectvely. Page

13 Master Thess MEE 97-4 /5/98 Here all neurons n every layer are connected wth all feedng sources (nputs). Ths approach s called a fully connected neural network. If some of the connectons are removed the network becomes a reduced connecton network. The way n whch one chose the connecton n the later network are not n any way trval. The am s to mantan the maxmum of nformatonflow through the network, that s, only to reduce connectons that are redundant. Ths of course vares wth the applcatons. In speech enhancement systems for example one could try to make use of the fact that speech s qute correlated. Ths could perhaps make t possble to reduce some of the connectons wthout any maor loss n performance. In ths thess a general approach s made wth a fully connected network. Page 3

14 Master Thess MEE 97-4 /5/98 4. Steepest Descent Backpropagaton Learnng Algorthm 4. Learnng of a Sngle Perceptron The dervng of the learnng rule for the sngle perceptron wll be done wth bas and wth a general transfer functon. The perceptron wll be denoted wth ndex for straghtforward ncorporaton later nto the network, see fgure 4.. Consder a general transfer functon Ψ( ) wth gradent Ψ ( ). n u = w x =, w = Θ, x = and y = Ψ( u ) We wsh to mnmze the nstantaneous squared error of the output sgnal J = e ( t) = ( d y ) The steepest descent approach gves the followng dfferental equatons dw J J e J e u = η = η = η dt w e w e u w snce J e e u u w = e Ψ( u ) = u = x the update rule becomes k dw dt = η e Ψ ' ( u ) x If defnng a learnng sgnal δ as δ e u = Ψ ' ( ) we can wrte the dscrete vector update form as w (k + ) = w (k) + µ δ x(k) where k s the teraton ndex and µ s the learnng parameter. Ths last formula s represented by the box called Adaptve algorthm n fgure 4.. Page 4

15 Master Thess MEE 97-4 /5/98 Fg 4. Learnng of a sngle perceptron, [7] If the actvaton functon bsg s used as derved n secton.5 the update rule becomes w (k + ) = w (k) + µ γ e (k) ( y (k) ) x(k) and for the usg actvaton functon the update rule becomes (see secton.5) w (k + ) = w (k) + µ γ e (k) y (k) ( y (k)) x(k) For the pure lnear functon the rule becomes the well known LMS-algorthm (secton.3) w (k + ) = w (k) + µ γ e (k) x(k) As stated before the slope γn LMS s often chosen to unty snce t only affects the step-sze of the rule. 4. Learnng of a Multlayer Perceptron, MLP Here a 3-layer perceptron wll be derved where fewer layers can be used. The network structure s accordng to fg 4.. The structure s bult up wth several sngle layer perceptrons where the learnng sgnals s modfed accordng to rules descrbed below. In ths fgure the bases are left out. The extenson to more layers s straghtforward. The network behavor should be determned on the bass of nput/output vector pars where the sze of these vectors can be chosen n any way. The network wll have the same number of neurons n the thrd layer as the number of desred sgnals. A set of nput/output pars wll be used to tran the network. The presented par wll be denoted wth ndex k. Each learnng par s composed of n nput sgnals x ( =,,..., n ) and n 3 correspondng desred output sgnals d ( =,,..., n3 ). In the frst hdden layer there wll be n neurons and n the second hdden layer there wll be n neurons. The number of neurons n hdden layers can be chosen arbtrarly and ths represents the complexty of the problem. It s a delcate task to chose ths complexty for specfc applcatons and often a tral and error approach s used. There are some ways to automatcally ncrease or decrease the number of neurons n hdden layers durng learnng. Ths approach s Page 5

16 Master Thess MEE 97-4 /5/98 called prunng. Sometmes ths s done by lookng at each neurons output state durng learnng cycles and f those are smlar to an other neurons state (or wth opposte sgn) then one of them could be removed. Some levels for how much they can vary s proposed by []. Too many neurons n hdden layers decreases the generalzaton of the network and too few neurons wll not solve the task. The learnng of the MLP conssts n adustng all the weghts n order to mnmze the error measure between the desred and the actual outcome. A ntalzng set of weghts must be chosen. Fg 4. Learnng of a Multple Layer Perceptron, MLP (3-layer), [7] If multple outputs s chosen then the sum of all errors s to be mnmzed for all nput/outputpars, that s n3 n3 mn{ Jk}, Jk = ( d k y k ) = ( e k ), k = = Here the L -norm s used, n secton 5. other error measures s dscussed. Ths functon s called the local error functon snce ths error s to regard of the nstantaneous error for the k:th nput/output par. The global error functon (also called the performance functon) s the sum of the above functon over all tranng data. If the task to be solved s statonery the later functon s to be mnmzed, but on the other hand, f an adaptaton durng learnng patterns s desred then the local error functon must be mnmzed. Ths can only be done n an on-lne update system, whch s of concern here. An dscusson of ths topc wll be presented n secton 6. Page 6

17 Master Thess MEE 97-4 /5/98 When mnmzng the local error functon the global error functon s not always mnmzed. It has been proved that, f the learnng parameter (step sze) s suffcently small, the local mnmzaton wll lead to a global mnmzaton [3]. When dervng the backpropagaton algorthm, the dfferental equatons wll be ntalzed by the gradent method for mnmzaton of the local error functon, that s [ s] dw Jk = η, η > dt w (Here ndex [s] stands for layer s, stands for neuron and stands for the :th weght, k s the k:th nput/output pattern) Frst the update rules for the output layer wll be derved (s=3). Usng the steepest descent approach gves snce dw dt Jk J = η = η w u k n3 n3 [ ] [ ] [ ] [ ] = = u = w x = w o u w [ where o ] s the nput vector to layer 3, the same as the output vector from layer, see fg 4. and compare t wth the sngle perceptron. Ths becomes dw J J dt w u o J e e u o e e k k [ ] k k [ ] k u o [ ] = η e = η = η Ψ = η k = η k u [ If we defne the local error δ 3 ] as 3 δ [ ] J Ψ k = = e k ( Ψ )' = ( d k y k ) u u ths can be wrtten as dw [ ] = η δ o dt and the dscrete vector update rule becomes k o [ ] Ψ [ 3 ] w (k + ) = w (k) + µ e (k) o u (k) [ ] (k) [Output layer, neuron ] where ndex k stands for the k:th teraton of nput/output patterns and ndex s the :th neuron n output layer. Here the stepsze µ dffers from the learnng parameter η n the contnuos case, generally t must be smaller to ensure convergence. The gradent above s dependng on the transfer functon chosen. In ths approach the transfer functon can be ntermxed n any order as mentoned earler, see secton.5. Page 7

18 Master Thess MEE 97-4 /5/98 For second hdden layer the error s not drectly reachable so the dervatves must be taken wth regard to quanttes already calculated and other that can be evaluated. Ths stll hold [ ] [ ] dw J J u k k [ ] [ ] = η [ ] = η [ ] [ ] = η δ o dt w u w [ The dfferens from output layer wll be n the local error δ ]. As before the local error s defned as [ ] Jk δ = [ ] u and ths gves (usng the chan rule) [ ] J o [ ] k δ = [ ] [ ] o u snce we have o = ψ ( u ) [ ] [ ] [ ] δ [ ] J = o k [ ] Ψ u [ ] [ ] The second term on the rght s the dervatve of the transfer functons used for the :th neuron. Jk The reason why ths algorthm s called backpropagaton s seen when calculatng [ ] o snce nformaton from the output layer update s used here to update the second hdden layer. n3 n3 n 3 n 3 = = = δ n3 n3 Jk Jk u Jk [ ] 3 3 w o w o [ ] [ ] [ ] p p [ ] p p = δ w o = u o = u o p= = o p= = Ths gves the local error for the second hdden layer as [ ] n Ψ 3 [ ] δ = [ ] δ w u = In vector notaton ths becomes Ψ [ ] 3 3 ( δ w ) w (k + ) = w (k) + µ [ ] o u (k) n 3 = [ ] [ ] [ ] Page 8 (k) [Second hdden layer, neuron ] [ ] [ ] [ ] [ Here w 3 ] means neuron, weght, n output layer. The nformaton, that are backpropagated, are the local errors and the updated weghts from layer three. In smlar way the update rule for frst hdden layer s derved, now wth regard to second hdden layer local error and weghts. Ths gves the vector update rule for frst hdden layer as Ψ [ ] [ ] [ ] ( δ w ) w (k + ) = w (k) + µ [ ] x(k) u (k) n = [Frst hdden layer, neuron ] Here w [ ] means neuron, weght, n second hdden layer. The extenson to more hdden layers s straghtforward but not dealt wth here.

19 Master Thess MEE 97-4 /5/98 5. Addtonal features for mprovng Convergence speed and Generalzaton 5. Algorthm wth Momentum Updatng The standard backpropagaton have some drawbacks, the learnng parameter (stepsze n the dscrete case) should be chosen small n order to provde mnmzaton of the global error functon, but a small learnng parameter decreases the learnng process. In the dscrete case the stepsze must be kept small also to ensure convergence. A large learnng parameter s desrable for rapd convergence and for mnmzng the rsk of gettng trapped n local mnma or very flat plateaus n the error surface. One way to mprove the backpropagaton algorthm s to smooth the weghts by overrelaxaton. Ths s done by addng a fracton of the prevous weght update to the actual weght update. The fracton s called the momentum term and the update rule can be modfed to [ s] [ s] [ s ] [ s] w ( k) = o η δ + α w ( k ), (s =,, 3) where α s a parameter whch controls the amount of momentum ( α < ). Ths s done for each layer [s] separately. The momentum concept wll ncrease the speed of convergence and at the same tme mprove the steady state performance of the algorthm. If we are n a plateau of the error surface, where the gradent s approxmately the same for two consecutve steps, the effectve step sze wll be η ηeff = ( α) snce [ s] Jk [ s] η Jk w ( k) = η [ s] + α w ( k ) [ s] w ( k) ( α) w ( k) If we are n a local- or a global mnma the momentum term wll have opposte sgn to the local error and thus decrease the effectve step sze. The result wll be that learnng rate s ncreased wthout magnfyng the parastc oscllatons at mnmas. 5. Algorthm wth Non-Eucldean Error Sgnals So far the optmzaton crteron has been to mnmze the local/global error surface on the bass of the least-square error, the Eucldean or the L -norm. When the output layer s large, and/or the nput sgnals are contamnated wth non- Gaussan nose (especally wld spky nose), other error measures s needed n order to mprove the learnng of the network. In ths general approach the error measure wll be derved for the L -norm and a contnuos selecton of norms between the Eucldean and the L -norm. The norms wll be an approxmaton wth mnor dfferens to the true norms. Consder the performance (error) functon defned as J k n3 = σ( e k ) =, where e k = d k y k as before. Page 9

20 Master Thess MEE 97-4 /5/98 n 3 s the number of neurons n output layer and σ s a functon (typcally convex) called the loss functon. The case when σ = e the L -norm s obtaned and when σ = e the L -norm s obtaned. There are many dfferent loss functons proposed for a varety of applcatons. A general loss functon whch easly gves freedom n choosng the shape of the functon s the logstc functon e σ( e) = β ln cosh( ) β Fgure 5. shows the logstc functon for dfferent selectons of β. 4 Loss functons for beta varyng between and Fg 5. The Logstc functon for β =.,.3,.7, 5. and When β s close to one the functon approxmates the absolute ( L ) norm and when β s large t approxmates the Eucldean ( L ) norm. The dervatve for ths loss functon must be calculated n order to employ t nto the network. J J e k k k Jk = =, e k = d k y k, ( =,, n y e y e 3 ) k k k k Introducng: h(g) = ln(g) h g = g g(y) = cosh(y) g y e = y e y = snh( y) y(e) = e/β usng the chan rule y e = β Page

21 Master Thess MEE 97-4 /5/98 σ β e/ β e/ β e/ β e/ β Jk k h g y e e β β β β e e = = = = e/ β e/ β = e e g y e cosh( e / ) e + e k k k e/ β e/ β e β e e e e β e β e β = β / / / e/ β = β tanh( ) e + e + e β Now usng the backpropagaton algorthm wth ths logstc loss functon gves dw J J y J y u k k k = µ = µ = µ dt w y w y u w J e k k snce = β tanh( ) y β and y u k = Ψ u the dscrete update rule for output layer becomes [ ] e (k) Ψ 3 w (k + ) = w (k) + µ β tanh( ) o β u [ ] (k) [Output layer, neuron, Logstc loss functon] For the hdden layers the update rule wll be dentcal as wth the Eucldean error norm. The effect of ths loss functon wll be backpropagated to the nner of the network. 5.3 Algorthm wth an Adaptaton of the Slope of the Actvaton Functons So far the slope of the actvaton functons has been fxed (typcally γ= ). Kruschke and Movellan [4] has shown that an adaptaton of the slope (gan) for all actvaton functons greatly ncreases the learnng speed and mprove the generalzaton. In ths approach the slopes of bsg and usg (see.5) are modfed when an adaptaton of the slope s appled. No adaptaton are used to the pure lnear transfer functons, because ths could lead to nstablty. When the slope s modfed for each presented nput/output pattern ths gves the followng for each neuron [] n each layer [s] o s s s = ψ ( γ u ) where Ψ s bsg or usg as n sec..5 [ s] [ ] [ ] [ ] Applyng the steepest descent approach when mnmzng the nstantaneous error wth regard to the slope γgves s dγ dt [ ] = Jk s γ [ ] s dγ dt [ ] s s Jk Jk y u η s s s s γ η [ ] η δ = [ ] = [ ] [ ] = [ ] y γ γ [ ] [ s] Page

22 Master Thess MEE 97-4 /5/98 Here the local error for the synaptc weght update has been used snce the only dfference between [ s] y [ s] and [ s] y [ s] s that the slope γhere s treated as the varable and the nner state u γ u as a constant. The dscrete form for ths update wth general loss functon wll be σ Ψ γ (k ) γ (k) µ γ e (k) u (k) u (k) + = + γ (k) [Output layer, neuron, General loss functon] 3 3 ( δ p w p ) [ ] n3 Ψ [ ] [ ] [ ] [ ] [ ] γ (k + ) = γ (k) + µ γ [ ] u (k) [ ] u (k) γ (k) p= [Second hdden layer, neuron ] ( δ p w p ) [ ] n Ψ [ ] [ ] [ ] [ ] [ ] γ (k + ) = γ (k) + µ γ [ ] u (k) [ ] u (k) γ (k) p= [Frst hdden layer, neuron ] The local errors are those already calculated n the update for the synaptc weghts whch makes ths extenson cheap n regard to extra computaton needed. 5.4 Adaptaton of the learnng rate and Mxng of nput patterns Besdes the addton of momentum term there s another way to accomplsh rapd convergence whle keepng the oscllatons small n error surface mnma. An adaptaton of the learnng rate (step sze) wll stand up for those crtera. In a batch update system t s easer to keep an effectve magntude of the learnng rate snce there are more nformaton about the learnng n the global error functon then there s n the local error functon. More dfferences between the two wll be dscussed n secton 6. In an on-lne update system there s a smple rule whch can control the magntude of the learnng parameter. The update for the weghts wll be as before but wth varable learnng parameter ( k ) Jk w ( k) = η w wth η ( k ) Jk J = mn, Jk η max where η max s the maxmum learnng rate and J s a small offset error. These values wll dffer wth the applcaton. It can be understood ntutvely that when beng on the error surface at a large level the nomnator n the expresson wll be large and therefore the step sze wll be large. The norm of the gradent n the denomnator wll be small when beng near a mnma and therefore ncreasng the magntude. If on the other hand we are at a low level n the error surface the expresson wll set the step sze to a lower value. It s crtcal that the offset error s set to what s beleved to be the smallest error possble to get. Thus, we need some apror knowledge about the task to be solved. Ths smple scheme has not been very successful n my Page

23 Master Thess MEE 97-4 /5/98 emprcal materal so some modfcaton would be necessary. In ths thess no addtonal effort has been done to solve ths ssue, see further development sec. 8. The learnng parameter can be updated unquely for every neuron and some work has been done n ths feld [5]. If tryng to accomplsh ths n a on-lne update system, there wll be a great rsk for unstablty. Input patterns that are presented n some knd of order can be of dsturbance n the learnng process f no adaptaton durng epochs s desred. Ths follows from the fact that the nstantaneous error s mnmzed and not the global error. The algorthm can sometmes perform better when mnmzng the local error rather than the global one. But f the applcaton demands that the adaptaton s shut off durng use, ths wll lead to a problem. The weghts that are gven n the last update n the learnng phase wll then perform bad when facng the problem of mnmzng the total error. Ths problem arses only n on-lne update snce the mnmzaton n a batch system s always done on the global error. The soluton s smply to present the nput/output pattern n a random order where the order s changed for every epoch. In secton 7. an example whch ponts ths out s presented. 5.5 A combned stochastc-determnstc weght update When optmzng any functon by followng the traectores of dfferental equatons the global mnma s not lkely to be found. By nvestgatng neural cells of lvng creatures t has been found that they are essentally stochastc n nature. Ths can be employed n an artfcal envronment by ntroducng a stochastc term n the functon whch s to be mnmzed. Consderng a general functon J( w) and mnmzng ths wth a stochastc term added gves the followng problem mn( J ~ ( w)) = mn( J( w) + c( t) w N ( t)) w n = where n s the number of varables n vector w. N s a vector of ndependent random varables (deally whte nose) and c(t) s a parameter controllng the magntude of the nose. C(t) should reach zero as tmes goes to nfnty. In ths approach c(t) has been chosen as a exponentally decreasng functon c( t) = e α β t Where β and α are chosen to ft the applcaton. In ths thess the startng value β s chosen as a absolute value and the rate of nose dampng α s adusted as a relatve tme n whch the magntude of the functon c(t) s decreased wth 9 percent at the end of all tranng presentatons. The resultng dfferental equaton becomes dw Jk ( w ) = + c t N t dt ( ) ( ) w In an on-lne update system there s naturally a stochastc part n the update quanttes snce no averagng s done durng pattern presentatons, so the magntude of c(t) should be chosen wth care, n order to avod dvergence. Page 3

24 Master Thess MEE 97-4 /5/98 6. Comparson of a Batch update system wth an On-Lne Update system There are some dfferences between a batch system and an on-lne system whch must be outlned. They have both some advantages and of course some dsadvantages. Here some key dfferences wll be presented. The on-lne approach has to be used f all tranng examples are not avalable before the learnng starts and a adaptaton to the actual (on-lne) stream of nput/output patterns s desred The on-lne learnng algorthm s usually more effcent wth regard to computatonal and memory requrements when the number of tranng patterns s large The on-lne procedure ntroduces some randomness (nose) that often may help n escapng from local mnma Usually the on-lne algorthm s faster and more effcent than the batch procedure for largescale classfcaton problems, ths follows from the fact that usually many tranng examples possess redundant nformaton and gves approxmately the same gradent contrbutons and the watng to update the weghts after a whole epoch s wastng tme For hgh precson mappng the batch procedure can perform better results snce more sophstcated methods for optmzaton can be used (for nstance Newton s method) Page 4

25 Master Thess MEE 97-4 /5/98 7. Emprcal Results 7. Approxmaton of sn(x) wth two neurons n hdden layer In ths frst test the nput sgnal wll be real value scalars members of the set X [ p]. The desred (or target) sgnal wll be sn(x). The nner structure of the network s Number of layers: Hdden neurons: bpolar sgmod functons Output neuron: lnear functon (slope of unty) Bases: Present for all neurons Momentum term: Not used Error measure: Eucldean ( L -norm) Adaptve slope: Used, step sze.5 Adaptaton of learnng rate: Used, offset error. Stochastc term: Not used Mxng of nputs: Used, random order for every epoch Number of nput: equally dstrbuted n the set X Number of epochs: Results: Resultng slopes: bsg.3956 bsg.945 SSE (Sum Squared Error) for last epoch:.8 Instantaneous error for last epoch: Fgure 7.. shows the absolute error accordng to the L -norm for all nput/output pars.. x -4 Instantaneous error for last epoch Fg 7.. The nstantaneous error for last epoch n example 7. The match between the actual outcome and the true value sn(x) s shown n fgure 7.. Page 5

26 Master Thess MEE 97-4 /5/ Actual outcome and true values for example Fg 7.. The actual outcome and true value sn(x) n example 7. It s crtcal n ths example to mx the nputs n order to fnd the soluton that mnmzes the global error functon. The nputs are otherwse so well ordered that t s easer for the network to approxmate a straght lne and then update the slope of ths lne durng epochs. The match n ths case can be very good (dependng on the magntude of the stepsze). In ths problem the match can probably not be better than the one whch mnmzes the global error. Ths follows because the algorthm probably has enough degrees of freedom to exactly mtate the trgonometrc functon sn(x). I say probably because the eventually best result for a Neural Network can not easly be proven. Ths s n fact one of the most mportant drawbacks of Neural Network and that s why they n some applcatons are of no nterest. 7. Separatons of sgnals wth nose In ths example the nput sgnal consst of three smple sgnals added together and then a nose component s also added. The task here s to separate these three sgnals from the sum. In fgure 7.. the frst plot shows the three components, the second plot shows the nput sgnal. The thrd plot shows the separaton after tranng the Network. Number of layers: Hdden neurons: 5 bpolar sgmod functons Output neurons: 3 lnear functons (slope of unty) Flter length: samples Bases: Present for all neurons Momentum term: Not used Error measure: Eucldean ( L -norm) Adaptve slope: Used, step sze. Adaptaton of learnng rate: Used, offset error 3*. Stochastc term: Not used Mxng of nputs: Used, random order for every epoch Number of nput patterns: 6 Number of epochs: 4 Results: Resultng slopes: bsg 6.46 bsg.4587 bsg 3.48 Page 6

27 Master Thess MEE 97-4 /5/98 bsg bsg SSE (Sum Squared Error) for last epoch: Fg 7.. The true components, nput sgnal and the resultng separaton Ths example shows that even sgnals wth abrupt drops and rsng can be separated. Here the hdden layer must provde satsfactory nformaton for all three output neurons smultaneously. Studyng the resultng slopes one can realze that ths soluton could not be accomplshed wth slopes fxed at unty, whch s usually the case. Of course ths may not be the best soluton, but slopes at hgh magntude gves the neuron a shape lke the hardlmter and ths s well ftted for sgnals wth abrupt drops an rses (as the square sgnal). 7.3 Processng of EEG-sgnals Ths example s taken from real data sampled at Unversty of Lnköpng. The nputs are two EEG sgnals samples from a person whle the person s closng and openng hs/her eyes at prespecfed tme frames (n ths example second frames). The am here s, by lookng at only the EEG sgnals, to determne whether or not the person has hs eyes opened. The true logcal value (lookng/not lookng) from one patent s used to tran the Network and nput sgnals from another patent s used to valdate the result. The complexty of ths problem s unknown. The EEG sgnals were sampled at 8 Hz sample rate. The structure chosen s Number of layers: Hdden neurons: 3 bpolar sgmod functons Output neuron: lnear functon (slope of unty) Flter length: Bases: Momentum term: samples Present for all neurons Not used Page 7

28 Master Thess MEE 97-4 /5/98 Error measure: Eucldean ( L -norm) Adaptve slope: Used, step sze. Adaptaton of learnng rate: Used, offset error. Stochastc term: Not used Mxng of nputs: Used, random order for every epoch Number of nput patterns: 8 Number of epochs: 4 Results: Resultng slopes: bsg.569 bsg.949 bsg The nput EEG-sgnals are shown n fgure The nformaton n the nput sgnals s qute obvous and the classfcaton problem s not too complex. 5 The raw EEG-sgnal nput Fg 7.3. The EEG sgnals whch s fed to the network.35 The Neural Network output for EEG sgnals Fg 7.3. The Neural Network output for EEG sgnals and desred state It can be seen from fgure 7.3. that some spky effects occurred. Ths probably comes from the fact that the nput sgnal s spread wdely. Ths can be reduced by processng the output of the neural network wth a medan flter as shown n fgure Page 8

29 Master Thess MEE 97-4 /5/ The NN output fltered n a medanflter (N=) Fg The Neural Network output of ex. 7.3, fltered wth a medan flter of length, and desred state The output does not follow the desred exactly and ths s probably due to the fact that the desred sgnal dffer from the real events. Ths dfferens could be caused by the dffculty of exact measurng whether or not the person has hs eyes opened. 7.4 Echo-cancelng n a Hybrd In a telephone system hybrds s used to separate speech n one drecton from speech n the other drecton. Ths s done so amplfers can be used to compensate for losses n wrng. These hybrds can be regarded as converters from a wre system to a 4 wre system, n whch the drectons are separated. Now, unfortunately, the separaton s not perfect and some of the speech n one drecton s nduced to the other. Ths phenomena wll be apprehended as a echo from the user pont of vew. Here the am s to use a neural network for deletng ths nducton (echo) by subtractng the echo from the channel. The tranng has been done on whte nose as nput and the output from the hybrd as the desred sgnal. Fgure 7.4. shows the unprocessed sgnal and the dfferens between the unprocessed and the processed sgnal. Approxmately 9 db dampng of the echo s acheved. True speech as nput wll probably gve better result snce the character of speech s mostly at low frequency. Low frequency parts are here damped more than hgher frequency. Number of layers: Hdden neurons: 5 bpolar sgmod functons Output neuron: lnear functon (slope of unty) Flter length: samples Bases: Present for all neurons Momentum term: Not used Error measure: Eucldean ( L -norm) Adaptve slope: Used, step sze.3 Adaptaton of learnng rate: Used, offset error. Stochastc term: Not used Mxng of nputs: Used, random order for every epoch Number of nput patterns: Number of epochs: 5 Page 9

30 Master Thess MEE 97-4 /5/98 Dampng of Hybrd echo wth whte nose as nput Fg 7.4. Dampng of Hybrd echo [db] x Speech enhancement for Hands-Free moble telephone set In a hands-free moble telephone set used n a car many sources of dsturbance are present. The speech comng from the hands-free speaker wll be heard n the hands-free mcrophones. Ths wll be apprehended as a echo by the user n the far end of the communcaton. Nose from the engne and wnd-frcton wll cause degraded recognton of the speech comng from the person usng the set. An approach to solve ths problem and to enhance the speech s done by [6]. The soluton s due to the fact that hearng can be drectonal by usng a mcrophone array. Ths gves, the algorthm used, foundaton to solve the problem. In [6] lnear methods has been used. In ths thess the sampled data and the proposed structure by [6] wll be used. The dfference here s that the lnear model s replaced by a neural network. The structure of the chosen network s Number of layers: Hdden neurons: bpolar sgmod functons Output neuron: lnear functon (slope of unty) Flter length: 56 samples Bases: Present for all neurons Momentum term: Not used Error measure: Eucldean ( L -norm) Adaptve slope: Used/Not used, step sze. Adaptaton of learnng rate: Used, offset error. Stochastc term: Not used Mxng of nputs: Used, random order for every epoch Number of nput patterns: 48*6 (sx mcrophones n the array) Number of epochs: 3 Results: SSE (Sum Squared Error) for last epoch: wth no adaptaton of slopes 3.8 wth adaptaton of slopes Page 3

31 Master Thess MEE 97-4 /5/98 Fgure 7.5. shows the sum squared error (SSE) versus tranng epochs wthout any adaptaton of the slopes. Fgure 7.5. shows the SSE versus tranng epochs wth adaptaton of the slopes. 3 SSE versus epochs wthout adaptve slopes Fg 7.5. SSE versus epochs wthout adaptve slopes 3 SSE versus epoches for adaptve slopes Fg 7.5. SSE versus epochs wth adaptve slopes There are no maor dfferences between the network wth adaptaton and wthout adaptaton of the slopes. Examnng the resultng slopes gves that they vary between. and.. So, slopes fxed at unty are probably good values. Fgure shows the unprocessed sgnal n the frst half of the plot and the enhanced sgnal n the second half. The unprocessed sgnal conssts of true speech sgnal n the begnnng and the echo n the end. The sgnal has also been degraded wth nose. For detaled descrpton of the prelmnares see [6]. Fgure shows the same as fgure but wth slope adaptaton. 5 Traned wth no adaptaton of the slopes x 4 Fg Results usng no adaptaton of the slopes [db] Page 3

32 Master Thess MEE 97-4 /5/98 5 Traned wth adaptaton of the slopes Fg Results usng adaptaton of the slopes [db] Comparson wth lnear system: The dampng s approxmately 5 db for nose and about 6 db for echo. It should be mentoned that further tranng would probably gve better results. In the results of [6] the dampng of both nose and echo s less than ths and the dstorton of the speech sgnal s qute notceable. When lstenng to ths resultng sgnal no actual dstorton s notced. Ths s a maor advantage of usng neural networks, when compared to the lnear algorthm used n [6]. The network task here has been to solve both the nose and the echo problem smultaneously. If only one of these tasks s facng the network, better results can be expected, see [6]. Page 3

33 Master Thess MEE 97-4 /5/98 8. Further Development The network adopted n ths thess can be mproved n some ways. The adaptaton of the stepsze descrbed n secton 5.4 has proven to be nadequate for keepng the oscllatons of the weghts small n error surface mnma. Ths may be due to the naturally arsng stochastc nature n an on-lne update system. The norm of the gradent follows ths stochastc nature and wll therefore mantan the oscllatons rather than dampng them. Some knd of flterng of the nstantaneous error and/or the gradent norm could perhaps gve update of the step sze a damped nature. Of course only past experence can be used n ths flterng. To choose the offset bas for ths step sze update s a delcate task. Ths bas would probably gve better result f decreased gradually durng learnng. Some ntal value must stll be chosen, but ths task s not so crtcal. The effect of the combned stochastc-determnstc weght update descrbed n secton 5.5 s almost neglectable n an on-lne update system due to the already stochastc nature. The scheme descrbed n sec. 5.5 can be modfed by replacng the functon c(t) wth an annealng schedule. The dea s to add some statstcal nformaton from the task and approprately adust c(t). The concept s taken from the feld of mechancs. Ths approach, of course, wll gve unque solutons dependng on the applcaton. The addng and deleton of neurons durng learnng (prunng), as descrbed n secton 4., can be adopted n order to mnmze the needed apror knowledge of the task to be solved. Page 33

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

IV. Performance Optimization

IV. Performance Optimization IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

CHAPTER III Neural Networks as Associative Memory

CHAPTER III Neural Networks as Associative Memory CHAPTER III Neural Networs as Assocatve Memory Introducton One of the prmary functons of the bran s assocatve memory. We assocate the faces wth names, letters wth sounds, or we can recognze the people

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

MMA and GCMMA two methods for nonlinear optimization

MMA and GCMMA two methods for nonlinear optimization MMA and GCMMA two methods for nonlnear optmzaton Krster Svanberg Optmzaton and Systems Theory, KTH, Stockholm, Sweden. krlle@math.kth.se Ths note descrbes the algorthms used n the author s 2007 mplementatons

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations Physcs 171/271 -Davd Klenfeld - Fall 2005 (revsed Wnter 2011) 1 Dervaton of Rate Equatons from Sngle-Cell Conductance (Hodgkn-Huxley-lke) Equatons We consder a network of many neurons, each of whch obeys

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester 0/25/6 Admn Assgnment 7 Class /22 Schedule for the rest of the semester NEURAL NETWORKS Davd Kauchak CS58 Fall 206 Perceptron learnng algorthm Our Nervous System repeat untl convergence (or for some #

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

Multilayer neural networks

Multilayer neural networks Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205 Multlayer

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

Neural Networks & Learning

Neural Networks & Learning Neural Netorks & Learnng. Introducton The basc prelmnares nvolved n the Artfcal Neural Netorks (ANN) are descrbed n secton. An Artfcal Neural Netorks (ANN) s an nformaton-processng paradgm that nspred

More information

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1 P. Guterrez Physcs 5153 Classcal Mechancs D Alembert s Prncple and The Lagrangan 1 Introducton The prncple of vrtual work provdes a method of solvng problems of statc equlbrum wthout havng to consder the

More information

Report on Image warping

Report on Image warping Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.

More information

Introduction to the Introduction to Artificial Neural Network

Introduction to the Introduction to Artificial Neural Network Introducton to the Introducton to Artfcal Neural Netork Vuong Le th Hao Tang s sldes Part of the content of the sldes are from the Internet (possbly th modfcatons). The lecturer does not clam any onershp

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Lecture 23: Artificial neural networks

Lecture 23: Artificial neural networks Lecture 23: Artfcal neural networks Broad feld that has developed over the past 20 to 30 years Confluence of statstcal mechancs, appled math, bology and computers Orgnal motvaton: mathematcal modelng of

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 where (x 1,..., x N ) X N, N s called the populaton sze, f(x) f (x) for at least one {1, 2,..., N}, and those dfferent from f(x) are called the tral dstrbutons n terms of mportance samplng. Dfferent ways

More information

Multigradient for Neural Networks for Equalizers 1

Multigradient for Neural Networks for Equalizers 1 Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT

More information

ECE559VV Project Report

ECE559VV Project Report ECE559VV Project Report (Supplementary Notes Loc Xuan Bu I. MAX SUM-RATE SCHEDULING: THE UPLINK CASE We have seen (n the presentaton that, for downlnk (broadcast channels, the strategy maxmzng the sum-rate

More information

Solving Nonlinear Differential Equations by a Neural Network Method

Solving Nonlinear Differential Equations by a Neural Network Method Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

9 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

9 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations Physcs 171/271 - Chapter 9R -Davd Klenfeld - Fall 2005 9 Dervaton of Rate Equatons from Sngle-Cell Conductance (Hodgkn-Huxley-lke) Equatons We consder a network of many neurons, each of whch obeys a set

More information

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1]

Outline. Communication. Bellman Ford Algorithm. Bellman Ford Example. Bellman Ford Shortest Path [1] DYNAMIC SHORTEST PATH SEARCH AND SYNCHRONIZED TASK SWITCHING Jay Wagenpfel, Adran Trachte 2 Outlne Shortest Communcaton Path Searchng Bellmann Ford algorthm Algorthm for dynamc case Modfcatons to our algorthm

More information

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM An elastc wave s a deformaton of the body that travels throughout the body n all drectons. We can examne the deformaton over a perod of tme by fxng our look

More information

8 Derivation of Network Rate Equations from Single- Cell Conductance Equations

8 Derivation of Network Rate Equations from Single- Cell Conductance Equations Physcs 178/278 - Davd Klenfeld - Wnter 2015 8 Dervaton of Network Rate Equatons from Sngle- Cell Conductance Equatons We consder a network of many neurons, each of whch obeys a set of conductancebased,

More information

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS IJRRAS 8 (3 September 011 www.arpapress.com/volumes/vol8issue3/ijrras_8_3_08.pdf NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS H.O. Bakodah Dept. of Mathematc

More information

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD CHALMERS, GÖTEBORGS UNIVERSITET SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 35, FIM 72 GU, PhD Tme: Place: Teachers: Allowed materal: Not allowed: January 2, 28, at 8 3 2 3 SB

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system Transfer Functons Convenent representaton of a lnear, dynamc model. A transfer functon (TF) relates one nput and one output: x t X s y t system Y s The followng termnology s used: x y nput output forcng

More information

Multi-layer neural networks

Multi-layer neural networks Lecture 0 Mult-layer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Lnear regresson w Lnear unts f () Logstc regresson T T = w = p( y =, w) = g( w ) w z f () = p ( y = ) w d w d Gradent

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

8 Derivation of Network Rate Equations from Single- Cell Conductance Equations

8 Derivation of Network Rate Equations from Single- Cell Conductance Equations Physcs 178/278 - Davd Klenfeld - Wnter 2019 8 Dervaton of Network Rate Equatons from Sngle- Cell Conductance Equatons Our goal to derve the form of the abstract quanttes n rate equatons, such as synaptc

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity Week3, Chapter 4 Moton n Two Dmensons Lecture Quz A partcle confned to moton along the x axs moves wth constant acceleraton from x =.0 m to x = 8.0 m durng a 1-s tme nterval. The velocty of the partcle

More information

Some modelling aspects for the Matlab implementation of MMA

Some modelling aspects for the Matlab implementation of MMA Some modellng aspects for the Matlab mplementaton of MMA Krster Svanberg krlle@math.kth.se Optmzaton and Systems Theory Department of Mathematcs KTH, SE 10044 Stockholm September 2004 1. Consdered optmzaton

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

COS 521: Advanced Algorithms Game Theory and Linear Programming

COS 521: Advanced Algorithms Game Theory and Linear Programming COS 521: Advanced Algorthms Game Theory and Lnear Programmng Moses Charkar February 27, 2013 In these notes, we ntroduce some basc concepts n game theory and lnear programmng (LP). We show a connecton

More information

Digital Signal Processing

Digital Signal Processing Dgtal Sgnal Processng Dscrete-tme System Analyss Manar Mohasen Offce: F8 Emal: manar.subh@ut.ac.r School of IT Engneerng Revew of Precedent Class Contnuous Sgnal The value of the sgnal s avalable over

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

TOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION

TOPICS MULTIPLIERLESS FILTER DESIGN ELEMENTARY SCHOOL ALGORITHM MULTIPLICATION 1 2 MULTIPLIERLESS FILTER DESIGN Realzaton of flters wthout full-fledged multplers Some sldes based on support materal by W. Wolf for hs book Modern VLSI Desgn, 3 rd edton. Partly based on followng papers:

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0 MODULE 2 Topcs: Lnear ndependence, bass and dmenson We have seen that f n a set of vectors one vector s a lnear combnaton of the remanng vectors n the set then the span of the set s unchanged f that vector

More information

One-sided finite-difference approximations suitable for use with Richardson extrapolation

One-sided finite-difference approximations suitable for use with Richardson extrapolation Journal of Computatonal Physcs 219 (2006) 13 20 Short note One-sded fnte-dfference approxmatons sutable for use wth Rchardson extrapolaton Kumar Rahul, S.N. Bhattacharyya * Department of Mechancal Engneerng,

More information

Lecture 14: Forces and Stresses

Lecture 14: Forces and Stresses The Nuts and Bolts of Frst-Prncples Smulaton Lecture 14: Forces and Stresses Durham, 6th-13th December 2001 CASTEP Developers Group wth support from the ESF ψ k Network Overvew of Lecture Why bother? Theoretcal

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso

Supplement: Proofs and Technical Details for The Solution Path of the Generalized Lasso Supplement: Proofs and Techncal Detals for The Soluton Path of the Generalzed Lasso Ryan J. Tbshran Jonathan Taylor In ths document we gve supplementary detals to the paper The Soluton Path of the Generalzed

More information

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14 APPROXIMAE PRICES OF BASKE AND ASIAN OPIONS DUPON OLIVIER Prema 14 Contents Introducton 1 1. Framewor 1 1.1. Baset optons 1.. Asan optons. Computng the prce 3. Lower bound 3.1. Closed formula for the prce

More information

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law: CE304, Sprng 2004 Lecture 4 Introducton to Vapor/Lqud Equlbrum, part 2 Raoult s Law: The smplest model that allows us do VLE calculatons s obtaned when we assume that the vapor phase s an deal gas, and

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

Tracking with Kalman Filter

Tracking with Kalman Filter Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,

More information

Lecture 20: November 7

Lecture 20: November 7 0-725/36-725: Convex Optmzaton Fall 205 Lecturer: Ryan Tbshran Lecture 20: November 7 Scrbes: Varsha Chnnaobreddy, Joon Sk Km, Lngyao Zhang Note: LaTeX template courtesy of UC Berkeley EECS dept. Dsclamer:

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

The Feynman path integral

The Feynman path integral The Feynman path ntegral Aprl 3, 205 Hesenberg and Schrödnger pctures The Schrödnger wave functon places the tme dependence of a physcal system n the state, ψ, t, where the state s a vector n Hlbert space

More information

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach A Bayes Algorthm for the Multtask Pattern Recognton Problem Drect Approach Edward Puchala Wroclaw Unversty of Technology, Char of Systems and Computer etworks, Wybrzeze Wyspanskego 7, 50-370 Wroclaw, Poland

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence)

Dynamic Programming. Preview. Dynamic Programming. Dynamic Programming. Dynamic Programming (Example: Fibonacci Sequence) /24/27 Prevew Fbonacc Sequence Longest Common Subsequence Dynamc programmng s a method for solvng complex problems by breakng them down nto smpler sub-problems. It s applcable to problems exhbtng the propertes

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

Suppose that there s a measured wndow of data fff k () ; :::; ff k g of a sze w, measured dscretely wth varable dscretzaton step. It s convenent to pl

Suppose that there s a measured wndow of data fff k () ; :::; ff k g of a sze w, measured dscretely wth varable dscretzaton step. It s convenent to pl RECURSIVE SPLINE INTERPOLATION METHOD FOR REAL TIME ENGINE CONTROL APPLICATIONS A. Stotsky Volvo Car Corporaton Engne Desgn and Development Dept. 97542, HA1N, SE- 405 31 Gothenburg Sweden. Emal: astotsky@volvocars.com

More information

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests Smulated of the Cramér-von Mses Goodness-of-Ft Tests Steele, M., Chaselng, J. and 3 Hurst, C. School of Mathematcal and Physcal Scences, James Cook Unversty, Australan School of Envronmental Studes, Grffth

More information

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS Avalable onlne at http://sck.org J. Math. Comput. Sc. 3 (3), No., 6-3 ISSN: 97-537 COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

More information

Solution Thermodynamics

Solution Thermodynamics Soluton hermodynamcs usng Wagner Notaton by Stanley. Howard Department of aterals and etallurgcal Engneerng South Dakota School of nes and echnology Rapd Cty, SD 57701 January 7, 001 Soluton hermodynamcs

More information

The equation of motion of a dynamical system is given by a set of differential equations. That is (1)

The equation of motion of a dynamical system is given by a set of differential equations. That is (1) Dynamcal Systems Many engneerng and natural systems are dynamcal systems. For example a pendulum s a dynamcal system. State l The state of the dynamcal system specfes t condtons. For a pendulum n the absence

More information

1 GSW Iterative Techniques for y = Ax

1 GSW Iterative Techniques for y = Ax 1 for y = A I m gong to cheat here. here are a lot of teratve technques that can be used to solve the general case of a set of smultaneous equatons (wrtten n the matr form as y = A), but ths chapter sn

More information

Nice plotting of proteins II

Nice plotting of proteins II Nce plottng of protens II Fnal remark regardng effcency: It s possble to wrte the Newton representaton n a way that can be computed effcently, usng smlar bracketng that we made for the frst representaton

More information

Assortment Optimization under MNL

Assortment Optimization under MNL Assortment Optmzaton under MNL Haotan Song Aprl 30, 2017 1 Introducton The assortment optmzaton problem ams to fnd the revenue-maxmzng assortment of products to offer when the prces of products are fxed.

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Prof. Dr. I. Nasser Phys 630, T Aug-15 One_dimensional_Ising_Model

Prof. Dr. I. Nasser Phys 630, T Aug-15 One_dimensional_Ising_Model EXACT OE-DIMESIOAL ISIG MODEL The one-dmensonal Isng model conssts of a chan of spns, each spn nteractng only wth ts two nearest neghbors. The smple Isng problem n one dmenson can be solved drectly n several

More information

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks Internet Engneerng Jacek Mazurkewcz, PhD Softcomputng Part 3: Recurrent Artfcal Neural Networks Self-Organsng Artfcal Neural Networks Recurrent Artfcal Neural Networks Feedback sgnals between neurons Dynamc

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

CONTRAST ENHANCEMENT FOR MIMIMUM MEAN BRIGHTNESS ERROR FROM HISTOGRAM PARTITIONING INTRODUCTION

CONTRAST ENHANCEMENT FOR MIMIMUM MEAN BRIGHTNESS ERROR FROM HISTOGRAM PARTITIONING INTRODUCTION CONTRAST ENHANCEMENT FOR MIMIMUM MEAN BRIGHTNESS ERROR FROM HISTOGRAM PARTITIONING N. Phanthuna 1,2, F. Cheevasuvt 2 and S. Chtwong 2 1 Department of Electrcal Engneerng, Faculty of Engneerng Rajamangala

More information