Principe, J.C. Artificial Neural Networks The Electrical Engineering Handbook Ed. Richard C. Dorf Boca Raton: CRC Press LLC, 2000

Size: px
Start display at page:

Download "Principe, J.C. Artificial Neural Networks The Electrical Engineering Handbook Ed. Richard C. Dorf Boca Raton: CRC Press LLC, 2000"

Transcription

1 Prncpe, J.C. Artfcal Neural Networks The Electrcal Engneerng Handbook Ed. Rchard C. Dorf Boca Raton: CRC Press LLC, 2000

2 20 Artfcal Neural Networks Jose C. Prncpe Unversty of Florda 20.1 Defntons and Scope Introducton Defntons and Style of Computaton ANN Types and Applcatons 20.2 Multlayer Perceptrons Functon of Each PE How to Tran MLPs Applyng Back- Propagaton n Practce A Posteror Probabltes 20.3 Radal Bass Functon Networks 20.4 Tme Lagged Networks Memory Structures Tranng-Focused TLN Archtectures 20.5 Hebban Learnng and Prncpal Component Analyss Networks Hebban Learnng Prncpal Component Analyss Assocatve Memores 20.6 Compettve Learnng and Kohonen Networks 20.1 Defntons and Scope Introducton Artfcal neural networks (ANN) are among the newest sgnal-processng technologes n the engneer s toolbox. The feld s hghly nterdscplnary, but our approach wll restrct the vew to the engneerng perspectve. In engneerng, neural networks serve two mportant functons: as pattern classfers and as nonlnear adaptve flters. We wll provde a bref overvew of the theory, learnng rules, and applcatons of the most mportant neural network models. Defntons and Style of Computaton An ANN s an adaptve, most often nonlnear system that learns to perform a functon (an nput/output map) from data. Adaptve means that the system parameters are changed durng operaton, normally called the tranng phase. After the tranng phase the ANN parameters are fxed and the system s deployed to solve the problem at hand (the testng phase). The ANN s bult wth a systematc step-by-step procedure to optmze a performance crteron or to follow some mplct nternal constrant, whch s commonly referred to as the learnng rule. The nput/output tranng data are fundamental n neural network technology, because they convey the necessary nformaton to dscover the optmal operatng pont. The nonlnear nature of the neural network processng elements (PEs) provdes the system wth lots of flexblty to acheve practcally any desred nput/output map,.e., some ANNs are unversal mappers. There s a style n neural computaton that s worth descrbng (Fg. 20.1). An nput s presented to the network and a correspondng desred or target response set at the output (when ths s the case the tranng s called supervsed). An error s composed from the dfference between the desred response and the system

3 FIGURE 20.1 The style of neural computaton. output. Ths error nformaton s fed back to the system and adjusts the system parameters n a systematc fashon (the learnng rule). The process s repeated untl the performance s acceptable. It s clear from ths descrpton that the performance hnges heavly on the data. If one does not have data that cover a sgnfcant porton of the operatng condtons or f they are nosy, then neural network technology s probably not the rght soluton. On the other hand, f there s plenty of data and the problem s poorly understood to derve an approxmate model, then neural network technology s a good choce. Ths operatng procedure should be contrasted wth the tradtonal engneerng desgn, made of exhaustve subsystem specfcatons and ntercommuncaton protocols. In ANNs, the desgner chooses the network topology, the performance functon, the learnng rule, and the crteron to stop the tranng phase, but the system automatcally adjusts the parameters. So, t s dffcult to brng a pror nformaton nto the desgn, and when the system does not work properly t s also hard to ncrementally refne the soluton. But ANN-based solutons are extremely effcent n terms of development tme and resources, and n many dffcult problems ANNs provde performance that s dffcult to match wth other technologes. Denker 10 years ago sad that ANNs are the second best way to mplement a soluton motvated by the smplcty of ther desgn and because of ther unversalty, only shadowed by the tradtonal desgn obtaned by studyng the physcs of the problem. At present, ANNs are emergng as the technology of choce for many applcatons, such as pattern recognton, predcton, system dentfcaton, and control. ANN Types and Applcatons It s always rsky to establsh a taxonomy of a technology, but our motvaton s one of provdng a quck overvew of the applcaton areas and the most popular topologes and learnng paradgms. Supervsed Unsupervsed Applcaton Topology Learnng Learnng Assocaton Hopfeld [Zurada, 1992; Haykn, 1994] Hebban [Zurada, 1992; Haykn, 1994; Kung, 1993] Multlayer perceptron [Zurada, 1992; Haykn, 1994; Back-propagaton [Zurada, 1992; Bshop, 1995] Haykn, 1994; Bshop, 1995] Lnear assocatve mem. [Zurada, 1992; Haykn, 1994] Hebban Pattern Multlayer perceptron [Zurada, 1992; Haykn, 1994; Back-propagaton recognton Bshop, 1995] Radal bass functons [Zurada, 1992; Bshop, 1995] Least mean square k-means [Bshop, 1995] Feature Compettve [Zurada, 1992; Haykn, 1994] Compettve extracton Kohonen [Zurada, 1992; Haykn, 1994] Kohonen Multlayer perceptron [Kung, 1993] Back-propagaton Prncpal comp. anal. [Zurada, 1992; Kung, 1993] Oja s [Zurada, 1992; Kung, 1993] Predcton, system ID Tme-lagged networks [Zurada, 1992; Kung, 1993; de Vres and Prncpe, 1992] Fully recurrent nets [Zurada, 1992] Back-propagaton through tme [Zurada, 1992]

4 FIGURE 20.2 MLP wth one hdden layer (d-k-m). FIGURE 20.3 A PE and the most common nonlneartes. It s clear that multlayer perceptrons (MLPs), the back-propagaton algorthm and ts extensons tme-lagged networks (TLN) and back-propagaton through tme (BPTT), respectvely hold a promnent poston n ANN technology. It s therefore only natural to spend most of our overvew presentng the theory and tools of back-propagaton learnng. It s also mportant to notce that Hebban learnng (and ts extenson, the Oja rule) s also a very useful (and bologcally plausble) learnng mechansm. It s an unsupervsed learnng method snce there s no need to specfy the desred or target response to the ANN Multlayer Perceptrons Multlayer perceptrons are a layered arrangement of nonlnear PEs as shown n Fg The layer that receves the nput s called the nput layer, and the layer that produces the output s the output layer. The layers that do not have drect access to the external world are called hdden layers. A layered network wth just the nput and output layers s called the perceptron. Each connecton between PEs s weghted by a scalar, w, called a weght, whch s adapted durng learnng. The PEs n the MLP are composed of an adder followed by a smooth saturatng nonlnearty of the sgmod type (Fg. 20.3). The most common saturatng nonlneartes are the logstc functon and the hyperbolc tangent. The threshold s used n other nets. The mportance of the MLP s that t s a unversal mapper (mplements arbtrary nput/output maps) when the topology has at least two hdden layers and suffcent number of PEs [Haykn, 1994]. Even MLPs wth a sngle hdden layer are able to approxmate contnuous nput/output maps. Ths means that rarely we wll need to choose topologes wth more than two hdden layers. But these are exstence proofs, so the ssue that we must solve as engneers s to choose how many layers and how many PEs n each layer are requred to produce good results. Many problems n engneerng can be thought of n terms of a transformaton of an nput space, contanng the nput, to an output space where the desred response exsts. For nstance, dvdng data nto classes can be thought of as transformng the nput nto 0 and 1 responses that wll code the classes [Bshop, 1995]. Lkewse, dentfcaton of an unknown system can also be framed as a mappng (functon approxmaton) from the nput to the system output [Kung, 1993]. The MLP s hghly recommended for these applcatons.

5 FIGURE 20.4 A two-nput PE and ts separaton surface. Functon of Each PE Let us study brefly the functon of a sngle PE wth two nputs [Zurada, 1992]. If the nonlnearty s the threshold nonlnearty we can mmedately see that the output s smply 1 and 1. The surface that dvdes these subspaces s called a separaton surface, and n ths case t s a lne of equaton ( ) = + + = yw, w wx w x b (20.1).e., the PE weghts and the bas control the orentaton and poston of the separaton lne, respectvely (Fg. 20.4). In many dmensons the separaton surface becomes an hyperplane of dmenson one less than the dmensonalty of the nput space. So, each PE creates a dchotomy n the nput space. For smooth nonlneartes the separaton surface s not crsp; t becomes fuzzy but the same prncples apply. In ths case, the sze of the weghts controls the wdth of the fuzzy boundary (larger weghts shrnk the fuzzy boundary). The perceptron nput/output map s bult from a juxtaposton of lnear separaton surfaces, so the perceptron gves zero classfcaton error only for lnearly separable classes (.e., classes that can be exactly classfed by hyperplanes). When one adds one layer to the perceptron creatng a one hdden layer MLP, the type of separaton surfaces changes drastcally. It can be shown that ths learnng machne s able to create bumps n the nput space,.e., an area of hgh response surrounded by low responses [Zurada, 1992]. The functon of each PE s always the same, no matter f the PE s part of a perceptron or an MLP. However, notce that the output layer n the MLP works wth the result of hdden layer actvatons, creatng an embeddng of functons and producng more complex separaton surfaces. The one-hdden-layer MLP s able to produce nonlnear separaton surfaces. If one adds an extra layer (.e., two hdden layers), the learnng machne now can combne at wll bumps, whch can be nterpreted as a unversal mapper, snce there s evdence that any functon can be approxmated by localzed bumps. One mportant aspect to remember s that changng a sngle weght n the MLP can drastcally change the locaton of the separaton surfaces;.e., the MLP acheves the nput/output map through the nterplay of all ts weghts. How to Tran MLPs One fundamental ssue s how to adapt the weghts w of the MLP to acheve a gven nput/output map. The core deas have been around for many years n optmzaton, and they are extensons of well-known engneerng prncples, such as the least mean square (LMS) algorthm of adaptve flterng [Haykn, 1994]. Let us revew the theory here. Assume that we have a lnear PE (f(net) = net) and that one wants to adapt the weghts as to mnmze the square dfference between the desred sgnal and the PE response (Fg. 20.5). Ths problem has an analytcal soluton known as the least squares [Haykn, 1994]. The optmal weghts are obtaned as the product of the nverse of the nput autocorrelaton functon (R 1 ) and the cross-correlaton vector (P) between the nput and the desred response. The analytcal soluton s equvalent to a search for the mnmum of the quadratc performance surface J(w ) usng gradent descent, where the weghts at each teraton k are adjusted by

6 FIGURE 20.5 Computng analytcally optmal weghts for the lnear PE. ( ) = ( ) - Ñ ( ) Ñ = w k + 1 w k h J k J w (20.2) where h s a small constant called the step sze, and Ñ J (k) s the gradent of the performance surface at teraton k. Bernard Wdrow n the late 1960s proposed a very effcent estmate to compute the gradent at each teraton w Jk ~ 1 2 w 2 Ñ J ( k) = ( ) e ( ) ( k ) = - e ( k) x ( k) (20.3) whch when substtuted nto Eq. (20.2) produces the so-called LMS algorthm. He showed that the LMS converged to the analytc soluton provded the step sze h s small enough. Snce t s a steepest descent procedure, the largest step sze s lmted by the nverse of the largest egenvalue of the nput autocorrelaton matrx. The larger the step sze (below ths lmt), the faster s the convergence, but the fnal values wll rattle around the optmal value n a basn that has a radus proportonal to the step sze. Hence, there s a fundamental trade-off between speed of convergence and accuracy n the fnal weght values. One great appeal of the LMS algorthm s that t s very effcent (just one multplcaton per weght) and requres only local quanttes to be computed. The LMS algorthm can be framed as a computaton of partal dervatves of the cost wth respect to the unknowns,.e., the weght values. In fact, wth the chanrule one wrtes w y = = æ å d - y y w y è ( ) 2 ö ø w ( å w x ) = -ex (20.4) we obtan the LMS algorthm for the lnear PE. What happens f the PE s nonlnear? If the nonlnearty s dfferentable (smooth), we stll can apply the same method, because of the chan rule, whch prescrbes that (Fg. 20.6) FIGURE 20.6 How to extend LMS to nonlnear PEs wth the chan rule.

7 FIGURE 20.7 How to adapt the weghts connected to th PE. w y = e y w net = -( d - y ) f ( net ) x f x = - ( net ) net (20.5) where f (net) s the dervatve of the nonlnearty computed at the operatng pont. Equaton (20.5) s known as the delta rule, and t wll tran the perceptron [Haykn, 1994]. Note that throughout the dervaton we skpped the pattern ndex p for smplcty, but ths rule s appled for each nput pattern. However, the delta rule cannot tran MLPs snce t requres the knowledge of the error sgnal at each PE. The prncple of the ordered dervatves can be extended to multlayer networks, provded we organze the computatons n flows of actvaton and error propagaton. The prncple s very easy to understand, but a lttle complex to formulate n equaton form [Haykn, 1994]. Suppose that we want to adapt the weghts connected to a hdden layer PE, the th PE (Fg. 20.7). One can decompose the computaton of the partal dervatve of the cost wth respect to the weght w j as w j = y y net w j net (20.6) 1 2.e., the partal dervatve wth respect to the weght s the product of the partal dervatve wth respect to the PE state part 1 n Eq. (20.6) tmes the partal dervatve of the local actvaton to the weghts part 2 n Eq. (20.6). Ths last quantty s exactly the same as for the nonlnear PE (f (net )x j ), so the bg ssue s the computaton of. For an output PE, becomes the njected error e n Eq. (20.4). For the hdden th PE y y y s evaluated by summng all the errors that reach the PE from the top layer through the topology when the njected errors e k are clamped at the top layer, or n an equaton y yk = æ ö e f w è ç å net yk k y = å ( net ) net ø k k k k k k (20.7) Substtutng back n Eq. (20.6) we fnally get w j æ x f net çå e f net w è = - ( ) ( ) j k k k 1 2 k ö ø (20.8)

8 Ths equaton embodes the back-propagaton tranng algorthm [Haykn, 1994; Bshop, 1995]. It can be rewrtten as the product of a local actvaton (part 1) and a local error (part 2), exactly as the LMS and the delta rules. But now the local error s a composton of errors that flow through the topology, whch becomes equvalent to the exstence of a desred response at the PE. There s an ntrnsc flow n the mplementaton of the back-propagaton algorthm: frst, nputs are appled to the net and actvatons computed everywhere to yeld the output actvaton. Second, the external errors are computed by subtractng the net output from the desred response. Thrd, these external errors are utlzed n Eq. (20.8) to compute the local errors for the layer mmedately precedng the output layer, and the computatons chaned up to the nput layer. Once all the local errors are avalable, Eq. (20.2) can be used to update every weght. These three steps are then repeated for other tranng patterns untl the error s acceptable. Step three s equvalent to njectng the external errors n the dual topology and back-propagatng them up to the nput layer [Haykn, 1994]. The dual topology s obtaned from the orgnal one by reversng data flow and substtutng summng junctons by splttng nodes and vce versa. The error at each PE of the dual topology s then multpled by the actvaton of the orgnal network to compute the weght updates. So, effectvely the dual topology s beng used to compute the local errors whch makes the procedure hghly effcent. Ths s the reason back-propagaton trans a network of N weghts wth a number of multplcatons proportonal to N, (O(N)), nstead of (O(N 2 )) for prevous methods of computng partal dervatves known n control theory. Usng the dual topology to mplement back-propagaton s the best and most general method to program the algorthm n a dgtal computer. Applyng Back-Propagaton n Practce Now that we know an algorthm to tran MLPs, let us see what are the practcal ssues to apply t. We wll address the followng aspects: sze of tranng set vs. weghts, search procedures, how to stop tranng, and how to set the topology for maxmum generalzaton. Sze of Tranng Set The sze of the tranng set s very mportant for good performance. Remember that the ANN gets ts nformaton from the tranng set. If the tranng data do not cover the full range of operatng condtons, the system may perform badly when deployed. Under no crcumstances should the tranng set be less than the number of weghts n the ANN. A good sze of the tranng data s ten tmes the number of weghts n the network, wth the lower lmt beng set around three tmes the number of weghts (these values should be taken as an ndcaton, subject to expermentaton for each case) [Haykn, 1994]. Search Procedures Searchng along the drecton of the gradent s fne f the performance surface s quadratc. However, n ANNs rarely s ths the case, because of the use of nonlnear PEs and topologes wth several layers. So, gradent descent can be caught n local mnma, whch makes the search very slow n regons of small curvature. One effcent way to speed up the search n regons of small curvature and, at the same tme, to stablze t n narrow valleys s to nclude a momentum term n the weght adaptaton ( ) ( ) = ( ) + ( ) ( ) + ( ) - ( - ) w n + 1 w n hd n x n a w n w n 1 j j j j j (20.9) The value of momentum a should be set expermentally between 0.5 and 0.9. There are many more modfcatons to the conventonal gradent search, such as adaptve step szes, annealed nose, conjugate gradents, and second-order methods (usng nformaton contaned n the Hessan matrx), but the smplcty and power of momentum learnng s hard to beat [Haykn, 1994; Bshop, 1995]. How to Stop Tranng The stop crteron s a fundamental aspect of tranng. The smple deas of cappng the number of teratons or of lettng the system tran untl a predetermned error value are not recommended. The reason s that we want the ANN to perform well n the test set data;.e., we would lke the system to perform well n data t

9 never saw before (good generalzaton) [Bshop, 1995]. The error n the tranng set tends to decrease wth teraton when the ANN has enough degrees of freedom to represent the nput/output map. However, the system may be rememberng the tranng patterns (overfttng) nstead of fndng the underlyng mappng rule. Ths s called overtranng. To avod overtranng the performance n a valdaton set,.e., a set of nput data that the system never saw before, must be checked regularly durng tranng (.e., once every 50 passes over the tranng set). The tranng should be stopped when the performance n the valdaton set starts to ncrease, despte the fact that the performance n the tranng set contnues to decrease. Ths method s called cross valdaton. The valdaton set should be 10% of the tranng set, and dstnct from t. Sze of the Topology The sze of the topology should also be carefully selected. If the number of layers or the sze of each layer s too small, the network does not have enough degrees of freedom to classfy the data or to approxmate the functon, and the performance suffers. On the other hand, f the sze of the network s too large, performance may also suffer. Ths s the phenomenon of overfttng that we mentoned above. But one alternatve way to control t s to reduce the sze of the network. There are bascally two procedures to set the sze of the network: ether one starts small and adds new PEs or one starts wth a large network and prunes PEs [Haykn, 1994]. One quck way to prune the network s to mpose a penalty term n the performance functon a regularzng term such as lmtng the slope of the nput/output map [Bshop, 1995]. A regularzaton term that can be mplemented locally s æ wj n + 1 wj n ç1 ç è ( ) = ( ) - l ( 1 + wj ( n) ) 2 ö n xj n + hd ( ) ( ) ø (20.10) where l s the weght decay parameter and d the local error. Weght decay tends to drve unmportant weghts to zero. A Posteror Probabltes We wll fnsh the dscusson of the MLP by notng that ths topology when traned wth the mean square error s able to estmate drectly at ts outputs a posteror probabltes,.e., the probablty that a gven nput pattern belongs to a gven class [Bshop, 1995]. Ths property s very useful because the MLP outputs can be nterpreted as probabltes and operated as numbers. In order to guarantee ths property, one has to make sure that each class s attrbuted to one output PE, that the topology s suffcently large to represent the mappng, that the tranng has converged to the absolute mnmum, and that the outputs are normalzed between 0 and 1. The frst requrements are met by good desgn, whle the last can be easly enforced f the softmax actvaton s used as the output PE [Bshop, 1995], y = å j exp ( net ) exp ( net j ) (20.11) 20.3 Radal Bass Functon Networks The radal bass functon (RBF) network consttutes another way of mplementng arbtrary nput/output mappngs. The most sgnfcant dfference between the MLP and RBF les n the PE nonlnearty. Whle the PE n the MLP responds to the full nput space, the PE n the RBF s local, normally a Gaussan kernel n the nput space. Hence, t only responds to nputs that are close to ts center;.e., t has bascally a local response.

10 FIGURE 20.8 Radal Bass Functon (RBF) network. The RBF network s also a layered net wth the hdden layer bult from Gaussan kernels and a lnear (or nonlnear) output layer (Fg. 20.8). Tranng of the RBF network s done normally n two stages [Haykn, 1994]: frst, the centers x are adaptvely placed n the nput space usng compettve learnng or k means clusterng [Bshop, 1995], whch are unsupervsed procedures. Compettve learnng s explaned later n the chapter. The varances of each Gaussan are chosen as a percentage (30 to 50%) to the dstance to the nearest center. The goal s to cover adequately the nput data dstrbuton. Once the RBF s located, the second layer weghts w are traned usng the LMS procedure. RBF networks are easy to work wth, they tran very fast, and they have shown good propertes both for functon approxmaton as classfcaton. The problem s that they requre lots of Gaussan kernels n hghdmensonal spaces Tme-Lagged Networks The MLP s the most common neural network topology, but t can only handle nstantaneous nformaton, snce the system has no memory and t s feedforward. In engneerng, the processng of sgnals that exst n tme requres systems wth memory,.e., lnear flters. Another alternatve to mplement memory s to use feedback, whch gves rse to recurrent networks. Fully recurrent networks are dffcult to tran and to stablze, so t s preferable to develop topologes based on MLPs but where explct subsystems to store the past nformaton are ncluded. These subsystems are called short-term memory structures [de Vres and Prncpe, 1992]. The combnaton of an MLP wth short-term memory structures s called a tme-lagged network (TLN). The memory structures can be eventually recurrent, but the feedback s local, so stablty s stll easy to guarantee. Here, we wll cover just one TLN topology, called focused, where the memory s at the nput layer. The most general TLN have memory added anywhere n the network, but they requre other more-nvolved tranng strateges (BPTT [Haykn, 1994]). The nterested reader s referred to de Vres and Prncpe [1992] for further detals. The functon of a short-term memory n the focused TLN s to represent the past of the nput sgnal, whle the nonlnear PEs provde the mappng as n the MLP (Fg. 20.9). Memory Structures The smplest memory structure s bult from a tap delay lne (Fg ). The memory by delays s a snglenput, multple-output system that has no free parameters except ts sze K. The tap delay memory s the memory utlzed n the tme-delay neural network (TDNN) whch has been utlzed successfully n speech recognton and system dentfcaton [Kung, 1993]. A dfferent mechansm for lnear memory s the feedback (Fg ). Feedback allows the system to remember past events because of the exponental decay of the response. Ths memory has lmted resoluton because of the low pass requred for long memores. But notce that unlke the memory by delay, memory by feedback provdes the learnng system wth a free parameter m that controls the length of the memory. Memory by feedback has been used n Elman and Jordan networks [Haykn, 1994].

11 FIGURE 20.9 A focused TLN. FIGURE Tap delay lne memory. FIGURE Memory by feedback (context PE). It s possble to combne the advantages of memory by feedback wth the ones of the memory by delays n lnear systems called dspersve delay lnes. The most studed of these memores s a cascade of low-pass functons called the gamma memory [de Vres and Prncpe, 1992]. The gamma memory has a free parameter m that controls and decouples memory depth from resoluton of the memory. Memory depth D s defned as the frst moment of the mpulse response from the nput to the last tap K, whle memory resoluton R s the number of taps per unt tme. For the gamma memory D = K/m, and R = m;.e., changng m modfes the memory depth and resoluton nversely. Ths recursve parameter m can be adapted wth the output MSE as the other network parameters;.e., the ANN s able to choose the best memory depth to mnmze the output error, whch s unlke the tap delay memory. Tranng-Focused TLN Archtectures The appeal of the focused archtecture s that the MLP weghts can be stll adapted wth back-propagaton. However, the nput/output mappng produced by these networks s statc. The nput memory layer s brngng n past nput nformaton to establsh the value of the mappng. As we know n engneerng, the sze of the memory s fundamental to dentfy, for nstance, an unknown plant or to perform predcton wth a small error. But note now that wth the focused TLN the models for system dentfcaton become nonlnear (.e., nonlnear movng average NMA). When the tap delay mplements the short-term memory, straght back-propagaton can be utlzed snce the only adaptve parameters are the MLP weghts. When the gamma memory s utlzed (or the context PE), the recursve parameter s adapted n a total adaptve framework (or the parameter s preset by some external consderaton). The equatons to adapt the context PE and the gamma memory are shown n Fgs and 20.12, respectvely. For the context PE d(n) refers to the total error that s back-propagated from the MLP and that reaches the dual context PE.

12 FIGURE Gamma memory (dspersve delay lne) Hebban Learnng and Prncpal Component Analyss Networks Hebban Learnng Hebban learnng s an unsupervsed learnng rule that captures smlarty between an nput and an output through correlaton. To adapt a weght w usng Hebban learnng we adjust the weghts accordng to Dw = hx y or n an equaton [Haykn, 1994] ( ) = ( ) + ( ) ( ) w n + 1 w n h x n yn (20.12) where h s the step sze, x s the th nput and y s the PE output. The output of the sngle PE s an nner product between the nput and the weght vector (formula n Fg ). It measures the smlarty between the two vectors.e., f the nput s close to the weght vector the output y s large; otherwse t s small. The weghts are computed by an outer product of the nput X and output Y,.e., W = XY T, where T means transpose. The problem of Hebban learnng s that t s unstable;.e., the weghts wll keep on growng wth the number of teratons [Haykn, 1994]. Oja proposed to stablze the Hebban rule by normalzng the new weght by ts sze, whch gves the rule [Haykn, 1994]: [ ] ( ) = ( ) + ( ) ( ) - ( ) ( ) w n + 1 w n h yn x n yn w n (20.13) The weghts now converge to fnte values. They stll defne n the nput space the drecton where the data cluster has ts largest projecton, whch corresponds to the egenvector wth the largest egenvalue of the nput correlaton matrx [Kung, 1993]. The output of the PE provdes the largest egenvalue of the nput correlaton matrx. FIGURE Hebban PE.

13 FIGURE PCA network. Prncpal Component Analyss Prncpal component analyss (PCA) s a well-known technque n sgnal processng that s used to project a sgnal nto a sgnal-specfc bass. The mportance of PCA analyss s that t provdes the best lnear projecton to a subspace n terms of preservng the sgnal energy [Haykn, 1994]. Normally, PCA s computed analytcally through a sngular value decomposton. PCA networks offer an alternatve to ths computaton by provdng an teratve mplementaton that may be preferred for real-tme operaton n embedded systems. The PCA network s a one-layer network wth lnear-processng elements (Fg ). One can extend Oja s rule for many-output PEs (less or equal to the number of nput PEs), accordng to the formula shown n Fg whch s called the Sanger s rule [Haykn, 1994]. The weght matrx rows (that contan the weghts connected to the output PEs n descendng order) are the egenvectors of the nput correlaton matrx. If we set the number of output PEs equal to M < D, we wll be projectng the nput data onto the M largest prncpal components. Ther outputs wll be proportonal to the M largest egenvalues. Note that we are performng an egendecomposton through an teratve procedure. Assocatve Memores Hebban learnng s also the rule to create assocatve memores [Zurada, 1992]. The most-utlzed assocatve memory mplements heteroassocaton, where the system s able to assocate an nput X to a desgnated output Y whch can be of a dfferent dmenson (Fg ). So, n heteroassocaton the sgnal Y works as the desred response. We can tran such a memory usng Hebban learnng or LMS, but the LMS provdes a more effcent encodng of nformaton. Assocatve memores dffer from conventonal computer memores n several respects. Frst, they are content addressable, and the nformaton s dstrbuted throughout the network, so they are robust to nose n the nput. Wth nonlnear PEs or recurrent connectons (as n the famous Hopfeld network) [Haykn, 1994] they dsplay the mportant property of pattern completon;.e., when the nput s dstorted or only partally avalable, the recall can stll be perfect. FIGURE Assocatve memory (heteroassocaton).

14 FIGURE Autoassocator. A specal case of assocatve memores s called the autoassocator (Fg ), where the tranng output of sze D s equal to the nput sgnal (also a sze D) [Kung, 1993]. Note that the hdden layer has fewer PEs (M D) than the nput (bottleneck layer). W 1 = W 2 T s enforced. The functon of ths network s one of encodng or data reducton. The tranng of ths network (W 2 matrx) s done wth LMS. It can be shown that ths network also mplements PCA wth M components, even when the hdden layer s bult from nonlnear PEs Compettve Learnng and Kohonen Networks Competton s a very effcent way to dvde the computng resources of a network. Instead of havng each output PE more or less senstve to the full nput space, as n the assocatve memores, n a compettve network each PE specalzes nto a pece of the nput space and represents t [Haykn, 1994]. Compettve networks are lnear, sngle-layer nets (Fg ). Ther functonalty s drectly related to the compettve learnng rule, whch belongs to the unsupervsed category. Frst, only the PE that has the largest output gets ts weghts updated. The weghts of the wnnng PE are updated accordng to the formula n Fg n such a way that they approach the present nput. The step sze exactly controls how much s ths adjustment (see Fg ). Notce that there s an ntrnsc nonlnearty n the learnng rule: only the PE that has the largest output (the wnner) has ts weghts updated. All the other weghts reman unchanged. Ths s the mechansm that allows the compettve net PEs to specalze. Compettve networks are used for clusterng;.e., an M output PE net wll seek M clusters n the nput space. The weghts of each PE wll correspond to the centers of mass of one of the M clusters of nput samples. When a gven pattern s shown to the traned net, only one of the outputs wll be actve and can be used to label the sample as belongng to one of the clusters. No more nformaton about the nput data s preserved. Compettve learnng s one of the fundamental components of the Kohonen self-organzng feature map (SOFM) network, whch s also a sngle-layer network wth lnear PEs [Haykn, 1994]. Kohonen learnng creates annealed competton n the output space, by adaptng not only the wnner PE weghts but also ther spatal FIGURE Compettve neural network.

15 FIGURE Kohonen SOFM. neghbors usng a Gaussan neghborhood functon L. The output PEs are arranged n lnear or two-dmensonal neghborhoods (Fg ) Kohonen SOFM networks produce a mappng between the contnuous nput space to the dscrete output space preservng topologcal propertes of the nput space (.e., local neghbors n the nput space are mapped to neghbors n the output space). Durng tranng, both the spatal neghborhoods and the learnng constant are decreased slowly by startng wth a large neghborhood s 0, and decreasng t (N 0 controls the schedulng). The ntal step sze h 0 also needs to be scheduled (by K). The Kohonen SOFM network s useful to project the nput to a subspace as an alternatve to PCA networks. The topologcal propertes of the output space provde more nformaton about the nput than straght clusterng. References C. M. Bshop, Neural Networks for Pattern Recognton, New York: Oxford Unversty Press, de Vres and J. C. Prncpe, The gamma model a new neural model for temporal processng, Neural Networks, Vol. 5, pp , S. Haykn, Neural Networks: A Comprehensve Foundaton, New York: Macmllan, S. Y. Kung, Dgtal Neural Networks, Englewood Clffs, N.J.: Prentce-Hall, J. M. Zurada, Artfcal Neural Systems, West Publshng, Further Informaton The lterature n ths feld s volumnous. We decded to lmt the references to text books for an engneerng audence, wth dfferent levels of sophstcaton. Zurada s the most accessble text, Haykn the most comprehensve. Kung provdes nterestng applcatons of both PCA networks and nonlnear sgnal processng and system dentfcaton. Bshop concentrates on the desgn of pattern classfers. Interested readers are drected to the followng journals for more nformaton: IEEE Transactons on Sgnal Processng, IEEE Tranactons on Neural Networks, Neural Networks, Neural Computaton, and Proceedngs of the Neural Informaton Processng System Conference (NIPS).

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

Neural Networks & Learning

Neural Networks & Learning Neural Netorks & Learnng. Introducton The basc prelmnares nvolved n the Artfcal Neural Netorks (ANN) are descrbed n secton. An Artfcal Neural Netorks (ANN) s an nformaton-processng paradgm that nspred

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Multigradient for Neural Networks for Equalizers 1

Multigradient for Neural Networks for Equalizers 1 Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT

More information

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks

Internet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks Internet Engneerng Jacek Mazurkewcz, PhD Softcomputng Part 3: Recurrent Artfcal Neural Networks Self-Organsng Artfcal Neural Networks Recurrent Artfcal Neural Networks Feedback sgnals between neurons Dynamc

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radal-Bass uncton Networs v.0 March 00 Mchel Verleysen Radal-Bass uncton Networs - Radal-Bass uncton Networs p Orgn: Cover s theorem p Interpolaton problem p Regularzaton theory p Generalzed RBN p Unversal

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Which Separator? Spring 1

Which Separator? Spring 1 Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal

More information

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems

More information

CHAPTER III Neural Networks as Associative Memory

CHAPTER III Neural Networks as Associative Memory CHAPTER III Neural Networs as Assocatve Memory Introducton One of the prmary functons of the bran s assocatve memory. We assocate the faces wth names, letters wth sounds, or we can recognze the people

More information

Non-linear Canonical Correlation Analysis Using a RBF Network

Non-linear Canonical Correlation Analysis Using a RBF Network ESANN' proceedngs - European Smposum on Artfcal Neural Networks Bruges (Belgum), 4-6 Aprl, d-sde publ., ISBN -97--, pp. 57-5 Non-lnear Canoncal Correlaton Analss Usng a RBF Network Sukhbnder Kumar, Elane

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,

More information

Introduction to the Introduction to Artificial Neural Network

Introduction to the Introduction to Artificial Neural Network Introducton to the Introducton to Artfcal Neural Netork Vuong Le th Hao Tang s sldes Part of the content of the sldes are from the Internet (possbly th modfcatons). The lecturer does not clam any onershp

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

CSE 252C: Computer Vision III

CSE 252C: Computer Vision III CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel

More information

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks

Other NN Models. Reinforcement learning (RL) Probabilistic neural networks Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

Chapter - 2. Distribution System Power Flow Analysis

Chapter - 2. Distribution System Power Flow Analysis Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load

More information

Lecture 23: Artificial neural networks

Lecture 23: Artificial neural networks Lecture 23: Artfcal neural networks Broad feld that has developed over the past 20 to 30 years Confluence of statstcal mechancs, appled math, bology and computers Orgnal motvaton: mathematcal modelng of

More information

Pattern Classification

Pattern Classification Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Lecture 12: Classification

Lecture 12: Classification Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

Unified Subspace Analysis for Face Recognition

Unified Subspace Analysis for Face Recognition Unfed Subspace Analyss for Face Recognton Xaogang Wang and Xaoou Tang Department of Informaton Engneerng The Chnese Unversty of Hong Kong Shatn, Hong Kong {xgwang, xtang}@e.cuhk.edu.hk Abstract PCA, LDA

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

IV. Performance Optimization

IV. Performance Optimization IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton

More information

Hopfield Training Rules 1 N

Hopfield Training Rules 1 N Hopfeld Tranng Rules To memorse a sngle pattern Suppose e set the eghts thus - = p p here, s the eght beteen nodes & s the number of nodes n the netor p s the value requred for the -th node What ll the

More information

Multilayer neural networks

Multilayer neural networks Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205 Multlayer

More information

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD CHALMERS, GÖTEBORGS UNIVERSITET SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 35, FIM 72 GU, PhD Tme: Place: Teachers: Allowed materal: Not allowed: January 2, 28, at 8 3 2 3 SB

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Nonlinear Classifiers II

Nonlinear Classifiers II Nonlnear Classfers II Nonlnear Classfers: Introducton Classfers Supervsed Classfers Lnear Classfers Perceptron Least Squares Methods Lnear Support Vector Machne Nonlnear Classfers Part I: Mult Layer Neural

More information

Report on Image warping

Report on Image warping Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.

More information

MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN

MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN S. Chtwong, S. Wtthayapradt, S. Intajag, and F. Cheevasuvt Faculty of Engneerng, Kng Mongkut s Insttute of Technology

More information

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve

More information

Difference Equations

Difference Equations Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1

More information

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester 0/25/6 Admn Assgnment 7 Class /22 Schedule for the rest of the semester NEURAL NETWORKS Davd Kauchak CS58 Fall 206 Perceptron learnng algorthm Our Nervous System repeat untl convergence (or for some #

More information

4DVAR, according to the name, is a four-dimensional variational method.

4DVAR, according to the name, is a four-dimensional variational method. 4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Regularized Discriminant Analysis for Face Recognition

Regularized Discriminant Analysis for Face Recognition 1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE

CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Solving Nonlinear Differential Equations by a Neural Network Method

Solving Nonlinear Differential Equations by a Neural Network Method Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,

More information

EPR Paradox and the Physical Meaning of an Experiment in Quantum Mechanics. Vesselin C. Noninski

EPR Paradox and the Physical Meaning of an Experiment in Quantum Mechanics. Vesselin C. Noninski EPR Paradox and the Physcal Meanng of an Experment n Quantum Mechancs Vesseln C Nonnsk vesselnnonnsk@verzonnet Abstract It s shown that there s one purely determnstc outcome when measurement s made on

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015 CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16 STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus

More information

Tracking with Kalman Filter

Tracking with Kalman Filter Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,

More information

De-noising Method Based on Kernel Adaptive Filtering for Telemetry Vibration Signal of the Vehicle Test Kejun ZENG

De-noising Method Based on Kernel Adaptive Filtering for Telemetry Vibration Signal of the Vehicle Test Kejun ZENG 6th Internatonal Conference on Mechatroncs, Materals, Botechnology and Envronment (ICMMBE 6) De-nosng Method Based on Kernel Adaptve Flterng for elemetry Vbraton Sgnal of the Vehcle est Kejun ZEG PLA 955

More information

Evaluation of classifiers MLPs

Evaluation of classifiers MLPs Lecture Evaluaton of classfers MLPs Mlos Hausrecht mlos@cs.ptt.edu 539 Sennott Square Evaluaton For any data set e use to test the model e can buld a confuson matrx: Counts of examples th: class label

More information

Multi-layer neural networks

Multi-layer neural networks Lecture 0 Mult-layer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Lnear regresson w Lnear unts f () Logstc regresson T T = w = p( y =, w) = g( w ) w z f () = p ( y = ) w d w d Gradent

More information

Linear Classification, SVMs and Nearest Neighbors

Linear Classification, SVMs and Nearest Neighbors 1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush

More information

Feb 14: Spatial analysis of data fields

Feb 14: Spatial analysis of data fields Feb 4: Spatal analyss of data felds Mappng rregularly sampled data onto a regular grd Many analyss technques for geophyscal data requre the data be located at regular ntervals n space and/or tme. hs s

More information

Appendix B: Resampling Algorithms

Appendix B: Resampling Algorithms 407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen

Hopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering / Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons

More information

A neural network with localized receptive fields for visual pattern classification

A neural network with localized receptive fields for visual pattern classification Unversty of Wollongong Research Onlne Faculty of Informatcs - Papers (Archve) Faculty of Engneerng and Informaton Scences 2005 A neural network wth localzed receptve felds for vsual pattern classfcaton

More information

Lecture 3: Dual problems and Kernels

Lecture 3: Dual problems and Kernels Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm

Design and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm Desgn and Optmzaton of Fuzzy Controller for Inverse Pendulum System Usng Genetc Algorthm H. Mehraban A. Ashoor Unversty of Tehran Unversty of Tehran h.mehraban@ece.ut.ac.r a.ashoor@ece.ut.ac.r Abstract:

More information

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming

EEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming EEL 6266 Power System Operaton and Control Chapter 3 Economc Dspatch Usng Dynamc Programmng Pecewse Lnear Cost Functons Common practce many utltes prefer to represent ther generator cost functons as sngle-

More information

Natural Language Processing and Information Retrieval

Natural Language Processing and Information Retrieval Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support

More information

Appendix B. The Finite Difference Scheme

Appendix B. The Finite Difference Scheme 140 APPENDIXES Appendx B. The Fnte Dfference Scheme In ths appendx we present numercal technques whch are used to approxmate solutons of system 3.1 3.3. A comprehensve treatment of theoretcal and mplementaton

More information

Inexact Newton Methods for Inverse Eigenvalue Problems

Inexact Newton Methods for Inverse Eigenvalue Problems Inexact Newton Methods for Inverse Egenvalue Problems Zheng-jan Ba Abstract In ths paper, we survey some of the latest development n usng nexact Newton-lke methods for solvng nverse egenvalue problems.

More information

Uncertainty in measurements of power and energy on power networks

Uncertainty in measurements of power and energy on power networks Uncertanty n measurements of power and energy on power networks E. Manov, N. Kolev Department of Measurement and Instrumentaton, Techncal Unversty Sofa, bul. Klment Ohrdsk No8, bl., 000 Sofa, Bulgara Tel./fax:

More information

Section 8.3 Polar Form of Complex Numbers

Section 8.3 Polar Form of Complex Numbers 80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)

Some Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS) Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998

More information

Fundamentals of Neural Networks

Fundamentals of Neural Networks Fundamentals of Neural Networks Xaodong Cu IBM T. J. Watson Research Center Yorktown Heghts, NY 10598 Fall, 2018 Outlne Feedforward neural networks Forward propagaton Neural networks as unversal approxmators

More information

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.

We present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}. CS 189 Introducton to Machne Learnng Sprng 2018 Note 26 1 Boostng We have seen that n the case of random forests, combnng many mperfect models can produce a snglodel that works very well. Ths s the dea

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Estimating the Fundamental Matrix by Transforming Image Points in Projective Space 1

Estimating the Fundamental Matrix by Transforming Image Points in Projective Space 1 Estmatng the Fundamental Matrx by Transformng Image Ponts n Projectve Space 1 Zhengyou Zhang and Charles Loop Mcrosoft Research, One Mcrosoft Way, Redmond, WA 98052, USA E-mal: fzhang,cloopg@mcrosoft.com

More information

Support Vector Machines CS434

Support Vector Machines CS434 Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations Physcs 171/271 -Davd Klenfeld - Fall 2005 (revsed Wnter 2011) 1 Dervaton of Rate Equatons from Sngle-Cell Conductance (Hodgkn-Huxley-lke) Equatons We consder a network of many neurons, each of whch obeys

More information