Principe, J.C. Artificial Neural Networks The Electrical Engineering Handbook Ed. Richard C. Dorf Boca Raton: CRC Press LLC, 2000
|
|
- Lorin Patrick
- 5 years ago
- Views:
Transcription
1 Prncpe, J.C. Artfcal Neural Networks The Electrcal Engneerng Handbook Ed. Rchard C. Dorf Boca Raton: CRC Press LLC, 2000
2 20 Artfcal Neural Networks Jose C. Prncpe Unversty of Florda 20.1 Defntons and Scope Introducton Defntons and Style of Computaton ANN Types and Applcatons 20.2 Multlayer Perceptrons Functon of Each PE How to Tran MLPs Applyng Back- Propagaton n Practce A Posteror Probabltes 20.3 Radal Bass Functon Networks 20.4 Tme Lagged Networks Memory Structures Tranng-Focused TLN Archtectures 20.5 Hebban Learnng and Prncpal Component Analyss Networks Hebban Learnng Prncpal Component Analyss Assocatve Memores 20.6 Compettve Learnng and Kohonen Networks 20.1 Defntons and Scope Introducton Artfcal neural networks (ANN) are among the newest sgnal-processng technologes n the engneer s toolbox. The feld s hghly nterdscplnary, but our approach wll restrct the vew to the engneerng perspectve. In engneerng, neural networks serve two mportant functons: as pattern classfers and as nonlnear adaptve flters. We wll provde a bref overvew of the theory, learnng rules, and applcatons of the most mportant neural network models. Defntons and Style of Computaton An ANN s an adaptve, most often nonlnear system that learns to perform a functon (an nput/output map) from data. Adaptve means that the system parameters are changed durng operaton, normally called the tranng phase. After the tranng phase the ANN parameters are fxed and the system s deployed to solve the problem at hand (the testng phase). The ANN s bult wth a systematc step-by-step procedure to optmze a performance crteron or to follow some mplct nternal constrant, whch s commonly referred to as the learnng rule. The nput/output tranng data are fundamental n neural network technology, because they convey the necessary nformaton to dscover the optmal operatng pont. The nonlnear nature of the neural network processng elements (PEs) provdes the system wth lots of flexblty to acheve practcally any desred nput/output map,.e., some ANNs are unversal mappers. There s a style n neural computaton that s worth descrbng (Fg. 20.1). An nput s presented to the network and a correspondng desred or target response set at the output (when ths s the case the tranng s called supervsed). An error s composed from the dfference between the desred response and the system
3 FIGURE 20.1 The style of neural computaton. output. Ths error nformaton s fed back to the system and adjusts the system parameters n a systematc fashon (the learnng rule). The process s repeated untl the performance s acceptable. It s clear from ths descrpton that the performance hnges heavly on the data. If one does not have data that cover a sgnfcant porton of the operatng condtons or f they are nosy, then neural network technology s probably not the rght soluton. On the other hand, f there s plenty of data and the problem s poorly understood to derve an approxmate model, then neural network technology s a good choce. Ths operatng procedure should be contrasted wth the tradtonal engneerng desgn, made of exhaustve subsystem specfcatons and ntercommuncaton protocols. In ANNs, the desgner chooses the network topology, the performance functon, the learnng rule, and the crteron to stop the tranng phase, but the system automatcally adjusts the parameters. So, t s dffcult to brng a pror nformaton nto the desgn, and when the system does not work properly t s also hard to ncrementally refne the soluton. But ANN-based solutons are extremely effcent n terms of development tme and resources, and n many dffcult problems ANNs provde performance that s dffcult to match wth other technologes. Denker 10 years ago sad that ANNs are the second best way to mplement a soluton motvated by the smplcty of ther desgn and because of ther unversalty, only shadowed by the tradtonal desgn obtaned by studyng the physcs of the problem. At present, ANNs are emergng as the technology of choce for many applcatons, such as pattern recognton, predcton, system dentfcaton, and control. ANN Types and Applcatons It s always rsky to establsh a taxonomy of a technology, but our motvaton s one of provdng a quck overvew of the applcaton areas and the most popular topologes and learnng paradgms. Supervsed Unsupervsed Applcaton Topology Learnng Learnng Assocaton Hopfeld [Zurada, 1992; Haykn, 1994] Hebban [Zurada, 1992; Haykn, 1994; Kung, 1993] Multlayer perceptron [Zurada, 1992; Haykn, 1994; Back-propagaton [Zurada, 1992; Bshop, 1995] Haykn, 1994; Bshop, 1995] Lnear assocatve mem. [Zurada, 1992; Haykn, 1994] Hebban Pattern Multlayer perceptron [Zurada, 1992; Haykn, 1994; Back-propagaton recognton Bshop, 1995] Radal bass functons [Zurada, 1992; Bshop, 1995] Least mean square k-means [Bshop, 1995] Feature Compettve [Zurada, 1992; Haykn, 1994] Compettve extracton Kohonen [Zurada, 1992; Haykn, 1994] Kohonen Multlayer perceptron [Kung, 1993] Back-propagaton Prncpal comp. anal. [Zurada, 1992; Kung, 1993] Oja s [Zurada, 1992; Kung, 1993] Predcton, system ID Tme-lagged networks [Zurada, 1992; Kung, 1993; de Vres and Prncpe, 1992] Fully recurrent nets [Zurada, 1992] Back-propagaton through tme [Zurada, 1992]
4 FIGURE 20.2 MLP wth one hdden layer (d-k-m). FIGURE 20.3 A PE and the most common nonlneartes. It s clear that multlayer perceptrons (MLPs), the back-propagaton algorthm and ts extensons tme-lagged networks (TLN) and back-propagaton through tme (BPTT), respectvely hold a promnent poston n ANN technology. It s therefore only natural to spend most of our overvew presentng the theory and tools of back-propagaton learnng. It s also mportant to notce that Hebban learnng (and ts extenson, the Oja rule) s also a very useful (and bologcally plausble) learnng mechansm. It s an unsupervsed learnng method snce there s no need to specfy the desred or target response to the ANN Multlayer Perceptrons Multlayer perceptrons are a layered arrangement of nonlnear PEs as shown n Fg The layer that receves the nput s called the nput layer, and the layer that produces the output s the output layer. The layers that do not have drect access to the external world are called hdden layers. A layered network wth just the nput and output layers s called the perceptron. Each connecton between PEs s weghted by a scalar, w, called a weght, whch s adapted durng learnng. The PEs n the MLP are composed of an adder followed by a smooth saturatng nonlnearty of the sgmod type (Fg. 20.3). The most common saturatng nonlneartes are the logstc functon and the hyperbolc tangent. The threshold s used n other nets. The mportance of the MLP s that t s a unversal mapper (mplements arbtrary nput/output maps) when the topology has at least two hdden layers and suffcent number of PEs [Haykn, 1994]. Even MLPs wth a sngle hdden layer are able to approxmate contnuous nput/output maps. Ths means that rarely we wll need to choose topologes wth more than two hdden layers. But these are exstence proofs, so the ssue that we must solve as engneers s to choose how many layers and how many PEs n each layer are requred to produce good results. Many problems n engneerng can be thought of n terms of a transformaton of an nput space, contanng the nput, to an output space where the desred response exsts. For nstance, dvdng data nto classes can be thought of as transformng the nput nto 0 and 1 responses that wll code the classes [Bshop, 1995]. Lkewse, dentfcaton of an unknown system can also be framed as a mappng (functon approxmaton) from the nput to the system output [Kung, 1993]. The MLP s hghly recommended for these applcatons.
5 FIGURE 20.4 A two-nput PE and ts separaton surface. Functon of Each PE Let us study brefly the functon of a sngle PE wth two nputs [Zurada, 1992]. If the nonlnearty s the threshold nonlnearty we can mmedately see that the output s smply 1 and 1. The surface that dvdes these subspaces s called a separaton surface, and n ths case t s a lne of equaton ( ) = + + = yw, w wx w x b (20.1).e., the PE weghts and the bas control the orentaton and poston of the separaton lne, respectvely (Fg. 20.4). In many dmensons the separaton surface becomes an hyperplane of dmenson one less than the dmensonalty of the nput space. So, each PE creates a dchotomy n the nput space. For smooth nonlneartes the separaton surface s not crsp; t becomes fuzzy but the same prncples apply. In ths case, the sze of the weghts controls the wdth of the fuzzy boundary (larger weghts shrnk the fuzzy boundary). The perceptron nput/output map s bult from a juxtaposton of lnear separaton surfaces, so the perceptron gves zero classfcaton error only for lnearly separable classes (.e., classes that can be exactly classfed by hyperplanes). When one adds one layer to the perceptron creatng a one hdden layer MLP, the type of separaton surfaces changes drastcally. It can be shown that ths learnng machne s able to create bumps n the nput space,.e., an area of hgh response surrounded by low responses [Zurada, 1992]. The functon of each PE s always the same, no matter f the PE s part of a perceptron or an MLP. However, notce that the output layer n the MLP works wth the result of hdden layer actvatons, creatng an embeddng of functons and producng more complex separaton surfaces. The one-hdden-layer MLP s able to produce nonlnear separaton surfaces. If one adds an extra layer (.e., two hdden layers), the learnng machne now can combne at wll bumps, whch can be nterpreted as a unversal mapper, snce there s evdence that any functon can be approxmated by localzed bumps. One mportant aspect to remember s that changng a sngle weght n the MLP can drastcally change the locaton of the separaton surfaces;.e., the MLP acheves the nput/output map through the nterplay of all ts weghts. How to Tran MLPs One fundamental ssue s how to adapt the weghts w of the MLP to acheve a gven nput/output map. The core deas have been around for many years n optmzaton, and they are extensons of well-known engneerng prncples, such as the least mean square (LMS) algorthm of adaptve flterng [Haykn, 1994]. Let us revew the theory here. Assume that we have a lnear PE (f(net) = net) and that one wants to adapt the weghts as to mnmze the square dfference between the desred sgnal and the PE response (Fg. 20.5). Ths problem has an analytcal soluton known as the least squares [Haykn, 1994]. The optmal weghts are obtaned as the product of the nverse of the nput autocorrelaton functon (R 1 ) and the cross-correlaton vector (P) between the nput and the desred response. The analytcal soluton s equvalent to a search for the mnmum of the quadratc performance surface J(w ) usng gradent descent, where the weghts at each teraton k are adjusted by
6 FIGURE 20.5 Computng analytcally optmal weghts for the lnear PE. ( ) = ( ) - Ñ ( ) Ñ = w k + 1 w k h J k J w (20.2) where h s a small constant called the step sze, and Ñ J (k) s the gradent of the performance surface at teraton k. Bernard Wdrow n the late 1960s proposed a very effcent estmate to compute the gradent at each teraton w Jk ~ 1 2 w 2 Ñ J ( k) = ( ) e ( ) ( k ) = - e ( k) x ( k) (20.3) whch when substtuted nto Eq. (20.2) produces the so-called LMS algorthm. He showed that the LMS converged to the analytc soluton provded the step sze h s small enough. Snce t s a steepest descent procedure, the largest step sze s lmted by the nverse of the largest egenvalue of the nput autocorrelaton matrx. The larger the step sze (below ths lmt), the faster s the convergence, but the fnal values wll rattle around the optmal value n a basn that has a radus proportonal to the step sze. Hence, there s a fundamental trade-off between speed of convergence and accuracy n the fnal weght values. One great appeal of the LMS algorthm s that t s very effcent (just one multplcaton per weght) and requres only local quanttes to be computed. The LMS algorthm can be framed as a computaton of partal dervatves of the cost wth respect to the unknowns,.e., the weght values. In fact, wth the chanrule one wrtes w y = = æ å d - y y w y è ( ) 2 ö ø w ( å w x ) = -ex (20.4) we obtan the LMS algorthm for the lnear PE. What happens f the PE s nonlnear? If the nonlnearty s dfferentable (smooth), we stll can apply the same method, because of the chan rule, whch prescrbes that (Fg. 20.6) FIGURE 20.6 How to extend LMS to nonlnear PEs wth the chan rule.
7 FIGURE 20.7 How to adapt the weghts connected to th PE. w y = e y w net = -( d - y ) f ( net ) x f x = - ( net ) net (20.5) where f (net) s the dervatve of the nonlnearty computed at the operatng pont. Equaton (20.5) s known as the delta rule, and t wll tran the perceptron [Haykn, 1994]. Note that throughout the dervaton we skpped the pattern ndex p for smplcty, but ths rule s appled for each nput pattern. However, the delta rule cannot tran MLPs snce t requres the knowledge of the error sgnal at each PE. The prncple of the ordered dervatves can be extended to multlayer networks, provded we organze the computatons n flows of actvaton and error propagaton. The prncple s very easy to understand, but a lttle complex to formulate n equaton form [Haykn, 1994]. Suppose that we want to adapt the weghts connected to a hdden layer PE, the th PE (Fg. 20.7). One can decompose the computaton of the partal dervatve of the cost wth respect to the weght w j as w j = y y net w j net (20.6) 1 2.e., the partal dervatve wth respect to the weght s the product of the partal dervatve wth respect to the PE state part 1 n Eq. (20.6) tmes the partal dervatve of the local actvaton to the weghts part 2 n Eq. (20.6). Ths last quantty s exactly the same as for the nonlnear PE (f (net )x j ), so the bg ssue s the computaton of. For an output PE, becomes the njected error e n Eq. (20.4). For the hdden th PE y y y s evaluated by summng all the errors that reach the PE from the top layer through the topology when the njected errors e k are clamped at the top layer, or n an equaton y yk = æ ö e f w è ç å net yk k y = å ( net ) net ø k k k k k k (20.7) Substtutng back n Eq. (20.6) we fnally get w j æ x f net çå e f net w è = - ( ) ( ) j k k k 1 2 k ö ø (20.8)
8 Ths equaton embodes the back-propagaton tranng algorthm [Haykn, 1994; Bshop, 1995]. It can be rewrtten as the product of a local actvaton (part 1) and a local error (part 2), exactly as the LMS and the delta rules. But now the local error s a composton of errors that flow through the topology, whch becomes equvalent to the exstence of a desred response at the PE. There s an ntrnsc flow n the mplementaton of the back-propagaton algorthm: frst, nputs are appled to the net and actvatons computed everywhere to yeld the output actvaton. Second, the external errors are computed by subtractng the net output from the desred response. Thrd, these external errors are utlzed n Eq. (20.8) to compute the local errors for the layer mmedately precedng the output layer, and the computatons chaned up to the nput layer. Once all the local errors are avalable, Eq. (20.2) can be used to update every weght. These three steps are then repeated for other tranng patterns untl the error s acceptable. Step three s equvalent to njectng the external errors n the dual topology and back-propagatng them up to the nput layer [Haykn, 1994]. The dual topology s obtaned from the orgnal one by reversng data flow and substtutng summng junctons by splttng nodes and vce versa. The error at each PE of the dual topology s then multpled by the actvaton of the orgnal network to compute the weght updates. So, effectvely the dual topology s beng used to compute the local errors whch makes the procedure hghly effcent. Ths s the reason back-propagaton trans a network of N weghts wth a number of multplcatons proportonal to N, (O(N)), nstead of (O(N 2 )) for prevous methods of computng partal dervatves known n control theory. Usng the dual topology to mplement back-propagaton s the best and most general method to program the algorthm n a dgtal computer. Applyng Back-Propagaton n Practce Now that we know an algorthm to tran MLPs, let us see what are the practcal ssues to apply t. We wll address the followng aspects: sze of tranng set vs. weghts, search procedures, how to stop tranng, and how to set the topology for maxmum generalzaton. Sze of Tranng Set The sze of the tranng set s very mportant for good performance. Remember that the ANN gets ts nformaton from the tranng set. If the tranng data do not cover the full range of operatng condtons, the system may perform badly when deployed. Under no crcumstances should the tranng set be less than the number of weghts n the ANN. A good sze of the tranng data s ten tmes the number of weghts n the network, wth the lower lmt beng set around three tmes the number of weghts (these values should be taken as an ndcaton, subject to expermentaton for each case) [Haykn, 1994]. Search Procedures Searchng along the drecton of the gradent s fne f the performance surface s quadratc. However, n ANNs rarely s ths the case, because of the use of nonlnear PEs and topologes wth several layers. So, gradent descent can be caught n local mnma, whch makes the search very slow n regons of small curvature. One effcent way to speed up the search n regons of small curvature and, at the same tme, to stablze t n narrow valleys s to nclude a momentum term n the weght adaptaton ( ) ( ) = ( ) + ( ) ( ) + ( ) - ( - ) w n + 1 w n hd n x n a w n w n 1 j j j j j (20.9) The value of momentum a should be set expermentally between 0.5 and 0.9. There are many more modfcatons to the conventonal gradent search, such as adaptve step szes, annealed nose, conjugate gradents, and second-order methods (usng nformaton contaned n the Hessan matrx), but the smplcty and power of momentum learnng s hard to beat [Haykn, 1994; Bshop, 1995]. How to Stop Tranng The stop crteron s a fundamental aspect of tranng. The smple deas of cappng the number of teratons or of lettng the system tran untl a predetermned error value are not recommended. The reason s that we want the ANN to perform well n the test set data;.e., we would lke the system to perform well n data t
9 never saw before (good generalzaton) [Bshop, 1995]. The error n the tranng set tends to decrease wth teraton when the ANN has enough degrees of freedom to represent the nput/output map. However, the system may be rememberng the tranng patterns (overfttng) nstead of fndng the underlyng mappng rule. Ths s called overtranng. To avod overtranng the performance n a valdaton set,.e., a set of nput data that the system never saw before, must be checked regularly durng tranng (.e., once every 50 passes over the tranng set). The tranng should be stopped when the performance n the valdaton set starts to ncrease, despte the fact that the performance n the tranng set contnues to decrease. Ths method s called cross valdaton. The valdaton set should be 10% of the tranng set, and dstnct from t. Sze of the Topology The sze of the topology should also be carefully selected. If the number of layers or the sze of each layer s too small, the network does not have enough degrees of freedom to classfy the data or to approxmate the functon, and the performance suffers. On the other hand, f the sze of the network s too large, performance may also suffer. Ths s the phenomenon of overfttng that we mentoned above. But one alternatve way to control t s to reduce the sze of the network. There are bascally two procedures to set the sze of the network: ether one starts small and adds new PEs or one starts wth a large network and prunes PEs [Haykn, 1994]. One quck way to prune the network s to mpose a penalty term n the performance functon a regularzng term such as lmtng the slope of the nput/output map [Bshop, 1995]. A regularzaton term that can be mplemented locally s æ wj n + 1 wj n ç1 ç è ( ) = ( ) - l ( 1 + wj ( n) ) 2 ö n xj n + hd ( ) ( ) ø (20.10) where l s the weght decay parameter and d the local error. Weght decay tends to drve unmportant weghts to zero. A Posteror Probabltes We wll fnsh the dscusson of the MLP by notng that ths topology when traned wth the mean square error s able to estmate drectly at ts outputs a posteror probabltes,.e., the probablty that a gven nput pattern belongs to a gven class [Bshop, 1995]. Ths property s very useful because the MLP outputs can be nterpreted as probabltes and operated as numbers. In order to guarantee ths property, one has to make sure that each class s attrbuted to one output PE, that the topology s suffcently large to represent the mappng, that the tranng has converged to the absolute mnmum, and that the outputs are normalzed between 0 and 1. The frst requrements are met by good desgn, whle the last can be easly enforced f the softmax actvaton s used as the output PE [Bshop, 1995], y = å j exp ( net ) exp ( net j ) (20.11) 20.3 Radal Bass Functon Networks The radal bass functon (RBF) network consttutes another way of mplementng arbtrary nput/output mappngs. The most sgnfcant dfference between the MLP and RBF les n the PE nonlnearty. Whle the PE n the MLP responds to the full nput space, the PE n the RBF s local, normally a Gaussan kernel n the nput space. Hence, t only responds to nputs that are close to ts center;.e., t has bascally a local response.
10 FIGURE 20.8 Radal Bass Functon (RBF) network. The RBF network s also a layered net wth the hdden layer bult from Gaussan kernels and a lnear (or nonlnear) output layer (Fg. 20.8). Tranng of the RBF network s done normally n two stages [Haykn, 1994]: frst, the centers x are adaptvely placed n the nput space usng compettve learnng or k means clusterng [Bshop, 1995], whch are unsupervsed procedures. Compettve learnng s explaned later n the chapter. The varances of each Gaussan are chosen as a percentage (30 to 50%) to the dstance to the nearest center. The goal s to cover adequately the nput data dstrbuton. Once the RBF s located, the second layer weghts w are traned usng the LMS procedure. RBF networks are easy to work wth, they tran very fast, and they have shown good propertes both for functon approxmaton as classfcaton. The problem s that they requre lots of Gaussan kernels n hghdmensonal spaces Tme-Lagged Networks The MLP s the most common neural network topology, but t can only handle nstantaneous nformaton, snce the system has no memory and t s feedforward. In engneerng, the processng of sgnals that exst n tme requres systems wth memory,.e., lnear flters. Another alternatve to mplement memory s to use feedback, whch gves rse to recurrent networks. Fully recurrent networks are dffcult to tran and to stablze, so t s preferable to develop topologes based on MLPs but where explct subsystems to store the past nformaton are ncluded. These subsystems are called short-term memory structures [de Vres and Prncpe, 1992]. The combnaton of an MLP wth short-term memory structures s called a tme-lagged network (TLN). The memory structures can be eventually recurrent, but the feedback s local, so stablty s stll easy to guarantee. Here, we wll cover just one TLN topology, called focused, where the memory s at the nput layer. The most general TLN have memory added anywhere n the network, but they requre other more-nvolved tranng strateges (BPTT [Haykn, 1994]). The nterested reader s referred to de Vres and Prncpe [1992] for further detals. The functon of a short-term memory n the focused TLN s to represent the past of the nput sgnal, whle the nonlnear PEs provde the mappng as n the MLP (Fg. 20.9). Memory Structures The smplest memory structure s bult from a tap delay lne (Fg ). The memory by delays s a snglenput, multple-output system that has no free parameters except ts sze K. The tap delay memory s the memory utlzed n the tme-delay neural network (TDNN) whch has been utlzed successfully n speech recognton and system dentfcaton [Kung, 1993]. A dfferent mechansm for lnear memory s the feedback (Fg ). Feedback allows the system to remember past events because of the exponental decay of the response. Ths memory has lmted resoluton because of the low pass requred for long memores. But notce that unlke the memory by delay, memory by feedback provdes the learnng system wth a free parameter m that controls the length of the memory. Memory by feedback has been used n Elman and Jordan networks [Haykn, 1994].
11 FIGURE 20.9 A focused TLN. FIGURE Tap delay lne memory. FIGURE Memory by feedback (context PE). It s possble to combne the advantages of memory by feedback wth the ones of the memory by delays n lnear systems called dspersve delay lnes. The most studed of these memores s a cascade of low-pass functons called the gamma memory [de Vres and Prncpe, 1992]. The gamma memory has a free parameter m that controls and decouples memory depth from resoluton of the memory. Memory depth D s defned as the frst moment of the mpulse response from the nput to the last tap K, whle memory resoluton R s the number of taps per unt tme. For the gamma memory D = K/m, and R = m;.e., changng m modfes the memory depth and resoluton nversely. Ths recursve parameter m can be adapted wth the output MSE as the other network parameters;.e., the ANN s able to choose the best memory depth to mnmze the output error, whch s unlke the tap delay memory. Tranng-Focused TLN Archtectures The appeal of the focused archtecture s that the MLP weghts can be stll adapted wth back-propagaton. However, the nput/output mappng produced by these networks s statc. The nput memory layer s brngng n past nput nformaton to establsh the value of the mappng. As we know n engneerng, the sze of the memory s fundamental to dentfy, for nstance, an unknown plant or to perform predcton wth a small error. But note now that wth the focused TLN the models for system dentfcaton become nonlnear (.e., nonlnear movng average NMA). When the tap delay mplements the short-term memory, straght back-propagaton can be utlzed snce the only adaptve parameters are the MLP weghts. When the gamma memory s utlzed (or the context PE), the recursve parameter s adapted n a total adaptve framework (or the parameter s preset by some external consderaton). The equatons to adapt the context PE and the gamma memory are shown n Fgs and 20.12, respectvely. For the context PE d(n) refers to the total error that s back-propagated from the MLP and that reaches the dual context PE.
12 FIGURE Gamma memory (dspersve delay lne) Hebban Learnng and Prncpal Component Analyss Networks Hebban Learnng Hebban learnng s an unsupervsed learnng rule that captures smlarty between an nput and an output through correlaton. To adapt a weght w usng Hebban learnng we adjust the weghts accordng to Dw = hx y or n an equaton [Haykn, 1994] ( ) = ( ) + ( ) ( ) w n + 1 w n h x n yn (20.12) where h s the step sze, x s the th nput and y s the PE output. The output of the sngle PE s an nner product between the nput and the weght vector (formula n Fg ). It measures the smlarty between the two vectors.e., f the nput s close to the weght vector the output y s large; otherwse t s small. The weghts are computed by an outer product of the nput X and output Y,.e., W = XY T, where T means transpose. The problem of Hebban learnng s that t s unstable;.e., the weghts wll keep on growng wth the number of teratons [Haykn, 1994]. Oja proposed to stablze the Hebban rule by normalzng the new weght by ts sze, whch gves the rule [Haykn, 1994]: [ ] ( ) = ( ) + ( ) ( ) - ( ) ( ) w n + 1 w n h yn x n yn w n (20.13) The weghts now converge to fnte values. They stll defne n the nput space the drecton where the data cluster has ts largest projecton, whch corresponds to the egenvector wth the largest egenvalue of the nput correlaton matrx [Kung, 1993]. The output of the PE provdes the largest egenvalue of the nput correlaton matrx. FIGURE Hebban PE.
13 FIGURE PCA network. Prncpal Component Analyss Prncpal component analyss (PCA) s a well-known technque n sgnal processng that s used to project a sgnal nto a sgnal-specfc bass. The mportance of PCA analyss s that t provdes the best lnear projecton to a subspace n terms of preservng the sgnal energy [Haykn, 1994]. Normally, PCA s computed analytcally through a sngular value decomposton. PCA networks offer an alternatve to ths computaton by provdng an teratve mplementaton that may be preferred for real-tme operaton n embedded systems. The PCA network s a one-layer network wth lnear-processng elements (Fg ). One can extend Oja s rule for many-output PEs (less or equal to the number of nput PEs), accordng to the formula shown n Fg whch s called the Sanger s rule [Haykn, 1994]. The weght matrx rows (that contan the weghts connected to the output PEs n descendng order) are the egenvectors of the nput correlaton matrx. If we set the number of output PEs equal to M < D, we wll be projectng the nput data onto the M largest prncpal components. Ther outputs wll be proportonal to the M largest egenvalues. Note that we are performng an egendecomposton through an teratve procedure. Assocatve Memores Hebban learnng s also the rule to create assocatve memores [Zurada, 1992]. The most-utlzed assocatve memory mplements heteroassocaton, where the system s able to assocate an nput X to a desgnated output Y whch can be of a dfferent dmenson (Fg ). So, n heteroassocaton the sgnal Y works as the desred response. We can tran such a memory usng Hebban learnng or LMS, but the LMS provdes a more effcent encodng of nformaton. Assocatve memores dffer from conventonal computer memores n several respects. Frst, they are content addressable, and the nformaton s dstrbuted throughout the network, so they are robust to nose n the nput. Wth nonlnear PEs or recurrent connectons (as n the famous Hopfeld network) [Haykn, 1994] they dsplay the mportant property of pattern completon;.e., when the nput s dstorted or only partally avalable, the recall can stll be perfect. FIGURE Assocatve memory (heteroassocaton).
14 FIGURE Autoassocator. A specal case of assocatve memores s called the autoassocator (Fg ), where the tranng output of sze D s equal to the nput sgnal (also a sze D) [Kung, 1993]. Note that the hdden layer has fewer PEs (M D) than the nput (bottleneck layer). W 1 = W 2 T s enforced. The functon of ths network s one of encodng or data reducton. The tranng of ths network (W 2 matrx) s done wth LMS. It can be shown that ths network also mplements PCA wth M components, even when the hdden layer s bult from nonlnear PEs Compettve Learnng and Kohonen Networks Competton s a very effcent way to dvde the computng resources of a network. Instead of havng each output PE more or less senstve to the full nput space, as n the assocatve memores, n a compettve network each PE specalzes nto a pece of the nput space and represents t [Haykn, 1994]. Compettve networks are lnear, sngle-layer nets (Fg ). Ther functonalty s drectly related to the compettve learnng rule, whch belongs to the unsupervsed category. Frst, only the PE that has the largest output gets ts weghts updated. The weghts of the wnnng PE are updated accordng to the formula n Fg n such a way that they approach the present nput. The step sze exactly controls how much s ths adjustment (see Fg ). Notce that there s an ntrnsc nonlnearty n the learnng rule: only the PE that has the largest output (the wnner) has ts weghts updated. All the other weghts reman unchanged. Ths s the mechansm that allows the compettve net PEs to specalze. Compettve networks are used for clusterng;.e., an M output PE net wll seek M clusters n the nput space. The weghts of each PE wll correspond to the centers of mass of one of the M clusters of nput samples. When a gven pattern s shown to the traned net, only one of the outputs wll be actve and can be used to label the sample as belongng to one of the clusters. No more nformaton about the nput data s preserved. Compettve learnng s one of the fundamental components of the Kohonen self-organzng feature map (SOFM) network, whch s also a sngle-layer network wth lnear PEs [Haykn, 1994]. Kohonen learnng creates annealed competton n the output space, by adaptng not only the wnner PE weghts but also ther spatal FIGURE Compettve neural network.
15 FIGURE Kohonen SOFM. neghbors usng a Gaussan neghborhood functon L. The output PEs are arranged n lnear or two-dmensonal neghborhoods (Fg ) Kohonen SOFM networks produce a mappng between the contnuous nput space to the dscrete output space preservng topologcal propertes of the nput space (.e., local neghbors n the nput space are mapped to neghbors n the output space). Durng tranng, both the spatal neghborhoods and the learnng constant are decreased slowly by startng wth a large neghborhood s 0, and decreasng t (N 0 controls the schedulng). The ntal step sze h 0 also needs to be scheduled (by K). The Kohonen SOFM network s useful to project the nput to a subspace as an alternatve to PCA networks. The topologcal propertes of the output space provde more nformaton about the nput than straght clusterng. References C. M. Bshop, Neural Networks for Pattern Recognton, New York: Oxford Unversty Press, de Vres and J. C. Prncpe, The gamma model a new neural model for temporal processng, Neural Networks, Vol. 5, pp , S. Haykn, Neural Networks: A Comprehensve Foundaton, New York: Macmllan, S. Y. Kung, Dgtal Neural Networks, Englewood Clffs, N.J.: Prentce-Hall, J. M. Zurada, Artfcal Neural Systems, West Publshng, Further Informaton The lterature n ths feld s volumnous. We decded to lmt the references to text books for an engneerng audence, wth dfferent levels of sophstcaton. Zurada s the most accessble text, Haykn the most comprehensve. Kung provdes nterestng applcatons of both PCA networks and nonlnear sgnal processng and system dentfcaton. Bshop concentrates on the desgn of pattern classfers. Interested readers are drected to the followng journals for more nformaton: IEEE Transactons on Sgnal Processng, IEEE Tranactons on Neural Networks, Neural Networks, Neural Computaton, and Proceedngs of the Neural Informaton Processng System Conference (NIPS).
EEE 241: Linear Systems
EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationWeek 5: Neural Networks
Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple
More informationFor now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.
Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More informationNeural Networks & Learning
Neural Netorks & Learnng. Introducton The basc prelmnares nvolved n the Artfcal Neural Netorks (ANN) are descrbed n secton. An Artfcal Neural Netorks (ANN) s an nformaton-processng paradgm that nspred
More informationVQ widely used in coding speech, image, and video
at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationMultigradient for Neural Networks for Equalizers 1
Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT
More informationInternet Engineering. Jacek Mazurkiewicz, PhD Softcomputing. Part 3: Recurrent Artificial Neural Networks Self-Organising Artificial Neural Networks
Internet Engneerng Jacek Mazurkewcz, PhD Softcomputng Part 3: Recurrent Artfcal Neural Networks Self-Organsng Artfcal Neural Networks Recurrent Artfcal Neural Networks Feedback sgnals between neurons Dynamc
More informationMultilayer Perceptron (MLP)
Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne
More informationRadial-Basis Function Networks
Radal-Bass uncton Networs v.0 March 00 Mchel Verleysen Radal-Bass uncton Networs - Radal-Bass uncton Networs p Orgn: Cover s theorem p Interpolaton problem p Regularzaton theory p Generalzed RBN p Unversal
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationBoostrapaggregating (Bagging)
Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationKernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan
Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems
More informationCHAPTER III Neural Networks as Associative Memory
CHAPTER III Neural Networs as Assocatve Memory Introducton One of the prmary functons of the bran s assocatve memory. We assocate the faces wth names, letters wth sounds, or we can recognze the people
More informationNon-linear Canonical Correlation Analysis Using a RBF Network
ESANN' proceedngs - European Smposum on Artfcal Neural Networks Bruges (Belgum), 4-6 Aprl, d-sde publ., ISBN -97--, pp. 57-5 Non-lnear Canoncal Correlaton Analss Usng a RBF Network Sukhbnder Kumar, Elane
More information1 Convex Optimization
Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,
More informationMultilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata
Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,
More informationIntroduction to the Introduction to Artificial Neural Network
Introducton to the Introducton to Artfcal Neural Netork Vuong Le th Hao Tang s sldes Part of the content of the sldes are from the Internet (possbly th modfcatons). The lecturer does not clam any onershp
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More informationSupporting Information
Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationCSE 252C: Computer Vision III
CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel
More informationOther NN Models. Reinforcement learning (RL) Probabilistic neural networks
Other NN Models Renforcement learnng (RL) Probablstc neural networks Support vector machne (SVM) Renforcement learnng g( (RL) Basc deas: Supervsed dlearnng: (delta rule, BP) Samples (x, f(x)) to learn
More informationNeural networks. Nuno Vasconcelos ECE Department, UCSD
Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationChapter - 2. Distribution System Power Flow Analysis
Chapter - 2 Dstrbuton System Power Flow Analyss CHAPTER - 2 Radal Dstrbuton System Load Flow 2.1 Introducton Load flow s an mportant tool [66] for analyzng electrcal power system network performance. Load
More informationLecture 23: Artificial neural networks
Lecture 23: Artfcal neural networks Broad feld that has developed over the past 20 to 30 years Confluence of statstcal mechancs, appled math, bology and computers Orgnal motvaton: mathematcal modelng of
More informationPattern Classification
Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationLecture 12: Classification
Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna
More informationADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING
1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N
More informationLinear Feature Engineering 11
Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19
More informationUnified Subspace Analysis for Face Recognition
Unfed Subspace Analyss for Face Recognton Xaogang Wang and Xaoou Tang Department of Informaton Engneerng The Chnese Unversty of Hong Kong Shatn, Hong Kong {xgwang, xtang}@e.cuhk.edu.hk Abstract PCA, LDA
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationIV. Performance Optimization
IV. Performance Optmzaton A. Steepest descent algorthm defnton how to set up bounds on learnng rate mnmzaton n a lne (varyng learnng rate) momentum learnng examples B. Newton s method defnton Gauss-Newton
More informationHopfield Training Rules 1 N
Hopfeld Tranng Rules To memorse a sngle pattern Suppose e set the eghts thus - = p p here, s the eght beteen nodes & s the number of nodes n the netor p s the value requred for the -th node What ll the
More informationMultilayer neural networks
Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205 Multlayer
More informationCHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD
CHALMERS, GÖTEBORGS UNIVERSITET SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 35, FIM 72 GU, PhD Tme: Place: Teachers: Allowed materal: Not allowed: January 2, 28, at 8 3 2 3 SB
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationNonlinear Classifiers II
Nonlnear Classfers II Nonlnear Classfers: Introducton Classfers Supervsed Classfers Lnear Classfers Perceptron Least Squares Methods Lnear Support Vector Machne Nonlnear Classfers Part I: Mult Layer Neural
More informationReport on Image warping
Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.
More informationMULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN
MULTISPECTRAL IMAGE CLASSIFICATION USING BACK-PROPAGATION NEURAL NETWORK IN PCA DOMAIN S. Chtwong, S. Wtthayapradt, S. Intajag, and F. Cheevasuvt Faculty of Engneerng, Kng Mongkut s Insttute of Technology
More informationCSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography
CSc 6974 and ECSE 6966 Math. Tech. for Vson, Graphcs and Robotcs Lecture 21, Aprl 17, 2006 Estmatng A Plane Homography Overvew We contnue wth a dscusson of the major ssues, usng estmaton of plane projectve
More informationDifference Equations
Dfference Equatons c Jan Vrbk 1 Bascs Suppose a sequence of numbers, say a 0,a 1,a,a 3,... s defned by a certan general relatonshp between, say, three consecutve values of the sequence, e.g. a + +3a +1
More informationAdmin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester
0/25/6 Admn Assgnment 7 Class /22 Schedule for the rest of the semester NEURAL NETWORKS Davd Kauchak CS58 Fall 206 Perceptron learnng algorthm Our Nervous System repeat untl convergence (or for some #
More information4DVAR, according to the name, is a four-dimensional variational method.
4D-Varatonal Data Assmlaton (4D-Var) 4DVAR, accordng to the name, s a four-dmensonal varatonal method. 4D-Var s actually a drect generalzaton of 3D-Var to handle observatons that are dstrbuted n tme. The
More informationLecture 12: Discrete Laplacian
Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly
More informationRegularized Discriminant Analysis for Face Recognition
1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationCHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE
CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationSolving Nonlinear Differential Equations by a Neural Network Method
Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,
More informationEPR Paradox and the Physical Meaning of an Experiment in Quantum Mechanics. Vesselin C. Noninski
EPR Paradox and the Physcal Meanng of an Experment n Quantum Mechancs Vesseln C Nonnsk vesselnnonnsk@verzonnet Abstract It s shown that there s one purely determnstc outcome when measurement s made on
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationCS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015
CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 16
STAT 39: MATHEMATICAL COMPUTATIONS I FALL 218 LECTURE 16 1 why teratve methods f we have a lnear system Ax = b where A s very, very large but s ether sparse or structured (eg, banded, Toepltz, banded plus
More informationTracking with Kalman Filter
Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,
More informationDe-noising Method Based on Kernel Adaptive Filtering for Telemetry Vibration Signal of the Vehicle Test Kejun ZENG
6th Internatonal Conference on Mechatroncs, Materals, Botechnology and Envronment (ICMMBE 6) De-nosng Method Based on Kernel Adaptve Flterng for elemetry Vbraton Sgnal of the Vehcle est Kejun ZEG PLA 955
More informationEvaluation of classifiers MLPs
Lecture Evaluaton of classfers MLPs Mlos Hausrecht mlos@cs.ptt.edu 539 Sennott Square Evaluaton For any data set e use to test the model e can buld a confuson matrx: Counts of examples th: class label
More informationMulti-layer neural networks
Lecture 0 Mult-layer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Lnear regresson w Lnear unts f () Logstc regresson T T = w = p( y =, w) = g( w ) w z f () = p ( y = ) w d w d Gradent
More informationLinear Classification, SVMs and Nearest Neighbors
1 CSE 473 Lecture 25 (Chapter 18) Lnear Classfcaton, SVMs and Nearest Neghbors CSE AI faculty + Chrs Bshop, Dan Klen, Stuart Russell, Andrew Moore Motvaton: Face Detecton How do we buld a classfer to dstngush
More informationFeb 14: Spatial analysis of data fields
Feb 4: Spatal analyss of data felds Mappng rregularly sampled data onto a regular grd Many analyss technques for geophyscal data requre the data be located at regular ntervals n space and/or tme. hs s
More informationAppendix B: Resampling Algorithms
407 Appendx B: Resamplng Algorthms A common problem of all partcle flters s the degeneracy of weghts, whch conssts of the unbounded ncrease of the varance of the mportance weghts ω [ ] of the partcles
More informationStructure and Drive Paul A. Jensen Copyright July 20, 2003
Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationNumerical Heat and Mass Transfer
Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and
More informationHopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen
Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The
More informationCSC 411 / CSC D11 / CSC C11
18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t
More informationChapter 11: Simple Linear Regression and Correlation
Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationA neural network with localized receptive fields for visual pattern classification
Unversty of Wollongong Research Onlne Faculty of Informatcs - Papers (Archve) Faculty of Engneerng and Informaton Scences 2005 A neural network wth localzed receptve felds for vsual pattern classfcaton
More informationLecture 3: Dual problems and Kernels
Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM
More informationThe Order Relation and Trace Inequalities for. Hermitian Operators
Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence
More informationErrors for Linear Systems
Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationProblem Set 9 Solutions
Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationChapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems
Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons
More informationDesign and Optimization of Fuzzy Controller for Inverse Pendulum System Using Genetic Algorithm
Desgn and Optmzaton of Fuzzy Controller for Inverse Pendulum System Usng Genetc Algorthm H. Mehraban A. Ashoor Unversty of Tehran Unversty of Tehran h.mehraban@ece.ut.ac.r a.ashoor@ece.ut.ac.r Abstract:
More informationEEL 6266 Power System Operation and Control. Chapter 3 Economic Dispatch Using Dynamic Programming
EEL 6266 Power System Operaton and Control Chapter 3 Economc Dspatch Usng Dynamc Programmng Pecewse Lnear Cost Functons Common practce many utltes prefer to represent ther generator cost functons as sngle-
More informationNatural Language Processing and Information Retrieval
Natural Language Processng and Informaton Retreval Support Vector Machnes Alessandro Moschtt Department of nformaton and communcaton technology Unversty of Trento Emal: moschtt@ds.untn.t Summary Support
More informationAppendix B. The Finite Difference Scheme
140 APPENDIXES Appendx B. The Fnte Dfference Scheme In ths appendx we present numercal technques whch are used to approxmate solutons of system 3.1 3.3. A comprehensve treatment of theoretcal and mplementaton
More informationInexact Newton Methods for Inverse Eigenvalue Problems
Inexact Newton Methods for Inverse Egenvalue Problems Zheng-jan Ba Abstract In ths paper, we survey some of the latest development n usng nexact Newton-lke methods for solvng nverse egenvalue problems.
More informationUncertainty in measurements of power and energy on power networks
Uncertanty n measurements of power and energy on power networks E. Manov, N. Kolev Department of Measurement and Instrumentaton, Techncal Unversty Sofa, bul. Klment Ohrdsk No8, bl., 000 Sofa, Bulgara Tel./fax:
More informationSection 8.3 Polar Form of Complex Numbers
80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the
More informationLectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix
Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could
More informationSome Comments on Accelerating Convergence of Iterative Sequences Using Direct Inversion of the Iterative Subspace (DIIS)
Some Comments on Acceleratng Convergence of Iteratve Sequences Usng Drect Inverson of the Iteratve Subspace (DIIS) C. Davd Sherrll School of Chemstry and Bochemstry Georga Insttute of Technology May 1998
More informationFundamentals of Neural Networks
Fundamentals of Neural Networks Xaodong Cu IBM T. J. Watson Research Center Yorktown Heghts, NY 10598 Fall, 2018 Outlne Feedforward neural networks Forward propagaton Neural networks as unversal approxmators
More informationWe present the algorithm first, then derive it later. Assume access to a dataset {(x i, y i )} n i=1, where x i R d and y i { 1, 1}.
CS 189 Introducton to Machne Learnng Sprng 2018 Note 26 1 Boostng We have seen that n the case of random forests, combnng many mperfect models can produce a snglodel that works very well. Ths s the dea
More informationThe Geometry of Logit and Probit
The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.
More informationEstimating the Fundamental Matrix by Transforming Image Points in Projective Space 1
Estmatng the Fundamental Matrx by Transformng Image Ponts n Projectve Space 1 Zhengyou Zhang and Charles Loop Mcrosoft Research, One Mcrosoft Way, Redmond, WA 98052, USA E-mal: fzhang,cloopg@mcrosoft.com
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More information1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations
Physcs 171/271 -Davd Klenfeld - Fall 2005 (revsed Wnter 2011) 1 Dervaton of Rate Equatons from Sngle-Cell Conductance (Hodgkn-Huxley-lke) Equatons We consder a network of many neurons, each of whch obeys
More information