Diagrammatic Derivation of Gradient Algorithms for Neural Networks

Size: px
Start display at page:

Download "Diagrammatic Derivation of Gradient Algorithms for Neural Networks"

Transcription

1 Communicated by Fernando Pineda - Diagrammatic Derivation of Gradient Algorithms for Neural Networks Deriving gradient algorithms for time-dependent neural network structures typically requires numerous chain rule expansions, diligent bookkeeping, and careful manipulation of terms. In this paper, we show how to derive such algorithms via a set of simple block diagram manipulation rules. The approach provides a common framework to derive popular algorithms including backpropagation and backpropagationthrough-time without a single chain rule expansion. Additional examples are provided for a variety of complicated architectures to illustrate both the generality and the simplicity of the approach. 1 Introduction ~ Deriving the appropriate gradient descent algorithm for a new network architecture or system configuration normally involves brute force derivative calculations. For example, the celebrated backpropagation algorithm for training feedforward neural networks was derived by repeatedly applying chain rule expansions backward through the network (Rumelhart c.f ; Werbos 1974; Parker 1982). However, the actual implementation of backpropagation may be viewed as a simple reversal of signal flow through the network. Another popular algorithm, backpropagationthrough-time for recurrent networks, can be derived by Euler-Lagrange or ordered derivative methods, and involves both a signal flow reversal and time reversal (Werbos 1992; Nguyen and Widrow 1989). For both of these algorithms, there is a r.c~iipocn71 nature to the forward propagation of states and the backward propagation of gradient terms. Furthermore, both algorithms are efficient in the sense that calculations are order N, where N is the number of variable weights in the network. These properties are often attributed to the clever manner in which the algorithms

2 Derivation of Gradient Algorithms 183 were derived for a specific network architecture. We will show, however, that these properties are universal to all network architectures and that the associated gradient algorithm may be formulated directly with virtually no effort. The approach consists of a simple diagrammatic method for construction of a reciprocal network that directly specifies the gradient derivation. This is in contrast to graphical methods, which simply illustrate the relationship between signal and gradient flow after derivation of the algorithm by an alternative method (Narendra and Parthasarathy 1990, Nerrand et al. 1993). The reciprocal networks are, in principle, identical to adjoint systems seen in N-stage optimal control problems (Bryson and Ho 1975). While adjoint methods have been applied to neural networks, such approaches have been restricted to specific architectures where the adjoint systems resulted from a disciplined Euler-Lagrange optimization technique (Parisini and Zoppoli 1994; Toomarian and Barhen 1992; Matsuoka 1991). Here we use a graphic approach for the complete derivation. We thus prefer the term "reciprocal" network, which further imposes certain topological constraints and is taken from the electrical network literature. The concepts detailed in this paper were developed in Wan (1993) and later presented in Wan and Beaufays (1994).' 1.1 Network Adaptation and Error Gradient Propagation. In supervised learning, the goal is to find a set of network weights W that minimizes a cost function ] = C,"=, Lk[d(k), y(k)], where k is used to specify a discrete time index (the actual order of presentation may be random or sequential), y(k) is the output of the network, d(k) is a desired response, and Lk is a generic error metric that may contain additional weight regularization terms. For illustrative purposes, we will work with the squared error metric, Lk = e(k)te(k), where e(k) is the error vector. Optimization techniques invariably require calculation of the gradient vector a]/sw(k). At the architectural level, a variable weight w,, may be isolated between two points in a network with corresponding signals a,(k) and a,(k) [i.e., u,(k) = w,, a,(k)l. Using the chain rule, we get2 where we define the error gradient Sj(k) A a]/aa,(k). The error gradient 6,(k) depends on the entire topology of the network. Specifying the gradients necessitates finding an explicit formula for calculating the delta 'The method presented here is similar in spirit to Automatic Differentiation (Rall 1981; Griewank and Corliss 1991). Automatic Differentiation is a simple method for finding derivative of functions and algorithms that can be represented by acyclic graphs. Our approach, however, applies to discrete-time systems with the possibility of feedback. In addition, we are concerned with diagrammatic derivations rather than computational rule based implementations. *In the general case of a variable parameter, we have a,(k) = f[w,,.a,(k)], and equation 1.1 remains as 6,(k)&,(k)/Dw,,(k), where the partial term depends on the form off.

3 184 Eric A. Wan and Francoise Beaufays terms. Backpropagation, for example, is nothing more than an algorithm for generating these terms in a feedforward network. In the next section, we develop a simple nonalgebraic method for deriving the delta terms associated with any network architecture. 2 Network Representation and Reciprocal Construction Rules An arbitrary neural network can be represented as a block diagram whose building blocks are: summing junctions, branching points, univariate functions, multivariate functions, and time-delay operators. Only discrete-time systems are considered. A signal located within the network is labeled n,(k). A synaptic weight, for example, niay be thought of as a linear transmittance, which is a special case of a univariate fuiiction. The basic neuron is simply a sum of linear trarismittances followed by a univariate sigmoid function. Networks can then be constructed from individual neurons, and may include additional functioiial blocks and time-delay operators that allow for buffering of signals and internal memory. This block diagram representation is really nothing more than the typical pictorial description of a neural network with a bit of added formalism. Directly from the block diagram we may construct the reciprocal rrefiiwk by reversing the flow direction in the original network, labeling all resulting signals (k), and perforniing the following operations: 1. Suniirziriy jirrictioiir nrcl i cpln~d iuitlr hi I711ChiilS poirifs.

4 Derivation of Gradient Algorithms 185 Explicitly, the scalar coiitiizzioiis function a,(k) = f[n,(k)] is replaced by h,(k) = f [n,(k)] d,(k), where f [a,(k)]~da,(k)/da,(k). Note this rule replaces a nonlinear function by a linear time-dependeiit transmittance. Special cases are 0 Weights: a, = su,, a,, in which case 6, = 70,, d,. i YJ I Activation functions: a,,(k) = tanh[a,(k)]. In this case, f [a,(k)] = 1 - a:(k). 4. Mirltivariate functions are replaced with their Jacobians. I ai,,fw I A multivariate function maps a vector of input signals into a vector of output signals, aout = F(a,,). In the transformed network, we have 6,,(k) = F [a,,(k)] SOut(k), where F [a,,(k)]~aa,,,(k)/aa,,(k) corresponds to a matrix of partial derivatives. For shorthand, F [a,,(k)] will be written simply as F (k). Clearly both summing junctions and univariate functions are special cases of multivariate functions. A multivariate function may also represent a product junction (for sigma-pi units) or even another multilayer network. For a multi- layer network, the product F [a,,(k)] b,,,(k) is found directly by backpropagating bout through the network. Delay operators are replaced zuitk ndvance operafors. A delay operator q- performs a unit time delay on its argument: a,(k) = q- a,(k) = a,(k - 1). In the reciprocal system, we form a unit time advance: d,(k) = q+ h,(k) = h,(k + 1). The resulting system is thus noncausal. Actual implementation of the reciprocal network in a causal manner is addressed in specific examples.

5 186 Eric A. Wan and Franpise Beaufays 6. Ozttputs become inputs. network ao= Yo network By reversing the signal flow, output nodes n,,(k) = y,,(k) in the original network become input nodes in the reciprocal network. These inputs are then set at each time step to -2e,,(k). [For cost functions other than squared error, the input should be set to i)l~/dy,,(k).i These six rules allow direct construction of the reciprocal network from the original network.3 Note that there is a topological equivalence between the two networks. The order of computations in the reciprocal network is thus identical to the order of computations in the forward network. Whereas the original network corresponds to a nonlinear timeindependent system (assuming the weights are fixed), the rrciprocn2 network is a linear time-dependent system. The signals 6,(k) that propagate through the reciprocal network correspond to the terms tl]/&,(k) necessary for gradient adaptation. Exact equations may then be "read-out" directly from the reciprocal network, completing the derivation. A formal proof of the validity and generality of this method is presented in Appendix A. 3 Examples 3.1 Backpropagation. We start by rederiving standard backpropagation. Figure 1 shows a hidden neuron feeding other neurons and an output neuron in a multilayer network. For consistency with traditional notation, we have labeled the summing junction signal s: rather than n,, and added superscripts to denote the layer. In addition, since multilayer networks are static structures, we omit the time index k. The reciprocal network shown in Figure 2 is found by applying the construction rules of the previous section. From this figure, we may immediately write down the equations for calculating the delta terms: These are precisely the equations describing standard backpropagation. In this case, there are no delay operators and 6, = b,(k) ajl/as,(k) = 3CClearly, this set of rules is not a minimal set, i.e., a summing junction can be considered a special case of a multivariate function. However, we choose this set for ease and clarity of construction.

6 Derivation of Gradient Algorithms 187 Figure 1: Block diagram construction of a multilayer network Figure 2: Reciprocal multilayer network. [?let(k)e(k)]/3s,(k) corresponds to an instanfancous gradient. Readers familiar with neural networks have undoubtedly seen these diagrams before. What is new is the concept that the diagrams themselves inay be used directly, completely circumventing all intermediate steps involving tedious algebra. 3.2 Backpropagation-Through-Time. For the next example, consider a network with output feedback (see Fig. 3) described by Y(k) = "X(k),Y(k (3.21 where x(k) are external inputs, and y(k) represents the vector of outputs that form feedback connections. N is a multilayer neural network. If N has only one layer of neurons, every neuron output has a feedback

7 188 Eric A. Wan and Francoise Beaufays Figure 3: Recurrent network and backpropagation-through-time. connection to the input of every other neuron and the structure is referred to as a fzilly recurreiit netzoork (Williams and Zipser 1989). Typically, only a select set of the outputs have an actual desired response. The remaining outputs have no desired response (error equals zero) and are used for internal computation. Direct calculation of gradient terms using chain rule expansions is extremely complicated. A weight perturbation at a specified time step affects not only the output at future time steps, but future inputs as well. However, applying the reciprocal construction rules (see Fig. 3) we find immediately: (3.3) These are precisely the equations describing hllckyroprzgatioii-through-tinie, which have been derived in the past using either ordered derivatives (Werbos 1974) or Euler-Lagrange techniques (Plumer 1993). The diagrammatic approach is by far the simplest and most direct method. Note that the causality constraints require these equations to be run backward in time. This implies a forward sweep of the system to generate the output states and internal activation values, followed by a backward sweep through the reciprocal network. Also from rule 4 in the previous section, the product N (k)6(k) may be calculated directly by a standard backpropagation of 6(k + 1) through the network at time k.4 4Backpropagation-through-time is viewed as an off-lirze gradient descent algorithm in which weight updates are made after each presentation of an entire training sequence. An orz-he version in which adaptation occurs at each time step is possible using an algorithm called real-time-bnckprop~gntjon (Williams and Zipser 1989). The algorithm, however, is far more computationally expensive. The authors have presented a method based on flow graph iirferreciprorify to directly relate the two algorithms (Beaufays and Wan 1994a).

8 Derivation of Gradient Algorithms 189 rfkld I Controller I Figure 4: Neural network control using nonlinear ARMA models Backpropagation-Through-Time and Neural Control Architectures. Backpropagation-through-time can be extended to a number of neural control architectures (Nguyen and Widrow 1989; Werbos 1992). A system may be configured using full-state feedback or more complicated ARMA (AutoRegressive Moving Average) models as illustrated in Figure 4. To adapt the weights of the controller, it is necessary to find the gradient terms that constitute the effective error for the neural network. Figure 5 illustrates how such terms may be directly acquired using the reciprocal network. A variety of other recurrent architectures may be considered including hierarchical networks to radial basis networks with feedback. In all cases, the diagrammatic approach provides a direct derivation of the gradient algorithm. 3.3 Cascaded Neural Networks. Let us now turn to an example of two cascaded neural networks (Fig. 6), which will further illustrate advantages of the diagrammatic approach. The inputs to the first network are samples from a time sequence x(k). Delayed outputs of the first network are fed to the second network. The cascaded networks are defined as u(k) = Nl[W,.x(k),x(k - l).x(k - 2)] (3.4) y(k) = N2[W2, u(kf, u(k - 3). u(k (3.5) where W1 and W, represent the weights parameterizing the networks, y(k) is the output, and u(k) the intermediate signal. Given a desired response for the output y of the second network, it is straightforward to use backpropagation for adapting the second network. It is not obvious, however, what the effective error should be for the first network.

9 190 Eric A. Wan and Franqoise Beaufays Controller Plant Model i Figure 3: Reciprocal network tor control using nonlinear ARMA models. Figure 6: Cascaded neural network filters and reciprocal counterpart.

10 Derivation of Gradient Algorithms 191 From the reciprocal network also shown in Figure 6, we simply label the desired signals and write down the gradient relations: with 6,,(k) = &(k) + S2(k + 1) + &(k + 2) (3.6) [6i(k) 62(k) 63(k)I = -Wk) N[u(k)] (3.7) i.e., each S,(k) is found by backpropagation through the output network, and the 6,s (after appropriate advance operations) are summed together. The gradient for the weights in the first network is thus given by in which the product term is found by a single backpropagation with &(k) acting as the error to the first network. Equations can be made causal by simply delaying the weight update for a few time steps. Clearly, extrapolating to an arbitrary number of taps is also straightforward. For comparison, let us consider the brute force derivative approach to finding the gradient. Using the chain rule, the instantaneous error gradient is evaluated as: (3.9) + I du(k - 1) du(k - 2) ay(k) du(k - 2) du(k-2) dw, = b I ( k ) g + b2(k) dw, where we define + 63(k) aw, (3.10) Again, the 6, terms are found simultaneously by a single backpropagation of the error through the second network. Each product 6,(k)[du(k - i - 1)/ 3W1] is then found by backpropagation applied to the first network with 6,+l(k) acting as the error. However, since the derivatives used in backpropagation are time-dependent, separate backpropagations are necessary for each b,,, (k). These equations, in fact, imply backpropagation through an unfolded structure and are equivalent to weight sharing (LeCun et al. 1989) as illustrated in Figure 7. In situations where there may be hundreds of taps in the second network, this algorithm is far less efficient than the one derived directly using reciprocal networks. Similar arguments can be used to derive an efficient on-line algorithm for adapting time-delay neural networks (Waibel et al. 1989).

11 192 Eric A. Wan and Frangoise Beaufays Figure 7: Cascaded neural network filters unfolded-in-time. 3.4 Temporal Backpropagation. An extension of the feedforward network can be constructed by replacing all scalar weights with discrete time linear filters to provide dynamic interconnectivity between neurons. Mathematically, a neuron i in layer I may be specified as (3.11) (3.12) where a(k) are neuron output values, s(k) are summing junctions, f(.) are sigmoid functions, and W(q-') are synaptic filters5 Three possible forms for W(qp') are Case I I M Case I1 (3.13) 5The time domain operator q-' is used instead of the more common z-domain variable 2-'. The z notation would imply an actual transfer function that does not apply in nonlinear systems.

12 Derivation of Gradient Algorithms 193 a Figure 8: Block diagram construction of an FIR network and corresponding reciprocal structure. In Case I, the filter reduces to a scalar weight and we have the standard definition of a neuron for feedforward networks. Case I1 corresponds to a Finite Impulse Response (FIR) filter in which the synapse forms a weighted sum of past values of its input. Such networks have been utilized for a number of time-series and system identification problems (Wan 1993a,b,c). Case I11 represents the more general Infinite Impulse

13 194 Eric A. Wan and Fraiifoise Beaufays Figure 9: IIR filter realizations: (a) controller canonical, (b) reciprocal observer canonical, (c) lattice, (d) reciprocal lattice. Response (IIR) filter, in which feedback is permitted. In all cases, coefficients are assumed to be adaptive. Figure 8 illustrates a network composed of FIR filter synapses realized as tap-delay lines. Deriving the gradient descent rule for adapting filter coefficients is quite formidable if we use a direct chain rule approach. However, using the construction rules described earlier, we may trivially form the reciprocal network also shown in Figure 8. By inspection we have Consideration of an output neuron at layer L yields?f(k) = -2e,(k)f'[sF(k)]. These equations define the algorithm known as temporal hnckyropagafioii (Wan 1993a,b). The algorithm may be viewed as a temporal generalization of backpropagation in which error gradients are propagated not by simply taking weighted sums, but by backward filtering. Note that in the reciprocal network, backpropagation is achieved through the reciprocal filters W(q+*). Since this is a noncausal filter, it is necessary to introduce a delay of a few time steps to implement the on-line adaptation. In the IIR case, it is easy to verify that equation 3.14 for temporal backpropagation still applies with W(q+' ) representing a noncausal IIR filter. As with backpropagation-through-time, the network must be trained using a forward and backward sweep necessitating storage of all activation values at each step in time. Different realizations for the filters dictate

14 Derivation of Gradient Algorithms 195 how signals flow through the reciprocal structure as illustrated in Figure 9. In all cases, computations remain order N (this is in contrast with the order N2 algorithms derived by Back and Tsoi (1991) using direct chain rule methods). Note that the poles of the forward IIR filters are reciprocal to the poles of the reciprocal filters. Stability monitoring can be made easier if we consider lattice realizations in which case stability is guaranteed if the magnitude of each coefficient is less than 1. Regardless of the choice of the filter realization, reciprocal networks provide a simple unified approach for deriving a learning algorithm.6 The above examples allow us to extrapolate the following additional construction rule: Any linear subsystem H(q-') in the original network is transformed to H(q+l) in the reciprocal system. 4 Summary The previous examples served to illustrate the ease with which algorithms may be derived using the diagrammatic approach. One starts with a diagrammatic representation of the network of interest. A reciprocal network is then constructed by simply swapping summing junctions with branching points, continuous functions with derivative transmittances, and time delays with time advances. The final algorithm is read directly off the reciprocal network. No messy chain rules are needed. The approach provides a unified framework for formally deriving gradient algorithms for arbitrary network architectures, network configurations, and systems. Appendix A: Proof of Reciprocal Construction Rules We show that the diagrammatic method constitutes a formal derivation for arbitrary network architectures. Intuitively, the chain rule applied to individual building blocks yields the reciprocal architecture. However, delay operators, which cannot be differentiated, as well as feedback, prevents a straightforward chain rule approach to the proof. Instead, we use a more rigorous approach that may be outlined as follows: (1) It is argued that a perturbation applied to a specific node in the network propagates through a derivative network that is topologically equivalent to the original network. (2) The derivative network is systematically unraveled in time to produce a linear time independent flow graph. (3) Next, the principle of flow graph interreciprocity is evoked to reverse the signal flow through the unraveled network. (4) The reverse flow graph is 'In related work (Beaufays and Wan 1994b), the diagrammatic method was used to derive an algorithm to minimize the output power at each stage of an FIR lattice filter. This provides an adaptive lattice predictor used as a decorrelating preprocessor to a second adaptive filter. The new algorithm is more effective than the Griffiths algorithm (Griffiths 1977).

15 196 Eric A. Wan and Franqoise Beaufays raveled back in time to produce the reciprocal network. The input, originally corresponding to a perturbation, becomes an output providing the desired gradient. (5) By symmetry, it is argued that a11 signals in the reciprocal network correspond to proper gradient terms. 1. We will initially assume that only uiziuariate functions exist within the network. This is by no means restrictive. It has been shown (Hornik et a/. 1989; Cybenko 1989) that a feedforward network with two or more layers and a sufficient number of internal neurons can approximate any iuiifonnly coiitiiiiioits multivariate function to an arbitrary accuracy. A feedforward network is, of course, composed of simple univariate functions and summing junctions. Thus any multivariate function in the overall network architecture is assumed to be well approximated using a univariate composition. We may completely specify the topology of a network by the set of equations I- E cfo. q -7 (A.2) where n,(k) is the signal corresponding to the node a, at time k. The sum is taken over all signals a,(k) that connect to a,(k), and T,, is a transmittance operator corresponding to either a univariate function (e.g., sigmoid function, constant multiplicative weight), or a delay operator. (The symbol o is used to remind us that T is an operator whose argument is a.) The signals a,(k) may correspond to inputs (a,%,), outputs (arey,), or internal signals to the network. Feedback of signals is permitted. Let us add to a specific node a' a perturbation An*(k) at time k. The perturbation propagates through the network resulting in effective per- turbations Aa,(k) for all nodes in the network. Through a continuous univariate function in which u,(k) = f[al(k)] we have, to first order: where it must be clearly understood that Aa,(k) and &(k) are the perturbations directly resulting from the external perturbation Aa*(k). Through a delay operator, aj(k) = 9-'a1(k) = a,(k - 1), we have Combining these two results with equation A.l gives au, (k) = c T:, 0 aa, (k) vj I (A.5) where we define T' E cf'[nl(k)]. q-'}. Note that f'(ui(k)) is a linear timedependent transmittance. Equation A.5 defines a derizmfiae network that is

16 Derivation of Gradient Algorithms 197 J: Aa*(k) - Figure 10: (a) Time dependent input/output system for the derivative network. (b) Same system with all delays drawn externally. topologically identical to the original network (i.e., one-to-one correspondence between signals and connections). Functions are simply replaced by their derivatives. This is a rather obvious result, and simply states that a perturbation propagates through the same connections and in the same direction as would normal signals. 2. The derivative network may be considered a time-dependent system with input Aa*(k) and outputs Ay(k) as illustrated in Figure 10a. Imagine now redrawing the network such that all delay operators q-' are dragged outside the functional block (Fig. lob). Equation A.5 still applies. Neither the definition nor the topology of the network has been changed. However, we may now remove the delay operators by cascading copies of the derivative network as illustrated in Figure 11. Each stage has a different set of transmittance values corresponding to the time step. The unraveling process stops at the final time K (K is allowed to approach co). Additionally, the outputs Ay(n) at each stage are multiplied by -2e(~z)~ and then summed over all stages to produce a single output A]$ ~ t -2e(~~)~~y(n). = ~ By removing the delay operators, the time index k may now be treated as simply a labeling index. The unraveled structure is thus a time independent linear flow graph. By linearity, all signals in the flow graph can be divided by Aa*(k) so that the input is now 1, and the output is A]/Aa*(k). In the limit of small Aa*(k) Since the system is causal the partial of the error at time n with respect

17 ~ ~~ 198 Eric A. Wan and Francoise Beaufays Figure 11: Flow graph corresponding to unraveled derivative network. I I I 1.0 Figure 12: Transposed flow graph corresponding to unraveled derivative network. to n*(k) is zero for iz < k. Thus ;?d ; e;w; ' - 5 De(n)Te(i') da*(k) 5 rl=k 11=l - ~~ an*(k) 'J "h'(k) (A.7) The term h*(k) is precisely what we were interested in finding. However, calculating all the 6, (k) terms would require separately propagating signals through the unraveled network with an input of 1 at each location associated with a,(k). The entire process would then have to be repeated at every time step. 3. Next, take the unraveled network (i.e., flow graph) and form its transpose. This is accomplished by reversing the signal flow direction, transposing the branch gains, replacing summing junctions by branching points and vice versa, and interchanging input and output nodes. The new flow graph is represented in Figure 12.

18 Derivation of Gradient Algorithms 199 From the work by Tellegen (1952) and Bordewijk (19561, we know that transposed flow graphs are a particular case of interreciprocal graphs. This means that the output obtained in one graph, when exciting the input with a given signal, is the same as the output value of the transposed graph, when exciting its input by the same signal. In other words, the two graphs have identical transfer f~nctions.~ Thus, if an input of 1.0 is distributed along the lower horizontal branch of the transposed graph, the final output will equal b*(k). This b*(k) is identical to the output of our original flow graph. 4. The transposed graph can now be raveled back in time to produce the reciprocal network. Since the direction of signal flow has been reversed, delay operators q-' become advance operators q+'. The node a*(k) that was the original source of an input perturbation is now the output S*(k) as desired. The outputs of the original network become inputs with value -2e( k). Summarizing the steps involved in finding the reciprocal network, we start with the original network, form the derivative network, unravel in time, transpose, and then ravel back up in time. These steps are accomplished directly by starting with the original network and simply swapping branching points and summing junctions (rules 1 and 2), functions f for derivatives f' (rule 3), and qpls for q+'s (rule 5). 5. Note that the selection of the specific node a*(k) is totally arbitrary. Had we started with any node a,(k), we would still have arrived at the same result. In all cases, the input to the reciprocal network would still be -2e(k). Thus by symmetry, every signal in the reciprocal network provides S,(k) = aj1/3a,(k). This is exactly what we set out to prove for the signal flow in the reciprocal network. Finally, for multivariate functions, aout(k) = F[a,,(k)], where by our earlier statements it was assumed the function was explicitly represented by a composition of summing junctions and univariate functions. F is restricted to be memoryless and thus cannot be composed of any delay operators. In the reciprocal network, a,,(k) becomes b,,(k) and aout(k) becomes bout (k). But Thus any section within a network that contains no delays may be replaced by the multivariate function F( ), and the corresponding section in the reciprocal network is replaced with the Jacobian F'[ai,(k)]. This verifies rule This basic property, which was first presented in the context of electrical circuits analysis (Penfield et al. 1970), finds applications in a wide variety of engineering disciplines, such as the reciprocity of emitting and receiving antennas in electromagnetism (Ramo et al. 1984), and the duality between decimation in time and decimation in frequency formulations of the FIT algorithm in signal processing (Oppenheim and Schafer 1989). Flow graph interreciprocity was first applied to neural networks to relate realtime-backpropagation and backpropagation-through-time by Beaufays and Wan (1 994a).

19 200 Eric A. Wan and Franqoise Beaufays Acknowledgments This work was funded in part by EPRI under Contract RP and by NSF under Grants IRI and ECS References Back, A., and Tsoi, A FIR and IIR synapses, a new neural networks architecture for time series modeling. Neural Conzp. 3(3), Beaufays, F., and Wan, E. 1994a. Relating real-time backpropagation and backpropagation-through-time: An application of flow graph interreciprocity. Neural Comp. 6(2), Beaufays, F., and Wan, E An efficient first-order stochastic algorithm for lattice filters. ICANN'94 2, Bordewijk, J Inter-reciprocity applied to electrical networks. AppI. Sci. Res. 6B, Bryson, A,, and Ho, Y Applied Optinznl Coiztrol. Hemisphere, New York. Cybenko, G Approximation by superpositions of a sigmoidal function. Math. Corifrol Sigizals Syst. 2(4). Griewank, A., and Coliss, G., (eds.) Automatic differentiation of algorithms: Theory, implementation, and application. Proc. First SlAM Workshop oiz Autoinatic Differelitintioil, Brekenridge, Colorado. Griffiths, L A continuously adaptive filter implemented as a lattice structure. Proc. ICASSP, Hartford, CT, Hornik, K., Stinchombe, M., and White, H Multilayer feedforward networks are universal approximators. Neirral Networks, 2, LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D Backpropagation applied to handwritten zip code recognition. Neural Conzp. 1, Matsuoka, K Learning of neural networks using their adjoint systems. Syst. Computers Jpii. 22(11), Narendra, K, and Parthasarathy, K Identification and control of dynamic systems using neural networks. IEEE Trails. Neural Netniorks 1(1), Nerrand, O., Roussel-Ragot, P., Personnaz, L., Dreyfus, G., and Marcos, S Neural networks and nonlinear adaptive filtering: Unifying concepts and new algorithms. Neural Comp. 5(2), Nguyen, D., and Widrow, B The truck backer-upper: An example of self-learning in neural networks. Proc. Iizt. Joiizt Coizf. on Neural Networks, 11, Washington, DC Oppenheim, A., and Schafer, R Digital SigizaI Processing. Prentice-Hall, Englewood Cliffs, NJ. Parisini, T., and Zoppoli, R Neural networks for feedback feedforward nonlinear control systems. IEEE Trans. Neural Netzilorks, 5(3), Parker, D., Learning-Logic. Invention Report S81-64, File 1, Office of Technology Licensing, Stanford University, October.

20 Derivation of Gradient Algorithms 201 Penfield, P., Spence, R., and Duiker. S Tellegen's Theorem and Electrical Networks. MIT Press, Cambridge, MA. Plumer, E Optimal terminal control using feedforward neural networks. Ph.D. dissertation. Stanford University, San Fransisco, CA. Rall, Automatic Differentiation: Techniques and Applications. Lecture Notes in Computer Science, Springer-Verlag, Berlin. Ramo, S., Whinnery, J. R., and Van Duzer, T Fields and Waves in Communication Electronics, 2nd Ed. John Wiley, New York. Rumelhart, D. E., McClelland, J. L., and the PDP Research Group Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1. MIT Press, Cambridge, MA. Tellegen, D A general network theorem, with applications. Philzps Res. Rep. 7, Toomarian, N. B., and Barhen, J Learning a trajectory using adjoint function and teacher forcing. Neural Networks 5(3), Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., and Lang, K Phoneme recognition using time-delay neural networks. IEEE Trans. Acoustics, Speech, Signal Process. 37(3), Wan, E. 1993a. Finite impulse response neural networks with applications in time series prediction. Ph.D, dissertation. Stanford University, San Fransisco, CA. Wan, E Time series prediction using a connectionist network with internal delay lines. In Time Series Prediction: Forecasting the Future and Understanding the Past, A. Weigend and N. Gershenfeld, eds. Addison-Wesley, Reading, MA. Wan, E. 1993c. Modeling nonlinear dynamics with neural networks: Examples in time series prediction. Proc. Fifth Workshop Neural Networks: Acadernic/lndustrial/NASA/Defense, WNN93/FNN93, pp , San Francisco. Wan, E., and Beaufays, F Network reciprocity: A simple approach to derive gradient algorithms for arbitrary neural network structures. WCNN'94, San Diego, CA, 111, Werbos, P Beyond regression: New tools for prediction and analysis in the behavioral sciences. Ph.D. thesis, Harvard University, Cambridge, MA. Werbos, P Neurocontrol and supervised learning: An overview and evaluation. In Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, chap. 3, D. White and D. Sofge, eds. Van Nostrand Reinhold, New York. Williams, R., and Zipser D A learning algorithm for continually running fully recurrent neural networks. Neural Comp. 1, Received April 29, 1994; accepted May 16, 1995.

Relating Real-Time Backpropagation and. Backpropagation-Through-Time: An Application of Flow Graph. Interreciprocity.

Relating Real-Time Backpropagation and. Backpropagation-Through-Time: An Application of Flow Graph. Interreciprocity. Neural Computation, 1994 Relating Real-Time Backpropagation and Backpropagation-Through-Time: An Application of Flow Graph Interreciprocity. Francoise Beaufays and Eric A. Wan Abstract We show that signal

More information

Temporal Backpropagation for FIR Neural Networks

Temporal Backpropagation for FIR Neural Networks Temporal Backpropagation for FIR Neural Networks Eric A. Wan Stanford University Department of Electrical Engineering, Stanford, CA 94305-4055 Abstract The traditional feedforward neural network is a static

More information

Diagrammatic Methods for Deriving and. Speech Technology and Research Laboratory, SRI International,

Diagrammatic Methods for Deriving and. Speech Technology and Research Laboratory, SRI International, Diagrammatic Methods for Deriving and Relating Temporal eural etwork Algorithms Eric A. Wan 1 and Francoise Beaufays 2 1 Department of Electrical Engineering, Oregon Graduate Institute, P.O. Box 91000,

More information

inear Adaptive Inverse Control

inear Adaptive Inverse Control Proceedings of the 36th Conference on Decision & Control San Diego, California USA December 1997 inear Adaptive nverse Control WM15 1:50 Bernard Widrow and Gregory L. Plett Department of Electrical Engineering,

More information

ADAPTIVE INVERSE CONTROL BASED ON NONLINEAR ADAPTIVE FILTERING. Information Systems Lab., EE Dep., Stanford University

ADAPTIVE INVERSE CONTROL BASED ON NONLINEAR ADAPTIVE FILTERING. Information Systems Lab., EE Dep., Stanford University ADAPTIVE INVERSE CONTROL BASED ON NONLINEAR ADAPTIVE FILTERING Bernard Widrow 1, Gregory Plett, Edson Ferreira 3 and Marcelo Lamego 4 Information Systems Lab., EE Dep., Stanford University Abstract: Many

More information

Adaptive Inverse Control

Adaptive Inverse Control TA1-8:30 Adaptive nverse Control Bernard Widrow Michel Bilello Stanford University Department of Electrical Engineering, Stanford, CA 94305-4055 Abstract A plant can track an input command signal if it

More information

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation 1 Introduction A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation J Wesley Hines Nuclear Engineering Department The University of Tennessee Knoxville, Tennessee,

More information

Adaptive Inverse Control based on Linear and Nonlinear Adaptive Filtering

Adaptive Inverse Control based on Linear and Nonlinear Adaptive Filtering Adaptive Inverse Control based on Linear and Nonlinear Adaptive Filtering Bernard Widrow and Gregory L. Plett Department of Electrical Engineering, Stanford University, Stanford, CA 94305-9510 Abstract

More information

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network LETTER Communicated by Geoffrey Hinton Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network Xiaohui Xie xhx@ai.mit.edu Department of Brain and Cognitive Sciences, Massachusetts

More information

Keywords- Source coding, Huffman encoding, Artificial neural network, Multilayer perceptron, Backpropagation algorithm

Keywords- Source coding, Huffman encoding, Artificial neural network, Multilayer perceptron, Backpropagation algorithm Volume 4, Issue 5, May 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Huffman Encoding

More information

New Recursive-Least-Squares Algorithms for Nonlinear Active Control of Sound and Vibration Using Neural Networks

New Recursive-Least-Squares Algorithms for Nonlinear Active Control of Sound and Vibration Using Neural Networks IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 1, JANUARY 2001 135 New Recursive-Least-Squares Algorithms for Nonlinear Active Control of Sound and Vibration Using Neural Networks Martin Bouchard,

More information

STRUCTURED NEURAL NETWORK FOR NONLINEAR DYNAMIC SYSTEMS MODELING

STRUCTURED NEURAL NETWORK FOR NONLINEAR DYNAMIC SYSTEMS MODELING STRUCTURED NEURAL NETWORK FOR NONLINEAR DYNAIC SYSTES ODELING J. CODINA, R. VILLÀ and J.. FUERTES UPC-Facultat d Informàtica de Barcelona, Department of Automatic Control and Computer Engineeering, Pau

More information

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /

More information

Optimal Polynomial Control for Discrete-Time Systems

Optimal Polynomial Control for Discrete-Time Systems 1 Optimal Polynomial Control for Discrete-Time Systems Prof Guy Beale Electrical and Computer Engineering Department George Mason University Fairfax, Virginia Correspondence concerning this paper should

More information

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples Back-Propagation Algorithm Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples 1 Inner-product net =< w, x >= w x cos(θ) net = n i=1 w i x i A measure

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

C1.2 Multilayer perceptrons

C1.2 Multilayer perceptrons Supervised Models C1.2 Multilayer perceptrons Luis B Almeida Abstract This section introduces multilayer perceptrons, which are the most commonly used type of neural network. The popular backpropagation

More information

Neural Network Identification of Non Linear Systems Using State Space Techniques.

Neural Network Identification of Non Linear Systems Using State Space Techniques. Neural Network Identification of Non Linear Systems Using State Space Techniques. Joan Codina, J. Carlos Aguado, Josep M. Fuertes. Automatic Control and Computer Engineering Department Universitat Politècnica

More information

A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models. Isabelle Rivals and Léon Personnaz

A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models. Isabelle Rivals and Léon Personnaz In Neurocomputing 2(-3): 279-294 (998). A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models Isabelle Rivals and Léon Personnaz Laboratoire d'électronique,

More information

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory Danilo López, Nelson Vera, Luis Pedraza International Science Index, Mathematical and Computational Sciences waset.org/publication/10006216

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

( t) Identification and Control of a Nonlinear Bioreactor Plant Using Classical and Dynamical Neural Networks

( t) Identification and Control of a Nonlinear Bioreactor Plant Using Classical and Dynamical Neural Networks Identification and Control of a Nonlinear Bioreactor Plant Using Classical and Dynamical Neural Networks Mehmet Önder Efe Electrical and Electronics Engineering Boðaziçi University, Bebek 80815, Istanbul,

More information

Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption

Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption ANDRÉ NUNES DE SOUZA, JOSÉ ALFREDO C. ULSON, IVAN NUNES

More information

Introduction to Artificial Neural Networks

Introduction to Artificial Neural Networks Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline

More information

T Machine Learning and Neural Networks

T Machine Learning and Neural Networks T-61.5130 Machine Learning and Neural Networks (5 cr) Lecture 11: Processing of Temporal Information Prof. Juha Karhunen https://mycourses.aalto.fi/ Aalto University School of Science, Espoo, Finland 1

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Lecture 7 Artificial neural networks: Supervised learning

Lecture 7 Artificial neural networks: Supervised learning Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

Introduction to Machine Learning Spring 2018 Note Neural Networks

Introduction to Machine Learning Spring 2018 Note Neural Networks CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,

More information

EXTENDED FUZZY COGNITIVE MAPS

EXTENDED FUZZY COGNITIVE MAPS EXTENDED FUZZY COGNITIVE MAPS Masafumi Hagiwara Department of Psychology Stanford University Stanford, CA 930 U.S.A. Abstract Fuzzy Cognitive Maps (FCMs) have been proposed to represent causal reasoning

More information

A STATE-SPACE NEURAL NETWORK FOR MODELING DYNAMICAL NONLINEAR SYSTEMS

A STATE-SPACE NEURAL NETWORK FOR MODELING DYNAMICAL NONLINEAR SYSTEMS A STATE-SPACE NEURAL NETWORK FOR MODELING DYNAMICAL NONLINEAR SYSTEMS Karima Amoura Patrice Wira and Said Djennoune Laboratoire CCSP Université Mouloud Mammeri Tizi Ouzou Algeria Laboratoire MIPS Université

More information

IN neural-network training, the most well-known online

IN neural-network training, the most well-known online IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 1, JANUARY 1999 161 On the Kalman Filtering Method in Neural-Network Training and Pruning John Sum, Chi-sing Leung, Gilbert H. Young, and Wing-kay Kan

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

memory networks, have been proposed by Hopeld (1982), Lapedes and Farber (1986), Almeida (1987), Pineda (1988), and Rohwer and Forrest (1987). Other r

memory networks, have been proposed by Hopeld (1982), Lapedes and Farber (1986), Almeida (1987), Pineda (1988), and Rohwer and Forrest (1987). Other r A Learning Algorithm for Continually Running Fully Recurrent Neural Networks Ronald J. Williams College of Computer Science Northeastern University Boston, Massachusetts 02115 and David Zipser Institute

More information

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir Supervised (BPL) verses Hybrid (RBF) Learning By: Shahed Shahir 1 Outline I. Introduction II. Supervised Learning III. Hybrid Learning IV. BPL Verses RBF V. Supervised verses Hybrid learning VI. Conclusion

More information

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017 Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Threshold units Gradient descent Multilayer networks Backpropagation Hidden layer representations Example: Face Recognition Advanced topics 1 Connectionist Models Consider humans:

More information

Direct Method for Training Feed-forward Neural Networks using Batch Extended Kalman Filter for Multi- Step-Ahead Predictions

Direct Method for Training Feed-forward Neural Networks using Batch Extended Kalman Filter for Multi- Step-Ahead Predictions Direct Method for Training Feed-forward Neural Networks using Batch Extended Kalman Filter for Multi- Step-Ahead Predictions Artem Chernodub, Institute of Mathematical Machines and Systems NASU, Neurotechnologies

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

TECHNICAL RESEARCH REPORT

TECHNICAL RESEARCH REPORT TECHNICAL RESEARCH REPORT Reconstruction of Nonlinear Systems Using Delay Lines and Feedforward Networks by D.L. Elliott T.R. 95-17 ISR INSTITUTE FOR SYSTEMS RESEARCH Sponsored by the National Science

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

MODELING NONLINEAR DYNAMICS WITH NEURAL. Eric A. Wan. Stanford University, Department of Electrical Engineering, Stanford, CA

MODELING NONLINEAR DYNAMICS WITH NEURAL. Eric A. Wan. Stanford University, Department of Electrical Engineering, Stanford, CA MODELING NONLINEAR DYNAMICS WITH NEURAL NETWORKS: EXAMPLES IN TIME SERIES PREDICTION Eric A Wan Stanford University, Department of Electrical Engineering, Stanford, CA 9435-455 Abstract A neural networ

More information

Discrete-Time Systems

Discrete-Time Systems FIR Filters With this chapter we turn to systems as opposed to signals. The systems discussed in this chapter are finite impulse response (FIR) digital filters. The term digital filter arises because these

More information

Introduction Biologically Motivated Crude Model Backpropagation

Introduction Biologically Motivated Crude Model Backpropagation Introduction Biologically Motivated Crude Model Backpropagation 1 McCulloch-Pitts Neurons In 1943 Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician, published A logical calculus of the

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

Neural Dynamic Optimization for Control Systems Part II: Theory

Neural Dynamic Optimization for Control Systems Part II: Theory 490 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART B: CYBERNETICS, VOL. 31, NO. 4, AUGUST 2001 Neural Dynamic Optimization for Control Systems Part II: Theory Chang-Yun Seong, Member, IEEE, and

More information

Artificial Neural Network

Artificial Neural Network Artificial Neural Network Contents 2 What is ANN? Biological Neuron Structure of Neuron Types of Neuron Models of Neuron Analogy with human NN Perceptron OCR Multilayer Neural Network Back propagation

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

Digital Filter Structures. Basic IIR Digital Filter Structures. of an LTI digital filter is given by the convolution sum or, by the linear constant

Digital Filter Structures. Basic IIR Digital Filter Structures. of an LTI digital filter is given by the convolution sum or, by the linear constant Digital Filter Chapter 8 Digital Filter Block Diagram Representation Equivalent Basic FIR Digital Filter Basic IIR Digital Filter. Block Diagram Representation In the time domain, the input-output relations

More information

Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

More information

Using Variable Threshold to Increase Capacity in a Feedback Neural Network

Using Variable Threshold to Increase Capacity in a Feedback Neural Network Using Variable Threshold to Increase Capacity in a Feedback Neural Network Praveen Kuruvada Abstract: The article presents new results on the use of variable thresholds to increase the capacity of a feedback

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

Lecture 6: Backpropagation

Lecture 6: Backpropagation Lecture 6: Backpropagation Roger Grosse 1 Introduction So far, we ve seen how to train shallow models, where the predictions are computed as a linear function of the inputs. We ve also observed that deeper

More information

Chapter 4 Neural Networks in System Identification

Chapter 4 Neural Networks in System Identification Chapter 4 Neural Networks in System Identification Gábor HORVÁTH Department of Measurement and Information Systems Budapest University of Technology and Economics Magyar tudósok körútja 2, 52 Budapest,

More information

Christian Mohr

Christian Mohr Christian Mohr 20.12.2011 Recurrent Networks Networks in which units may have connections to units in the same or preceding layers Also connections to the unit itself possible Already covered: Hopfield

More information

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs. artifical neural

More information

Artificial Neural Network and Fuzzy Logic

Artificial Neural Network and Fuzzy Logic Artificial Neural Network and Fuzzy Logic 1 Syllabus 2 Syllabus 3 Books 1. Artificial Neural Networks by B. Yagnanarayan, PHI - (Cover Topologies part of unit 1 and All part of Unit 2) 2. Neural Networks

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Filter-Generating Systems

Filter-Generating Systems 24 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 47, NO. 3, MARCH 2000 Filter-Generating Systems Saed Samadi, Akinori Nishihara, and Hiroshi Iwakura Abstract

More information

CS:4420 Artificial Intelligence

CS:4420 Artificial Intelligence CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart

More information

Deep Belief Networks are compact universal approximators

Deep Belief Networks are compact universal approximators 1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation

More information

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.

More information

NN V: The generalized delta learning rule

NN V: The generalized delta learning rule NN V: The generalized delta learning rule We now focus on generalizing the delta learning rule for feedforward layered neural networks. The architecture of the two-layer network considered below is shown

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks Datamining Seminar Kaspar Märtens Karl-Oskar Masing Today's Topics Modeling sequences: a brief overview Training RNNs with back propagation A toy example of training an RNN Why

More information

ANN Control of Non-Linear and Unstable System and its Implementation on Inverted Pendulum

ANN Control of Non-Linear and Unstable System and its Implementation on Inverted Pendulum Research Article International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347-5161 2014 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet ANN

More information

CSC 411 Lecture 10: Neural Networks

CSC 411 Lecture 10: Neural Networks CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11

More information

ECE503: Digital Signal Processing Lecture 6

ECE503: Digital Signal Processing Lecture 6 ECE503: Digital Signal Processing Lecture 6 D. Richard Brown III WPI 20-February-2012 WPI D. Richard Brown III 20-February-2012 1 / 28 Lecture 6 Topics 1. Filter structures overview 2. FIR filter structures

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

λ-universe: Introduction and Preliminary Study

λ-universe: Introduction and Preliminary Study λ-universe: Introduction and Preliminary Study ABDOLREZA JOGHATAIE CE College Sharif University of Technology Azadi Avenue, Tehran IRAN Abstract: - Interactions between the members of an imaginary universe,

More information

y(n) Time Series Data

y(n) Time Series Data Recurrent SOM with Local Linear Models in Time Series Prediction Timo Koskela, Markus Varsta, Jukka Heikkonen, and Kimmo Kaski Helsinki University of Technology Laboratory of Computational Engineering

More information

Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network

Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network Fadwa DAMAK, Mounir BEN NASR, Mohamed CHTOUROU Department of Electrical Engineering ENIS Sfax, Tunisia {fadwa_damak,

More information

8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples

8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples 8-1: Backpropagation Prof. J.C. Kao, UCLA Backpropagation Chain rule for the derivatives Backpropagation graphs Examples 8-2: Backpropagation Prof. J.C. Kao, UCLA Motivation for backpropagation To do gradient

More information

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units Connectionist Models Consider humans: Neuron switching time ~ :001 second Number of neurons ~ 10 10 Connections per neuron ~ 10 4 5 Scene recognition time ~ :1 second 100 inference steps doesn't seem like

More information

A New Weight Initialization using Statistically Resilient Method and Moore-Penrose Inverse Method for SFANN

A New Weight Initialization using Statistically Resilient Method and Moore-Penrose Inverse Method for SFANN A New Weight Initialization using Statistically Resilient Method and Moore-Penrose Inverse Method for SFANN Apeksha Mittal, Amit Prakash Singh and Pravin Chandra University School of Information and Communication

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume 3, Issue 4, April 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Application of

More information

(Refer Slide Time: )

(Refer Slide Time: ) Digital Signal Processing Prof. S. C. Dutta Roy Department of Electrical Engineering Indian Institute of Technology, Delhi FIR Lattice Synthesis Lecture - 32 This is the 32nd lecture and our topic for

More information

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

More information

Recurrent neural networks with trainable amplitude of activation functions

Recurrent neural networks with trainable amplitude of activation functions Neural Networks 16 (2003) 1095 1100 www.elsevier.com/locate/neunet Neural Networks letter Recurrent neural networks with trainable amplitude of activation functions Su Lee Goh*, Danilo P. Mandic Imperial

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA   1/ 21 Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Automatic Differentiation and Neural Networks

Automatic Differentiation and Neural Networks Statistical Machine Learning Notes 7 Automatic Differentiation and Neural Networks Instructor: Justin Domke 1 Introduction The name neural network is sometimes used to refer to many things (e.g. Hopfield

More information

Artificial Neural Networks. Historical description

Artificial Neural Networks. Historical description Artificial Neural Networks Historical description Victor G. Lopez 1 / 23 Artificial Neural Networks (ANN) An artificial neural network is a computational model that attempts to emulate the functions of

More information

NONLINEAR PLANT IDENTIFICATION BY WAVELETS

NONLINEAR PLANT IDENTIFICATION BY WAVELETS NONLINEAR PLANT IDENTIFICATION BY WAVELETS Edison Righeto UNESP Ilha Solteira, Department of Mathematics, Av. Brasil 56, 5385000, Ilha Solteira, SP, Brazil righeto@fqm.feis.unesp.br Luiz Henrique M. Grassi

More information

Address for Correspondence

Address for Correspondence Research Article APPLICATION OF ARTIFICIAL NEURAL NETWORK FOR INTERFERENCE STUDIES OF LOW-RISE BUILDINGS 1 Narayan K*, 2 Gairola A Address for Correspondence 1 Associate Professor, Department of Civil

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Multilayer Perceptrons and Backpropagation

Multilayer Perceptrons and Backpropagation Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:

More information

Multilayer Perceptrons (MLPs)

Multilayer Perceptrons (MLPs) CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1

More information

Introduction and Perceptron Learning

Introduction and Perceptron Learning Artificial Neural Networks Introduction and Perceptron Learning CPSC 565 Winter 2003 Christian Jacob Department of Computer Science University of Calgary Canada CPSC 565 - Winter 2003 - Emergent Computing

More information

Frequency Selective Surface Design Based on Iterative Inversion of Neural Networks

Frequency Selective Surface Design Based on Iterative Inversion of Neural Networks J.N. Hwang, J.J. Choi, S. Oh, R.J. Marks II, "Query learning based on boundary search and gradient computation of trained multilayer perceptrons", Proceedings of the International Joint Conference on Neural

More information

Neural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve)

Neural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve) Neural Turing Machine Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve) Introduction Neural Turning Machine: Couple a Neural Network with external memory resources The combined

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs artifical neural networks

More information