Diagrammatic Derivation of Gradient Algorithms for Neural Networks
|
|
- Silvester Sutton
- 5 years ago
- Views:
Transcription
1 Communicated by Fernando Pineda - Diagrammatic Derivation of Gradient Algorithms for Neural Networks Deriving gradient algorithms for time-dependent neural network structures typically requires numerous chain rule expansions, diligent bookkeeping, and careful manipulation of terms. In this paper, we show how to derive such algorithms via a set of simple block diagram manipulation rules. The approach provides a common framework to derive popular algorithms including backpropagation and backpropagationthrough-time without a single chain rule expansion. Additional examples are provided for a variety of complicated architectures to illustrate both the generality and the simplicity of the approach. 1 Introduction ~ Deriving the appropriate gradient descent algorithm for a new network architecture or system configuration normally involves brute force derivative calculations. For example, the celebrated backpropagation algorithm for training feedforward neural networks was derived by repeatedly applying chain rule expansions backward through the network (Rumelhart c.f ; Werbos 1974; Parker 1982). However, the actual implementation of backpropagation may be viewed as a simple reversal of signal flow through the network. Another popular algorithm, backpropagationthrough-time for recurrent networks, can be derived by Euler-Lagrange or ordered derivative methods, and involves both a signal flow reversal and time reversal (Werbos 1992; Nguyen and Widrow 1989). For both of these algorithms, there is a r.c~iipocn71 nature to the forward propagation of states and the backward propagation of gradient terms. Furthermore, both algorithms are efficient in the sense that calculations are order N, where N is the number of variable weights in the network. These properties are often attributed to the clever manner in which the algorithms
2 Derivation of Gradient Algorithms 183 were derived for a specific network architecture. We will show, however, that these properties are universal to all network architectures and that the associated gradient algorithm may be formulated directly with virtually no effort. The approach consists of a simple diagrammatic method for construction of a reciprocal network that directly specifies the gradient derivation. This is in contrast to graphical methods, which simply illustrate the relationship between signal and gradient flow after derivation of the algorithm by an alternative method (Narendra and Parthasarathy 1990, Nerrand et al. 1993). The reciprocal networks are, in principle, identical to adjoint systems seen in N-stage optimal control problems (Bryson and Ho 1975). While adjoint methods have been applied to neural networks, such approaches have been restricted to specific architectures where the adjoint systems resulted from a disciplined Euler-Lagrange optimization technique (Parisini and Zoppoli 1994; Toomarian and Barhen 1992; Matsuoka 1991). Here we use a graphic approach for the complete derivation. We thus prefer the term "reciprocal" network, which further imposes certain topological constraints and is taken from the electrical network literature. The concepts detailed in this paper were developed in Wan (1993) and later presented in Wan and Beaufays (1994).' 1.1 Network Adaptation and Error Gradient Propagation. In supervised learning, the goal is to find a set of network weights W that minimizes a cost function ] = C,"=, Lk[d(k), y(k)], where k is used to specify a discrete time index (the actual order of presentation may be random or sequential), y(k) is the output of the network, d(k) is a desired response, and Lk is a generic error metric that may contain additional weight regularization terms. For illustrative purposes, we will work with the squared error metric, Lk = e(k)te(k), where e(k) is the error vector. Optimization techniques invariably require calculation of the gradient vector a]/sw(k). At the architectural level, a variable weight w,, may be isolated between two points in a network with corresponding signals a,(k) and a,(k) [i.e., u,(k) = w,, a,(k)l. Using the chain rule, we get2 where we define the error gradient Sj(k) A a]/aa,(k). The error gradient 6,(k) depends on the entire topology of the network. Specifying the gradients necessitates finding an explicit formula for calculating the delta 'The method presented here is similar in spirit to Automatic Differentiation (Rall 1981; Griewank and Corliss 1991). Automatic Differentiation is a simple method for finding derivative of functions and algorithms that can be represented by acyclic graphs. Our approach, however, applies to discrete-time systems with the possibility of feedback. In addition, we are concerned with diagrammatic derivations rather than computational rule based implementations. *In the general case of a variable parameter, we have a,(k) = f[w,,.a,(k)], and equation 1.1 remains as 6,(k)&,(k)/Dw,,(k), where the partial term depends on the form off.
3 184 Eric A. Wan and Francoise Beaufays terms. Backpropagation, for example, is nothing more than an algorithm for generating these terms in a feedforward network. In the next section, we develop a simple nonalgebraic method for deriving the delta terms associated with any network architecture. 2 Network Representation and Reciprocal Construction Rules An arbitrary neural network can be represented as a block diagram whose building blocks are: summing junctions, branching points, univariate functions, multivariate functions, and time-delay operators. Only discrete-time systems are considered. A signal located within the network is labeled n,(k). A synaptic weight, for example, niay be thought of as a linear transmittance, which is a special case of a univariate fuiiction. The basic neuron is simply a sum of linear trarismittances followed by a univariate sigmoid function. Networks can then be constructed from individual neurons, and may include additional functioiial blocks and time-delay operators that allow for buffering of signals and internal memory. This block diagram representation is really nothing more than the typical pictorial description of a neural network with a bit of added formalism. Directly from the block diagram we may construct the reciprocal rrefiiwk by reversing the flow direction in the original network, labeling all resulting signals (k), and perforniing the following operations: 1. Suniirziriy jirrictioiir nrcl i cpln~d iuitlr hi I711ChiilS poirifs.
4 Derivation of Gradient Algorithms 185 Explicitly, the scalar coiitiizzioiis function a,(k) = f[n,(k)] is replaced by h,(k) = f [n,(k)] d,(k), where f [a,(k)]~da,(k)/da,(k). Note this rule replaces a nonlinear function by a linear time-dependeiit transmittance. Special cases are 0 Weights: a, = su,, a,, in which case 6, = 70,, d,. i YJ I Activation functions: a,,(k) = tanh[a,(k)]. In this case, f [a,(k)] = 1 - a:(k). 4. Mirltivariate functions are replaced with their Jacobians. I ai,,fw I A multivariate function maps a vector of input signals into a vector of output signals, aout = F(a,,). In the transformed network, we have 6,,(k) = F [a,,(k)] SOut(k), where F [a,,(k)]~aa,,,(k)/aa,,(k) corresponds to a matrix of partial derivatives. For shorthand, F [a,,(k)] will be written simply as F (k). Clearly both summing junctions and univariate functions are special cases of multivariate functions. A multivariate function may also represent a product junction (for sigma-pi units) or even another multilayer network. For a multi- layer network, the product F [a,,(k)] b,,,(k) is found directly by backpropagating bout through the network. Delay operators are replaced zuitk ndvance operafors. A delay operator q- performs a unit time delay on its argument: a,(k) = q- a,(k) = a,(k - 1). In the reciprocal system, we form a unit time advance: d,(k) = q+ h,(k) = h,(k + 1). The resulting system is thus noncausal. Actual implementation of the reciprocal network in a causal manner is addressed in specific examples.
5 186 Eric A. Wan and Franpise Beaufays 6. Ozttputs become inputs. network ao= Yo network By reversing the signal flow, output nodes n,,(k) = y,,(k) in the original network become input nodes in the reciprocal network. These inputs are then set at each time step to -2e,,(k). [For cost functions other than squared error, the input should be set to i)l~/dy,,(k).i These six rules allow direct construction of the reciprocal network from the original network.3 Note that there is a topological equivalence between the two networks. The order of computations in the reciprocal network is thus identical to the order of computations in the forward network. Whereas the original network corresponds to a nonlinear timeindependent system (assuming the weights are fixed), the rrciprocn2 network is a linear time-dependent system. The signals 6,(k) that propagate through the reciprocal network correspond to the terms tl]/&,(k) necessary for gradient adaptation. Exact equations may then be "read-out" directly from the reciprocal network, completing the derivation. A formal proof of the validity and generality of this method is presented in Appendix A. 3 Examples 3.1 Backpropagation. We start by rederiving standard backpropagation. Figure 1 shows a hidden neuron feeding other neurons and an output neuron in a multilayer network. For consistency with traditional notation, we have labeled the summing junction signal s: rather than n,, and added superscripts to denote the layer. In addition, since multilayer networks are static structures, we omit the time index k. The reciprocal network shown in Figure 2 is found by applying the construction rules of the previous section. From this figure, we may immediately write down the equations for calculating the delta terms: These are precisely the equations describing standard backpropagation. In this case, there are no delay operators and 6, = b,(k) ajl/as,(k) = 3CClearly, this set of rules is not a minimal set, i.e., a summing junction can be considered a special case of a multivariate function. However, we choose this set for ease and clarity of construction.
6 Derivation of Gradient Algorithms 187 Figure 1: Block diagram construction of a multilayer network Figure 2: Reciprocal multilayer network. [?let(k)e(k)]/3s,(k) corresponds to an instanfancous gradient. Readers familiar with neural networks have undoubtedly seen these diagrams before. What is new is the concept that the diagrams themselves inay be used directly, completely circumventing all intermediate steps involving tedious algebra. 3.2 Backpropagation-Through-Time. For the next example, consider a network with output feedback (see Fig. 3) described by Y(k) = "X(k),Y(k (3.21 where x(k) are external inputs, and y(k) represents the vector of outputs that form feedback connections. N is a multilayer neural network. If N has only one layer of neurons, every neuron output has a feedback
7 188 Eric A. Wan and Francoise Beaufays Figure 3: Recurrent network and backpropagation-through-time. connection to the input of every other neuron and the structure is referred to as a fzilly recurreiit netzoork (Williams and Zipser 1989). Typically, only a select set of the outputs have an actual desired response. The remaining outputs have no desired response (error equals zero) and are used for internal computation. Direct calculation of gradient terms using chain rule expansions is extremely complicated. A weight perturbation at a specified time step affects not only the output at future time steps, but future inputs as well. However, applying the reciprocal construction rules (see Fig. 3) we find immediately: (3.3) These are precisely the equations describing hllckyroprzgatioii-through-tinie, which have been derived in the past using either ordered derivatives (Werbos 1974) or Euler-Lagrange techniques (Plumer 1993). The diagrammatic approach is by far the simplest and most direct method. Note that the causality constraints require these equations to be run backward in time. This implies a forward sweep of the system to generate the output states and internal activation values, followed by a backward sweep through the reciprocal network. Also from rule 4 in the previous section, the product N (k)6(k) may be calculated directly by a standard backpropagation of 6(k + 1) through the network at time k.4 4Backpropagation-through-time is viewed as an off-lirze gradient descent algorithm in which weight updates are made after each presentation of an entire training sequence. An orz-he version in which adaptation occurs at each time step is possible using an algorithm called real-time-bnckprop~gntjon (Williams and Zipser 1989). The algorithm, however, is far more computationally expensive. The authors have presented a method based on flow graph iirferreciprorify to directly relate the two algorithms (Beaufays and Wan 1994a).
8 Derivation of Gradient Algorithms 189 rfkld I Controller I Figure 4: Neural network control using nonlinear ARMA models Backpropagation-Through-Time and Neural Control Architectures. Backpropagation-through-time can be extended to a number of neural control architectures (Nguyen and Widrow 1989; Werbos 1992). A system may be configured using full-state feedback or more complicated ARMA (AutoRegressive Moving Average) models as illustrated in Figure 4. To adapt the weights of the controller, it is necessary to find the gradient terms that constitute the effective error for the neural network. Figure 5 illustrates how such terms may be directly acquired using the reciprocal network. A variety of other recurrent architectures may be considered including hierarchical networks to radial basis networks with feedback. In all cases, the diagrammatic approach provides a direct derivation of the gradient algorithm. 3.3 Cascaded Neural Networks. Let us now turn to an example of two cascaded neural networks (Fig. 6), which will further illustrate advantages of the diagrammatic approach. The inputs to the first network are samples from a time sequence x(k). Delayed outputs of the first network are fed to the second network. The cascaded networks are defined as u(k) = Nl[W,.x(k),x(k - l).x(k - 2)] (3.4) y(k) = N2[W2, u(kf, u(k - 3). u(k (3.5) where W1 and W, represent the weights parameterizing the networks, y(k) is the output, and u(k) the intermediate signal. Given a desired response for the output y of the second network, it is straightforward to use backpropagation for adapting the second network. It is not obvious, however, what the effective error should be for the first network.
9 190 Eric A. Wan and Franqoise Beaufays Controller Plant Model i Figure 3: Reciprocal network tor control using nonlinear ARMA models. Figure 6: Cascaded neural network filters and reciprocal counterpart.
10 Derivation of Gradient Algorithms 191 From the reciprocal network also shown in Figure 6, we simply label the desired signals and write down the gradient relations: with 6,,(k) = &(k) + S2(k + 1) + &(k + 2) (3.6) [6i(k) 62(k) 63(k)I = -Wk) N[u(k)] (3.7) i.e., each S,(k) is found by backpropagation through the output network, and the 6,s (after appropriate advance operations) are summed together. The gradient for the weights in the first network is thus given by in which the product term is found by a single backpropagation with &(k) acting as the error to the first network. Equations can be made causal by simply delaying the weight update for a few time steps. Clearly, extrapolating to an arbitrary number of taps is also straightforward. For comparison, let us consider the brute force derivative approach to finding the gradient. Using the chain rule, the instantaneous error gradient is evaluated as: (3.9) + I du(k - 1) du(k - 2) ay(k) du(k - 2) du(k-2) dw, = b I ( k ) g + b2(k) dw, where we define + 63(k) aw, (3.10) Again, the 6, terms are found simultaneously by a single backpropagation of the error through the second network. Each product 6,(k)[du(k - i - 1)/ 3W1] is then found by backpropagation applied to the first network with 6,+l(k) acting as the error. However, since the derivatives used in backpropagation are time-dependent, separate backpropagations are necessary for each b,,, (k). These equations, in fact, imply backpropagation through an unfolded structure and are equivalent to weight sharing (LeCun et al. 1989) as illustrated in Figure 7. In situations where there may be hundreds of taps in the second network, this algorithm is far less efficient than the one derived directly using reciprocal networks. Similar arguments can be used to derive an efficient on-line algorithm for adapting time-delay neural networks (Waibel et al. 1989).
11 192 Eric A. Wan and Frangoise Beaufays Figure 7: Cascaded neural network filters unfolded-in-time. 3.4 Temporal Backpropagation. An extension of the feedforward network can be constructed by replacing all scalar weights with discrete time linear filters to provide dynamic interconnectivity between neurons. Mathematically, a neuron i in layer I may be specified as (3.11) (3.12) where a(k) are neuron output values, s(k) are summing junctions, f(.) are sigmoid functions, and W(q-') are synaptic filters5 Three possible forms for W(qp') are Case I I M Case I1 (3.13) 5The time domain operator q-' is used instead of the more common z-domain variable 2-'. The z notation would imply an actual transfer function that does not apply in nonlinear systems.
12 Derivation of Gradient Algorithms 193 a Figure 8: Block diagram construction of an FIR network and corresponding reciprocal structure. In Case I, the filter reduces to a scalar weight and we have the standard definition of a neuron for feedforward networks. Case I1 corresponds to a Finite Impulse Response (FIR) filter in which the synapse forms a weighted sum of past values of its input. Such networks have been utilized for a number of time-series and system identification problems (Wan 1993a,b,c). Case I11 represents the more general Infinite Impulse
13 194 Eric A. Wan and Fraiifoise Beaufays Figure 9: IIR filter realizations: (a) controller canonical, (b) reciprocal observer canonical, (c) lattice, (d) reciprocal lattice. Response (IIR) filter, in which feedback is permitted. In all cases, coefficients are assumed to be adaptive. Figure 8 illustrates a network composed of FIR filter synapses realized as tap-delay lines. Deriving the gradient descent rule for adapting filter coefficients is quite formidable if we use a direct chain rule approach. However, using the construction rules described earlier, we may trivially form the reciprocal network also shown in Figure 8. By inspection we have Consideration of an output neuron at layer L yields?f(k) = -2e,(k)f'[sF(k)]. These equations define the algorithm known as temporal hnckyropagafioii (Wan 1993a,b). The algorithm may be viewed as a temporal generalization of backpropagation in which error gradients are propagated not by simply taking weighted sums, but by backward filtering. Note that in the reciprocal network, backpropagation is achieved through the reciprocal filters W(q+*). Since this is a noncausal filter, it is necessary to introduce a delay of a few time steps to implement the on-line adaptation. In the IIR case, it is easy to verify that equation 3.14 for temporal backpropagation still applies with W(q+' ) representing a noncausal IIR filter. As with backpropagation-through-time, the network must be trained using a forward and backward sweep necessitating storage of all activation values at each step in time. Different realizations for the filters dictate
14 Derivation of Gradient Algorithms 195 how signals flow through the reciprocal structure as illustrated in Figure 9. In all cases, computations remain order N (this is in contrast with the order N2 algorithms derived by Back and Tsoi (1991) using direct chain rule methods). Note that the poles of the forward IIR filters are reciprocal to the poles of the reciprocal filters. Stability monitoring can be made easier if we consider lattice realizations in which case stability is guaranteed if the magnitude of each coefficient is less than 1. Regardless of the choice of the filter realization, reciprocal networks provide a simple unified approach for deriving a learning algorithm.6 The above examples allow us to extrapolate the following additional construction rule: Any linear subsystem H(q-') in the original network is transformed to H(q+l) in the reciprocal system. 4 Summary The previous examples served to illustrate the ease with which algorithms may be derived using the diagrammatic approach. One starts with a diagrammatic representation of the network of interest. A reciprocal network is then constructed by simply swapping summing junctions with branching points, continuous functions with derivative transmittances, and time delays with time advances. The final algorithm is read directly off the reciprocal network. No messy chain rules are needed. The approach provides a unified framework for formally deriving gradient algorithms for arbitrary network architectures, network configurations, and systems. Appendix A: Proof of Reciprocal Construction Rules We show that the diagrammatic method constitutes a formal derivation for arbitrary network architectures. Intuitively, the chain rule applied to individual building blocks yields the reciprocal architecture. However, delay operators, which cannot be differentiated, as well as feedback, prevents a straightforward chain rule approach to the proof. Instead, we use a more rigorous approach that may be outlined as follows: (1) It is argued that a perturbation applied to a specific node in the network propagates through a derivative network that is topologically equivalent to the original network. (2) The derivative network is systematically unraveled in time to produce a linear time independent flow graph. (3) Next, the principle of flow graph interreciprocity is evoked to reverse the signal flow through the unraveled network. (4) The reverse flow graph is 'In related work (Beaufays and Wan 1994b), the diagrammatic method was used to derive an algorithm to minimize the output power at each stage of an FIR lattice filter. This provides an adaptive lattice predictor used as a decorrelating preprocessor to a second adaptive filter. The new algorithm is more effective than the Griffiths algorithm (Griffiths 1977).
15 196 Eric A. Wan and Franqoise Beaufays raveled back in time to produce the reciprocal network. The input, originally corresponding to a perturbation, becomes an output providing the desired gradient. (5) By symmetry, it is argued that a11 signals in the reciprocal network correspond to proper gradient terms. 1. We will initially assume that only uiziuariate functions exist within the network. This is by no means restrictive. It has been shown (Hornik et a/. 1989; Cybenko 1989) that a feedforward network with two or more layers and a sufficient number of internal neurons can approximate any iuiifonnly coiitiiiiioits multivariate function to an arbitrary accuracy. A feedforward network is, of course, composed of simple univariate functions and summing junctions. Thus any multivariate function in the overall network architecture is assumed to be well approximated using a univariate composition. We may completely specify the topology of a network by the set of equations I- E cfo. q -7 (A.2) where n,(k) is the signal corresponding to the node a, at time k. The sum is taken over all signals a,(k) that connect to a,(k), and T,, is a transmittance operator corresponding to either a univariate function (e.g., sigmoid function, constant multiplicative weight), or a delay operator. (The symbol o is used to remind us that T is an operator whose argument is a.) The signals a,(k) may correspond to inputs (a,%,), outputs (arey,), or internal signals to the network. Feedback of signals is permitted. Let us add to a specific node a' a perturbation An*(k) at time k. The perturbation propagates through the network resulting in effective per- turbations Aa,(k) for all nodes in the network. Through a continuous univariate function in which u,(k) = f[al(k)] we have, to first order: where it must be clearly understood that Aa,(k) and &(k) are the perturbations directly resulting from the external perturbation Aa*(k). Through a delay operator, aj(k) = 9-'a1(k) = a,(k - 1), we have Combining these two results with equation A.l gives au, (k) = c T:, 0 aa, (k) vj I (A.5) where we define T' E cf'[nl(k)]. q-'}. Note that f'(ui(k)) is a linear timedependent transmittance. Equation A.5 defines a derizmfiae network that is
16 Derivation of Gradient Algorithms 197 J: Aa*(k) - Figure 10: (a) Time dependent input/output system for the derivative network. (b) Same system with all delays drawn externally. topologically identical to the original network (i.e., one-to-one correspondence between signals and connections). Functions are simply replaced by their derivatives. This is a rather obvious result, and simply states that a perturbation propagates through the same connections and in the same direction as would normal signals. 2. The derivative network may be considered a time-dependent system with input Aa*(k) and outputs Ay(k) as illustrated in Figure 10a. Imagine now redrawing the network such that all delay operators q-' are dragged outside the functional block (Fig. lob). Equation A.5 still applies. Neither the definition nor the topology of the network has been changed. However, we may now remove the delay operators by cascading copies of the derivative network as illustrated in Figure 11. Each stage has a different set of transmittance values corresponding to the time step. The unraveling process stops at the final time K (K is allowed to approach co). Additionally, the outputs Ay(n) at each stage are multiplied by -2e(~z)~ and then summed over all stages to produce a single output A]$ ~ t -2e(~~)~~y(n). = ~ By removing the delay operators, the time index k may now be treated as simply a labeling index. The unraveled structure is thus a time independent linear flow graph. By linearity, all signals in the flow graph can be divided by Aa*(k) so that the input is now 1, and the output is A]/Aa*(k). In the limit of small Aa*(k) Since the system is causal the partial of the error at time n with respect
17 ~ ~~ 198 Eric A. Wan and Francoise Beaufays Figure 11: Flow graph corresponding to unraveled derivative network. I I I 1.0 Figure 12: Transposed flow graph corresponding to unraveled derivative network. to n*(k) is zero for iz < k. Thus ;?d ; e;w; ' - 5 De(n)Te(i') da*(k) 5 rl=k 11=l - ~~ an*(k) 'J "h'(k) (A.7) The term h*(k) is precisely what we were interested in finding. However, calculating all the 6, (k) terms would require separately propagating signals through the unraveled network with an input of 1 at each location associated with a,(k). The entire process would then have to be repeated at every time step. 3. Next, take the unraveled network (i.e., flow graph) and form its transpose. This is accomplished by reversing the signal flow direction, transposing the branch gains, replacing summing junctions by branching points and vice versa, and interchanging input and output nodes. The new flow graph is represented in Figure 12.
18 Derivation of Gradient Algorithms 199 From the work by Tellegen (1952) and Bordewijk (19561, we know that transposed flow graphs are a particular case of interreciprocal graphs. This means that the output obtained in one graph, when exciting the input with a given signal, is the same as the output value of the transposed graph, when exciting its input by the same signal. In other words, the two graphs have identical transfer f~nctions.~ Thus, if an input of 1.0 is distributed along the lower horizontal branch of the transposed graph, the final output will equal b*(k). This b*(k) is identical to the output of our original flow graph. 4. The transposed graph can now be raveled back in time to produce the reciprocal network. Since the direction of signal flow has been reversed, delay operators q-' become advance operators q+'. The node a*(k) that was the original source of an input perturbation is now the output S*(k) as desired. The outputs of the original network become inputs with value -2e( k). Summarizing the steps involved in finding the reciprocal network, we start with the original network, form the derivative network, unravel in time, transpose, and then ravel back up in time. These steps are accomplished directly by starting with the original network and simply swapping branching points and summing junctions (rules 1 and 2), functions f for derivatives f' (rule 3), and qpls for q+'s (rule 5). 5. Note that the selection of the specific node a*(k) is totally arbitrary. Had we started with any node a,(k), we would still have arrived at the same result. In all cases, the input to the reciprocal network would still be -2e(k). Thus by symmetry, every signal in the reciprocal network provides S,(k) = aj1/3a,(k). This is exactly what we set out to prove for the signal flow in the reciprocal network. Finally, for multivariate functions, aout(k) = F[a,,(k)], where by our earlier statements it was assumed the function was explicitly represented by a composition of summing junctions and univariate functions. F is restricted to be memoryless and thus cannot be composed of any delay operators. In the reciprocal network, a,,(k) becomes b,,(k) and aout(k) becomes bout (k). But Thus any section within a network that contains no delays may be replaced by the multivariate function F( ), and the corresponding section in the reciprocal network is replaced with the Jacobian F'[ai,(k)]. This verifies rule This basic property, which was first presented in the context of electrical circuits analysis (Penfield et al. 1970), finds applications in a wide variety of engineering disciplines, such as the reciprocity of emitting and receiving antennas in electromagnetism (Ramo et al. 1984), and the duality between decimation in time and decimation in frequency formulations of the FIT algorithm in signal processing (Oppenheim and Schafer 1989). Flow graph interreciprocity was first applied to neural networks to relate realtime-backpropagation and backpropagation-through-time by Beaufays and Wan (1 994a).
19 200 Eric A. Wan and Franqoise Beaufays Acknowledgments This work was funded in part by EPRI under Contract RP and by NSF under Grants IRI and ECS References Back, A., and Tsoi, A FIR and IIR synapses, a new neural networks architecture for time series modeling. Neural Conzp. 3(3), Beaufays, F., and Wan, E. 1994a. Relating real-time backpropagation and backpropagation-through-time: An application of flow graph interreciprocity. Neural Comp. 6(2), Beaufays, F., and Wan, E An efficient first-order stochastic algorithm for lattice filters. ICANN'94 2, Bordewijk, J Inter-reciprocity applied to electrical networks. AppI. Sci. Res. 6B, Bryson, A,, and Ho, Y Applied Optinznl Coiztrol. Hemisphere, New York. Cybenko, G Approximation by superpositions of a sigmoidal function. Math. Corifrol Sigizals Syst. 2(4). Griewank, A., and Coliss, G., (eds.) Automatic differentiation of algorithms: Theory, implementation, and application. Proc. First SlAM Workshop oiz Autoinatic Differelitintioil, Brekenridge, Colorado. Griffiths, L A continuously adaptive filter implemented as a lattice structure. Proc. ICASSP, Hartford, CT, Hornik, K., Stinchombe, M., and White, H Multilayer feedforward networks are universal approximators. Neirral Networks, 2, LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., and Jackel, L. D Backpropagation applied to handwritten zip code recognition. Neural Conzp. 1, Matsuoka, K Learning of neural networks using their adjoint systems. Syst. Computers Jpii. 22(11), Narendra, K, and Parthasarathy, K Identification and control of dynamic systems using neural networks. IEEE Trails. Neural Netniorks 1(1), Nerrand, O., Roussel-Ragot, P., Personnaz, L., Dreyfus, G., and Marcos, S Neural networks and nonlinear adaptive filtering: Unifying concepts and new algorithms. Neural Comp. 5(2), Nguyen, D., and Widrow, B The truck backer-upper: An example of self-learning in neural networks. Proc. Iizt. Joiizt Coizf. on Neural Networks, 11, Washington, DC Oppenheim, A., and Schafer, R Digital SigizaI Processing. Prentice-Hall, Englewood Cliffs, NJ. Parisini, T., and Zoppoli, R Neural networks for feedback feedforward nonlinear control systems. IEEE Trans. Neural Netzilorks, 5(3), Parker, D., Learning-Logic. Invention Report S81-64, File 1, Office of Technology Licensing, Stanford University, October.
20 Derivation of Gradient Algorithms 201 Penfield, P., Spence, R., and Duiker. S Tellegen's Theorem and Electrical Networks. MIT Press, Cambridge, MA. Plumer, E Optimal terminal control using feedforward neural networks. Ph.D. dissertation. Stanford University, San Fransisco, CA. Rall, Automatic Differentiation: Techniques and Applications. Lecture Notes in Computer Science, Springer-Verlag, Berlin. Ramo, S., Whinnery, J. R., and Van Duzer, T Fields and Waves in Communication Electronics, 2nd Ed. John Wiley, New York. Rumelhart, D. E., McClelland, J. L., and the PDP Research Group Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Vol. 1. MIT Press, Cambridge, MA. Tellegen, D A general network theorem, with applications. Philzps Res. Rep. 7, Toomarian, N. B., and Barhen, J Learning a trajectory using adjoint function and teacher forcing. Neural Networks 5(3), Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., and Lang, K Phoneme recognition using time-delay neural networks. IEEE Trans. Acoustics, Speech, Signal Process. 37(3), Wan, E. 1993a. Finite impulse response neural networks with applications in time series prediction. Ph.D, dissertation. Stanford University, San Fransisco, CA. Wan, E Time series prediction using a connectionist network with internal delay lines. In Time Series Prediction: Forecasting the Future and Understanding the Past, A. Weigend and N. Gershenfeld, eds. Addison-Wesley, Reading, MA. Wan, E. 1993c. Modeling nonlinear dynamics with neural networks: Examples in time series prediction. Proc. Fifth Workshop Neural Networks: Acadernic/lndustrial/NASA/Defense, WNN93/FNN93, pp , San Francisco. Wan, E., and Beaufays, F Network reciprocity: A simple approach to derive gradient algorithms for arbitrary neural network structures. WCNN'94, San Diego, CA, 111, Werbos, P Beyond regression: New tools for prediction and analysis in the behavioral sciences. Ph.D. thesis, Harvard University, Cambridge, MA. Werbos, P Neurocontrol and supervised learning: An overview and evaluation. In Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, chap. 3, D. White and D. Sofge, eds. Van Nostrand Reinhold, New York. Williams, R., and Zipser D A learning algorithm for continually running fully recurrent neural networks. Neural Comp. 1, Received April 29, 1994; accepted May 16, 1995.
Relating Real-Time Backpropagation and. Backpropagation-Through-Time: An Application of Flow Graph. Interreciprocity.
Neural Computation, 1994 Relating Real-Time Backpropagation and Backpropagation-Through-Time: An Application of Flow Graph Interreciprocity. Francoise Beaufays and Eric A. Wan Abstract We show that signal
More informationTemporal Backpropagation for FIR Neural Networks
Temporal Backpropagation for FIR Neural Networks Eric A. Wan Stanford University Department of Electrical Engineering, Stanford, CA 94305-4055 Abstract The traditional feedforward neural network is a static
More informationDiagrammatic Methods for Deriving and. Speech Technology and Research Laboratory, SRI International,
Diagrammatic Methods for Deriving and Relating Temporal eural etwork Algorithms Eric A. Wan 1 and Francoise Beaufays 2 1 Department of Electrical Engineering, Oregon Graduate Institute, P.O. Box 91000,
More informationinear Adaptive Inverse Control
Proceedings of the 36th Conference on Decision & Control San Diego, California USA December 1997 inear Adaptive nverse Control WM15 1:50 Bernard Widrow and Gregory L. Plett Department of Electrical Engineering,
More informationADAPTIVE INVERSE CONTROL BASED ON NONLINEAR ADAPTIVE FILTERING. Information Systems Lab., EE Dep., Stanford University
ADAPTIVE INVERSE CONTROL BASED ON NONLINEAR ADAPTIVE FILTERING Bernard Widrow 1, Gregory Plett, Edson Ferreira 3 and Marcelo Lamego 4 Information Systems Lab., EE Dep., Stanford University Abstract: Many
More informationAdaptive Inverse Control
TA1-8:30 Adaptive nverse Control Bernard Widrow Michel Bilello Stanford University Department of Electrical Engineering, Stanford, CA 94305-4055 Abstract A plant can track an input command signal if it
More informationA Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation
1 Introduction A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation J Wesley Hines Nuclear Engineering Department The University of Tennessee Knoxville, Tennessee,
More informationAdaptive Inverse Control based on Linear and Nonlinear Adaptive Filtering
Adaptive Inverse Control based on Linear and Nonlinear Adaptive Filtering Bernard Widrow and Gregory L. Plett Department of Electrical Engineering, Stanford University, Stanford, CA 94305-9510 Abstract
More informationEquivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network
LETTER Communicated by Geoffrey Hinton Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network Xiaohui Xie xhx@ai.mit.edu Department of Brain and Cognitive Sciences, Massachusetts
More informationKeywords- Source coding, Huffman encoding, Artificial neural network, Multilayer perceptron, Backpropagation algorithm
Volume 4, Issue 5, May 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Huffman Encoding
More informationNew Recursive-Least-Squares Algorithms for Nonlinear Active Control of Sound and Vibration Using Neural Networks
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 12, NO. 1, JANUARY 2001 135 New Recursive-Least-Squares Algorithms for Nonlinear Active Control of Sound and Vibration Using Neural Networks Martin Bouchard,
More informationSTRUCTURED NEURAL NETWORK FOR NONLINEAR DYNAMIC SYSTEMS MODELING
STRUCTURED NEURAL NETWORK FOR NONLINEAR DYNAIC SYSTES ODELING J. CODINA, R. VILLÀ and J.. FUERTES UPC-Facultat d Informàtica de Barcelona, Department of Automatic Control and Computer Engineeering, Pau
More informationIntroduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen
Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /
More informationOptimal Polynomial Control for Discrete-Time Systems
1 Optimal Polynomial Control for Discrete-Time Systems Prof Guy Beale Electrical and Computer Engineering Department George Mason University Fairfax, Virginia Correspondence concerning this paper should
More informationBack-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples
Back-Propagation Algorithm Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples 1 Inner-product net =< w, x >= w x cos(θ) net = n i=1 w i x i A measure
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationC1.2 Multilayer perceptrons
Supervised Models C1.2 Multilayer perceptrons Luis B Almeida Abstract This section introduces multilayer perceptrons, which are the most commonly used type of neural network. The popular backpropagation
More informationNeural Network Identification of Non Linear Systems Using State Space Techniques.
Neural Network Identification of Non Linear Systems Using State Space Techniques. Joan Codina, J. Carlos Aguado, Josep M. Fuertes. Automatic Control and Computer Engineering Department Universitat Politècnica
More informationA recursive algorithm based on the extended Kalman filter for the training of feedforward neural models. Isabelle Rivals and Léon Personnaz
In Neurocomputing 2(-3): 279-294 (998). A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models Isabelle Rivals and Léon Personnaz Laboratoire d'électronique,
More informationAnalysis of Multilayer Neural Network Modeling and Long Short-Term Memory
Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory Danilo López, Nelson Vera, Luis Pedraza International Science Index, Mathematical and Computational Sciences waset.org/publication/10006216
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,
More information( t) Identification and Control of a Nonlinear Bioreactor Plant Using Classical and Dynamical Neural Networks
Identification and Control of a Nonlinear Bioreactor Plant Using Classical and Dynamical Neural Networks Mehmet Önder Efe Electrical and Electronics Engineering Boðaziçi University, Bebek 80815, Istanbul,
More informationApplication of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption
Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption ANDRÉ NUNES DE SOUZA, JOSÉ ALFREDO C. ULSON, IVAN NUNES
More informationIntroduction to Artificial Neural Networks
Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline
More informationT Machine Learning and Neural Networks
T-61.5130 Machine Learning and Neural Networks (5 cr) Lecture 11: Processing of Temporal Information Prof. Juha Karhunen https://mycourses.aalto.fi/ Aalto University School of Science, Espoo, Finland 1
More informationPattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore
Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationLecture 7 Artificial neural networks: Supervised learning
Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in
More informationIntroduction to Machine Learning Spring 2018 Note Neural Networks
CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,
More informationEXTENDED FUZZY COGNITIVE MAPS
EXTENDED FUZZY COGNITIVE MAPS Masafumi Hagiwara Department of Psychology Stanford University Stanford, CA 930 U.S.A. Abstract Fuzzy Cognitive Maps (FCMs) have been proposed to represent causal reasoning
More informationA STATE-SPACE NEURAL NETWORK FOR MODELING DYNAMICAL NONLINEAR SYSTEMS
A STATE-SPACE NEURAL NETWORK FOR MODELING DYNAMICAL NONLINEAR SYSTEMS Karima Amoura Patrice Wira and Said Djennoune Laboratoire CCSP Université Mouloud Mammeri Tizi Ouzou Algeria Laboratoire MIPS Université
More informationIN neural-network training, the most well-known online
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 1, JANUARY 1999 161 On the Kalman Filtering Method in Neural-Network Training and Pruning John Sum, Chi-sing Leung, Gilbert H. Young, and Wing-kay Kan
More informationARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD
ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided
More informationmemory networks, have been proposed by Hopeld (1982), Lapedes and Farber (1986), Almeida (1987), Pineda (1988), and Rohwer and Forrest (1987). Other r
A Learning Algorithm for Continually Running Fully Recurrent Neural Networks Ronald J. Williams College of Computer Science Northeastern University Boston, Massachusetts 02115 and David Zipser Institute
More informationSupervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir
Supervised (BPL) verses Hybrid (RBF) Learning By: Shahed Shahir 1 Outline I. Introduction II. Supervised Learning III. Hybrid Learning IV. BPL Verses RBF V. Supervised verses Hybrid learning VI. Conclusion
More informationModelling Time Series with Neural Networks. Volker Tresp Summer 2017
Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,
More informationArtificial Neural Networks
Artificial Neural Networks Threshold units Gradient descent Multilayer networks Backpropagation Hidden layer representations Example: Face Recognition Advanced topics 1 Connectionist Models Consider humans:
More informationDirect Method for Training Feed-forward Neural Networks using Batch Extended Kalman Filter for Multi- Step-Ahead Predictions
Direct Method for Training Feed-forward Neural Networks using Batch Extended Kalman Filter for Multi- Step-Ahead Predictions Artem Chernodub, Institute of Mathematical Machines and Systems NASU, Neurotechnologies
More informationArtificial Neural Networks
Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationTECHNICAL RESEARCH REPORT
TECHNICAL RESEARCH REPORT Reconstruction of Nonlinear Systems Using Delay Lines and Feedforward Networks by D.L. Elliott T.R. 95-17 ISR INSTITUTE FOR SYSTEMS RESEARCH Sponsored by the National Science
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationMODELING NONLINEAR DYNAMICS WITH NEURAL. Eric A. Wan. Stanford University, Department of Electrical Engineering, Stanford, CA
MODELING NONLINEAR DYNAMICS WITH NEURAL NETWORKS: EXAMPLES IN TIME SERIES PREDICTION Eric A Wan Stanford University, Department of Electrical Engineering, Stanford, CA 9435-455 Abstract A neural networ
More informationDiscrete-Time Systems
FIR Filters With this chapter we turn to systems as opposed to signals. The systems discussed in this chapter are finite impulse response (FIR) digital filters. The term digital filter arises because these
More informationIntroduction Biologically Motivated Crude Model Backpropagation
Introduction Biologically Motivated Crude Model Backpropagation 1 McCulloch-Pitts Neurons In 1943 Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician, published A logical calculus of the
More informationNeural networks. Chapter 20. Chapter 20 1
Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms
More informationNeural Dynamic Optimization for Control Systems Part II: Theory
490 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART B: CYBERNETICS, VOL. 31, NO. 4, AUGUST 2001 Neural Dynamic Optimization for Control Systems Part II: Theory Chang-Yun Seong, Member, IEEE, and
More informationArtificial Neural Network
Artificial Neural Network Contents 2 What is ANN? Biological Neuron Structure of Neuron Types of Neuron Models of Neuron Analogy with human NN Perceptron OCR Multilayer Neural Network Back propagation
More informationIntroduction to Neural Networks
Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning
More informationDigital Filter Structures. Basic IIR Digital Filter Structures. of an LTI digital filter is given by the convolution sum or, by the linear constant
Digital Filter Chapter 8 Digital Filter Block Diagram Representation Equivalent Basic FIR Digital Filter Basic IIR Digital Filter. Block Diagram Representation In the time domain, the input-output relations
More informationLearning and Memory in Neural Networks
Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units
More informationUsing Variable Threshold to Increase Capacity in a Feedback Neural Network
Using Variable Threshold to Increase Capacity in a Feedback Neural Network Praveen Kuruvada Abstract: The article presents new results on the use of variable thresholds to increase the capacity of a feedback
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More informationLecture 6: Backpropagation
Lecture 6: Backpropagation Roger Grosse 1 Introduction So far, we ve seen how to train shallow models, where the predictions are computed as a linear function of the inputs. We ve also observed that deeper
More informationChapter 4 Neural Networks in System Identification
Chapter 4 Neural Networks in System Identification Gábor HORVÁTH Department of Measurement and Information Systems Budapest University of Technology and Economics Magyar tudósok körútja 2, 52 Budapest,
More informationChristian Mohr
Christian Mohr 20.12.2011 Recurrent Networks Networks in which units may have connections to units in the same or preceding layers Also connections to the unit itself possible Already covered: Hopfield
More informationSections 18.6 and 18.7 Artificial Neural Networks
Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs. artifical neural
More informationArtificial Neural Network and Fuzzy Logic
Artificial Neural Network and Fuzzy Logic 1 Syllabus 2 Syllabus 3 Books 1. Artificial Neural Networks by B. Yagnanarayan, PHI - (Cover Topologies part of unit 1 and All part of Unit 2) 2. Neural Networks
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationFilter-Generating Systems
24 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 47, NO. 3, MARCH 2000 Filter-Generating Systems Saed Samadi, Akinori Nishihara, and Hiroshi Iwakura Abstract
More informationCS:4420 Artificial Intelligence
CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart
More informationDeep Belief Networks are compact universal approximators
1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation
More informationCSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning
CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.
More informationNN V: The generalized delta learning rule
NN V: The generalized delta learning rule We now focus on generalizing the delta learning rule for feedforward layered neural networks. The architecture of the two-layer network considered below is shown
More informationRecurrent Neural Networks
Recurrent Neural Networks Datamining Seminar Kaspar Märtens Karl-Oskar Masing Today's Topics Modeling sequences: a brief overview Training RNNs with back propagation A toy example of training an RNN Why
More informationANN Control of Non-Linear and Unstable System and its Implementation on Inverted Pendulum
Research Article International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347-5161 2014 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet ANN
More informationCSC 411 Lecture 10: Neural Networks
CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11
More informationECE503: Digital Signal Processing Lecture 6
ECE503: Digital Signal Processing Lecture 6 D. Richard Brown III WPI 20-February-2012 WPI D. Richard Brown III 20-February-2012 1 / 28 Lecture 6 Topics 1. Filter structures overview 2. FIR filter structures
More informationAI Programming CS F-20 Neural Networks
AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols
More informationARTIFICIAL INTELLIGENCE. Artificial Neural Networks
INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationIntroduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis
Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.
More informationλ-universe: Introduction and Preliminary Study
λ-universe: Introduction and Preliminary Study ABDOLREZA JOGHATAIE CE College Sharif University of Technology Azadi Avenue, Tehran IRAN Abstract: - Interactions between the members of an imaginary universe,
More informationy(n) Time Series Data
Recurrent SOM with Local Linear Models in Time Series Prediction Timo Koskela, Markus Varsta, Jukka Heikkonen, and Kimmo Kaski Helsinki University of Technology Laboratory of Computational Engineering
More informationConvergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network
Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network Fadwa DAMAK, Mounir BEN NASR, Mohamed CHTOUROU Department of Electrical Engineering ENIS Sfax, Tunisia {fadwa_damak,
More information8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples
8-1: Backpropagation Prof. J.C. Kao, UCLA Backpropagation Chain rule for the derivatives Backpropagation graphs Examples 8-2: Backpropagation Prof. J.C. Kao, UCLA Motivation for backpropagation To do gradient
More information100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units
Connectionist Models Consider humans: Neuron switching time ~ :001 second Number of neurons ~ 10 10 Connections per neuron ~ 10 4 5 Scene recognition time ~ :1 second 100 inference steps doesn't seem like
More informationA New Weight Initialization using Statistically Resilient Method and Moore-Penrose Inverse Method for SFANN
A New Weight Initialization using Statistically Resilient Method and Moore-Penrose Inverse Method for SFANN Apeksha Mittal, Amit Prakash Singh and Pravin Chandra University School of Information and Communication
More informationInternational Journal of Advanced Research in Computer Science and Software Engineering
Volume 3, Issue 4, April 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Application of
More information(Refer Slide Time: )
Digital Signal Processing Prof. S. C. Dutta Roy Department of Electrical Engineering Indian Institute of Technology, Delhi FIR Lattice Synthesis Lecture - 32 This is the 32nd lecture and our topic for
More informationVasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks
C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,
More informationRecurrent neural networks with trainable amplitude of activation functions
Neural Networks 16 (2003) 1095 1100 www.elsevier.com/locate/neunet Neural Networks letter Recurrent neural networks with trainable amplitude of activation functions Su Lee Goh*, Danilo P. Mandic Imperial
More informationFeedforward Neural Nets and Backpropagation
Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features
More informationNeural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21
Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationAutomatic Differentiation and Neural Networks
Statistical Machine Learning Notes 7 Automatic Differentiation and Neural Networks Instructor: Justin Domke 1 Introduction The name neural network is sometimes used to refer to many things (e.g. Hopfield
More informationArtificial Neural Networks. Historical description
Artificial Neural Networks Historical description Victor G. Lopez 1 / 23 Artificial Neural Networks (ANN) An artificial neural network is a computational model that attempts to emulate the functions of
More informationNONLINEAR PLANT IDENTIFICATION BY WAVELETS
NONLINEAR PLANT IDENTIFICATION BY WAVELETS Edison Righeto UNESP Ilha Solteira, Department of Mathematics, Av. Brasil 56, 5385000, Ilha Solteira, SP, Brazil righeto@fqm.feis.unesp.br Luiz Henrique M. Grassi
More informationAddress for Correspondence
Research Article APPLICATION OF ARTIFICIAL NEURAL NETWORK FOR INTERFERENCE STUDIES OF LOW-RISE BUILDINGS 1 Narayan K*, 2 Gairola A Address for Correspondence 1 Associate Professor, Department of Civil
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationMultilayer Perceptrons and Backpropagation
Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:
More informationMultilayer Perceptrons (MLPs)
CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1
More informationIntroduction and Perceptron Learning
Artificial Neural Networks Introduction and Perceptron Learning CPSC 565 Winter 2003 Christian Jacob Department of Computer Science University of Calgary Canada CPSC 565 - Winter 2003 - Emergent Computing
More informationFrequency Selective Surface Design Based on Iterative Inversion of Neural Networks
J.N. Hwang, J.J. Choi, S. Oh, R.J. Marks II, "Query learning based on boundary search and gradient computation of trained multilayer perceptrons", Proceedings of the International Joint Conference on Neural
More informationNeural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve)
Neural Turing Machine Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve) Introduction Neural Turning Machine: Couple a Neural Network with external memory resources The combined
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationSections 18.6 and 18.7 Artificial Neural Networks
Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs artifical neural networks
More information