15. NEURAL NETWORKS Introduction to The Chapter Neural Networks: A Primer Basic Terminology and Functions

Size: px
Start display at page:

Download "15. NEURAL NETWORKS Introduction to The Chapter Neural Networks: A Primer Basic Terminology and Functions"

Transcription

1 5. NEURAL NETWORKS 5. Introduction to The Chapter In this chapter, we provide a new perspective into neural networks, both feedforward and feedback networks. In section 5.2, we provide a small primer on neural networks. In addition to providing necessary details (that will be of use to those who are not familiar neural networks) in a concise manner, we try to remove the mystery and hype surrounding these networks and try to look at them in a more critical and objective way. Thus, we concentrate on what they can really do, how they do it, what are the advantages and disadvantages, etc. One key aspect of neural networks we try to highlight in this and the next section is that they represent special classes of nonlinear networks, and hence we can learn quite a lot by understanding nonlinear networks and their dynamics. In section 5.3, we consider recurrent neural networks (RNNs), or neural networks with internal feedback, and discuss some of the wellknown models. In the case of recurrent neural networks, stability, or the lack of it, is a major concern, and we discuss some of the existing approaches to prove the stability of RNNs. This section highlights the slow progress in the design of RNNs due to the lack of structured methods to obtain RNN architectures with guaranteed stability property and the problems in training such networks. In section 5.4, we discuss a new approach based on the building block concept (seen in earlier chapters for designing complex, stable nonlinear networks) to obtain new and complex RNN architectures and develop training or learning algorithms. We also show that existing RNN architectures can be derived by placing specific constraints on the architectures obtained using the building block approach. We also indicate that in RNN architectures, no energy function is being minimized, but a situation of power balance (between sources and sinks) is reached. At the bottom level of the hierarchy, we have the static, linear processing. For the special case of oneinput, oneoutput transformation, this can be mathematically represented as: y = ax b (5.) where a and b are two constants. Viewed as a functional mapping, this transformation can be represented graphically as a straight line connecting the input and the output as shown in Fig. 5.. y b slope = a x Figure 5. A simple oneinput, oneoutput system with a linear functional relationship. 5.2 Neural Networks: A Primer 5.2. Basic Terminology and Functions In chapter # and later chapters, we argued that a system could be viewed as performing a mapping from an input domain to an output domain. Here, we will look at neural networks from the same perspective. We start the discussion by looking at the different levels (in the order of increasing complexity) of signal/information processing described in chapter #. A number of points are worth repeating here. The input variable (and hence the output variable) can take values in the real R space. The mapping is restricted (to a straight line) and is represented by the two parameters a and b, representing the two degrees of freedom available. If observations of the inputs x corresponding to different times and the corresponding outputs y are available, the values of the parameters can be found by a curve (straightline) fitting algorithm. The model with the resulting parameter values can be used at later times to predict the output given the input. In the neural network terminology, the input is known as the stimulus, the output as the response (emitted by the system), the numerical I/O data as the training set, and the

2 parameter estimation process as learning or training. In this particular example, both the samples of the stimulus and the response are assumed to be available for training. Such learning or training process is known as supervised or teacherbased learning. We will see later examples of the other learning category known as unsupervised learning, also known as selforganization. An important observation that can be made from this simple single input, single output system is that the degree of freedom is restricted to two {or some constant for the general case} for linear processing and cannot be increased any further. In other words, any linear representation involving more than two variables {or degrees of freedom} can be reduced to the representation involving only two variables. This is not so for nonlinear processing, which makes it more complicated and interesting. By relaxing the linearity constraint, we obtain the next level of processing static, nonlinear processing. Depending upon the nonlinear primitive(s) employed, we can obtain different mathematical representations. For example, in the simple oneinput, oneoutput case we could have: or M y = a i x i (5.2) i=0 N y = b i tanh[c i x i ] (5.3) i=0 The first expression simply involves a polynomial (of order M) representation with the various powers of the inputs as the nonlinear primitives or as the basis of representation, whereas the second involves weighted transcendal functions of the scaled inputs. The coefficients a i (i = 0 to M) and b i, c i (i = 0 to N) represent the degrees of freedom available in modeling the given system using the two representations. As we noted in chapter # 4, and as the two examples above illustrate, the leap from linear to nonlinear is not that simple. The choice for nonlinear primitives is unlimited. The nonlinear primitives and nonlinear representations have to be chosen to represent the mapping process properly. That is, one representation may be good for one set of systems stimulated by a particular set of inputs, but may fail miserably with other sets of systems and or inputs even if the degrees of freedom are increased considerably. Further, the training data has to be a good representation of the stimuli seen by the system in its past as well as the inputs that will be seen in the future. Thus, input and output probability functions play a major role in the approximation (training or learning) process. Though researchers in the neural network area assume the probability functions are unknown and claim that it need not be known apriori, the conditions that good representative data must be available and must be used are very important Primitives of Neural Architectures & Feedforward Neural Networks The earlier neural networks fall under the class of static, nonlinear mapping. The neural network researchers handled the explosion in the choice for the nonlinear primitives by assuming that there will be one and only one {or one class of} nonlinear primitive in their architectures. Thus, in addition to the two basic building blocks {multiplier for multiplication of a signal by a constant value and adder} used in static linear processing, a third building block, called a neuron, is used in neural networks. A neuron, shown as a block diagram in Fig. 5.2a, is simply a oneinput, oneoutput memoryless nonlinear system or functionprimitive that transforms (or transduces) an unbounded input activation signal x into a bounded output signal y = s x( t) [ ] by the transformation s[ ]. The bounded output signal is thus a main characteristic of nonlinear primitives used in neural networks and is a result of practical considerations {interpretation from neurophysiological perspectives}. The mapping function s[ ] is usually a sigmoidal or 's' shaped curve. Some of the common functions are the hyperbolic tangent function and the logistic function s[x]= tanh[ x] = e x e x (5.4) e x e x s[ x] = (5.5) e x shown in Fig. 5.2b. It should be noted that the hyperbolic tangent function goes from () to (), and the logistic function goes from 0 to, as the input variable x goes from to. Both these functions are bounded and saturating, continuously differentiable and monotonically increasing, characteristics commonly associated with the nonlinear primitives of neural networks. Further, by scaling the input x to cx (c > 0) prior to inputting to the neuron 2, we can make the slope of the overall mapping (from x to y) small or large. For large value of c ( c ), both the functions approach the threshold function with the We will see shortly how these issues have been resolved in the neural network area. 2 In practice, the scaling factor c is incorporated within the nonlinear primitive itself.

3 output of {0, } or {, } as shown in the Fig. 5.2b. Thus the output of the neurons becomes binary or bipolar. A neuron whose output signal approaches the value one might be considered a winning neuron where as the one whose signal level remains near zero (or minus one) a losing neuron Logistic Function x We indicated earlier that the choice of the nonlinear primitives is crucial for proper modeling of a given nonlinear mapping. Thus, we may wonder what effect fixing the nonlinear primitives as neurons will have on the modeling and what the neural researchers have to say about this. Neural network researchers argue that the use of very large number {approaching millions and more} of the same nonlinearities is sufficient to model any kind of nonlinear mapping seen in practice. The key word here is "in practice". We can construct arbitrary y Sigmoidal Function Threshold Function Figure 52. a) The block diagram of neuron, a nonlinear primitive used in neural networks; b) Commonly used characteristics for the neuron. (a) (b) x mappings to show that architectures composed of only neurons cannot do the job of approximation properly. Thus, care has to be exercised in the use of neural architectures, and the one size fit all approach will only lead to failure and disappointment concerning the real potential of neural networks. In general, a static neural network architecture 3 {also known as feedforward neural net} is formed from the three primitives mentioned above and fixed bias inputs. Similar to the human brain with roughly 00 billion neurons, it is assumed that a useful artificial neural net or ANN {artificial to differentiate it from the real neural nets} will have millions of nonlinear neurons. The neurons are grouped into different fields {or layers for feedforward nets}, and neurons in a particular field receive, as inputs, the sum of weighted signals from neurons from proceeding fields and bias signals as shown in Fig The summing junction is known as the synoptic junction, the weights as synoptic weights or synoptic efficacies {and classified as excitory if the weights are positive and as inhibitory if the weights are negative}, and the matrix consisting of the synoptic weights as synoptic matrix or connection matrix. A synoptic function of each neuron may receive signals from a large number {0 4 in the case of the human brain} of neurons from the proceeding field. Thus, the factors that make a neural net unique among all static, nonlinear mapping systems are: ) the use of few primitives or building blocks; 2) the use of a large number of the only nonlinear building block, the neuron; 3) the use of very large number of trainable synoptic weights, and 4) the use of very large number of interconnections. Referring to Fig. 5.3a of a feedforward neural net, we find a number of inputs going into the input layer {a distribution terminal}, an output layer and a number of intermediate layers known as hidden layers. From a nonlinear functional mapping perspective, the presence of hidden layers provides the capability to implement reasonable mappings using the simple & only one nonlinear primitive. In fact, earlier research in neural nets stalled temporarily when it was shown that even a simple mapping such as a two input ExclusiveOr problem {two binary inputs & one binary output as shown in Fig. 5.4} cannot be implemented using a single layer network. It is now widely accepted that two hidden layer are sufficient for acceptable mapping. As indicated before, the weights are found using I/O data samples and a training procedure. Neural network researchers term such estimation as modelfree estimation as (according to them) they do not use a mathematical model describing how systems outputs depend on its inputs. However, considering Fig. 5.3b, where we have shown a threelayer network, the outputs can be written in terms of the inputs as: 3 This definition doesn t take into consideration that the inputs to this network might come from a timeseries and updated continuously.

4 x y x y x 2 x x 2 w i Bias input y 2 x 2 0 x N Input nodes (Input layer) Field or layers w in i Synoptic weights (a) Synoptic junction Output nodes y M x x 2 (a) y[ x, x 2 ] (b) y 0, x 2 y= y=0 x y=0 y= 0,0,0, (c) (d) x x 2 y y 2 Figure 54. a) Logic symbol of a XOR function; b) The truth table or the input/output map assuming that both the inputs and the output take only binary values; c) A neural network realization of the XOR function; d) The I/O map as the inputs are varied from 0 to. y M x N Input layer st hidden layer (b) 2nd hidden layer Output layer Figure 53. a) The various elements leading to a feedforward neural network, b) An Ninput, Moutput threelayer neural network architecture. N x i = s w ji x j I i j= ; i = to N N x 2i = s w 2ji x j I 2i j= ; i = to N 2 N 2 y i = s w 3ji x 2j I 3i j= i = to M (5.6)

5 Thus, there is an underlying model involving the various primitives, though the model may be at a micro level. The contention that there is an underlying model becomes more valid in the case of other more complex neural networks as such as recurrent neural networks that we will see shortly. Before we do that, let s look at possible applications of neural nets and training of FFNN weights Implementational Issues As we learned in the previous subsection, an artificial neural network is formed from dense interconnection among a large number of elements taken from a very limited primitive set. Within that domain, we do have a number of choices concerning the signals to be processed, implementational issues, and applications. The inputs and or the outputs can be continuous signals, digital signals, or binary/bipolar signals. The continuous signals can be processed using analog, electronics or analog, optical implementation or digital technology using A/D converters in the front end and D/A converters at the output end. The digital signals can be processed using digital hardware based on binary logic. The binary/bipolar signals can be processed using analog or digital architecture. Also, we can either timemultiplex one or few processing units or use a massively parallel architecture. It may be true that the human brain {the true neural net} performs its tasks using nonlinear, massively parallel, asynchronous {no underlying clock} and feedback architecture. However, some or any of these may or may not apply to artificial neural nets depending on how they have been designed and implemented. Hence, caution must be exercised while coming across such "hyperbole Techniques for Signal Encoding Another unique property of neural architectures is the technique used for signal encoding. We all know that Bbits (and B channels in an implementational sense) can be used to uniquely denote 2 B symbols and to denote 2 B quantization levels in a weightedscheme. Here, the representation is obtained by minimizing (to the lowest level) the number of bits or channels needed in the representation. Of course, such a representation is neither fault tolerant 4, nor provides the ability for graceful degradation 5. In digital computing and communication areas, an accepted solution to these problems is through the use of error detection and correction schemes where we basically add a few more redundant (in an information point of view) bits to the codes corresponding to each symbol in a systematic manner. We know that higher the number of bits we add per symbol, the better the error correction / error detection properties will be. For example, 4 Outputs remain unchanged even if some of the parameters change. 5 Minimal change in the performance for some changes in the parameters. the addition of just one bit per symbol code {through the use of bit parity schemes, for example} enable us to detect if an odd number of bits have changed. Adding one bit alone doesn't help us in detecting the change in an even number of bits or help in determining the corrupted bits. We can, of course, add two or more bits to have better error detection and correction capability. Neural architectures take this concept to the extreme where Bbits are used to denote just B symbols or B quantization levels. Such an encoding is know as distributed coding and leads to a redundant encoding (we will have more than one code representing the same symbol). The redundancy leads to better fault tolerance (deviations in weights for example will have minimal effect on the input/output mapping), and graceful degradation (removing certain connections for example will not drastically change the mapping) properties. Of course, as the number of channels and the associated elements have increased enormously, it can be argued that the chances for failure are higher. More on this as we look at the applications Applications of Neural Nets By constraining the inputs and outputs to special domains (of values), we can gain some knowledge about possible applications for artificial neural nets. For example, if both the inputs and outputs are multivalued (in amplitude or magnitude), an ANN architecture can be configured to handle such a situation by: ) making the input layer and the output layer linear and use analog implementation (see Fig. 5.5a), or 2) encoding the inputs to binary/bipolar signals at the input layer, use a digital neural net implementation and decode to continuous values at the output layer. (Fig. 5.5b) Such architectures can be used for a) modeling of physical nonlinear systems (Fig. 5.5c), and b) as nonlinear controller architectures as shown in Fig. 5.5d. In the later application, though the controller is a feedforward or static architecture with guaranteed stability property, the controller and the plant/process combination forms a closedloop system whose stability may be open to question. This problem will be addressed as we look at feedback neural network architectures. Another example of multivalued inputs to multivalued outputs mapping is that of cleaning up a noisy signal (of N samples) or noisy image (of N N samples). Here again, the training will be performed using (deliberately) corrupted images as the inputs and the original images as the outputs. It should be added here that, unlike linear filtering where scaling and delaying (and or rotation in the case of images) will not have any effect on filtering capability of the system, the performance of ANNs (nonlinear systems) can widely vary depending on the signal strength. Hence, techniques such as prescaling before inputting the data to the ANN need to be considered.

6 A second possibility is where the inputs are multivalued and the outputs are constrained to be binary or bipolar (see Fig. 5.6). Such architectures are useful in classification and recognition applications. An example belonging to this category is recognition of a particular image (F6 fighter plane, for example) from a (known) set of M possible images given a gray scale image as input. We can use this gray scale image or the features extracted from this image as the inputs and M binary outputs to indicate the presence of one of the M possible images. As discussed in the case of phoneme recognition, the recognition system has to take into account the possibility that the input image can be a scaled, translated, and rotated version of the original, known image of the same object. This problem can be solved by choosing features that are scale/rotation/translation invariant or by letting the neural net learn such possibilities through training with different versions of the same image. Another example of multivalued inputs to binary/bipolar outputs mapping is in speech processing (Fig. 5.7). In vowel recognition, the firsttwo formant frequencies are used to identify the vowel phonemes. Here, we have two multivalued inputs and binary outputs to indicate one of forty or so phonemes. x x 2 x N x 3layer analog neural network architecture (a) Plant Neural Network as model (c) y p y m y y 2 y M e x x 2 x N Sampler & encoder Neural Network as controller 3layer digital neural network architecture (b) Decoder & D/A converter Desired output from the plant (d) control input Plant y y 2 y M y p Figure 55. a) An analog neural net implementation with continuous inputs and outputs; b) A digital architecture with continuous inputs and outputs; c) A neural network used for modeling the behavior of a given plant; d) A neural net used as a nonlinear controller of a given plant.

7 x F 2 (z) IY (i) x 2 Artificial neural network y y 2 OW (o) A (a) F (Hz) F F 2 Artificial neural network y i 20 (a) (b) x N y M Multivalued inputs (a) Bipolar or binary outputs Figure 57. a) The firsttwo formant frequencies and their relationship to some vowel phonemes in English language; b) A block diagram of a neural network that identifies the occurrence of a particular vowel in a segment of speech based on the firsttwo formant frequencies. Figure 56. a) A neural net with multivalued inputs and binary or bipolar outputs; b) Grayscale (multivalued) images belonging to the class of planes. The neural net may have binary/bipolar outputs indicating if a given image belong to the class of planes or not or outputs indicating the subclasses within the plane class. Finally, both the inputs and outputs can be binary. A typical example is again in image recognition. In computers, for example, the English alphabets are represented by a binary matrix of M N bits (M = 2, N = 9). Assuming that only the upper case letters are possible, we have only 26 possibilities. A neural net used to recognize the occurrence of one of these 26 letters will have the M N (08 bits) bits as inputs and 26 bits as outputs (under distributed encoding) as shown in Fig Another use is in "associative memory" application where an Nbit binary string (vector a) is associated with an Mbit binary string (vector b) (M, N very large), with the association preknown. The ANN can be used to a) provide the associated vector b given the vector a or b) provide both a and b given partial information on a and b. We examine the implications from such applications below Redundancy in Coding & Redundancy in Problem Domain Earlier, we discussed redundancy introduced through distributed encoding and other similar methods. By studying the two problems discussed above, we can see redundancy arising from the problem domain.

8 2 bits 9 bits (a) x i Artificial 08 neural network 26 Figure 58. a) A 2 9 binary representation of English alphabets; b) Block diagram of a neural net architecture to identify one of 26 possible uppercase alphabets. Both x i, y i are binary. Considering the alphabet recognition task, we have 08 input bits (assuming M = 2, N = 9) and 26 output bits. Thus, the input can take 2 08 = unique combinations, the output 2 26 = unique combinations and the system I/O State can be one of the 2 (0826) = possible states. Out of these over 0 32 input states, only 26 states are valid states under ideal condition (no input bits corrupted). If one is daring enough, he (she) can go ahead & design a digital logic (with 08 bits as inputs and 26 bits as outputs). In such a design the other input combinations will be considered as a) don'tcares implying that the outputs corresponding to these combinations can be anything 6, a common practice in digital design, or b) force the outputs corresponding to certain input combinations that are close to the 26 ideal combinations as representations of one of those 26 valid states. The latter approach is sort of a trade mark of neural networks. As we train the system, the weights converge to values such that the ANN points to the same class (here one of 26 classes of the alphabets) even if some input bits are in error. We can also specifically train the net as we do in the case of digital design. Because of this and the presence of a very large number of weights etc., the correct class is identified even when 6 Of course, the digital system designed will produce a specific output corresponding to any (& all) input combination. (b) y i certain weights are changed and or the interconnections disturbed. Hence the fault tolerant and graceful degradation properties of neural architectures Storage capacity & crosstalk It should be clear that properties such as fault tolerance and graceful degradation are achieved by introducing redundancy, a tradeoff Omnipresent in all engineering applications. In the above example, we store only 26 classes. Suppose we excite the network with all possible combinations of the input, observe the corresponding outputs, and classify them into unique classes or vectors. We can generally expect a) still 26 classes only or b) c ( 26 << c << 2 26 ) classes. That is, the network splits the entire input combination space in R 26 into a) 26 classes or b) c classes. This value (26 or c) can be considered to be the storage capacity of the network and will be much less than Thus, fault tolerance is achieved at the cost of reduced storage capacity. We will look at storage capacity further when we study feedback NNs. Suppose the same network (which has already been trained to recognize 26 uppercase letters) is trained further to recognize the 26 lowercase letters as well (52 classes in all). After the new training, it is quite possible that the new classes make it difficult to recognize the older classes, or similar classes get represented by same class etc. In the NN terminology, such a possibility is known as crosstalk Approximation or Training or Learning 5.3 Recurrent Neural Nets 5.3. Basic Concepts A key feature of the human brain is the presence of feedback that gives rise to the dynamical behavior and the memory capability. ANNs that are patterned after the human brain thus need to incorporate feedback as well. From earlier chapters, we know that feedback provides the memory capability, and for a feedback system to be realizable there should not be any delay free loops {closed paths}. Thus, in a continuous ANN architecture, each and every loop will contain at least one integrating (or differentiating, though the former is preferred due to its lowsensitivity property to noise) element whereas the loops in a discrete ANN architecture will have at least one delay

9 unit. Thus, a mathematical model for a feedback neural net {also known as a recurrent neural net or RNN} can be written in the state space form as: continuous case: discrete case: x D ( n )T x c dx c = f dt c [ x c ] (5.7a) ( ) = f D [ x D ( nt) ] (5.7b) where x c, f c [.] etc. are vectors. The models given above are the same as the representations for nonlinear dynamical systems. Thus, RNN are indeed special 7 cases of nonlinear dynamical systems. The models can be rewritten to indicate the presence of the synoptic weight matrix M and the input (bias or external) signals i as: continuous case: [ ] (5.8a) x c = f c x c,m,i discrete case: x D ( n )T ( ) = f D [ x D ( nt), M( nt), i] (5.8b) and the training or learning property can be represented as: continuous case: m c = ˆ discrete case: m D ( n )T ) = ˆ f c [ x c,m c ] (5.9a) [ ( )] (5.9b) f D x D ( nt ),m D nt where we have used m to indicate the column vector formed from the elements of the matrix M. The above equations simply indicate that the differentials of the weights go to zero as the weights reach their constant, steadystate values. From a careful look at the above representations, especially the discrete ones, two important observations can be made. First, we have represented the discrete models as synchronous (due to our familiarity with such models & the relative ease with which such models can be handled), whereas, on reflection it should be clear that a human brain works asynchronously. Second, the update times for the state variables and the weights are different (T and T respectively) with T >> T 8. That is, the state variables change much faster than the synoptic weights, as it should be. In neuron network terminology, the state values (x) are thus called shortterm memory (STM) where as the synopses as longterm memory (LTM). 7 Special since the nonlinear terms used are limited. Later, we will also study some other properties that make RNNs unique nonlinear systems. 8 If we are forming discretedomain RNNs from continuousdomain RNNs, the both the values must be small enough to prevent aliasing. Of course, not all nonlinear dynamical systems are neural systems. We need to incorporate the same constraints indicated while discussing feed forward neural networks and a few more. Thus, ) the use of few primitives or building blocks; 2) the use of the only nonlinear building block, the neuron with a saturating output; 3) the use of very large number of these building blocks; 4) dense interconnection; 5) feedback; and 6) learnability or trainability characterize recurrent or feedback neural networks. We will add later some more constraints that arise due to the presence of feedback and the specific tasks expected of RNNs RNN Architectures Different approaches can be used to arrive at numerous RNN architectures. A straightforward approach is to add feedback to a feedforward neural network. Another approach is to derive architectures from a careful study of the biological models. A third approach, proposed by this author, is to use nonlinear electrical network theory and the building block approach discussed in this book. A simple RNN architecture derived using the first approach is shown in Fig In the figure, we have a FFNN with no hidden layer and feedback between their inputs and outputs through integrator units (shown with selffeedback). The outputs of the integrators, x, the state of a dynamical system, are the state of the neuronal dynamical system that can take any value in the real vector space R n. They are fed to the neurons to produce the bounded signal state s(x) (This should explain why the nonlinear functions appear first in the FFNN subsystem). The signal state space is thus an ndimensional hypercube. The synoptic matrix or connection matrix M= m ij [ ] multiplies the signal state. Bias or input signal is added to this output and the result, along with selffeedback from the outputs of the integrators, becomes the inputs to the integrators. The dynamics of the RNN architecture in Fig. 5.9 can be written as: where f i x i = f i x i n I i ; i =ton (5.0) j= [ ] m ij s[ x i ] [ x i ] is some nonlinear mapping of the independent variable x i. In x i [ ] = a i x i, a i > 0 and the practice, this mapping is constrained to be linear, f i dynamics corresponding to this case can be written in a matrix form as: x = Ax Ms[ x] i (5.)

10 where A is a diagonal matrix. S(x ) S(x N ) Weigh & sum M.S[x] Bias input addition I I 2 I N x f [ ] s Integration units with selffeedback x N f N[ ] s x x N feedback projection from neuron field F y to neuron field F x 9. A special case of bidirectional networks is the bidirectional associative memories or BAM (continuous additive bidirectional associative memory or CABAM since the dynamics is described in the continuous domain). In this architecture, the signal states { s[ x], s[ y]} reach certain stable values given x[ 0], y[ 0] and hence {s[ x], s[ y]} can be considered as associative pairs. When P and Q are not equal to zero, they are assumed to be symmetric (this applies to M of autoassociative architecture as well) with positive diagonal entries { p ii, q ii > 0 }(excitory connections) and nonpositive offdiagonal elements { p ij, q ij 0, i j} (inhibitory connections ). The symmetry is considered to be a reflection of a lateral inhibition or a competitive connection topology 0. The strength of the inhibitory connections is assumed to be a decreasing function of the distance i j. The inputs i i and j j, when present, are added directly to the dynamics in the above models. Hence, such models are known as additive activation models. We will see other models as we study some wellknown RNNs shortly. But let us first look at some central issues related to RNNs. Figure 59. A continuous RNN architecture based on feedforward neural network and feedback through integrator units. In the above model, all the signal states are assumed to be in a signal field Fx and used collectively to affect the future state. Such architecture is known as autoassociative or unidirectional. By adding another field F y (with the associated state y of size m) and allowing for cross connections as shown in Fig. 5.0, we arrive at a dynamics given by: x y = A 0 x 0 B y P N M s[ x] Q s[ y] i j (5.2) Depending upon the characteristics of the synoptic submatrices P, M, N, and Q, we arrive at different network categories. When P = Q = [0], the resulting architecture is called hetroassociative. M = N t leads to bidirectional networks (F x feeding into the inputs of the integrators of field F y and F y feeding into the inputs of the integrators of field F x ). Here M is called the forward projection or feedforward from neuron field F x to the neuron field F y and N backward or 9 The readers may note that these classifications differ vastly from the definitions found in the classical network and system theory literature. 0 We will discuss later the real requirement for symmetry using network interpretations.

11 x y F x F y S[x] *M P S[x] *P S[y] *N *Q N S[y] Response & stability definitions for RNNs Figure 50. An hetroassociative RNN architecture with two fields F x, F y. Vector/matrix notations are used to represent various operations. I J An arbitrarily chosen RNN, being a dynamical feedback system, can be: ) absolutely, globally stable with x e = 0 as the equilibrium point, 2) locally stable with multiple, nonzero equilibrium points, 3) a system that exhibits limit cycle response or even chaotic response, or 4) completely unstable. using the various classifications of general nonlinear dynamical systems seen in earlier chapters. Thus, we need to know which of the above behavior(s) is (are) acceptable for practical RNNs. Further, as we know from previous chapters, the above behavior classifications are based on the transient response or the response to initial conditions. Thus, we also need to ask if we should restrict our network operation to such situations only or should we be concerned with the network operation when external stimuli are present and if so what kind of stimuli? Constant valued (DC) inputs or one that varies with respect to time? Let us develop the answers to these questions using intuition and the possible application of RNNs as our guide. Consider the problem of the English alphabet (upper case only) recognition where we might have a corrupted n r by n c bitmatrix to start with. We can expect the information regarding the 26 correct combinations some how stored in the synoptic matrix M. We can think of applying the corrupted matrix: *A *B s I s I ) a)as the input i to the system & keep it permanently (DC input) or 2) b) as the initial signal state s x( 0) [ ] with no other stimuli. Case () is possible with an absolutely globally stable system {which also possesses BIBO or bounded input, bounded output stability property} where we force the DC response to one of the 26 desired states 2. For case (2), the network cannot be an absolutely, globally stable system since the transient response of such systems tend to zero (the equilibrium point) as time progresses. Thus, we need to use locally stable systems with multiple nonzero equilibrium points, some of which are attracting points 3. In earlier chapters, we learned that multiple equilibrium points are possible using ) networks consisting of reactive elements with multiple relaxation points, or 2) networks consisting of both positive and negative resistors. The RNNs proposed so far falls under the later category, as we will see in section 5.4. The above example points to the use of the response of RNNs to initial conditions. Another problem where the response of RNNs to both initial conditions and external stimuli (DC input) is used is in the area of functional minimization. The networks used for this application category also correspond to networks with nonzero equilibrium points. We will look at this problem after we look at some wellknown RNN models. In summary, the RNN architectures proposed so far do not correspond to globally, absolutely stable systems in the classical, nonlinear dynamics sense, and their applications are limited to cases involving response to stored energy or response to DC stimuli. The number of stable equilibria in a given RNN network represents the storage capacity of that network and the parameters are adjusted {trained or learned, in the NN terminology) so as to place the stable equilibria at values dictated by the application under consideration. We will have to preprocess the signal so that it lies in the 0 to range of the s[.] function. Thus, we may change all zeros to 0.0 & all ones to 0.99 etc. 2 How do we design such a system is another question & we have not yet addressed that problem. 3 NN literature use the term globally stable to describe the behavior of such networks to imply that those equilibrium points are stable.

12 5.3.4 Some well known RNNs Continuous Domain Models Hopfield Model A simple RNN model known as Hopfield circuit 4 is given by: c i x i = x i R i n m ij s j [ x j ] I i ; i = ton (5.3) j= where M = M t (symmetric synoptic matrix), s j [ x j ] > 0, and monotonically increasing, and bounded. That is, the Hopfield circuit corresponds to a special case of the autoassociative model seen before. This circuit, which reignited the interest in NNs once again in the early 980's, uses a synoptic matrix M that is learned (or computed) offline and is shown to be useful in two applications: ) cleaning up corrupted binary (or bipolar) data, and 2) functional optimization. In both the applications, the function s j [ x j ] > 0 is chosen to be the threshold function (with output values of zero and one). In application #, the corrupted data is used as the initial value of the Signal State, s[ x], and the external stimuli i is set equal to zero. After some iteration, the Signal State s[ x] will settle to a fixed value, providing the filtered data. Thus, as indicated earlier, the Hopfield circuits represent a locally stable system with multiple equilibrium points, one of which is reached, depending upon the initial condition. In fact, it has been shown that a specific M corresponds to making certain number of binary (bipolar) vectors as the attracting equilibrium points of the circuit and the resulting equilibrium point corresponding to an initial state is simply a stored vector with the least Hamming distance to the initial state 5. The coefficients c i, R i, i i, m ij (i, j = to n) of (5.3) represent the degrees of freedom available in the Hopfield dynamics. By varying their values properly, we can control the number of stable equilibria and their values. We have approximately n 2 (for n large) coefficients or degrees of freedom, whereas we have 2 n possible signal state values. The growth in the number of degrees of freedom becomes very small compared to the possible signal state values as the value of n increases. Therefore, we can expect the maximum number of stable equilibria that is possible by the variation of these limited number of 4 5 Thus, there is also the possibility for oscillation as more than one stored vector can have the same, least Hamming distance to the initial state. coefficients to be very small compared to the total number of signal states. It has been shown through simulations and rigorous analytical approaches that the maximum number of stable equilibria or the storage capacity of Hopfield network is approximately equal to 0.5 n, a very small number compared to the total signal states of the dynamics. This can be considered a major limitation of Hopfield (and other RNN) dynamics, but we have to keep in mind that the limited number of stable equilibria gives rise to better fault tolerance and graceful degradation properties. Later we will consider if the storage capacity can be increased through the use more complex nonlinear mapping functions using the network approach (section 5.3.4). Tank and Hopfield recognized that their circuit could be used for a special case of functional optimization known as linear programming (LP). Given a variable vector x (x R n ), constant vectors a, c and constant matrix B, the task in LP is to find the value of vector x such that the scalar cost function. φ[ x] = a T x (5.4) is minimized and where the solution x satisfies the p constraints f[ x] = Bx c 0 (5.5) A simple example of this LP problem is the onevariable case (n = ) with φ[ x] = x (5.6) as the linear function to be minimized subject to a single constraint (p = ) f[ x] = x 0 (5.7) It can be noted that this simple problem has a unique solution given by x = 0. A firstorder Hopfield circuit can be used to find the solution to this problem. The firstorder dynamics is given by: cx = x R m s[ x] i (5.8) Tank and Hopfield argued that the network minimizes a pseudo energy function 6 E[x] given by: 6 Tank & Hopfield called this an energy function, similar to a Lyapunov function, as its time derivation evaluated along the circuit dynamics is negative semidefinite. We have added the term pseudo to stress the possibility of the function E[x] becoming negative. Also, we will find later that the function that really gets minimized has the dimension of power and not energy. We discussed

13 E[ x] = i x x 2 2R x m S [ v ]dv (5.9) That is, the equilibrium point x e obtained from the dynamics of the Hopfield circuit correspond to the minimum of E[ x]. The circuit corresponding to the dynamics in (5.8) will provide the solution to the LP problem if we can make the following identifications: i = m = leading to E[ x] as: s[ x] = o if x 0 x if x < 0 E[ x] = x x2 2R x 0 (5.20) s[ v]dv (5.2) 0 The first term in the above expression corresponds to the LP function to be minimized, and the other two terms are added to make the energy function larger as and when the constraint is not satisfied (a penalty function or constraint). If the circuit is simulated with x(0) > 0, x will tend to x e = 0 which is the solution to the LP problem 7. In summary, in this application we excite the network with an external DC stimulus and some initial value and hope that the obtained solution is independent of the initial condition. Of course, this is possible only if the error function is bowl shaped with only one minimum. However, the dynamics and hence the cost function being minimized is in general nonlinear and hence the minimum reached may not be the global minimum. this in general terms in chapter 3 and came across the term 'mixed potential' which is really getting minimized. 7 The choice of s[.] as given here will make x as t for any x(0) < 0. We will analyze the cause for this problem and look at an alternate energy function as we study RNNs from 'Electrical Nets' perspective in section Continuous Additive Bidirectional associative Memory (CABAM) We have seen this model as we discussed the evolution of general RNN architectures from feedforward NNs. The dynamics is given by: m x i = a i x i m ij s j [ y j ] i i ; i = to n j= n y j = ˆ a j y j m ij s i [ x i ] j j ; j = to m i= (5.22) It can be observed that this model is a hetro associative analog of the Hopfield circuit. We will look at this model once again from an electrical network perspective in section Grossberg Models In this section, we present a number of biologically inspired RNN models to illustrate the slow and painful evolution leading to the various models. Restricting to an autoassociative model with field F x of n neurons, Grossberg models restrict the neuronal output to a finite range [0, B i ] and hence are mostly useful for pattern classification and recognition problems. A pattern is denoted as p = [p p 2 L p n ] t where p i (i = to n) are normalized such that n p i= i = and thus p i can be considered as a probability distribution corresponding to a pattern. Letting I as the background illumination and I i the reflectance from the ith element, we have I i = p i I with simple additive activation model can be written as: n I i i= = I and a x i = (a i I i )x i b i I i ; i = to n (5.23) The constant a i, which represents the term corresponding to selffeedback, leads to a stable system, and has a decaying response when I i = 0, is called the passive decay rate. From the expression, we can note that x i is bounded in the range [0, b i ]. Since in general I i is a constant, this dynamics represent a simple linear, uncoupled model whose solution can be written down easily. Since I i affects only x i, the term additive activation. We can note that x i b i as I regardless of the value of p i. Grossberg calls this saturation (an undesirable effect) of large inputs. He went on to suggest that the inputs I j (j= to n, j i) should also be used in the expression for x i, in an inhibitory sense. The corresponding simplest model is:

14 x i = (a i I i )x i a i I i x i I j j i (5.24) = (a i I)x i a i I i Thus, the dynamics represents an uncoupled, linear model for fixed I i and hence a closed form solution for x i can be easily written. It can be shown that x i p i b i as I. That is, we no longer have saturating outputs but outputs that represent the pattern. Thus, the pattern information gets stored in the neuron outputs. Grossberg calls this multiplicative activation model.. The multiplicative model can be modified to allow the activation variable x i to assume negative values in a finite range [c i, b i ], where c i > 0 and c i << b i. This leads to what is known as a multiplicative, shunting activation model:: x i = (a i I)x i b i I i c i I j (5.25) j i A more general shunting activation is possible by allowing cross coupling between the various states x i. Separating the inputs as excitory inputs I i and inhibitory inputs J i, the model can be written as: x i = ( a i I i J i )x i b i I i c i J i c i s i [ x i ] c i s j [ x j ] x i s i x i j i [ ] s j x j j i [ ] (5.26) That is, for the first time in the evolution of the Grossberg models, we have a true nonlinear model involving cross coupling between the various states, and it took almost 5 years to get to this point. In the next section we will show how these and other exotic RNN models that can be derived easily using the "building block concept" advanced in this book CohenGrossberg Model A RNN model proposed by CohenGrossberg has the activation dynamics given by: n x i = a i [ x i ] b i [ x i ] m ij s j [ x j ] (5.27) j= [ x i ] is assumed to be nonnegative and bounded, s j x i where the function a i [ ] bounded monotone nondecreasing function, and b i [ ] belongs to the set that ensures the stability of the activation dynamics Continuous Bidirectional Associative Memories This model results from extension to hetro associative case of the above Cohen Grossberg model and has the dynamics given by: x i = a i x i y j = ˆ a j y j m [ ] b i [ x i ] m ij s j y j b j y j j= n [ ] ˆ [ ] m ijˆ i= [ ] ; s i [ x i ] ; i =ton j = tom (5.28) where we have used ' ^ ' to differentiate the functions used for the F x field dynamics from those of F y. In practice, all the nonlinear functions are made identical to simplify the design problem. In summary, it can be noted that the RNN dynamics known so far correspond to nonlinear dynamics that are not globally stable in the classical system theory 8 sense. They correspond to dynamics with more than one locally stable equilibrium point that produce stable outputs under the application of external, DC excitation. More importantly, they are carefully hand crafted to lead to the desired behavior Discrete RNN Models Discrete RNN models often use neurons with threshold signal functions (binary or bipolar output) and hence can be called bivalent RNN models. They can be derived: ) by a direct approach that is, the dynamics will be represented by nonlinear difference equations, or 2) from an analog model and a suitable discretization procedure. we will use the later approach and present one specific model here. Considering the CABAM seen before (equation 5.22) and using the following substitutions: a i, ˆ a j = T, d dt = z T m ˆ ij = T.m ij, ˆ I i = T.I i, J i = T.J i We obtain a discrete hetro associate dynamics as: (5.29) 8 Another way to put it is to say that the electrical network architectures corresponding to such dynamics are nonpassive. We will use this line of thinking in the next section.

15 m x i ( k ) = m ˆ ij s j y j ( k ) j= n y j ( k ) = m ˆ ij s i x i ( k) i= [ ] [ ] I ˆ i ( k) ; i =to n J ˆ j ( k) ; j = tom (5.30) x i (k) S i S i [x i (k)] where k represents the time or iteration index. When ˆ I i (k), J ˆ i (k) are constant (DC excitation), we expect x i (k), y i (k) to stabilize to some steadystate values as k. When the neuron signal functions s i [ ], s j [ ] are constrained to be the threshold functions, the model reduces to the bidirectional associative or BAM model. The signal functions s i [ ], s j [ ] can also be made complex while retaining the threshold function property. For example: S i [x i (k )] u i z Unit delay x i s i [ x i ( k ) ] = if x i (k)> U i s i [ x i ( k) ] if x i (k) = U i 0 if x i (k) < U i (5.3) Figure 5. A discretetime threshold function with one memory used as a neuron. and similarly for s j [ ] are used and where U i (V j for s j [ ] ) is a threshold value (chosen to be equal to zero for all i and j). That is, memory is introduced into the signal functions themselves (in addition to memory present due to feedback) { see Fig. 5.}, and thus a continuous domain network equivalent of such dynamics needs to have delay elements, in addition to other lumped passive elements. We can combine the two sets of equations and write them together as: x i (k ) = m ij s j [y j ( k)] I i (k) j if x i ( k) > U i s i [ x i ( k) ]= s i [ x i (k)] if x i ( k) = U i 0 if x i ( k) < U i y j ( k ) = m ij s i [ y i ( k) ] J j ( k) s j [ y j ( k ) ] = i if y j ( k) > V j s i [ y j ( k) ] if y j ( k) = V j 0 if y j ( k) < V j (5.32) The memory in the signal functions s i [ ], s j [ ] and the resulting possible actions lead to what is known as "stayat the same value" capability and acts as a tie breaker that makes possible a steady state value. Using the model above, we can contemplate different implementations. Starting with x i (k 0 ), y i (k 0 ) (where k 0 is some initial time index), if we do the update simultaneously or as a block, we obtain a synchronous model. Alternately, we can update the state information for field F x, followed by

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

NONLINEAR AND ADAPTIVE (INTELLIGENT) SYSTEMS MODELING, DESIGN, & CONTROL A Building Block Approach

NONLINEAR AND ADAPTIVE (INTELLIGENT) SYSTEMS MODELING, DESIGN, & CONTROL A Building Block Approach NONLINEAR AND ADAPTIVE (INTELLIGENT) SYSTEMS MODELING, DESIGN, & CONTROL A Building Block Approach P.A. (Rama) Ramamoorthy Electrical & Computer Engineering and Comp. Science Dept., M.L. 30, University

More information

CHAPTER 3. Pattern Association. Neural Networks

CHAPTER 3. Pattern Association. Neural Networks CHAPTER 3 Pattern Association Neural Networks Pattern Association learning is the process of forming associations between related patterns. The patterns we associate together may be of the same type or

More information

Artificial Neural Network and Fuzzy Logic

Artificial Neural Network and Fuzzy Logic Artificial Neural Network and Fuzzy Logic 1 Syllabus 2 Syllabus 3 Books 1. Artificial Neural Networks by B. Yagnanarayan, PHI - (Cover Topologies part of unit 1 and All part of Unit 2) 2. Neural Networks

More information

Lecture 7 Artificial neural networks: Supervised learning

Lecture 7 Artificial neural networks: Supervised learning Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET Unit-. Definition Neural network is a massively parallel distributed processing system, made of highly inter-connected neural computing elements that have the ability to learn and thereby acquire knowledge

More information

Using a Hopfield Network: A Nuts and Bolts Approach

Using a Hopfield Network: A Nuts and Bolts Approach Using a Hopfield Network: A Nuts and Bolts Approach November 4, 2013 Gershon Wolfe, Ph.D. Hopfield Model as Applied to Classification Hopfield network Training the network Updating nodes Sequencing of

More information

Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE 4: Linear Systems Summary # 3: Introduction to artificial neural networks DISTRIBUTED REPRESENTATION An ANN consists of simple processing units communicating with each other. The basic elements of

More information

Neural Networks Introduction

Neural Networks Introduction Neural Networks Introduction H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011 H. A. Talebi, Farzaneh Abdollahi Neural Networks 1/22 Biological

More information

Christian Mohr

Christian Mohr Christian Mohr 20.12.2011 Recurrent Networks Networks in which units may have connections to units in the same or preceding layers Also connections to the unit itself possible Already covered: Hopfield

More information

Artificial Neural Networks Examination, June 2004

Artificial Neural Networks Examination, June 2004 Artificial Neural Networks Examination, June 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Simple Neural Nets for Pattern Classification: McCulloch-Pitts Threshold Logic CS 5870

Simple Neural Nets for Pattern Classification: McCulloch-Pitts Threshold Logic CS 5870 Simple Neural Nets for Pattern Classification: McCulloch-Pitts Threshold Logic CS 5870 Jugal Kalita University of Colorado Colorado Springs Fall 2014 Logic Gates and Boolean Algebra Logic gates are used

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

Neural Networks. Associative memory 12/30/2015. Associative memories. Associative memories

Neural Networks. Associative memory 12/30/2015. Associative memories. Associative memories //5 Neural Netors Associative memory Lecture Associative memories Associative memories The massively parallel models of associative or content associative memory have been developed. Some of these models

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

3.3 Discrete Hopfield Net An iterative autoassociative net similar to the nets described in the previous sections has been developed by Hopfield

3.3 Discrete Hopfield Net An iterative autoassociative net similar to the nets described in the previous sections has been developed by Hopfield 3.3 Discrete Hopfield Net An iterative autoassociative net similar to the nets described in the previous sections has been developed by Hopfield (1982, 1984). - The net is a fully interconnected neural

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Computational Intelligence Lecture 6: Associative Memory

Computational Intelligence Lecture 6: Associative Memory Computational Intelligence Lecture 6: Associative Memory Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Fall 2011 Farzaneh Abdollahi Computational Intelligence

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on

More information

Lecture 4: Feed Forward Neural Networks

Lecture 4: Feed Forward Neural Networks Lecture 4: Feed Forward Neural Networks Dr. Roman V Belavkin Middlesex University BIS4435 Biological neurons and the brain A Model of A Single Neuron Neurons as data-driven models Neural Networks Training

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

Neural Networks. Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation

Neural Networks. Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation Neural Networks Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation Neural Networks Historical Perspective A first wave of interest

More information

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required.

In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required. In biological terms, memory refers to the ability of neural systems to store activity patterns and later recall them when required. In humans, association is known to be a prominent feature of memory.

More information

Iterative Autoassociative Net: Bidirectional Associative Memory

Iterative Autoassociative Net: Bidirectional Associative Memory POLYTECHNIC UNIVERSITY Department of Computer and Information Science Iterative Autoassociative Net: Bidirectional Associative Memory K. Ming Leung Abstract: Iterative associative neural networks are introduced.

More information

Artificial Intelligence Hopfield Networks

Artificial Intelligence Hopfield Networks Artificial Intelligence Hopfield Networks Andrea Torsello Network Topologies Single Layer Recurrent Network Bidirectional Symmetric Connection Binary / Continuous Units Associative Memory Optimization

More information

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.

More information

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Neural Networks Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Brains as Computational Devices Brains advantages with respect to digital computers: Massively parallel Fault-tolerant Reliable

More information

Lecture 15: Exploding and Vanishing Gradients

Lecture 15: Exploding and Vanishing Gradients Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

CS:4420 Artificial Intelligence

CS:4420 Artificial Intelligence CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart

More information

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5 Artificial Neural Networks Q550: Models in Cognitive Science Lecture 5 "Intelligence is 10 million rules." --Doug Lenat The human brain has about 100 billion neurons. With an estimated average of one thousand

More information

Part 8: Neural Networks

Part 8: Neural Networks METU Informatics Institute Min720 Pattern Classification ith Bio-Medical Applications Part 8: Neural Netors - INTRODUCTION: BIOLOGICAL VS. ARTIFICIAL Biological Neural Netors A Neuron: - A nerve cell as

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

More information

Sample Exam COMP 9444 NEURAL NETWORKS Solutions

Sample Exam COMP 9444 NEURAL NETWORKS Solutions FAMILY NAME OTHER NAMES STUDENT ID SIGNATURE Sample Exam COMP 9444 NEURAL NETWORKS Solutions (1) TIME ALLOWED 3 HOURS (2) TOTAL NUMBER OF QUESTIONS 12 (3) STUDENTS SHOULD ANSWER ALL QUESTIONS (4) QUESTIONS

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, Sasidharan Sreedharan

EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, Sasidharan Sreedharan EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, 2012 Sasidharan Sreedharan www.sasidharan.webs.com 3/1/2012 1 Syllabus Artificial Intelligence Systems- Neural Networks, fuzzy logic,

More information

2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net.

2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net. 2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net. - For an autoassociative net, the training input and target output

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Integer weight training by differential evolution algorithms

Integer weight training by differential evolution algorithms Integer weight training by differential evolution algorithms V.P. Plagianakos, D.G. Sotiropoulos, and M.N. Vrahatis University of Patras, Department of Mathematics, GR-265 00, Patras, Greece. e-mail: vpp

More information

Artificial Neural Networks. Edward Gatt

Artificial Neural Networks. Edward Gatt Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very

More information

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang C4 Phenomenological Modeling - Regression & Neural Networks 4040-849-03: Computational Modeling and Simulation Instructor: Linwei Wang Recall.. The simple, multiple linear regression function ŷ(x) = a

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

Unit III. A Survey of Neural Network Model

Unit III. A Survey of Neural Network Model Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks Datamining Seminar Kaspar Märtens Karl-Oskar Masing Today's Topics Modeling sequences: a brief overview Training RNNs with back propagation A toy example of training an RNN Why

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Introduction to Machine Learning Spring 2018 Note Neural Networks

Introduction to Machine Learning Spring 2018 Note Neural Networks CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,

More information

Introduction to Artificial Neural Networks

Introduction to Artificial Neural Networks Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons

More information

Chapter 9: The Perceptron

Chapter 9: The Perceptron Chapter 9: The Perceptron 9.1 INTRODUCTION At this point in the book, we have completed all of the exercises that we are going to do with the James program. These exercises have shown that distributed

More information

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller 2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks Todd W. Neller Machine Learning Learning is such an important part of what we consider "intelligence" that

More information

Chapter 3 Supervised learning:

Chapter 3 Supervised learning: Chapter 3 Supervised learning: Multilayer Networks I Backpropagation Learning Architecture: Feedforward network of at least one layer of non-linear hidden nodes, e.g., # of layers L 2 (not counting the

More information

On the Hopfield algorithm. Foundations and examples

On the Hopfield algorithm. Foundations and examples General Mathematics Vol. 13, No. 2 (2005), 35 50 On the Hopfield algorithm. Foundations and examples Nicolae Popoviciu and Mioara Boncuţ Dedicated to Professor Dumitru Acu on his 60th birthday Abstract

More information

Instituto Tecnológico y de Estudios Superiores de Occidente Departamento de Electrónica, Sistemas e Informática. Introductory Notes on Neural Networks

Instituto Tecnológico y de Estudios Superiores de Occidente Departamento de Electrónica, Sistemas e Informática. Introductory Notes on Neural Networks Introductory Notes on Neural Networs Dr. José Ernesto Rayas Sánche April Introductory Notes on Neural Networs Dr. José Ernesto Rayas Sánche BIOLOGICAL NEURAL NETWORKS The brain can be seen as a highly

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

Handout 2: Invariant Sets and Stability

Handout 2: Invariant Sets and Stability Engineering Tripos Part IIB Nonlinear Systems and Control Module 4F2 1 Invariant Sets Handout 2: Invariant Sets and Stability Consider again the autonomous dynamical system ẋ = f(x), x() = x (1) with state

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Threshold units Gradient descent Multilayer networks Backpropagation Hidden layer representations Example: Face Recognition Advanced topics 1 Connectionist Models Consider humans:

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

Artificial Neural Network

Artificial Neural Network Artificial Neural Network Contents 2 What is ANN? Biological Neuron Structure of Neuron Types of Neuron Models of Neuron Analogy with human NN Perceptron OCR Multilayer Neural Network Back propagation

More information

PV021: Neural networks. Tomáš Brázdil

PV021: Neural networks. Tomáš Brázdil 1 PV021: Neural networks Tomáš Brázdil 2 Course organization Course materials: Main: The lecture Neural Networks and Deep Learning by Michael Nielsen http://neuralnetworksanddeeplearning.com/ (Extremely

More information

Artificial Neural Network

Artificial Neural Network Artificial Neural Network Eung Je Woo Department of Biomedical Engineering Impedance Imaging Research Center (IIRC) Kyung Hee University Korea ejwoo@khu.ac.kr Neuron and Neuron Model McCulloch and Pitts

More information

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Artificial Neural Networks and Nonparametric Methods CMPSCI 383 Nov 17, 2011! Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Neural Networks biological neuron artificial neuron 1

Neural Networks biological neuron artificial neuron 1 Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input

More information

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension

More information

Reification of Boolean Logic

Reification of Boolean Logic 526 U1180 neural networks 1 Chapter 1 Reification of Boolean Logic The modern era of neural networks began with the pioneer work of McCulloch and Pitts (1943). McCulloch was a psychiatrist and neuroanatomist;

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Neural Nets and Symbolic Reasoning Hopfield Networks

Neural Nets and Symbolic Reasoning Hopfield Networks Neural Nets and Symbolic Reasoning Hopfield Networks Outline The idea of pattern completion The fast dynamics of Hopfield networks Learning with Hopfield networks Emerging properties of Hopfield networks

More information

Neural Networks Lecture 6: Associative Memory II

Neural Networks Lecture 6: Associative Memory II Neural Networks Lecture 6: Associative Memory II H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi Neural

More information

Simple neuron model Components of simple neuron

Simple neuron model Components of simple neuron Outline 1. Simple neuron model 2. Components of artificial neural networks 3. Common activation functions 4. MATLAB representation of neural network. Single neuron model Simple neuron model Components

More information

CSC 411 Lecture 10: Neural Networks

CSC 411 Lecture 10: Neural Networks CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets Neural Networks for Machine Learning Lecture 11a Hopfield Nets Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Hopfield Nets A Hopfield net is composed of binary threshold

More information

7 Rate-Based Recurrent Networks of Threshold Neurons: Basis for Associative Memory

7 Rate-Based Recurrent Networks of Threshold Neurons: Basis for Associative Memory Physics 178/278 - David Kleinfeld - Fall 2005; Revised for Winter 2017 7 Rate-Based Recurrent etworks of Threshold eurons: Basis for Associative Memory 7.1 A recurrent network with threshold elements The

More information

Multilayer Perceptrons (MLPs)

Multilayer Perceptrons (MLPs) CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1

More information

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network architecture Geoffrey Hinton with Nitish Srivastava Kevin Swersky Feed-forward neural networks These are

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function

More information

An artificial neural networks (ANNs) model is a functional abstraction of the

An artificial neural networks (ANNs) model is a functional abstraction of the CHAPER 3 3. Introduction An artificial neural networs (ANNs) model is a functional abstraction of the biological neural structures of the central nervous system. hey are composed of many simple and highly

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

Convolutional Associative Memory: FIR Filter Model of Synapse

Convolutional Associative Memory: FIR Filter Model of Synapse Convolutional Associative Memory: FIR Filter Model of Synapse Rama Murthy Garimella 1, Sai Dileep Munugoti 2, Anil Rayala 1 1 International Institute of Information technology, Hyderabad, India. rammurthy@iiit.ac.in,

More information

7 Recurrent Networks of Threshold (Binary) Neurons: Basis for Associative Memory

7 Recurrent Networks of Threshold (Binary) Neurons: Basis for Associative Memory Physics 178/278 - David Kleinfeld - Winter 2019 7 Recurrent etworks of Threshold (Binary) eurons: Basis for Associative Memory 7.1 The network The basic challenge in associative networks, also referred

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Simple Neural Nets For Pattern Classification

Simple Neural Nets For Pattern Classification CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Artificial Neural Networks. Part 2

Artificial Neural Networks. Part 2 Artificial Neural Netorks Part Artificial Neuron Model Folloing simplified model of real neurons is also knon as a Threshold Logic Unit x McCullouch-Pitts neuron (943) x x n n Body of neuron f out Biological

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Object Recognition Using a Neural Network and Invariant Zernike Features

Object Recognition Using a Neural Network and Invariant Zernike Features Object Recognition Using a Neural Network and Invariant Zernike Features Abstract : In this paper, a neural network (NN) based approach for translation, scale, and rotation invariant recognition of objects

More information