Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Size: px

Start display at page:

Download "Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET"

Ira Barnaby Hubbard
5 years ago
Views:

1 Unit-. Definition Neural network is a massively parallel distributed processing system, made of highly inter-connected neural computing elements that have the ability to learn and thereby acquire knowledge and make it available for use.2 Benefits of neural networks The use of neural networks offers the following useful properties and capabilities: Nonlinearity: The neural networks, which are made of non-linear neurons, are non-linear. This nonlinearity is special in nature because it is distributed throughout the network. This nonlinearity is particularly useful as the application area of neural networks is in nonlinear systems. Input output mapping: A neural network is first trained by providing an input and adjusting the weights till the required output is obtained. Thus, as more number of inputs is provided for training, a map of input and corresponding outputs are formed in the neural network. This feature is very useful in pattern recognition and classification applications. Adaptivity: Neural networks have a built-in capability to adapt their synaptic weights to changes in the surrounding environment. A neural network trained to operate in a specific environment can be easily re-trained to deal with minor changes in the operating conditions. Evidential response: In the context of pattern classification, a neural network can be designed to provide information not only about which particular pattern to select, but also about the confidence in the decision made. This latter information may be used to reject ambiguous patterns, if any, and thereby improve the classification performance of the network. Fault tolerance: A neural network, implemented in hardware form, has the potential to be inherently fault tolerant, or capable of robust computation, in the sense that its performance degrades gracefully under adverse conditions. Due to failure of a neuron or its connecting link, only the quality of performance is affected, instead of failure of the entire system. VLSI implementability: The massively parallel nature of a neural network makes it well suited for implementation using the VLSI technology. Uniformity of analysis and design: Neural networks enjoy universality as information processor, in the sense that same notations are used in all domains involving the application of neural network. Neurons, in one form or another, represent an ingredient common to all neural networks, making it possible to share theories and learning algorithms in different applications..3 Biological neural networks.3. Features: Robustness and fault tolerance: The decay of nerve cells (neurons) does not affect the performance significantly. Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

2 Flexibility: The network automatically adjusts to a new environment without using any pre-programmed instructions. Ability to deal with variety of data situations: The network can deal with information that is fuzzy, probabilistic, noisy and inconsistent. Collective computation: The network performs routinely many operations in parallel and also a given task in distributed manner..3.2 Structure and working of a biological neural network: The structure of a biological neuron is as shown in fig. below. Fig. schematic of a typical neuron / nerve cell The fundamental unit of the biological neural network is neuron or a nerve cell. The neuron consists of a cell body or soma where the cell nucleus is located. Tree-like nerve fibres called dendrites are associated with the cell body. These dendrites receive signals from other neurons. Extending from the cell body is a single long fibre called the axon, which eventually branches into strands and sub-strands connecting to many other neurons at the synaptic junction or synapses. The receiving ends of these junctions on other cells can be found both on dendrites and on the cell body themselves. The axon of a neuron leads to a few thousand synapses associated with other neurons. The transmission of a signal from one cell to another at a synapse is a complex chemical process in which specific transmitter substances are released from the sending side of the junction. The effect is to raise or lower the electrical potential inside the body of receiving cell. If this potential reaches a threshold, an electrical activity in the form of short pulses is generated. When this happens the cell is said to have fired. These electrical signals of fixed strength and duration are sent down the axon. Generally this electrical activity is confined to the interior of a neuron, whereas the chemical mechanism operates at the synapses. The dendrites serve as receptors for signals from other neurons, whereas the purpose of axon is transmission of the generated neural activity to other nerve cells or to muscle fibres or receptor neuron. The size of the cell body of a typical neuron is approximately in the range of (-8) μm and the dendrites and axons have diameters of the order of few μm. The gap at the synaptic junction is about 2nm wide. The total length of a neuron varies from.mm for neurons in human brain up to m for neurons in limbs. The speed of propagation of the discharge signal in the cells of human brain is about (.5-2) m/sec. Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

3 The cell body of a neuron acts as kind of summing device. This net effect decays with a time constant of (5-) msec. But if several signals arrive within such a period, their excitatory effects accumulate. When the total magnitude of the depolarization potential in the cell body exceeds the critical threshold (mv), the neuron fires..4 Artificial Neural Networks (ANN) An artificial neural network is an information processing system that has certain performance characteristics in common with the biological neural networks. ANNs have been developed based on following assumptions: o Information processing occurs at many simple elements called neurons. o Signals are passed between neurons over connected links. o Each connection link has an associated weight, which, in a typical neural net. multiplies the signal transmitted. o Each neuron applies an activation function to its net input (sum of weighted input signals) to determine its output signal. Neural networks are classified based on the following parameters: Architecture Learning/training mechanism Activation function Consider a simple neural network as shown in fig.2. Fig.2 simple neural network Here X, X2, and X3 are input neurons and Y is an output neuron. W, W2, W3 are the weights on connecting link between input and output. Now the net input y_in to the output neuron Y is given by: Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

4 An activation function has to be applied to this net input y_in to get the output of the neuron y. i.e..5 Comparison of artificial and biological neural networks Artificial neural network. The cycle time corresponding to execution of one step of program is of order of few nanoseconds. Hence are faster in processing information. 2. Sequential mode of operation. 3. Size and complexity is less. 4. Old information is lost as new information gets in. 5. Fault tolerance is not satisfactory, as there is loss of quality due to fault in some part of network. 6. A control unit monitors all activities of computing. Biological neural network. The cycle time corresponding to a neural event prompted by external stimulus occurs in milliseconds range. Hence are comparatively slow in processing. 2. Parallel operation. 3. Highly complex and large. 4. No information is lost due to entry of new information. 5. Fault tolerance is very good, as damage of a neuron does not affect the performance in any manner. 6. No central control unit is present for processing information in brain..6 Neuron modeling A neuron is an information processing unit that is fundamental to the operation of a neural network. The fig.3 below shows the model of a neuron. Fig.3 Non-linear model of neuron Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

5 The three basic elements of the neuron model are: A set of synapses or connecting links, each of which is characterized by a weight or strength of its own. An adder for summing the input signals, weighted by the respective synapses of the neuron. An activation function for limiting the amplitude of the output of a neuron. The activation function is also referred to as a squashing function, as it squashes (limits) the possible amplitude range of the output signal to some finite value. The model also includes an external bias, denoted by b k. the bias has the effect of increasing or decreasing the net input of the activation function, depending on whether it is positive or negative. =>.6. Activation functions The activation function is usually used to limit the amplitude of output of a neuron. The commonly used activation functions are: Binary step function (with threshold ) Binary sigmoid Bipolar sigmoid Piecewise linear function.6.. Binary step function (threshold function) The binary step activation function, commonly referred to as threshold function or Heaviside function is defined as The threshold value can be a non-zero value (), in which case, it is defined as Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

6 This activation function is usually preferred in single-layer nets to convert the net input, which is continuously valued variable, to an output unit that is binary ( or ) or bipolar ( or -) Binary sigmoid function (logistic sigmoid) Sigmoid functions (S-shaped curves) are useful activation functions and are the most common form of activation function. They are usually advantageous for use in neural trained by back-propagation, because of less computational burden during training. The binary sigmoid / logistic function is defined as, is the steepness parameter.6..3 Bipolar sigmoid function (hyperbolic tangent) This is same as binary sigmoid except that here the range is - to instead of to. This is defined as.6..4 Piecewise linear function Piecewise linear function is defined as Fig. piecewise linear function Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

7 .7 Neural network learning Learning is a process by which the free parameters of a neural network are adapted through a process of stimulation by the environment in which the network is embedded. Learning methods can be classified as: Supervised learning Unsupervised learning Re-inforcement learning.7. Supervised learning: Supervised learning is also referred to as learning with a teacher. The fig. 4 below shows the block diagram that illustrates this form of learning. Fig.4 Supervised learning The teacher has knowledge about the environment and hence can provide output for a particular condition of the environment, whereas the neural network foes not have any knowledge of the environment. As the teacher has good knowledge of the environment, its output can be taken as the desired response and the neural network can be trained based on the difference between the actual response and desired response (error signal). This adjustment is carried out iteratively in a step-by-step fashion with the aim of eventually making the neural network emulate the teacher. i.e. make the error zero or minimum. Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

8 .7.2 Unsupervised learning: Unsupervised learning is also referred to as learning without a teacher or selforganized learning. The fig.5 below illustrates this form of learning. Fig.5 Unsupervised learning Unlike in supervised learning, here there is no teacher to present the desired response. Hence the neural network has to learn on its own. The network learns of its own by discovering and adapting to structural features in input patterns..7.3 Re-inforcement learning: This learning method is not usually used. This method is also referred to as neurodynamic programming. The fig.6 below illustrates re-inforcement learning. Fig.6 Re-inforcement learning In this learning method, the learning of an input-output mapping is performed through continued interaction with the environment in order to minimize a scalar index Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

9 of performance. The critic converts a primary re-inforcement signal received from the environment into a higher quality re-inforcement signal called the heuristic reinforcement signal, both of which are scalar inputs. The critic can also be viewed as a teacher, who does not present the desired output, but just tells whether the output is correct or not, based on which, necessary adjustments are made in the network..8 Neural network learning rules A learning rule decides how the weights are to be adjusted, so that the network learns how to adapt to changing environments (situations). i.e. a learning rule decides the amount by which a weight has to be modified..8. Requirements of learning rules/laws: The learning should lead to convergence of weights. The learning time should be as small as possible. An on-line training is preferred to off-line training. i.e. the weights should be adjusted on presentation of each sample and not separately. Learning should use only local information as far as possible. i.e. the change in weight on a connecting link between two units should depend on the states of the two units only. In such a case, it is possible to implement the learning law in parallel for all weights, thus speeding up the learning process..8.2 Learning rules: The commonly used learning rules are: Hebbian learning rule Perceptron learning rule Delta learning rule Widrow-Hoff learning rule Correlation learning rule Winner-take-all learning rule Outstar learning rule.8.2. Hebbian learning rule: According to Hebb rule, learning occurs by modification of the synaptic strengths (weights) in a manner such that, if two interconnected neurons are both on or off at the same time, then the weight between those neurons should be increased. The rule can be expressed as: Or Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

10 The features of Hebbian rule are: Can be applied only for purely feed-forward networks. (fig.7) It is an unsupervised learning technique. The weights are initialized to zero. Any kind of activation function can be used for this learning. Only a particular neuron is activated at a time instead of a layer of neurons. Fig.7 A purely feed-forward network used for Hebbian and Perceptron learning Perceptron learning rule: Unlike in the Hebb net, where the weight changes depend on input and output, here, weight change depends on the desired output. The rule is expressed as: Where is the learning rate ( ) t is the target output or desired output Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

11 Some features of Perceptron learning are: It is a supervised learning technique. The weights can be initialized to any values. The rule is applicable only for binary neuron response (i.e. only binary activation function can be used). Only a particular neuron is updated at a time instead of a layer of neurons Delta learning rule: Fig.8 Network used for delta, widrow-hoff, and correlation learning rules The delta rule changes the weights of the neural connections (synaptic weights) so as to minimize the difference between the net input to the output unit (y_in) and the target value t. The aim is to minimize the error. The rule is expressed as: and Or and where is the learning rate and b is the bias. Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

12 Some features of the delta rule are: Learning is supervised Applicable only for nets making use of continuous activation functions. Weight adjustment done to minimize square of error. Weights can be initialized to any values. Only a particular neuron is updated at a time instead of a layer of neurons Widrow-Hoff learning rule: This is same as the delta rule and is also referred to as Least mean square learning rule. The rule can be expressed as: Some salient features of this rule are: Learning supervised Can be applied for nets with any activation function Weight adjustment is done to minimize the squared error Weights can be initialized to any values Only a particular neuron is updated at a time instead of a layer of neurons Correlation learning rule: The rule states that if t is the desired response due to an input x i, then the corresponding weight increase is proportional to their product. The rule can be expressed as: Some features of the correlation rule are: Learning is supervised Can be applied for nets with any activation functions Weights should be initialized to zero Only a particular neuron is updated at a time instead of a layer of neurons. Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

13 Winner-take-all learning rule: Fig.9 Network for winner-take-all learning rule In this learning rule, first the response of all output neurons due to the input is calculated. The output neuron with maximum net output is declared the winner, and the weights connected only to that output neuron are updated. In fig.9, the m th output neuron is assumed to be the winner, and hence only the weights, w m, w im, w nm are updated. This learning is used for learning statistical properties of inputs. The rule can be expressed as:, where is the learning constant. The learning constant decreases as the learning progresses. Some features of this rule are: Learning is unsupervised Can be applied for nets using the continuous activation function Weights are initialized at random values and their lengths are normalized during learning A layer of neurons is updated here at a time, unlike in previous cases where only a particular neuron was updated Outstar learning rule: This rule is used to provide learning of repetitive and characteristic properties of input-output relationships. The weight adjustments can be expressed as:, where is the learning rate, which decreases as learning progresses. The important difference between this rule and others is that the weights are fanning out of the i th node, hence the weight vector, which is usually expressed as, is here expressed as Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

14 Some features of the outstar learning rule are: Learning is supervised Can be applied only for nets with continuous activation function Weights should be initialized to zero A layer of neurons is updated instead of a single neuron.9 Single layer feed-forward network Based on architecture, neural networks can be classified as: Single-layer feed-forward networks, multi-layer feed-forward networks, Recurrent (feedback) networks. Feedback networks are those in which there is feedback of signal from output neurons or intermediate neurons to the neuron before it. Feed forward networks on the other hand do not have any feedback path. In a single-layer net, there is only one layer of computation node. A computation node is one where some sort of computation or calculation takes place. In a two-layer net, there will be two layers of computation nodes. One computation node is the output node and other computation node is called the hidden node. Similarly, for a n layer net, there will be n computation layers, of which one will be output layer and remaining n- will be hidden layers. Some common configurations of single-layer feed-forward networks are: Fig.a. A simple single-layer feed-forward net with one output neuron Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

15 Fig.b. A single-layer feed-forward net with many output neurons Some examples of single-layer feed-forward neural networks are: Mc-Culloch Pitts neuron Hebb net Perceptron Adaline.9. Mc-Culloch Pitts neuron: The Mc-Culloch Pitts neuron is perhaps the earliest artificial neuron. The requirements for Mc-Culloch Pitts neuron are: The activation of the neuron is binary i.e. at any time step; the neuron either fires () or does not fire (). The neurons are connected by directed, weighted paths. A connection path is excitatory, if the weight on the path is positive; otherwise it is inhibitory. All the excitatory connections into a particular neuron have the same weights. Each neuron has a fixed threshold such that if the net input to the neuron is greater than the threshold, the neuron fires. The threshold is set so that inhibition is absolute i.e. any non-zero inhibitory input will prevent the neuron from firing. It takes only one time step for a signal to pass over one connection link. Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

16 Architecture: The general architecture for Mc-Culloch Pitts neuron is as shown in fig. below. Fig. Mc-Culloch Pitts neuron Each connection path is either excitatory with weight w, ( w> ) or inhibitory with weight p, (p > ). We assume that there are n units or neurons x, x 2.x n, which send excitatory signals to unit y and m units or neurons x n+, x n+2..x n+m,which send inhibitory signals. The activation function for unit y is, where y_in is the total input signal received, is the threshold. To satisfy the condition that inhibition should be absolute, should satisfy the condition Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

17 Implementation of logic functions: A threshold of 2 is assumed for implementation of all the logic functions. AND The truth table and implementation of AND using Mc-Culloch Pitts neuron is shown below. x x 2 y Fig.2 Mc-Culloch Pitts neuron for implementing AND logic and its truth table OR The truth table and implementation of OR using Mc-Culloch Pitts neuron is shown below. x x 2 y FIG.3 Mc-Culloch Pitts neuron for implementing OR logic and its truth table Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

18 AND NOT The response is true if the first input is true and the second false, otherwise the output is false. The truth table and implementation of AND-NOT using Mc-Culloch Pitts neuron is shown below. x x 2 y FIG.4 Mc-Culloch Pitts neuron for implementing AND-NOT logic and its truth table XOR The truth table and implementation of XOR using Mc-Culloch Pitts neuron is shown below. x XOR x 2 = (x ANDNOT x 2 ) OR (x 2 ANDNOT x ) = z OR z 2 x x 2 y FIG.5 Mc-Culloch Pitts neuron for implementing XOR logic and its truth table Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

19 In case of all the simple logic functions described earlier, a uniform threshold value of 2 was assumed for all the neurons. But for implementation of complex logic functions like NAND and NOR, different threshold values are assumed for different neurons. NAND The truth table and implementation of NAND using Mc-Culloch Pitts neuron is shown below. x x 2 y FIG.5 Mc-Culloch Pitts neuron for implementing NAND logic and its truth table Here the neurons in the hidden layer (z, z 2 ) have a threshold of and the output neuron (y) has a threshold value of. NOR The truth table and implementation of NOR using Mc-Culloch Pitts neuron is shown below. Here the neuron in the hidden layer (z ) has a threshold of and the output neuron (y) has a threshold value of. Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

20 Advantages Simple in construction. Disadvantages The weights are fixed, hence the network does not learn from examples. Linear separability: Decision boundary is the boundary between y_in > and y_in <, which can be determined by the relation,. If there are weights so that all of the training input vectors for which the correct response is + lie on one side of the decision boundary and all the training input vectors for which the correct response is - lie on the other side of the decision boundary, the problem is said to be linearly separable. Linear separability is an important concept in case of single-layer nets, as these can be applied only for linearly separable problems. The linear separability is hence important for Hebb net, perceptron and ADALINE. The decision boundary of the logical AND and OR functions is as shown below: Fig. Decision boundary for AND and OR functions Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

21 .9.2 Hebb net: The single layer feed-forward neural networks trained using the Hebb rule are termed as Hebb nets. Hebb net is shown in fig.6 below. Fig. 6 Hebb Net A bias acts as a weight on a connection from a unit whose activation is always. Increasing the bias increases the net input to the unit. It is denoted by b. Hebb rule: Hebb proposed that learning occurs by modification of the synaptic strength (weights) in a manner such that if two interconnected neurons are both on at the same time, then the weight between those neurons should be increased. The extended Hebb rule states that learning occurs not only for the condition that both neurons are on, but also for the condition that both neurons are off at the same time. Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

22 Algorithm: Step : Initialize all weights to zero i.e. Step : For each training vector and target output pair s: t, do steps 2 4. Step 2: Set activation for input units Step 3: Set activation for output units Step 4: Adjust the weights for Disadvantages: Adjust the bias Cannot be applied for binary data. (This method can t distinguish between a training pair in which input is on, target is off and a training pair in which both input and target are off ).9.3 Perceptron: The Perceptron learning rule is powerful compared to the Hebb rule. As the learning is iterative, the weights converge to correct values i.e. the weights are such that the output is correct for each of the training input training patterns. Perceptron rule: Unlike in the Hebb net, where the weight changes depend on input and output, here, weight change depends on the desired output. The rule is expressed as: Where is the learning rate ( ) t is the target output or desired output Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

23 Architecture: Fig.7 Perceptron Algorithm: Unlike Hebb rule, which is applicable only for bipolar data, Perceptron is applicable for both binary or bipolar input vectors, with a bipolar target, fixed threshold () and adjustable bias. The algorithm is not sensitive to initial values of the weights or the value of learning rate. Step : Initialize weights and bias (for simplicity set the weights and bias to zero) Set learning rate ( ) (for simplicity, is set to ) Step : While stopping condition is false, do steps 2 6. Step 2: For each training pair s: t, do steps 3 5 Step 3: Set activations of input units: Step 4: Compute the response of output unit: Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

24 Step 5: Update weights and bias if an error occurred for this pattern if, and else, and Step 6: Test stopping condition: If no weights changed in step 2, stop, else continue. Only weights connecting active input units ( ) are updated. Also, weights are updated only for the patterns that do not produce the correct value of y. this means that as more training patterns produce the correct response, less learning occurs. The threshold on the activation function for the output unit is fixed, non-negative value. Applications: Perceptrons find application in: o Implementation of logic functions o Character recognition o Pattern classification o Classification of noisy patterns..etc. Advantages: Weights converge to correct values. Learning decreases as more training patterns are presented to the network..9.4 ADALINE (ADAptive LInear NEuron) ADALINE uses bipolar ( or -) activations for its input signals and its target outputs. The weights on the connections from the input units to the ADALINE are adjustable. The ADALINE also has a bias, which acts like an adjustable weight on a connection whose activation is always. An ADALINE has only one output unit. Delta rule, also known as Least Mean Square rule or Widrow-Hoff rule is used for training the ADALINE. Delta rule: The delta rule changes the weights of the neural connections (synaptic weights) so as to minimize the difference between the net input to the output unit (y_in) and the target value t. The aim is to minimize the error. Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

25 The rule is expressed as: and Or and where Architecture: is the learning rate and b is the bias. An ADALINE is a single unit (neuron) that receives input from several units. It also receives input from a unit whose signal is always. An ADALINE is as shown in fig.8 below: Fig. 8 Architecture of an ADALINE Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

26 Algorithm: Step : Initialize weights (initialized to small random values) Set learning rate, [ ], n is the number of input units Step : While stopping condition is false, do steps 2-6. Step 2: For each bipolar training pair s : t, do steps 3-5 Step 3: set activations of input units i = to n, Step 4: Compute net input to the output unit, Step 5: Update weights and bias and Choice of learning rate () Step 6: Test for stopping condition If the largest weight change that occurred in step2 is smaller than a specified tolerance, then stop; otherwise continue. If large value is chosen for the learning process will not converge; if too small a value is selected, learning will be very slow. So the value has to be chosen very carefully. The upper bound can be determined from the largest Eigen value of the correlation matrix R, which is calculated as:, from which, Usually, instead of calculating R, a small value (say =.) is chosen or it can be chosen using the relation. Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

27 . Multi-layer feed-forward networks: A multi-layer network consists of one or more hidden layers. Hidden layers are computational layers, in the sense that, some sort of computations are performed in these layers. Hidden layers are usually used to improve the computational capability of the network. A two layer net consists of two computational nodes, of which, one is the output node and the other is the node of hidden layer. Similarly a three layer net consists two hidden and one output layer. MADALINE is a good example of multi-layer network... MADALINE (Multi ADALINE) MADALINE consists of many ADAptive LInear NEurons. It makes use of bipolar activation function just like a ADALINE does. Architecture: Fig.9 MADALINE architecture x and x 2 are input units. z and z 2 are the hidden units and y is an output unit. The hidden layer gives the network computational capabilities not possible with single layer nets. Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

28 Algorithm: There are two versions of training algorithm for MADALINE. The first version (MRI) allows for adjustment of the weights leading into the hidden layer neurons. Hence in this version, the weights to output neuron are fixed. The second version (MRII) allows for adjustment of all weights in the network. In MRI algorithm, the weights v, v 2 and the bias b 3 are determined so that the response of output unit y is, if the signal it receives from either z or z 2 (or both) is, and is -, if both z and z 2 send a signal of -. Hence, y can be thought of as performing the OR operation. MRI Algorithm: Step : Initialize weights. (v, v 2, b 3 are initialized as described above and other weights are initialized to random values). Set the learning rate to a small value. Step : While stopping condition is false, do steps 2-8. Step 2: For each bipolar training pair, s : t, do steps 3-7. Step 3: set activations of input units i = to n, Step 4: Compute net input to each hidden ADALINE unit: Step 5: Determine output of each hidden ADALINE unit: Step 6: Determine output of net: Step 7: Determine error and update weights: If, no weight updates are performed. Otherwise: Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

29 If t =, then update weights on Z J, the unit whose net input is closest to, MRII Algorithm: If t = -, then update weights on all units that have positive net input, Step 8: Test stopping condition. If weight changes have stopped (or reached an acceptable level), or if a specified maximum number of weight update iterations have been performed, then stop; otherwise continue. Step : Initialize weights. (v, v 2, b 3 are initialized as described above and other weights are initialized to random values). Set the learning rate to a small value. Step : While stopping condition is false, do steps 2-8. Step 2: For each bipolar training pair, s : t, do steps 3-7. Step 3: set activations of input units i = to n, Step 4: Compute net input to each hidden ADALINE unit: Step 5: Determine output of each hidden ADALINE unit: Step 6: Determine output of net: Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

30 Step 7: Determine error and update weights if necessary: If, do steps 7a-b for each hidden unit whose net input is sufficiently close to. Start with the unit whose net input is closest to, then for the next closest, etc. Step 7a: Change the unit s output (from + to - or vice versa) Step 7b: Re-compute the response of the net. If the error is reduced, adjust the weights on this unit (use its newly assigned output value as target and apply the Delta Rule) Step 8: Test stopping condition. If weight changes have stopped (or reached an acceptable level), or if a specified maximum number of weight update iterations have been performed, then stop; otherwise continue.. Back propagation Algorithm: Multi-layer networks are also referred to as Multi-Layer Perceptrons (MLPs). Backpropagation algorithm is a powerful algorithm for training MLPs. This algorithm is sometimes referred to as error back-propagation algorithm or generalized delta rule or error correction rule. Error back-propagation learning consists of two passes through the different layers of the network: a forward pass and a reverse pass. In the forward pass, an input is applied to the sensory nodes of the network (input neuron), and its effect propagates through the network layer by layer. Finally, a set of outputs is produced as the actual response of the network. During the forward pass, the weights of the network are all fixed. During the backward pass, on the other hand, the weights are adjusted in accordance with an error correction rule. The weight adjustment is based on the error produced between the desired and actual output, which propagates backward from output to the layers before it, hence this network is referred to as error-back-propagation algorithm or simply back-propagation algorithm. The training of network by back-propagation involves three stages. They are: Feed-forward of the input training pattern, Calculation and back-propagation of associated error, and Adjustment of weights. One distinct advantage of multi-layer nets is that they have the capability of learning any complex input-output mapping, whereas single-layer nets are limited to simple input-output mapping. Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

31 .. Architecture: Fig.2 Back-propagation neural net architecture A multi-layer neural net with one layer of hidden units (the Z units) is shown in fig.2 above. The output unit and the hidden unit have biases. The bias on a typical output unit Y k is denoted by W ok and the bias on a typical hidden unit Z j is denoted by V oj. The direction of flow of information in the forward pass is only shown in the figure. The flow is opposite in the reverse pass and has not been shown in the architecture...2 Activation function: An activation function for the back-propagation net should have the following characteristics: It should be continuous It should be differentiable It should be monotonically non-decreasing Its derivative should be easy to compute Binary sigmoid is usually used as activation function for back-propagation nets, which is defined in the range (,) as:. ().. (2) Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

32 Now (2) can be expressed as.. (3) Form (3), it is very clear that f(x) is differentiable and is easy to compute. as Another common activation function is the bipolar sigmoid, defined in the range (-,).. (4). (5) From (5) it is clear that bipolar sigmoid function is differentiable and easily computable...3 Algorithm: Training a neural net by back-propagation involves three stages: Feed-forward of input training pattern, Back-propagation of associated error, and Adjustment of weights. During feedback, each input unit (X i ) receives an input signal and broadcasts this signal to each of the hidden units (Z -Z p ). Each hidden unit then computes its activation and sends its signal (z j ) to each output unit. Each output unit (Y k ) computes its activation function (y k ) to form the response of net for given input pattern. During training, each output unit compares it s computed output y k with its target value t k to determine the associated error for that pattern with that unit. Based on this error, the factor k (k = to m) is computed. k is used to distribute the error at the output unit Y k back to all units in the previous layer. It is also used to update the weights between the output and the hidden layer. In a similar fashion, the factor j (j = to p) is computed for each hidden unit (Z j ). It is not necessary to propagate the error back to input layers, but j is used to update the weights between the hidden and the input layer. After all the factors have been determined, the weights for all layers are adjusted simultaneously. The adjustment to the weight w jk is based on the factor k and z j. The adjustment to the weight v ij is based on j and x i. Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

33 The algorithm is as follows: Step : Initialize the weights (set to random values) Step : While stopping condition is false, do steps 2-9. Step 2: For each training pair, do steps 3-8 Feed forward: Step 3: Each input unit (X i, i =, 2,3,., n) receives input signal x i and broadcasts this signal to all units in the layer above it i.e. the hidden layer Step 4: Each hidden unit (Z j, j=,2,3 p) sums its weighted input signals,, applies its activation function to compute its output signal, and sends this signal to all units in the layer above it i.e. the output layer. Step 5: Each output unit (Y k, k=,2,3 m) sums its weighted input signals,, applies its activation function to compute its output signal, Back-propagation of error: Step 6: Each output unit (Y k, k=,2,3 m) receives a target pattern corresponding to the input training pattern, computes its error information term,, calculates its weight-correction term (used to update later), calculates its bias correction term (used to update later), and sends k to units in the layer below. Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

34 Step 7: Each hidden unit (Z j, j=,2,3 p) sums its delta inputs (from units in layer above),, multiplies by the derivative of its activation function to calculate its error information term,, calculates its weight correction term (used to update v ij later), and calculates its bias correction term (used to update v j later). Update weights and biases: Step 8: Each output unit (Y k, k= to m) updates its weight and bias (j = to p): Step 9: Test stopping condition. Each hidden unit (Z j, j= to p) updates its weights and bias (i= to n):.2 Learning factors: The mathematical basis for the back-propagation algorithm is the optimization technique known as gradient descent. The gradient of a function gives the direction in which the function increases more rapidly; the negative of the gradient gives the direction in which the function decreases more rapidly. Implementation of back-propagation algorithm encounters some difficulties. One of the problems is that the error minimization procedure may produce only local minima of the error function. The fig.2 shows a typical cross-section of an error space weight dimension. It can be seen that the error is a non-negative function of weight variable. The error function shown processes one global minimum (w g ) below the minimum rms value. It also has two local minima at w l and w l2 and one stationary point at w s. The learning procedure will stop prematurely if it starts at point 2 or 3; thus the trained network will be unable to produce the desired performance in terms of its acceptable terminal error. To ensure convergence to a stationary minimum, the starting point should be changed to. Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

35 Fig.2 Error distribution with weight Statistical nature of inputs and outputs can improve the convergence of weights. The important factors affecting the convergence of training are: A. Initial weights B. Weight adjustment mechanism C. Activation function D. Selection of learning constant E. Momentum method.2. Choice of initial weights: The choice of initial weight will decide whether the net reaches a global or local minimum of the error and also decides the time for convergence. The update of the weight between two units depends both on the derivative of upper unit s activation function and on the activation function of lower unit. Hence, it is important to avoid choice of initial weight that would make either the activation function or its derivative zero. If the initial values are too large, there is danger of the derivative of activation function becoming zero. On the other hand, if the chosen values are too small, the learning will be extremely slow. Usually two methods are followed in initialization of weights. They are: Random initialization and Nguyen-widrow initialization Random initialization is the common method of initializing the weights and biases. In this method, the weights and biases are initialized to random values between -.5 and.5 (or between - and or some other suitable interval). The values may be positive or negative because the final weights after training may be of either sign. Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

36 Nguyen-widrow initialization is a modification of the random initialization, leading to faster learning. In this method, weights from the hidden units to output units ( ) are initialized to random values between -.5 to.5, as in random initialization. But, the initialization of the weights from input to hidden units is designed to improve the ability of the hidden units to learn. Let n be the number of input units, p be the number of hidden units, be the scale factor, where, The procedure of initialization is as follows: Compute Reinitialize weights Set bias.2.2 Choice of weight adjustment mechanism: The weight adjustment mechanisms available are: Sequential mode / on-line / stochastic / incremental mode, Batch mode / cumulative mode In the sequential mode, weight updation is performed after the presentation of each training pair (s: t). For example, consider the training of back-propagation network with N input-output pairs (x():t(), x(2):t(2), x(3):t(3),. x(n):t(n)). Now training is carried out for each pair of i-o. I.e. first x() is presented, its output calculated and error determined based on which weight adjustment is carried out. Next x(2) is presented, its output calculated and error determined based on which weight adjustment is carried out. In a similar fashion, adjustment is carried out for all the N inputs. The advantages of this method are: Computationally faster Requires less storage Less chances of the algorithm getting trapped in local minimum Effective when the training data is redundant in nature Simple to implement Effective for large and difficult problems. In the batch mode, however, training is done only after all the training inputs are applied. The advantage of this method is that convergence to a local minimum is guaranteed. Usually sequential mode is preferred to the batch mode. Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

37 .2.3 Choice of activation function: The choice and shape of the activation function strongly affects the speed of the network learning. An activation function for the back-propagation net should have the following characteristics: It should be continuous It should be differentiable It should be monotonically non-decreasing Its derivative should be easy to compute Accordingly, binary / bipolar sigmoid functions can be used based on the data type. The binary sigmoid function is defined as, where is the steepness factor. The derivative of the binary sigmoid is given by From the graph shown above, it is clear that for a fixed learning constant, all adjustment of weights are in proportion to the steepness coefficient. Thus, it can be concluded that, using activation functions with large may yield results similar to the case of large learning constant. Usually, it is advisable to keep at standard value of, and to control the learning speed solely using the learning constant, rather than by both..2.4 Choice of learning constant: The effectiveness and convergence of the back-propagation learning algorithm depends to a larger extent on the learning constant. The choice of depends on the application. For problems with broad minima, large value of will result in a rapid convergence. However for problems with steep and narrow minima, a small value of should be chosen. Larger values of no doubt results in faster convergence, but the learning is not exact. So, depending on whether fast convergence or exact learning is required, a large or small value of can be chosen. To see Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

38 that all the neurons learn at the same rate, usually the last layers are assigned a lower value of than the front layers. Also the neurons with many inputs are to have a small than neurons with lesser inputs, so as to see that learning time is constant. It is suggested that, for a given neuron, the learning constant should be inversely proportional to the square root of synaptic connections made to that neuron..2.5 Use of momentum method: To accelerate the convergence of back-propagation algorithm, momentum method is used. In this method, the weight change depends not only on the current gradient, but also on the previous gradient. This method is particularly useful when some training data are very different (or even incorrect) from the majority of data. In order to use this method, weight updates from one or more previous training patterns must be saved. The weight for training step t+ depends on weights at training steps t and t-. The weight update formula is as shown below:, Where μ is the momentum parameter whose value is in the range ( to ) Or And Or The advantages of momentum method are: Fast convergence with small, Leads to global minimum rather than local minimum. The drawbacks of momentum method are: The amount by which weight can change is limited, May lead to weight change in a direction that can increase the error..2.6 Choice of number of training patterns: A relationship among the number of training patterns (p), the number of weights to be trained (w) and the accuracy expected (e) is given by the expression:.2.7 Choice of data representation: To improve learning, input must be in represented in bipolar form and bipolar sigmoid must be used as activation function. The data may be represented either as continuous-valued Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

39 variable or as set of ranges. In general, it is easier to learn a set of distinct responses than a continuous-valued response..2.8 Choice of number of hidden layers: Usually one hidden layer is sufficient for a back-propagation net to approximate any continuous mapping from input to output patterns to an arbitrary degree of accuracy. However increasing the number of hidden layers may make training easier in certain situations. Neural Networks and Fuzzy Logic: Rajendra Dept.of CSE ASCET

40 UNIT-2 2. Feedback Networks: A feedback neural network is also referred to as recurrent neural network. These differ from the feed-forward neural networks with respect to the feedback loop. Feedback networks have at least one feedback loop, whereas a feed-forward neural net doesn t have any feedback loop. Based on the number of layers present in the network, feedback nets can be classified as: Single-layer feedback net Multi-layer feedback net 2.. Single-layer feedback network: A single-layer feedback neural net has only one layer of computational nodes / neurons. The computational being the output node, a single layer feedback net doesn t have any hidden layers. Based on whether there is feedback from the output of particular neuron to itself or not, the single layer nets are further classified as: o Single-layer net with self feedback o Single-layer net with no self feedback Fig. single layer net with self feedback Fig.2 single layer net with no self feedback Unit delay elements (z - ) are used to introduce non-linear dynamical behavior into the network. In a single-layer net with self feedback, the output of a particular neuron is fed back to itself, whereas in nets with no self feedback, the output of a particular neuron is fed back to all the neurons except itself. The nets are shown in fig. and 2 respectively. Neural Networks and Fuzzy Logic : rajendra.c ASCET

41 2..2 Multi-layer feedback network: A multi-layer feedback net consists of more than one layer of computational nodes and hence consists of hidden layers. The hidden layers are introduced in the net to improve the computational capability of the net. Multi-layer nets are classified based on whether there is a feedback of output of the neuron to itself or not as: o Multi-layer net with self feedback o Multi-layer net with no self feedback. 2.2 Neural networks for pattern classification: Usually single-layer feedback or feed-forward nets are used for pattern association applications. Associative memory neural nets are single-layer nets in which the weights are determined in such a fashion that the net can store a set of pattern associations, each association being an inputoutput pair s: t. If each output vector t is same as the input vector s with which it is associated, then the net is called auto-associative memory. If the t s are different from the s s, then the net is called hetero-associative memory. In each of the cases, the net not only learns the specific pattern pairs that were used for training, but also is able to recall the desired response pattern when given an input stimulus that is similar, but not necessarily identical to the training input. 2.3 Discrete Hopfield network: Discrete Hopfield nets are iterative, auto-associative nets used for pattern association applications. The net is fully interconnected, in the sense that each unit is connected to every other unit. The net has symmetrical weights with no self connections. i.e. In Hopfield net, only one unit updates its weight at a time. This feature allows for defining a function known as energy function or Lyapunov function, which is used to prove the convergence of the net. Neural Networks and Fuzzy Logic : rajendra.c ASCET

Simple Neural Nets For Pattern Classification

CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification