EEE 4: Linear Systems Summary # 3: Introduction to artificial neural networks DISTRIBUTED REPRESENTATION An ANN consists of simple processing units communicating with each other. The basic elements of a neural network are: a set of processing units (neuronal cells) a state of activation for every unit (output) connection between units an activation function an external input bias, also called offset or threshold. Fig.. Basic elements in a neuron ARTIFICIAL NEURAL NETWORKS Artificial neural networks are computational methods that can be seen as a very simplified artificial model of the brain. The idea behind artificial neural networks is to create intelligent systems that mimic the human brain. The applications of neural networks can be divided into two big parts: Open loop applications such as digital signal processing, applications include classification and pattern recognition. Classification is a learning problem where the goal is to distinguish between different inputs presented to the network. Closed loop applications such as feedback control problems. The main elements of a neuron are shown in figure where: Dendrites: bring signals from other neurons Signals are transmitted through the axon, which can be seen as a long tube. The processing is done in the cell body The interest in artificial neural network emerged after the introduction of the simplified neuron model by McCulloch and Pitts. In 969, Minsky and Papert published a book where they discussed the limitations of neural networks, research in neural networks diminished, but gained interest again in the 98 after the introduction of learning rules for multi-layer networks. Artificial neural networks can be characterized as computational models that have the ability to adapt, learn, cluster and classify data. Processing unit The role of the processing unit is simple: receive inputs (from neighbors) and compute an output signal that is sent to (other) neighbors. In general, three types of processing units exist input units output units hidden units Units can be updated synchronously or asynchronously. Network topology Feedforward networks: data flow from input to output is strictly feedforward, there is no feedback. Recurrent networks: contain feedback, the network has dynamic properties SINGLE OUTPUT NETWORK The simplest neural network is a single layer network with one output as shown in figures and 3. We have N input signals x, x,..., x N and a scalar output signal y. The output is given by N f w j x j + b () j= where f is the activation function. The activation function models the behavior of the cell. The simplest forms for the activation functions act similar to the sign function, below a certain value, we have an output and above that value we have another output. For many applications, the derivative of f is needed, thus f has to be differentiable. The expression of the neuron s output is function of the inputs x and the weights w defined as x = [x, x,..., x N ] T R N W = [w, w,..., w N ] T R N () We can write under matrix form f(w T x + b) (3)
EEE 4, spring 7 Summary 3 Fig.. Single output network Fig. 3. Representation of a single output network. The weight is N vector We can augment the input and the weight vectors as follows x = [, x T ] T = [, x, x,..., x N ] T W = [b, W T ] T = [b, w, w,..., w N ] T (4) It is also common to write x = [x T, ] T = [x, x,..., x N, ] T W = [W T, b] T = [w, w,..., w N, b] T (5) We can write under matrix form f(w T x) (6) Several notations are used in the literature for the output. For example, if we assume that W is row vector, the output can be written as follows f(w x) (7) This is the case for the system shown in figure 3. A LAYER OF NEURONS WITH MULTIPLE OUTPUTS Figures 4 and 5 show a layer of neurons with multiple outputs. In this case, the weight is a matrix. Assuming that W is L N matrix, the output is given by f (W x + b) (8) In this case, the network has N inputs and L neurons. Each input is connected to each neuron, in general, L N. Each neuron has a bias, a summer, a transfer function and an output. In the case of figures 4 and 5 all units have the same transfer function; it is possible to have different transfer functions for the same network. It is possible to write for output y l N y l = f w lj x j + b l (9) where j= w... w N W =..... () w L... w LN Fig. 4. One layer with multiple outputs and b = y. y L () b. () b L MULTIPLE LAYER NETWORK Minsky and Papert showed in 969 that single layer networks have serious drawbacks; they cannot perform simple logical operations such as XOR. Multiple layer networks were introduced to address the limitations of single networks and solve nonlinear classification problems. In multiple layer networks, several networks are in cascade where the output of
EEE 4, spring 7 Summary 3 Hard limit: The simplest form which is just a simple sign function, the output is when the sum of the weighted inputs + bias is positive, zero otherwise. { if n < if n Bipolar or symmetrical hard limit. The transfer function is given by Fig. 5. Representation of one layer network with multiple outputs one layer is the input of the next layer. A representation of a 3 layer network is shown in figures 6. Superscripts are used to indicate the layer. The layers in the middle are called hidden layers. We have The number of inputs is N First layer has L neurons Second layer has L neurons The outputs for each layer are given by y = f ( W x + b ) y = f ( W y + b ) y 3 = f 3 ( W 3 y + b 3) (3) which gives for the final layer: y 3 = f 3 ( W 3 f ( W f ( W x + b ) + b ) + b 3) (4) where y = x. In general for layer m +, it is possible to write: y m+ = f m+ ( W m+ y m + b m+) (5) THE NEURAL NETWORK DESIGN PROCESS Neural networks are used in several applications such as classification and pattern recognition, function approximation and feedback control. The basic steps in the neural networks design process are as follows: collect data create and configure network initialize the weights and biases train the network validate the network use the network THE ACTIVATION FUNCTION The activation function (also called transfer function) plays an important role in neural networks. There are several possibilities for the activation functions as discussed below. In the notation used, it is assumed that n is the input to the activation function and y is the output. Linear { if n < if n n (6) Saturating linear if n < n if n if n > Symmetric saturating linear if n < n if n if n > Log-sigmoid Hyperbolic tangent sigmoid Positive linear + e n (7) en e n e n + e n (8) { if n < n if n The plots of the activation functions are shown in figures 7 and 8. LEARNING METHODS There are three main paradigms for learning Supervised learning: a set of training data is provided as follows {x, t }, {x, t },..., {x K, t K } (9) where: x i is the input to the network. t i is the target ouput to the network. K is the number of prototypes Reinforcement learning: A reward function is defined; the agent is rewarded when it makes the right decision. Unsupervised learning: The target outputs are not available and the weights and biases are modified in response to the inputs only. 3
EEE 4, spring 7 Summary 3 Fig. 6. Multi-layer network 4
EEE 4, spring 7 Summary 3.8.6.4..5.5.8.6.4. Fig. 7. Some activation functions, (a) Hard limit, (b) Symmetric hard limit, (c) Linear and (d) Saturating linear.5.5.8.6.4..5.5.5.5 Fig. 8. Some activation functions, (a) Symmetric saturating linear, (b) Log sigmoid, (c) Hyperbolic tangent sigmoid and (d) Positive linear 5