Artificial Neural Networks. Edward Gatt

Artificial Neural Networks Edward Gatt

What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very simple principles Very complex behaviours Applications As powerful problem solvers As biological models

ANNs The basics ANNsincorporate the two fundamental components of biological neural nets: 1. Neurones (nodes) 2. Synapses (weights)

Neurone vs. Node

Structure of a node: Squashing function limits node output:

Synapse vs. weight

Feed-forward nets Information flow is unidirectional Data is presented to Input layer Passed on to Hidden Layer Passed on to Output layer Information is distributed Information processing is parallel Internal representation (interpretation) of data

Feeding data through the net: (1 0.25) + (0.5 (-1.5)) = 0.25 + (-0.75) = -0.5 Squashing: 1 1+ e 0.5 = 0.3775

Supervised Vs. Unsupervised Networks can be supervised Need to be trained ahead of time with lots of data Unsupervised networks adapt to the input Applications in Clustering and reducing dimensionality Learning may be very slow

What can a Neural Net do? Compute a known function Approximate an unknown function Pattern Recognition Signal Processing Learn to do any of the above

Basic Concepts A Neural Network generally maps a set of inputs to a set of outputs Input 0 Input 1... Input n Number of inputs/outputs is variable Neural Network The Network itself is composed of an arbitrary number of nodes with an arbitrary topology Output 0 Output 1... Output m

Basic Concepts Input 0 Input 1... Input n Definition of a node: W b W 0 W 1 W n + f H (x) +... Connection A node is an element which performs the function y = f H ( (w i x i ) + W b ) Output Node

Simple Perceptron Binary logic application f H (x) = u(x) [linear threshold] W i = random(-1,1) Y = u(w 0 X 0 + W 1 X 1 + W b ) W b Input 0 Input 1 W 0 W 1 + f H (x) Now how do we train it? Output

Basic Training Perception learning rule ΔW i = η* (D Y) * X i η= Learning Rate D = Desired Output Adjust weights based on a how well the current weights match an objective

Logic Training Expose the network to the logical OR operation Update the weights after each epoch As the output approaches the desired output for all cases, ΔW i will approach 0 X 0 X 1 D 0 0 0 0 1 1 1 0 1 1 1 1

Training the Network - Learning Backpropagation Requires training set (input/ output pairs) Starts with small random weights Error is used to adjust weights (supervised learning) Gradient descent on error landscape

The BackpropagationNetwork The backpropagationnetwork (BPN) is the most popular type of ANN for applications such as classification or function approximation. Like other networks using supervised learning, the BPN is not biologically plausible. The structure of the network is identical to the one we discussed before: Three (sometimes more) layers of neurons, Only feedforward processing: input layer hidden layer output layer, Sigmoid activation functions

Typical Activation Functions F(x) = 1 / (1 + e -k (w i x i ) ) Shown for k = 0.5, 1 and 10 Using a nonlinear function which approximates a linear threshold allows a network to approximate nonlinear functions

Alternative Activation functions Radial Basis Functions Square Triangle Gaussian! (μ, σ) can be varied at each hidden node to guide training Input 0 Input 1... Input n f RBF (x) f RBF (x) f RBF (x) f H (x) f H (x) f H (x)

The BackpropagationNetwork BPN units and activation functions: O 1 output vector y O K f(net o ) H 1 H 2 H 3 H J f(net h ) I 1 I 2 input vector x I I

Supervised Learning in the BPN Before the learning process starts, all weights (synapses) in the network are initializedwith pseudorandom numbers. We also have to provide a set of training patterns (exemplars). They can be described as a set of ordered vector pairs {(x 1, y 1 ), (x 2, y 2 ),, (x P, y P )}. Then we can start the backpropagationlearning algorithm. This algorithm iteratively minimizes the network s error by finding the gradient of the error surface in weightspace and adjusting the weightsin the opposite direction (gradient-descent technique).

Supervised Learning in the BPN Gradient-descent example:finding the absolute minimum of a one-dimensional error function f(x): f(x) slope: f (xf 0 ) x 0 x 1 = x 0 - ηf (x 0 ) x Repeat this iteratively until for some x i, f (x i ) is sufficiently close to 0.

Supervised Learning in the BPN In the BPN, learning is performed as follows: 1. Randomly select a vector pair (x p, y p ) from the training set and call it (x, y). 2. Use xas input to the BPN and successively compute the outputs of all neurons in the network (bottom-up) until you get the network output o. 3. Compute the error δ o pk, for the pattern p across all K output layer units by using the formula: δ o pk = ( y o k k ) f '( net o k )

Supervised Learning in the BPN 4. Compute the error δ h pj, for all J hidden layer units by using the formula: δ h pj = f '( net h k ) K k = 1 δ o pk w kj 5. Update the connection-weight values to the hidden layer by using the following equation: w ji ( t + 1) = w ( t) ji + ηδ h pj x i

Supervised Learning in the BPN 6. Update the connection-weight values to the output layer by using the following equation: w kj ( t + 1) = w kj ( t) + ηδ o pk f ( net h j ) Repeat steps 1 to 6 for all vector pairs in the training set; this is called a training epoch. Run as many epochs as required to reduce the network error E to fall below a threshold ε: P K o E = δ pk ) ( p= 1 k = 1 2

Supervised Learning in the BPN The only thing that we need to know before we can start our network is the derivative of our sigmoid function, for example, f (net k ) for the output neurons: 1 e f ( net k ) = net 1+ k f '(net k ) = f (net net k k ) = o k (1 o k )

Supervised Learning in the BPN Now our BPN is ready to go! If we choose the type and number of neurons in our network appropriately, priately, after training the network should show the following behavior: If we input any of the training vectors, the network should yield the expected output vector (with some margin of error). If we input a vector that the network has never seen before, it should be able to generalize and yield a plausible output vector based on its knowledge about similar input vectors.

Self-Organizing Maps (KohonenMaps) In the BPN, we used supervised learning. This is not biologically plausible: In a biological system, there e is no external teacher who manipulates the network s s weights from outside the network. Biologically more adequate: unsupervised learning. We will study Self-Organizing Maps (SOMs( SOMs) ) as examples for unsupervised learning (Kohonen,, 1980).

Self-Organizing Maps (KohonenMaps) Such topology-conserving mapping can be achieved by SOMs: Two layers: input layer and output (map) layer Input and output layers are completely connected. Output neurons are interconnected within a defined neighborhood. A topology (neighborhood relation) is defined on the output layer.

Self-Organizing Maps (KohonenMaps) A neighborhood function φ(i,, k) indicates how closely neurons i and k in the output layer are connected to each other. Usually, a Gaussian function on the distance between the two neurons in the layer is used: φ position of i position of k

Unsupervised Learning in SOMs For n-dimensional input space and m output neurons: (1) Choose random weight vector w i for neuron i, i = 1,..., m (2) Choose random input x (3) Determine winner neuron k: w k x = min i w i x (Euclidean distance) (4) Update all weight vectors of all neurons i in the neighborhood of neuron k: w i := w i + η φ(i, k) (x w i ) (w i is shifted towards x) (5) If convergence criterion met, STOP. Otherwise, narrow neighborhood function φ and learning parameter η and go to (2).