1 Neural Networks Artifical Neural Networks Neural Networks Biological Neural Networks Artificial Neural Networks ANN Structure ANN Illustration Perceptrons 6 Perceptron Learning Rule Perceptrons Continued Example of Perceptron Learning (α = ) Gradient Descent 9 Gradient Descent (Linear) The Delta Rule Multilayer Networks Illustration Sigmoid Activation Plot of Sigmoid Function Applying The Chain Rule Backpropagation Hypertangent Activation Function Gaussian Activation Function Issues in Neural Networks Biological Neural Networks Neural networks are inspired by our brains. The human brain has about 0 neurons and 0 4 synapses. A neuron consists of a soma (cell body), axons (sends signals), and dendrites (receives signals). A synapse connects an axon to a dendrite. Given a signal, a synapse might increase (excite) or decrease (inhibit) electrical potential. A neuron fires when its electrical potential reaches a threshold. Learning might occur by changes to synapses. CS 533 Artificial Intelligence Artificial Neural Networks Artificial Neural Networks An (artificial) neural network consists of units, connections, and weights. Inputs and outputs are numeric. Biological NN soma axon, dendrite synapse potential threshold signal Artificial NN unit connection weight weighted sum bias weight activation CS 533 Artificial Intelligence Artificial Neural Networks 3
2 ANN Structure A typical unit i receives inputs a j, a j,... from other units and performs a weighted sum: in i = W 0,i + j W j,i a j and outputs activation a i = g(in i ). Typically, input units store the inputs, hidden units transform the inputs into an internal numeric vector, and an output unit transforms the hidden values into the prediction. An ANN is a function f(x,w) = a, where x is an example, W is the weights, and a is the prediction (activation value from output unit). Learning is finding a W that minimizes error. CS 533 Artificial Intelligence Artificial Neural Networks 4 ANN Illustration INPUT x x x3 x4 W5 W5 W35 W45 W6 W6 W36 W46 WEIGHTS HIDDEN H5 W05 bias W06 H6 W57 W07 W67 UNIT O7 CS 533 Artificial Intelligence Artificial Neural Networks 5 a7 Perceptrons 6 Perceptron Learning Rule A perceptron is a single unit with activation: a = sign (W 0 + ) j W j x j sign returns or. W 0 is the bias weight. One version of the perceptron learning rule is: in W 0 + W j x j E max(0, y in) if E > 0 then W 0 W 0 + α y W j W j + α x j y for each input x j x is the inputs, y {, } is the target, and α > 0 is the learning rate. CS 533 Artificial Intelligence Artificial Neural Networks 6 Perceptrons Continued This learning rule tends to minimize E. The perceptron convergence theorem states that if some W classifies all the training examples correctly, then the perceptron learning rule will converge to zero error on the training examples. Usually, many epochs (passes over the training examples) are needed until convergence. If zero error is not possible, use α 0./n, where n is the number of normalized or standardized inputs. CS 533 Artificial Intelligence Artificial Neural Networks 7 3 4
3 Example of Perceptron Learning (α = ) Using α = : Inputs Weights x x x 3 x 4 y in E W 0 W W W 3 W CS 533 Artificial Intelligence Artificial Neural Networks 8 The Delta Rule Given error E(W), obtain the gradient: [ E E(W) =, E,... E ] W 0 W W n To decrease error, use the update rule: W j W j α E W j The LMS update rule can be derived using: E(W) = (y (W 0 + W j x j )) / For the perceptron learning rule, use: E(W) = max(0, y (W 0 + W j x j )) CS 533 Artificial Intelligence Artificial Neural Networks 0 Gradient Descent 9 Gradient Descent (Linear) Suppose activation is a linear function: a = W 0 + W j x j The WidrowHoff (LMS) update rule is: diff y a W 0 W 0 + α diff W j W j + α x j diff for each input j where α is the learning rate. This update rule tends to minimize squared error. E(W) = (y a) (x,y) CS 533 Artificial Intelligence Artificial Neural Networks 9 Multilayer Networks Illustration INPUT x x x3 x4 H5 3 bias H6 WEIGHTS HIDDEN 4 UNIT O7 CS 533 Artificial Intelligence Artificial Neural Networks a7 5 6
4 Sigmoid Activation The sigmoid function is defined as: sigmoid(x) = + e x It is commonly used for ANN activation functions: a i = sigmoid(in i ) = sigmoid(w 0,i + j W j,i a j ) Note that sigmoid(x) = sigmoid(x)( sigmoid(x)) x CS 533 Artificial Intelligence Artificial Neural Networks Plot of Sigmoid Function sigmoid(x) Applying The Chain Rule Using E = (y i a i ) for output unit i: E W j,i = E a i in i a i in i W j,i = (y i a i ) a i ( a i ) a j For weights from input to hidden units: E W k,j = E a i in i a j in j a i in i a j in j W k,j = (y i a i ) a i ( a i ) W j,i a j ( a j ) x k CS 533 Artificial Intelligence Artificial Neural Networks 4 Backpropagation Backpropagation is an application of the delta rule. Update each weight W using the rule: W W α E W where α is the learning rate. CS 533 Artificial Intelligence Artificial Neural Networks CS 533 Artificial Intelligence Artificial Neural Networks 3 7 8
5 Hypertangent Activation Function tanh(x) CS 533 Artificial Intelligence Artificial Neural Networks 6 Issues in Neural Networks Sufficiently complex ANNs can approximate any reasonable function. ANNs approx. a preference bias for interpolation. How many hidden units and layers? What are good initial values? One approach to avoid overfitting is: Remove validation exs. from training exs. Train neural network using training exs. Choose weights that are best on validation set. Faster algorithms might rely on: Momentum, a running average of deltas. Conjugant Gradient, a second deriv. method, or RPROP, update based only on sign. CS 533 Artificial Intelligence Artificial Neural Networks 8 Gaussian Activation Function 0.8 gaussian(x,) CS 533 Artificial Intelligence Artificial Neural Networks 7 9 0
More information