Neural networks. Chapter 20, Section 5 1


 Jennifer Warren
 10 months ago
 Views:
Transcription
1 Neural networks Chapter 20, Section 5 Chapter 20, Section 5
2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2
3 Brains 0 neurons of > 20 types, 0 4 synapses, ms 0ms cycle time Signals are noisy spike trains of electrical potential Axonal arborization Synapse Axon from another cell Dendrite Axon Nucleus Synapses Cell body or Soma Chapter 20, Section 5 3
4 McCulloch Pitts unit Output is a squashed linear function of the inputs: a i g(in i ) = g ( Σ j W j,i a j ) a 0 = a i = g(in i ) Wj,i a j Bias Weight W 0,i Σ in i g a i Input Links Input Function Activation Function Output Output Links A gross oversimplification of real neurons, but its purpose is to develop understanding of what networks of simple units can do Chapter 20, Section 5 4
5 Activation functions g(in i ) g(in i ) + + (a) in i (b) in i (a) is a step function or threshold function (b) is a sigmoid function /( + e x ) Changing the bias weight W 0,i moves the threshold location Chapter 20, Section 5 5
6 Implementing logical functions W 0 =.5 W 0 = 0.5 W 0 = 0.5 W = W 2 = W = W 2 = W = AND OR NOT McCulloch and Pitts: every Boolean function can be implemented Chapter 20, Section 5 6
7 Feedforward networks: singlelayer perceptrons multilayer perceptrons Network structures Feedforward networks implement functions, have no internal state Recurrent networks: Hopfield networks have symmetric weights (W i,j = W j,i ) g(x) = sign(x), a i = ± ; holographic associative memory Boltzmann machines use stochastic activation functions, MCMC in Bayes nets recurrent neural nets have directed cycles with delays have internal state (like flipflops), can oscillate etc. Chapter 20, Section 5 7
8 Feedforward example W,3 W,4 3 W 3,5 5 2 W 2,3 W 2,4 4 W 4,5 Feedforward network = a parameterized family of nonlinear functions: a 5 = g(w 3,5 a 3 + W 4,5 a 4 ) = g(w 3,5 g(w,3 a + W 2,3 a 2 ) + W 4,5 g(w,4 a + W 2,4 a 2 )) Adjusting weights changes the function: do learning this way! Chapter 20, Section 5 8
9 Singlelayer perceptrons Input Units Perceptron output Output 0 x 2 W j,i Units 2 4 x 2 Output units all operate separately no shared weights Adjusting weights moves the location, orientation, and steepness of cliff Chapter 20, Section 5 9
10 Expressiveness of perceptrons Consider a perceptron with g = step function (Rosenblatt, 957, 960) Can represent AND, OR, NOT, majority, etc., but not XOR Represents a linear separator in input space: Σ j W j x j > 0 or W x > 0 x x x? 0 0 x x x 2 (a) x and x 2 (b) x or x2 (c) x xor x2 Minsky & Papert (969) pricked the neural network balloon Chapter 20, Section 5 0
11 Perceptron learning Learn by adjusting weights to reduce error on training set The squared error for an example with input x and true output y is E = 2 Err 2 2 (y h W(x)) 2, Perform optimization search by gradient descent: E W j = Err Err = Err W j W j = Err g (in) x j Simple weight update rule: W j W j + α Err g (in) x j ( y g(σ n j = 0W j x j ) ) E.g., +ve error increase network output increase weights on +ve inputs, decrease on ve inputs Chapter 20, Section 5
12 Perceptron learning contd. Perceptron learning rule converges to a consistent function for any linearly separable data set Proportion correct on test set Perceptron Decision tree Proportion correct on test set Perceptron Decision tree Training set size  MAJORITY on inputs Training set size  RESTAURANT data Perceptron learns majority function easily, DTL is hopeless DTL learns restaurant function easily, perceptron cannot represent it Chapter 20, Section 5 2
13 Multilayer perceptrons Layers are usually fully connected; numbers of hidden units typically chosen by hand Output units a i W j,i Hidden units a j W k,j Input units a k Chapter 20, Section 5 3
14 Expressiveness of MLPs All continuous functions w/ 2 layers, all functions w/ 3 layers h W (x, x 2 ) x x 2 h W (x, x 2 ) x x 2 Combine two oppositefacing threshold functions to make a ridge Combine two perpendicular ridges to make a bump Add bumps of various sizes and locations to fit any surface Proof requires exponentially many hidden units (cf DTL proof) Chapter 20, Section 5 4
15 Backpropagation learning Output layer: same as for singlelayer perceptron, W j,i W j,i + α a j i where i = Err i g (in i ) Hidden layer: backpropagate the error from the output layer: j = g (in j ) W j,i i. i Update rule for weights in hidden layer: W k,j W k,j + α a k j. (Most neuroscientists deny that backpropagation occurs in the brain) Chapter 20, Section 5 5
16 Backpropagation derivation The squared error on a single example is defined as E = 2 i (y i a i ) 2, where the sum is over the nodes in the output layer. E W j,i = (y i a i ) a i W j,i = (y i a i ) g(in i) W j,i = (y i a i )g (in i ) in i = (y i a i )g (in i ) W j,i = (y i a i )g (in i )a j = a j i W j,i j W j,ia j Chapter 20, Section 5 6
17 Backpropagation derivation contd. E W k,j = i (y i a i ) a i W k,j = i (y i a i ) g(in i) = (y i a i )g (in i ) in i = i W i k,j i W k,j W k,j = i iw j,i a j W k,j = i iw j,i g(in j ) W k,j = i iw j,i g (in j ) in j = i iw j,i g (in j ) W k,j W k,j k W k,j a k j W j,ia j = i iw j,i g (in j )a k = a k j Chapter 20, Section 5 7
18 Backpropagation learning contd. At each epoch, sum gradient updates for all examples and apply Training curve for 00 restaurant examples: finds exact fit 4 Total error on training set Number of epochs Typical problems: slow convergence, local minima Chapter 20, Section 5 8
19 Backpropagation learning contd. Learning curve for MLP with 4 hidden units: Proportion correct on test set Decision tree Multilayer network Training set size  RESTAURANT data MLPs are quite good for complex pattern recognition tasks, but resulting hypotheses cannot be understood easily Chapter 20, Section 5 9
20 Handwritten digit recognition 3nearestneighbor = 2.4% error unit MLP =.6% error LeNet: unit MLP = 0.9% error Current best (kernel machines, vision algorithms) 0.6% error Chapter 20, Section 5 20
21 Summary Most brains have lots of neurons; each neuron linear threshold unit (?) Perceptrons (onelayer networks) insufficiently expressive Multilayer networks are sufficiently expressive; can be trained by gradient descent, i.e., error backpropagation Many applications: speech, driving, handwriting, fraud detection, etc. Engineering, cognitive modelling, and neural system modelling subfields have largely diverged Chapter 20, Section 5 2