Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 / 27
Outline 1 Introduction 2 Neural Networks - Architecture 3 Network Training 4 Small Example - ZIP Codes 5 Summary Henrik I Christensen (RIM@GT) Neural Networks 2 / 27
Introduction Initial motivation for design from modelling of neural systems Perceptrons emerged about same time as we started to have real neural data Studies of functional specialization in the brain Henrik I Christensen (RIM@GT) Neural Networks 3 / 27
Neurons - the motivation Henrik I Christensen (RIM@GT) Neural Networks 4 / 27
Neural Code Example Henrik I Christensen (RIM@GT) Neural Networks 5 / 27
Outline Outline of ANN architecture Formulation of the criteria function Optimization of weights Example from image analysis Next time: Bayesian Neural Networks Henrik I Christensen (RIM@GT) Neural Networks 6 / 27
Outline 1 Introduction 2 Neural Networks - Architecture 3 Network Training 4 Small Example - ZIP Codes 5 Summary Henrik I Christensen (RIM@GT) Neural Networks 7 / 27
Data Process w. Two-Layer Neural Network w T x h(.) w T z σ(x) Henrik I Christensen (RIM@GT) Neural Networks 8 / 27
Neural Net Architecture as a Graph hidden units x D w (1) MD z M w (2) KM y K inputs outputs y 1 x 1 x 0 z 1 w (2) 10 z 0 Henrik I Christensen (RIM@GT) Neural Networks 9 / 27
Neural Network Equations Consider an input layer a j = D i=0 w (1) ji x i where w j0 and x 0 represent the bias weight / term The activation, a j, is mapped by an activation function z j = h(a j ) which typically is a Sigmoid or tanh The output is considered the hidden activations Output unit activations are computed, similarly a k = M j=0 w (2) kj z j Henrik I Christensen (RIM@GT) Neural Networks 10 / 27
Neural Networks - A few more details The full system is then M y k (x, w) = σ w (2) kj h j=0 ( D i=0 ) w (1) ji x i The information is flowing forward through the system Naming is sometimes complicated! 3-layer network single-hidden-layer network two-layer network (input/output) Henrik I Christensen (RIM@GT) Neural Networks 11 / 27
Outline 1 Introduction 2 Neural Networks - Architecture 3 Network Training 4 Small Example - ZIP Codes 5 Summary Henrik I Christensen (RIM@GT) Neural Networks 12 / 27
Training Neural Networks For optimization we consider the error function: E(w) = N y(x n, w) t n 2 n=1 The optimization is similar to earlier searches Objective E(w) = 0 Due to non-linearity closed form solution is a challenge Newton-Raphson type solutions are possible w = H 1 E w Often an iterated solution is realistic w (τ+1) = w (τ) η E(w (τ) ) Henrik I Christensen (RIM@GT) Neural Networks 13 / 27
Error Backpropagation Consider the error composed of parts N E(w) = E n (w) Considering errors by parts we get n=1 y k = i w ki x i with the error the associated gradient is E n = 1 (y nk t nk ) 2 2 k E n w ji = (y nj t nj )x ni Henrik I Christensen (RIM@GT) Neural Networks 14 / 27
Computing gradients Given a j = i w ji z i and The gradient is (using chain rule) z j = h(a j ) We already know and E n w ji = E n a j E n a j a j w ji = (y k t j ) = δ j a j w ji = z i Henrik I Christensen (RIM@GT) Neural Networks 15 / 27
Updating of weights Updating backwards in the systems z i w ji δ j w kj δ k z j δ 1 Error Propagation δ j = h (a j ) k w kj δ k Henrik I Christensen (RIM@GT) Neural Networks 16 / 27
Update Algorithm 1 Enter a training sample x n, propagate and compare to expected value t n, y(x n ) 2 Evaluate δ k at all outputs 3 Backpropagate δ to correct hidden unit weights 4 Evaluate derivatives to correct input level weights Henrik I Christensen (RIM@GT) Neural Networks 17 / 27
Issues related to training of networks The Sigmoid is linear at 0 so random values around 0 is a good start. Be aware that training a network too much could result in over fitting There can be multiple hidden layers Henrik I Christensen (RIM@GT) Neural Networks 18 / 27
Outline 1 Introduction 2 Neural Networks - Architecture 3 Network Training 4 Small Example - ZIP Codes 5 Summary Henrik I Christensen (RIM@GT) Neural Networks 19 / 27
Small Example From (Le Cun 1989) on state of the art of ANN s for recognition Recognition of handwritten characters has been widely studied Still considered an important benchmark for new recognition methods Henrik I Christensen (RIM@GT) Neural Networks 20 / 27
ZIP code data Data normalized to 16x16 pixels 320 digits in training set and 160 digits in test set Henrik I Christensen (RIM@GT) Neural Networks 21 / 27
Different types of networks No hidden layer - pure 1 level regression 1 hidden layer with 12 hidden units - fully connected 2 hidden layers and local connectivity 2 hidden layers, locally connected and weight sharing 2 hidden layers, locally connected and 2 level weight sharing Henrik I Christensen (RIM@GT) Neural Networks 22 / 27
Example - Net Architectures Henrik I Christensen (RIM@GT) Neural Networks 23 / 27
Example Results Henrik I Christensen (RIM@GT) Neural Networks 24 / 27
Example - Summary Careful design of network architectures is important Neural Networks offer a rich variety of solutions Later results have shown improved performance with SVN s Henrik I Christensen (RIM@GT) Neural Networks 25 / 27
Outline 1 Introduction 2 Neural Networks - Architecture 3 Network Training 4 Small Example - ZIP Codes 5 Summary Henrik I Christensen (RIM@GT) Neural Networks 26 / 27
Summary Neural networks are general approximators Useful both for regression and discrimination Some would term them - self-parameterized lookup tables There is a rich community engaged in design of systems Rich variety of optimization techniques Henrik I Christensen (RIM@GT) Neural Networks 27 / 27