Artificial Neural Network Eung Je Woo Department of Biomedical Engineering Impedance Imaging Research Center (IIRC) Kyung Hee University Korea ejwoo@khu.ac.kr
Neuron and Neuron Model McCulloch and Pitts (1943) m y k = φ v k = φ w kj x j j=0 x 0 = 1 and w k0 = b k 2
Activation Function Threshold function Heaviside function Signum function S -shaped function Sigmoid function 1 if v 0 φ v = ቊ 0 if v < 0 1 if v > 0 φ v = ቐ0 if v = 0 1 if v < 0 φ v = Hyperbolic tangent function 1 1 + e av φ v = tanh v 3
Activation Function M. Hagan et al., 2017, Neural Network Design 4
Representation of Neuron M. Hagan et al., 2017, Neural Network Design Signal-flow Graph Architectural Graph Architectural Diagram 5
Feedforward Network Single-layer Feedforward Network Multilayer Feedforward Network (m-h-q) = (10-4-2) 6
Feedforward Network M. Hagan et al., 2017, Neural Network Design Single-layer Feedforward Network Multilayer Feedforward Network 7
Network using Delay M. Hagan et al., 2017, Neural Network Design Delay Recurrent Network Hamming Network Hopfield Network 8
Recurrent Network Recurrent Network with No Selffeedback Loops and No Hidden Neurons Recurrent Network with Hidden Neurons 9
Basic Setting Knowledge refers to stored information or models used by a person or machine to interpret, predict, and appropriately respond to the outside world. The known world state, represented by facts about what is and what has been known; this form of knowledge is referred to as prior information. Observations (noisy measurements) of the world, obtained by means of sensors designed to probe the environment, in which the neural network is supposed to operate. The observations provide the pool of information, from which the examples used to train the neural network are drawn. Labeled examples: pairs of (input signal, target output): training samples Unlabeled examples Three steps of machine learning Training (learning) Testing Generalization 10
Neural Network Rule Rule 1. Similar inputs (i.e., patterns drawn) from similar classes should usually produce similar representations inside the network, and should therefore be classified as belonging to the same class. Rule 2. Items to be categorized as separate classes should be given widely different representations in the network. Rule 3. If a particular feature is important, then there should be a large number of neurons involved in the representation of that item in the network. Rule 4. Prior information and invariances should be built into the design of a neural network whenever they are available, so as to simplify the network design by its not having to learn them. 11
Build Prior Information Receptive field Weight sharing 6 v j = w i x i+j 1 for j = 1,2,3,4 i=1 12
Build Invariance The system must be capable of coping with a range of transformations of the observed signal such as image rotations, signal amplitude changes, etc. Invariance by structure using same weights for chosen connections Invariance by training using a large set of examples including all possible transformations Invariance by feature space using preprocessing to extract invariant features 13
Perceptron Rosenblatt (1958) y = φ v = φ w T x = sgn(w T x) = ቊ 1 if wt x > 0 1 if w T x 0 w T x = 0 : hyperplane in m-dimensional signal space Linear classifier Linearly Separable Non-linearly Separable 14
Perceptron Convergence Algorithm x(n) d(n) w(n) y(n) e(n) 15
Error-correction Learning Quantized response x(n) y(n) = sgn w T (n)x(n) Quantized desired response d(n) = ቊ +1 if x n belongs to class C 1 1 if x n belongs to class C 2 Error signal e n = d n y n Error-correction learning rule w n + 1 = w n + ηe(n)x(n) Learning rate parameter 0 < η 1 Small η for more averaging and stable weight estimates Large η for fast adaptation w(n) y(n) e(n) d(n) 16
Batch Perceptron Algorithm Perceptron cost function J w = w T n x n d n for a set of misclassified samples H x n H Gradient vector x(n) J(w) = x n d(n) d(n) x(n) H w(n) y(n) =,,, w 1 w 2 w m T e(n) Steepest descent method to minimize J w w n + 1 = w n η(n) J(w) w n + 1 = w n + η(n) x(n) H x n d(n) 17
Bayes Classifier Average risk R = c 11 p 1 න H 1 p X xหc 1 dx + c 22 p 2 න H 2 p X xหc 2 dx + c 21 p 1 න H 2 p X xหc 1 dx + c 12 p 2 න H 1 p X xหc 2 dx = c 11 p 1 න H 1 p X xหc 1 dx + c 22 p 2 න H H 1 p X xหc 2 dx + c 21 p 1 න H H 1 p X xหc 1 dx + c 12 p 2 න H 1 p X xหc 2 dx = c 21 p 1 + c 22 p 2 + න H 1 p 2 c 12 c 22 p X xหc 2 p 1 (c 21 c 11 )p X xหc 1 dx Bayes classifier H = H 1 + H 2, c 11 < c 21, c 22 < c 12, න H If p 1 c 21 c 11 p X xหc 1 > p 2 c 12 c 22 p X xหc 2, assign x to H 1 (class C 1 ). Otherwise, assign x to H 2 (class C 2 ). If Λ x > ξ, assign x to H 1 (class C 1 ). Otherwise, assign x to H 2 (class C 2 ). p X xหc 1 dx = න H p X xหc 1 dx = 1 log Λ x = log p X xหc 1 p X xหc 2 logξ = log p 2 c 12 c 22 p 1 c 21 c 11 Log-likelihood Ratio Log-threshold 18
Bayes Classifier for Gaussian Gaussian classifier 19
Multilayer Perceptron 20
Backpropagation Learning 21
Backpropagation Learning Output layer C at n th iteration and j th neuron M j ji i y j[ n] j v j[ n] i0 v [ n] w y [ n] : Neural network e [ n] d [ n] y [ n] j j j E 1 2 [ n] e j[ n] 2 jc : Error and objective function E[ n] E[ n] e j[ n] y j[ n] v j[ n] w [ n] e [ n] y [ n] v [ n] w [ n] ji j j j ji n e [ n] 1 v [ n] y [ n] j j j i : Chain rule E[ n] w e n v n y n n y n n ji j[ ] j j[ ] i[ ] j[ ] i[ ] wji[ n] : LMS algorithm E[ n] E[ n] e [ n] y [ n] j j n j[ n] e j[ n] j v j[ n] v j[ n] e j[ n] y j[ n] v j[ n] 22
Backpropagation Learning Hidden layer C at n th iteration and j th neuron (k th neuron in next layer) E E[ n] E[ n] y [ n] E[ n] j n j[ n] j v j[ n] v j[ n] y j[ n] v j[ n] y j[ n] 1 2 [ n] ek [ n] 2 kc E[ n] ek[ n] ek[ n] vk[ n] ek[ n] ek[ n] y [ n] y [ n] v [ n] y [ n] j k j k k j k n v [ n] e [ n] d [ n] y [ n] d [ n] v [ n] k k k k k k M v [ n] w y [ n] k kj j j0 E[ n] vk[ n] w y [ n] n y [ n] j k j kj [ n] e [ n] v [ n] e [ n] v [ n] w [ n] [ n] w [ n] k k k kj k kj k k k k : Neural network : Error backpropagation : Neural network : Error backpropagation n j[ n] j v j[ n] k[ n] wkj[ n] k w [ n] y [ n] ji j i : LMS algorithm 23
Output Representation (M-class Classification) ANN is trained by using the training data set N x[ i], d[ i] i 1 For a new input x j, the ANN produced the output T y j y1, j,, ym, j F1 ( x j),, FM ( x j) F( x j) T Assign the input to a single class x C if F ( x ) F ( x ) for all l k j k k j l j Assign the input to multiple classes x C if F ( x ) threshold ex, 0.5 j k k j 24
Generalization ANN generalizes well when the input-output is correct for a test data never used in creating or training the network Apply cross-validation Proper size of training data set Proper size of validation data set Avoid overfitting Trade-off between bias and variance Adjust ANN architecture 25
Multilayer Perceptron T. Hastie et al., 2008, The Elements of Statistical Learning 26
Obesity Multilayer Perceptron T. Hastie et al., 2008, The Elements of Statistical Learning Systolic Blood Pressure Corpus Collosum Age 27
Algorithm Selection T. Hastie et al., 2008, The Elements of Statistical Learning Linear Regression of 0/1 Response 15-Nearest Neighbor Classifier 1 Nearest Neighbor Classifier 28 Bayes Optimal Classifier
EOD