Introduction to Neural Networks

Size: px

Start display at page:

Download "Introduction to Neural Networks"

Angelina Joseph
5 years ago
Views:

1 CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1

2 Pattern classification Which category of an input? Example: Character recognition for input images Classifier Output the category of an input Feature extraction x 1 x 2 x n input Classifier output a b c x y z Copyright by Nguyen, Hotta and Nakagawa 2

3 Supervised learning Learning by a training dataset: pair<input, target> Testing on unseen dataset Generalization ability Training dataset a Input b c Target Copyright by Nguyen, Hotta and Nakagawa 3

4 Supervised learning Learning by a training dataset: pair<input, target> Testing on unseen dataset Generalization ability Learning Classifier Prediction a output a b c x y z Copyright by Nguyen, Hotta and Nakagawa 4

5 Human neuron Neural Networks, A Simple Explanation Copyright by Nguyen, Hotta and Nakagawa 5

6 Artificial neuron Input Weights x w 1 1 w 2 x 2 net Activation function f y n net = x i w i i=1 y = f(net) w n x n Weighted connections Copyright by Nguyen, Hotta and Nakagawa 6

7 Activation function Controls when neuron should be activated tanh sigmoid linear ReLU Leaky ReLU Copyright by Nguyen, Hotta and Nakagawa 7

8 Weighted connection + Activation function A neuron is a feature detector: it is activated for a specific feature x ReLU x f x x x 2 = 0 x 1 Generated by: Copyright by Nguyen, Hotta and Nakagawa 8

9 Multi-layer perceptron (MLP) Neurons are arrange into layers Each neuron in a layer share the same input from preceding layer Simple features Complex features x 1 x 2 Layers of neurons Generated by: Copyright by Nguyen, Hotta and Nakagawa 9

10 MLP as a learnable classifier Output corresponding to an input is constrained by weighted connection These weights are learnable (adjustable) input x 1 Input layer Hidden layer Output layer output Z = h(x, W) x 2 z 1 z 2 Output Input Weight x n Weights (W) X Neural Networks (W) Z Copyright by Nguyen, Hotta and Nakagawa 10

non-linear activation function: can learn nonlinear function

11 Learning ability of neural networks Linear vs Non-linear With linear activation function: can only learn linear function With non-linear activation function: can learn nonlinear function linear sigmoid tanh relu Copyright by Nguyen, Hotta and Nakagawa 11

12 Learning ability of neural network Universal approximation theorem [Hornik, 1991]: MLP can learn arbitrary function with a single hidden layer For complex functions, however, may require large hidden layer Deep neural network Contains many hidden layers, can extract complex features Input layer Hidden layers Output layer Copyright by Nguyen, Hotta and Nakagawa 12

13 Learning in Neural Networks Weighted connection is tuned using the training data <input, target> Objective: Networks could output correct targets corresponding to inputs Training dataset Input pattern Target b Copyright by Nguyen, Hotta and Nakagawa 13

14 Learning in Neural Networks Loss function (objective function) Difference between output and target Learning: optimization process Minimize the loss (make output match target) Loss Target Output L = T Z = T h X, W = l(w) input x 1 x 2 Input layer Hidden layer Output layer output z 1 Target t 1 z k t k Weights Input x n Weights (W) Loss(L) Copyright by Nguyen, Hotta and Nakagawa 14

15 Learning in Neural Networks Gradient vector of l for W: W l W W l = l W W l W Weight update Reverse gradient direction W update = W current η l W W η:learning rate Copyright by Nguyen, Hotta and Nakagawa 15

16 Loss function Logistic regression Probabilistic loss function Binary entropy Cross entropy Multimodal Mean square error Copyright by Nguyen, Hotta and Nakagawa 16

17 Learning & converge By update weight using gradient, loss is reduced and converge to minima l w w 3 w 0 w 1 w 2 w w Copyright by Nguyen, Hotta and Nakagawa 17

18 Learning through all training samples After updating weights, new training samples is fed to the networks to continue learning When all training samples is learnt, networks has completed one epoch. Networks must run through many epochs to converge. Weight update strategy Stochastic gradient descent (SGD) Batch update Mini-batch Copyright by Nguyen, Hotta and Nakagawa 18

19 Momentum Optimizer Learning may stuck on a local minima. Momentum: w retains the latest optimizing direction. It may help the optimizer overcome the local minima. l w w 1 w w 0 w W update = W current η η: learning rate α: momentum parameter l W W + α w Copyright by Nguyen, Hotta and Nakagawa 19

20 Overfitting & Generalization While training, model complexity increases through each epoch Overfitting: Model is over-complex Poor generalization: good performance on train set but poor on test set Loss train test 1.0 Accuracy 0 Epochs Copyright by Nguyen, Hotta and Nakagawa 20

21 Prevent overfitting: Regularization Weight decaying Weight noise Early stopping Evaluate performance on a validation set Stop while there is no improvement on validation set Loss validation train Copyright by Nguyen, Hotta and Nakagawa 21

22 Prevent overfitting: Regularization Dropout Randomly drop the neurons with a predefined probability Good regularization: large ensembles of networks Bayesian perspective Copyright by Nguyen, Hotta and Nakagawa 22

23 Adam optimizer Adaptive learning rate Copyright by Nguyen, Hotta and Nakagawa 23

24 GPU implementation Keras + Tensorflow Practice Copyright by Nguyen, Hotta and Nakagawa 24

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses