Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Size: px

Start display at page:

Download "Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen"

Barbra McDaniel
5 years ago
Views:

1 Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen

2 Artificial neural network NB! Inspired by biology, not based on biology!

3 Applications Automatic speech recognition Automatic image tagging Machine translation

4 Learning objectives How artificial neural networks work? What types of artificial neural networks are used for what tasks? What are the state-of-the-art results achieved with artificial neural networks?

5 Part 1 HOW NEURAL NETWORKS WORK?

6 Frank Rosenblatt (1957) Added learning rule to McCulloch-Pitts neuron.

7 Perceptron z Prediction: 1, if xiwi b 0 i 0, otherwise Learning: w w ( y z) x i i i b b ( y z) x 1 x 2 w 1 w 2 b Σ z 1

8 Let s try it out! x 1 x 2 y = x 1 or x Algorithm: repeat 1, if x1w1 x2w2 b 0 z 0,otherwise w w ( y z) x w w ( y z) x b b ( y z) until y=z holds for entire dataset

9 Perceptron limitations Perceptron learning algorithm converges only for linearly separable problems. Minsky, Papert, Perceptrons (1969)

function can be approximated to given precision by feed-forward

10 Multi-layer perceptrons Add non-linear activation functions Add hidden layer(s) Universal approximation theorem: Any continous function can be approximated to given precision by feed-forward neural network with single hidden layer containing finite number of neurons.

Forward propagation +1 w 01 +1 v 0 w 02 a 1 = w 01 +x 1 w 11 +x 2 w 21 h 1 =σ(a 1 ) z = v 0 +h 1 v 1 +h

11 Forward propagation +1 w v 0 w 02 a 1 = w 01 +x 1 w 11 +x 2 w 21 h 1 =σ(a 1 ) z = v 0 +h 1 v 1 +h 2 v 2 x 1 w 11 Σ v 1 Σ w 12 x 2 w 21 w 22 a 2 = w 02 +x 1 w 12 +x 2 w 22 h 2 = σ(a 2 ) Σ v 2 1 ( x) 1 e x

12 Loss function Function approximation: 1 ( ) 2 L y z 2 ( 10 z) 2 Now we just need to find weight values that minimize the loss function for all inputs. How do you do that?

13 Backpropagation +1 w 01 = e a1 +1 v 0 = e z w 02 = e a2 e h1 = e z v 1 e a1 = e h1 σ (a 1 )= e h1 h 1 (1 -h 1 ) e z = y-z x 1 w 11 = e a1 x 1 w 12 = e a2 x 1 Σ v 1 = e z h 1 Σ w 21 = e a1 x 2 e h2 = e z v 2 e a2 = e h2 σ (a 2 )=e h2 h 2 (1 -h 2 ) x 2 w 22 = e a2 x 2 Σ v 2 = e z h 2 ' ( x) ( x)(1 ( x)) 1 ( x) 1 x e

14 Gradient Descent w w w ij ij ij v v v i i i learning rate Gradient descent finds weight values that result in small loss. Gradient descent is guaranteed to find only local minimum. But there is plenty of them and they are often good enough!

15 Other loss functions Binary classification: p () z L y log( p) (1 y)log(1 p) Multi-class classification: p softmax( z), p L y log p i i i i j e z i e z j log( p) log(1 p) 1 ( x) 1 e x

16 Things to remember... Perceptron was the first artificial neuron model invented in late 1950s. Perceptron can learn only linearly separable classification problems. Feed-forward networks with non-linear activation functions and hidden layers can overcome limitations of perceptrons. Multi-layer artificial neural networks are trained using backpropagation and gradient descent.

17 Part 2 NEURAL NETWORKS TAXONOMY

18 Simple feed-forward networks Architecture: Each node connected to all nodes of previous layer. Information moves in one direction only. Used for: Function approximation Simple classification problems Not too many inputs (~100) HIDDEN LAYER OUTPUT LAYER INPUT LAYER

19 Convolutional neural networks Architecture: Convolutional layer: local connections + weight sharing. Pooling layer: translation invariance. Used for: images and spatial data, any other data with locality property, i.e. adjacent characters make up word. POOLING LAYER CONVOLUTIONAL LAYER max INPUT LAYER weights: 1 0-1

20 Hubel & Wiesel (1959) Performed experiments with anesthetized cat. Discovered topographical mapping, sensitivity to orientation and hierarchical processing.

21 Convolution Convolution matches the same pattern over entire image and calculates score for each match.

22 Example: edge detector

23 Pooling Pooling achieves translation invariance by taking maximum of adjacent convolution scores.

24 Example: handwritten digit recognition Y. LeCun et al., Handwritten digit recognition: Applications of neural net chips and automatic learning, LeCun et al. (1989)

25 Recurrent neural networks Architecture: Hidden layer nodes connected to itself. Allows retaining internal state and memory. Used for: speech recognition, machine translation, language modeling, any time series. HIDDEN LAYER OUTPUT LAYER INPUT LAYER

26 Different configurations

27 Backpropagation through time y 1 y 2 y 3 y 4? OUTPUT LAYER z 1 z 2 z 3 z 4 HIDDEN LAYER h 0 h 1 h 2 h 3 h 4 INPUT LAYER x 1 x 2 x 3 x 4 time

28 Autoencoders Architecture: Input and output layers are the same. Hidden layer functions as a bottleneck. Network is trained to reconstruct input from hidden layer activations. Used for: image semantic hashing dimensionality reduction HIDDEN LAYER OUTPUT LAYER = INPUT LAYER INPUT LAYER

29 We didn t talk about... Long Short Term Memory (LSTMs) Restricted Boltzmann Machines (RBMs) Echo State Networks / Liquid State Machines Hopfield Network Self-organizing maps (SOMs) Radial basis function networks (RBFs) But we covered the most important ones!

30 Things to remember... Simple feed-forward networks are usually used for function approximation and classification with few input features. Convolutional neural networks are mostly used for images and spatial data. Recurrent neural networks are used for language modeling and time series. Autoencoders are used for image semantic hashing and dimensionality reduction.

31 Part 3 SOME STATE-OF-THE-ART RESULTS

32 Deep Learning Artificial neural networks and backpropagation have been around since 1980s. What s all this fuss about deep learning? What has changed: we have much bigger datasets, we have much faster computers (think GPUs), we have learned few tricks how to train neural networks with very many layers.

33 Revolution of Depth (human error ~5.1%)

34 Neural Image Processing

Instance Segmentation https://www.youtube.com/watch?

35 Instance Segmentation

36 Image Captioning

37 Image Captioning Errors

38 Reinforcement learning screen score Pong Breakout Space Invaders actions Seaquest Beam Rider Enduro Mnih et al., Human-level control through deep reinforcement learning (2015)

39 Skype Translator

40 Adversarial Examples

41 Things to remember... Artificial neural networks are state-of-the-art in image recognition, speech recognition, machine translation and many other fields. Anything that you can do in 1 second, probably we can train a neural network to do the same, i.e. neural nets can do perception. But in the end they are just reactive function approximators and can be easily fooled. In particular they do not think like humans (yet).

42 Thank you!

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as