Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018
Artificial neural network NB! Inspired by biology, not based on biology!
Applications Automatic speech recognition Automatic image tagging Machine translation
Learning objectives How artificial neural networks work? What types of artificial neural networks are used for what tasks? What are the state-of-the-art results achieved with artificial neural networks?
Part 1 HOW NEURAL NETWORKS WORK?
Frank Rosenblatt (1957) Added learning rule to McCulloch-Pitts neuron.
Perceptron z Prediction: 1, if xiwi b 0 i 0, otherwise Learning: w w ( y z) x i i i b b ( y z) x 1 x 2 w 1 w 2 b Σ z 1
Let s try it out! x 1 x 2 y = x 1 or x 2 0 0 0 0 1 1 1 0 1 1 1 1 Algorithm: repeat 1, if x1w1 x2w2 b 0 z 0,otherwise w w ( y z) x 1 1 1 w w ( y z) x 2 2 2 b b ( y z) until y=z holds for entire dataset
Perceptron limitations Perceptron learning algorithm converges only for linearly separable problems. Minsky, Papert, Perceptrons (1969)
Multi-layer perceptrons Add non-linear activation functions Add hidden layer(s) Universal approximation theorem: Any continous function can be approximated to given precision by feed-forward neural network with single hidden layer containing finite number of neurons.
Forward propagation +1 w 01 +1 v 0 w 02 a 1 = w 01 +x 1 w 11 +x 2 w 21 h 1 =σ(a 1 ) z = v 0 +h 1 v 1 +h 2 v 2 x 1 w 11 Σ v 1 Σ w 12 x 2 w 21 w 22 a 2 = w 02 +x 1 w 12 +x 2 w 22 h 2 = σ(a 2 ) Σ v 2 1 ( x) 1 e x
Loss function Function approximation: 1 ( ) 2 L y z 2 ( 10 z) 2 Now we just need to find weight values that minimize the loss function for all inputs. How do you do that?
Backpropagation +1 w 01 = e a1 +1 v 0 = e z w 02 = e a2 e h1 = e z v 1 e a1 = e h1 σ (a 1 )= e h1 h 1 (1 -h 1 ) e z = y-z x 1 w 11 = e a1 x 1 w 12 = e a2 x 1 Σ v 1 = e z h 1 Σ w 21 = e a1 x 2 e h2 = e z v 2 e a2 = e h2 σ (a 2 )=e h2 h 2 (1 -h 2 ) x 2 w 22 = e a2 x 2 Σ v 2 = e z h 2 ' ( x) ( x)(1 ( x)) 1 ( x) 1 x e
Gradient Descent w w w ij ij ij v v v i i i learning rate Gradient descent finds weight values that result in small loss. Gradient descent is guaranteed to find only local minimum. But there is plenty of them and they are often good enough!
Other loss functions Binary classification: p () z L y log( p) (1 y)log(1 p) Multi-class classification: p softmax( z), p L y log p i i i i j e z i e z j log( p) log(1 p) 1 ( x) 1 e x
Things to remember... Perceptron was the first artificial neuron model invented in late 1950s. Perceptron can learn only linearly separable classification problems. Feed-forward networks with non-linear activation functions and hidden layers can overcome limitations of perceptrons. Multi-layer artificial neural networks are trained using backpropagation and gradient descent.
Part 2 NEURAL NETWORKS TAXONOMY
Simple feed-forward networks Architecture: Each node connected to all nodes of previous layer. Information moves in one direction only. Used for: Function approximation Simple classification problems Not too many inputs (~100) HIDDEN LAYER OUTPUT LAYER INPUT LAYER
Convolutional neural networks Architecture: Convolutional layer: local connections + weight sharing. Pooling layer: translation invariance. Used for: images and spatial data, any other data with locality property, i.e. adjacent characters make up word. POOLING LAYER CONVOLUTIONAL LAYER -2 2 2 max 2 1 0 1 2-1 INPUT LAYER 2 1-3 weights: 1 0-1
Hubel & Wiesel (1959) Performed experiments with anesthetized cat. Discovered topographical mapping, sensitivity to orientation and hierarchical processing.
Convolution Convolution matches the same pattern over entire image and calculates score for each match.
Example: edge detector https://developer.apple.com/library/ios/documentation/performance/conceptual/vimage/convolutionoperations/convolutionoperations.html
Pooling Pooling achieves translation invariance by taking maximum of adjacent convolution scores.
Example: handwritten digit recognition Y. LeCun et al., Handwritten digit recognition: Applications of neural net chips and automatic learning, 1989. LeCun et al. (1989)
Recurrent neural networks Architecture: Hidden layer nodes connected to itself. Allows retaining internal state and memory. Used for: speech recognition, machine translation, language modeling, any time series. HIDDEN LAYER OUTPUT LAYER INPUT LAYER
Different configurations
Backpropagation through time y 1 y 2 y 3 y 4? OUTPUT LAYER z 1 z 2 z 3 z 4 HIDDEN LAYER h 0 h 1 h 2 h 3 h 4 INPUT LAYER x 1 x 2 x 3 x 4 time
Autoencoders Architecture: Input and output layers are the same. Hidden layer functions as a bottleneck. Network is trained to reconstruct input from hidden layer activations. Used for: image semantic hashing dimensionality reduction HIDDEN LAYER OUTPUT LAYER = INPUT LAYER INPUT LAYER
We didn t talk about... Long Short Term Memory (LSTMs) Restricted Boltzmann Machines (RBMs) Echo State Networks / Liquid State Machines Hopfield Network Self-organizing maps (SOMs) Radial basis function networks (RBFs) But we covered the most important ones!
Things to remember... Simple feed-forward networks are usually used for function approximation and classification with few input features. Convolutional neural networks are mostly used for images and spatial data. Recurrent neural networks are used for language modeling and time series. Autoencoders are used for image semantic hashing and dimensionality reduction.
Part 3 SOME STATE-OF-THE-ART RESULTS
Deep Learning Artificial neural networks and backpropagation have been around since 1980s. What s all this fuss about deep learning? What has changed: we have much bigger datasets, we have much faster computers (think GPUs), we have learned few tricks how to train neural networks with very many layers.
Revolution of Depth (human error ~5.1%)
Neural Image Processing
Instance Segmentation https://www.youtube.com/watch?v=oot3uixzzte https://github.com/matterport/mask_rcnn
Image Captioning
Image Captioning Errors
Reinforcement learning screen score Pong Breakout Space Invaders actions Seaquest Beam Rider Enduro http://sodeepdude.blogspot.com/2015/03/deepminds-atari-paper-replicated.html Mnih et al., Human-level control through deep reinforcement learning (2015)
Skype Translator https://www.youtube.com/watch?v=nhxcg2pa3zi
Adversarial Examples https://www.youtube.com/watch?v=xaqu7kkqbpc
Things to remember... Artificial neural networks are state-of-the-art in image recognition, speech recognition, machine translation and many other fields. Anything that you can do in 1 second, probably we can train a neural network to do the same, i.e. neural nets can do perception. But in the end they are just reactive function approximators and can be easily fooled. In particular they do not think like humans (yet).
Thank you! tambet@ut.ee