Let your machine do the learning

Similar documents
CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Lecture 17: Neural Networks and Deep Learning

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Deep Learning (CNNs)

Convolutional Neural Networks. Srikumar Ramalingam

Jakub Hajic Artificial Intelligence Seminar I

Convolutional Neural Networks

Deep Learning: a gentle introduction

Introduction to Neural Networks

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

SGD and Deep Learning

Bayesian Networks (Part I)

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

Neural networks and optimization

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CSCI 315: Artificial Intelligence through Deep Learning

From perceptrons to word embeddings. Simon Šuster University of Groningen

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Deep Neural Networks

A summary of Deep Learning without Poor Local Minima

Deep Feedforward Networks. Sargur N. Srihari

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Feature Design. Feature Design. Feature Design. & Deep Learning

CSE446: Neural Networks Winter 2015

arxiv: v1 [cs.cv] 11 May 2015 Abstract

Unit 8: Introduction to neural networks. Perceptrons

Convolutional Neural Network Architecture

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Course 395: Machine Learning - Lectures

Neural Networks: Introduction

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Convolutional neural networks

Very Deep Convolutional Neural Networks for LVCSR

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks: Backpropagation

Neural Networks biological neuron artificial neuron 1

Artificial Intelligence

Grundlagen der Künstlichen Intelligenz

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Understanding How ConvNets See

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Lecture 7 Convolutional Neural Networks

Artificial Neural Networks

CSC321 Lecture 16: ResNets and Attention

Artificial Neural Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Introduction to Deep Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Architectures for Image, Language, and Speech Processing

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Deep Feedforward Networks

Based on the original slides of Hung-yi Lee

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Νεςπο-Ασαυήρ Υπολογιστική Neuro-Fuzzy Computing

Introduction to Deep Learning

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Introduction to Machine Learning Spring 2018 Note Neural Networks

Lecture 5: Logistic Regression. Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Demystifying deep learning. Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Neural Networks and Deep Learning

ECE521 Lectures 9 Fully Connected Neural Networks

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Deep Recurrent Neural Networks

CSC321 Lecture 5: Multilayer Perceptrons

Computational statistics

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Statistical Machine Learning

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Neural networks and optimization

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Perceptrons and Backpropagation

Theories of Deep Learning

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Lecture 6. Notes on Linear Algebra. Perceptron

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

10. Artificial Neural Networks

Introduction to Convolutional Neural Networks (CNNs)

Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Transcription:

Let your machine do the learning Neural Networks Daniel Hugo Cámpora Pérez Universidad de Sevilla CERN Inverted CERN School of Computing, 6th - 8th March, 2017 Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 1 / 47

Outline 1 Introduction 2 Neural networks 3 Convolutional neural networks 4 Where to go from here 5 Bibliography Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 2 / 47

Introduction Section 1 Introduction Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 3 / 47

Introduction In the last episode A perceptron consists of: Inputs 1, x 1, x 2,..., x n Weights w 0, w 1, w 2,..., w n Activation function f (x) With these ingredients, the output of the perceptron is calculated as follows, y = f ( n w j x j ) (1) j=0 Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 4 / 47

Introduction In the last episode (2) We can apply the gradient descent technique we have already seen for learning the weights of the perceptron. The cost function given by the quadratic error for a single training example (x µ, y µ ) is, let input µ = n w j x µj j=0 E( w) = 1 2 (f (input µ) y µ ) 2 (2) The generic gradient descent can be then formulated as, j 1,...,n w j := w j + α(y µ f (input µ ))f (input µ )x µj (3) Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 5 / 47

Neural networks Section 2 Neural networks Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 6 / 47

Neural networks Connecting the dots In our previous lecture, we extensively covered the case study for a single perceptron, and found they can only express linearly separable functions, which does not make them particularly interesting. However, we can overcome this limitation by connecting several perceptrons into a neural network. Daniel Hugo Ca mpora Pe rez Let your machine do the learning icsc 2017 7 / 47

Neural networks Network topology In a network, the topology defines which neurons are connected to which others. Each connection is calibrated with an appropriate weight. In general connections need not be symmetric. Among the most common topological structures, we find the layer. Neurons on the same layer do not have any interaction with each other. Instead, they interact with other layers. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 8 / 47

Neural networks Feedforward networks Feedforward networks have their neurons organised in layers. The input layer receives external data, and the output layer produces the result. In between, there may be any number of hidden layers. The information moves in only one direction (forward) from the input layers, through the hidden layers, and until the output layers. There are no cycles on these networks. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 9 / 47

Neural networks Other network topologies Be aware that many other topologies exist, although we are not going to discuss them here. Suffice it to say that, upon the problem under study, the choice of the topology is an important one. Figure 1: Extensive neural network topology chart. (http://www.asimovinstitute.org/neural-network-zoo/) Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 10 / 47

Neural networks Multilayer perceptron Now that we consider a network consisting of a number of perceptrons, we can generalize our definition for a particular topology. Let s consider a multilayer perceptron as the building block of our feedforward network. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 11 / 47

Neural networks Multilayer perceptrons (2) The multilayer preceptron represents a function: f : R n R m x y Cybenko (1989 [5]) Multilayer perceptrons with one hidden unit are universal approximators of continuous functions (or with a finite number of discontinuities). Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 12 / 47

Neural networks Gradient descent In our last course, we covered the gradient descent method. We used it to learn the θ terms for linear regressions: j 0,...,n θ j := θ j α θ j E(θ) (4) We also used it for learning the weights of a single perceptron: j 1,...,n w j := w j + α(y µ f (input µ ))f (input µ )x j (5) Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 13 / 47

Neural networks Backpropagation The algorithm of backpropagation consists in applying gradient descent techniques on the quadratic error for perceptrons. This algorithm revolutionized the machine learning field in the mid-80s, as it allowed a neural network of an arbitrary complexity to be effectively trained. Here, we will focus on multilayer perceptrons, and derive the backpropagation algorithm for them. Adjust your seat belts. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 14 / 47

Neural networks Backpropagation (2) If x [c] i denotes the state of neuron i on layer c, then: x [c] i = f ( n j=0 w [c 1] ij x [c 1] j ) (6) It is worth taking some time to get the notation right, as we start populating indices up and down. The superscript c refers to the layers. In each layer, we have an arbitrary number of perceptrons, denoted by subscript i. Lastly, the subscript j refers to the inputs and weights of perceptron i. x [0] i = x i corresponds to the input layer, and x [N] i = y i to the output layer. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 15 / 47

Neural networks Backpropagation (3) When the input is fed with training pattern (x µ, y µ ), the neuron states take the corresponding values: x [c] µi, x [0] µi = x µi, x [N] µi = y µi (7) Just as we did for the simple perceptron we define the quadratic error and propose the gradient descent optimisation technique, E w [c] ij E( w) = 1 2 = N N µ=1 r=0 n µ=1 i=0 n (y µr x [c] (x [c] µi y µi ) 2 (8) µr ) [c] x µr w [c] ij (9) Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 16 / 47

Neural networks Backpropagation (4) We will not cover the development of this equation in depth. Suffice it to say that for the last layer we have the same formula as we had for the simple perceptron. Therefore, the backpropagation learning rule is defined as follows: where, i,j w [c] ij := w [c] ij δ [N] µi = (y µi x [N] n δ µs [k] k+1 = r=0 + α n N 1 µi )f ( δ [k+1] µr w [k] N µ=i s=0 n k 1 rs f ( t=0 δ [c+1] µi x [c] µj (10) w [N 1] is x µs [N 1] ) w [k 1] st x [k 1] µt ) Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 17 / 47

Neural networks Backpropagation (5) Here is some visual aid to help you through the formulae: (a) Detail of layer notation. (b) Detail of neuron notation. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 18 / 47

Neural networks Remarks about backpropagation Modification of each weight depends only on objects living on the neurons connected by its weight. Weight variations are larger for the last layers and very small for inner layers. Large networks are very hard to train. Learning stops by fixed number of epochs overtraining target error achieved Weights are usually randomly initialised, with small values. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 19 / 47

Neural networks A visual example Figure 2: Tensorflow has some pretty examples to visualize the learning process. (http://playground.tensorflow.org) Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 20 / 47

Convolutional neural networks Section 3 Convolutional neural networks Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 21 / 47

Convolutional neural networks What is a convolution? An image convolution is a transformation pixel by pixel, done by applying to an image some transformation defined by a set of weights, also known as a filter. Let s be a set of source pixels, and w a set of weights, a pixel y is transformed as follows: n y = s i w i (11) i=0 Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 22 / 47

Convolutional neural networks Examples of filters Identity Edge detection Sharpen Blur 0 0 0 0 1 0 0 0 0 1 1 1 1 8 1 1 1 1 0 1 0 1 5 1 0 1 0 1 1 1 1 1 1 1 9 1 1 1 Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 23 / 47

Convolutional neural networks Dealing with images Convolutional neural networks are very similar to ordinary neural networks: They are made up of neurons with learnable weights. However, convolutional neural networks make the explicit assumption that the inputs are images. This fact allows to encode certain properties into the architecture. These then make the forward function more efficient to implement and vastly reduce the amount of parameters in the network. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 24 / 47

Convolutional neural networks Architecture overview Regular neural networks don t scale well to full images. To see this, let s have a look at a subset of the CIFAR-10 dataset: Figure 3: A subset of the CIFAR-10 dataset. (https://www.cs.toronto.edu/~kriz/cifar.html) This dataset is composed of images of size 32x32x3 (width x height x color channels), categorized to what they contain. Note even though these are 2D images, due to the color channels, we are dealing with volumes. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 25 / 47

Convolutional neural networks Architecture overview (2) If we have a perceptron connect to all the inputs, this would imply 32 32 3 = 3072 weights. This number quickly runs out of hand when scaling to larger images, and even more so when considering we want to have a network of neurons, as opposed to just one single perceptron. Hence, convolutional neural networks (CNNs) are forward networks that use three different kinds of layers to build up its architecture: Convolutional layers, pooling layers and fully-connected layers. To describe these layers we will refer to their connectivity and three dimensions: width, height and depth. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 26 / 47

Convolutional neural networks Convolutional layers A convolutional layer consists of a set of learnable filters. Every filter is spatially small in width and height, but extends through the full depth of the input volume. During the forward pass, we slide each filter across a local region of the width and height of the input volume, and compute dot products between the entries of the filter and the input at any position, as in eq. 11. As we slide the filter over the width and height of the input volume, we will produce a 2-dimensional activation map that gives the responses of that filter at every spatial position. For instance, given an input volume of sizes 32x32x3 (like CIFAR-10), we could have a filter of size 5x5 and slide it across the input. This would result in a total of 5 5 3 = 75 input weights. Here is an example (http://cs231n.github.io/assets/conv-demo/index.html). Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 27 / 47

Convolutional neural networks Convolutional layers (2) Intuitively, the network will learn filters that activate when they see some type of visual feature, like an edge or a blotch of some color on the first layer, or eventually honeycomb or wheel-like patterns on higher layers of the network. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 28 / 47

Convolutional neural networks Convolutional layers spatial arrangement Three parameters control the size of the output volume: Depth Number of filters we would like to use, each searching for a different characteristic. Stride Determines how the filter is slid across the input. It is uncommon to slide more than three pixels at a time. Zero-padding Determines the number of pixels filled with zeroes around the border. Sometimes, this is useful to allow certain strides. We can compute the spatial size of the output volume as a function of the input volume size W, the filter size F, stride S and padding P as: W F + 2P S + 1 (12) Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 29 / 47

Convolutional neural networks Convolutional layers parameter sharing Following the assumption that if one feature is useful to compute at some spatial position (x 0, y 0 ), then it should also be useful to compute at a different position (x 1, y 1 ), it is possible to reduce dramatically the number of parameters (weights) in a CNN. Figure 4: Example filters learnt by Krizhevsky et al. [6], who won the ImageNet challenge (http://www.image-net.org/challenges/lsvrc) in 2012. The first convolutional layer had a size of 55 55 96 = 290 400 neurons with 11 11 3+1 = 364 inputs each. This makes for 105 705 600 weights, only feasible due to parameter sharing. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 30 / 47

Convolutional neural networks Pooling layers The function of a pooling layer, also known as a subsampling layer, is to progressively reduce the spatial size of the representation, to reduce the amount of parameters and computation in the network. The pooling is typically done using the average or maximum function, applied to the subset in consideration. Figure 5: Pooling using the maximum function, with 2x2 filters and stride 2. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 31 / 47

Convolutional neural networks Other types of layers Normalization layers Many types of normalization layers have been proposed, for instance with the intention of implementing inhibition schemes observed in the biological brain. However, these layers have since fallen out of favor because in practice their contribution has been shown to be minimal. Fully-Connected layers Neurons in a fully-connected layer have full connections to all activations in the previous layer, as seen in regular neural networks. They are used in the output layers of CNNs. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 32 / 47

Convolutional neural networks Example: Classifying birds Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 33 / 47

Convolutional neural networks Well, let s actually do it For this example, we will use tensorflow (https://www.tensorflow.org/), an open source library for numerical computation using data flow graphs. It s quite powerful, and has us covered for what we need. Our input will be the CIFAR-10 data set, combined with the Caltech-UCSD Birds-200-2011 data set. This way we have 18 000 pictures of birds, and 52 000 pictures of not birds. Figure 6: Birds come in many colors and shapes. Notice some pictures are actually cropped, or not oriented the same way, and color conditions are very different. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 34 / 47

Convolutional neural networks Our convolutional neural network Our convolutional neural network will look like this, Input is 32x32x3 Convolution layer of size 32x32x3 Max pooling with stride 2 Convolution layer of size 64x64x3 Convolution layer of size 64x64x3 Max pooling with stride 2 Fully-Connected network of 512 neurons Dropout some data randomly to prevent overfitting Fully-Connected network with two outputs (0: no-bird, 1: bird) Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 35 / 47

Convolutional neural networks Prototyping in tensorflow # I n p u t i s a 32 x32 image with 3 c o l o r c h a n n e l s network = i n p u t d a t a ( shape =[None, 32, 32, 3 ], d a t a p r e p r o c e s s i n g=img prep, d a t a a u g m e n t a t i o n=img aug ) # Network l a y e r s network = conv 2d ( network, 32, 3, a c t i v a t i o n= r e l u ) network = max pool 2d ( network, 2) network = conv 2d ( network, 64, 3, a c t i v a t i o n= r e l u ) network = conv 2d ( network, 64, 3, a c t i v a t i o n= r e l u ) network = max pool 2d ( network, 2) network = f u l l y c o n n e c t e d ( network, 512, a c t i v a t i o n= r e l u ) network = dropout ( network, 0. 5 ) network = f u l l y c o n n e c t e d ( network, 2, a c t i v a t i o n= softmax ) Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 36 / 47

Convolutional neural networks Prototyping in tensorflow (2) network = r e g r e s s i o n ( network, o p t i m i z e r= adam, l o s s= c a t e g o r i c a l c r o s s e n t r o p y, l e a r n i n g r a t e =0.001) model = t f l e a r n.dnn( network, c h e c k p o i n t p a t h= b i r d c l a s s i f i e r. t f l. ckpt ) # T r a i n the network f o r 100 epochs model. f i t (X, Y, n epoch =100, s h u f f l e=true, v a l i d a t i o n s e t =( X t e s t, Y t e s t ), s h o w m e t r i c=true, b a t c h s i z e =96, s n a p s h o t e p o c h=true, r u n i d= b i r d c l a s s i f i e r ) Source code (https://gist.github.com/ageitgey/ a2a03154bda7f6f86da4e434bf52bebf) Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 37 / 47

Convolutional neural networks Checking the efficiency of our network Finally, let s check how we did. Depending on how we separated our data set into a training set and a validation set, results may vary. Running the above code reports a 95% efficiency in recognizing whether we found a bird or not. We can also have a look at the precision and recall: precision = recall = true positives all positive guesses true positives total birds in dataset We obtain a 97.11% of precision, and a 90.83% recall. Not so bad! Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 38 / 47

Where to go from here Section 4 Where to go from here Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 39 / 47

Where to go from here Paradigms Now that you have strong foundations of neural networks, if you want to keep digging there is a lot more you can explore. Some of the more interesting paradigms out there are: Recurrent neural networks [7] Deep belief networks [8] You can also find a plethora of well-documented applications. The following are a few of the most recent results neural networks have produced in the last years. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 40 / 47

Where to go from here Speech recognition (a) CNN for speech recognition. (b) Recurrent Neural Network. Speech recognition problems have successfully been solved with recurrent neural networks [9], [10]. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 41 / 47

Where to go from here Self driving cars Here are some materials on self driving vehicles [11] as well as traffic signal recognition [12]. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 42 / 47

Where to go from here HEP related ATLAS Higgs ML challenge (http://opendata.cern.ch/collection/ ATLAS-Higgs-Challenge-2014) was held in 2014. The winner, G. Melis, used a deep neural network to achieve his result. Find his presentation here (https://indico.cern.ch/e/382895). Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 43 / 47

Bibliography Section 5 Bibliography Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 44 / 47

Bibliography Resources I [1] Ng, A. (2016). CS229 Machine Learning Autumn 2016. http://cs229.stanford.edu/ [2] CS231n: Convolutional Neural Networks for Visual Recognition. http://cs231n.stanford.edu/ [3] Ruiz Reina J. L., Gutiérrez-Naranjo M. A. and Valencia Cabrera L. (2016). Introducción a las redes neuronales. http://www.cs.us.es/cursos/siis/contents.html [4] Vilasís-Cardona, X. (2016). Curs de Doctorat de Xarxes Neuronals. Enginyeria i Arquitectura La Salle - Universitat Ramon Llull. [5] Cybenko G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems (MCSS) 2 (4), 303-314. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 45 / 47

Bibliography Resources II [6] Krizhevsky A., Sutskever I. and Hinton G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. Part of: Advances in Neural Information Processing Systems 25 (NIPS). [7] Olshausen, Bruno A. and David J. Field. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381.6583: 607-609. [8] Hinton G. E. (2009). Deep belief networks. Scholarpedia, 4(5):5947. http://www.scholarpedia.org/article/deep_belief_networks [9] Hinton G. E., Deng L., Yu D., Dahl G., Mohamed A., Jaitly N., Senior A., Vanhoucke V., Nguyen P., Kingsbury B. and Sainath T. (2012). Deep Neural Networks for Acoustic Modeling in Speech Recognition. IEEE Signal Processing Magazine, vol. 29, pages 82-97. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 46 / 47

Bibliography Resources III [10] Graves A. and Jaitly N. (2014). Towards End-to-End Speech Recognition with Recurrent Neural Network. Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP volume 32. [11] Bojarski M. et al. (2016). End to End Learning for Self-Driving Cars, arxiv:1604.07316v1. [12] Sermanet P. and LeCun Y. (2011). Traffic Sign Recognition with Multi-Scale Convolutional Networks, DOI: 10.1109/IJCNN.2011.6033589. Daniel Hugo Cámpora Pérez Let your machine do the learning icsc 2017 47 / 47