Deep Learning: a gentle introduction

Similar documents
Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Jakub Hajic Artificial Intelligence Seminar I

Lecture 17: Neural Networks and Deep Learning

SGD and Deep Learning

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

10. Artificial Neural Networks

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Neural networks and optimization

Neural Networks (Part 1) Goals for the lecture

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Neural Networks and Deep Learning

Neural networks. Chapter 20. Chapter 20 1

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Introduction to Deep Learning

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Course 395: Machine Learning - Lectures

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Neural networks. Chapter 19, Sections 1 5 1

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Feature Design. Feature Design. Feature Design. & Deep Learning

Artificial Neural Networks. Historical description

Neural Networks. Learning and Computer Vision Prof. Olga Veksler CS9840. Lecture 10

Artificial Neural Networks

Grundlagen der Künstlichen Intelligenz

Deep Learning Autoencoder Models

Cheng Soon Ong & Christian Walder. Canberra February June 2018

CSC242: Intro to AI. Lecture 21

Introduction to Neural Networks

From perceptrons to word embeddings. Simon Šuster University of Groningen

Deep Learning Architectures and Algorithms

How to do backpropagation in a brain

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Introduction to Deep Neural Networks

Neural Networks. Intro to AI Bert Huang Virginia Tech

Reading Group on Deep Learning Session 2

Neural Networks and Deep Learning.

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

CSE446: Neural Networks Winter 2015

Reading Group on Deep Learning Session 1

Links between Perceptrons, MLPs and SVMs

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks and the Back-propagation Algorithm

Neural Networks biological neuron artificial neuron 1

Machine Learning (CSE 446): Neural Networks

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation

Artificial neural networks

Machine Learning. Boris

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

Machine Learning Lecture 10

Artificial Neural Networks. MGS Lecture 2

Intelligent Systems Discriminative Learning, Neural Networks

CSC321 Lecture 5: Multilayer Perceptrons

UNSUPERVISED LEARNING

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Supervised Learning. George Konidaris

Neural networks. Chapter 20, Section 5 1

Neural Networks: A Very Brief Tutorial

RegML 2018 Class 8 Deep learning

Neural Networks: Backpropagation

Feedforward Neural Nets and Backpropagation

Demystifying deep learning. Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK

Introduction to Convolutional Neural Networks (CNNs)

Lecture 5: Logistic Regression. Neural Networks

Multilayer Neural Networks

Data Mining Part 5. Prediction

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

2018 EE448, Big Data Mining, Lecture 5. (Part II) Weinan Zhang Shanghai Jiao Tong University

y(x n, w) t n 2. (1)

Multilayer Neural Networks

Lecture 12. Talk Announcement. Neural Networks. This Lecture: Advanced Machine Learning. Recap: Generalized Linear Discriminants

Introduction to Deep Learning

Machine Learning. Neural Networks

Revision: Neural Network

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Introduction to (Convolutional) Neural Networks

Deep Learning (CNNs)

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Deep Learning & Neural Networks Lecture 2

Artificial Neural Networks

Statistical NLP for the Web

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

Machine Learning

Pattern Classification

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

AI Programming CS F-20 Neural Networks

Deep Learning for NLP Part 2

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Transcription:

Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 /

Why a talk about deep learning? Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 2 /

Why a talk about deep learning? Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 3 /

Why a talk about deep learning? Convolutional Networks (Yann Le Cun): challenge ILSVRC : 000 categories and.46.406 images. Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 4 /

Neuron s basic anatomy Figure: A neuron s basic anatomy consists of four parts: a soma (cell body), dendrites, an axon, and nerve terminals. Information is received by dendrites, gets collected in the cell body, and flows down the axon. Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 5 /

Artificial neuron x w x2 w2 x3 w3 n i= wixi + b Axon Pre-activation Activation Cell body wd b xn Dendrites xn+ = Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 6 /

Perceptron Rosenblatt 957 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 7 /

Perceptron learning Input: a sample S = {(x (), y ),, (x (n), y n )} Initialize a parameter t to 0 Initialize the weights w i with random values. Repeat Pick an example x (k) = [x (k),, x(k) d ]T from S Let y (k) be the target value and ỹ (k) the computed value using the current perceptron If ỹ (k) y (k) then Update the weights: w i (t + ) = w i (t) + w i (t) where w i (t) = (y (k) ỹ (k) )x (k) i End If t = t + Until all the examples in S were visited and no change occurs in the weights Output: a perceptron for linear discrimination of S Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 8 /

Perceptron learning Example: learning the OR function Initialization: w (0) = w 2 (0) =, w 3 (0) = t w (t) w 2 (t) w 3 (t) x (k) wi x k i ỹ (k) y (k) w (t) w 2 (t) w 3 (t) 0-00 - 0 0 0 0 0-0 0 0 0 2 2 0 0 0 0 0 3 2 0 3 0 0 0 4 2 0 00 0 0 0 0 0 0 5 2 0 0 2 0 0 0 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 9 /

Perceptron illustration x () = [0, 0] T t = 0 0 w = 0 w 2 = w 3 = if a > 0 a = ỹ = 0 i wixi 0 elsewhere y = 0 w i = w i + (y ỹ)x i w = + 0 0 w 2 = + 0 0 w 3 = + 0 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 0 /

Perceptron illustration x (2) = [0, ] T t = 0 w = w 2 = w 3 = if a > 0 a = 0 ỹ = 0 i wixi 0 elsewhere y = w i = w i + (y ỹ)x i w = + 0 = w 2 = + = 2 w 3 = + = 0 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 /

Perceptron illustration x (3) = [, 0] T t = 2 w = 0 w 2 = 2 w 3 = 0 if a > 0 a = ỹ = i wixi 0 elsewhere y = w i = w i + (y ỹ)x i w = + 0 0 = w 2 = 2 + 0 = 2 w 3 = 0 + 0 = 0 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 2 /

Perceptron illustration x (4) = [, ] T t = 3 w = w 2 = 2 w 3 = 0 if a > 0 a = 3 ỹ = i wixi 0 elsewhere y = w i = w i + (y ỹ)x i w = + 0 0 = w 2 = 2 + 0 = 2 w 3 = 0 + 0 = 0 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 3 /

Perceptron illustration x () = [0, 0] T t = 4 0 w = 0 w 2 = 2 w 3 = 0 if a > 0 a = 0 ỹ = 0 i wixi 0 elsewhere y = 0 w i = w i + (y ỹ)x i w = + 0 0 = w 2 = 2 + 0 = 2 w 3 = 0 + 0 = 0 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 4 /

Perceptron illustration x (2) = [0, ] T t = 5 0 w = w 2 = 2 w 3 = 0 if a > 0 a = 2 ỹ = i wixi 0 elsewhere y = w i = w i + (y ỹ)x i w = + 0 0 = w 2 = 2 + 0 = 2 w 3 = 0 + 0 = 0 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 5 /

Perceptron capacity 0 0 0 0 OR(x, x 2 ) AND(x, x 2 ) Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 6 /

Perceptron autumn 0 0 XOR(x, x 2 ) Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 7 /

Link with Logistic regression x w x2 w2 a(x) = w3 d i= wixi + b x3 h(x) = +e a(x) Stochastic gradient update rule: wd wj = wj + λ(y (i) h(x (i) ))x (i) j The same as the perceptron update rule! b xd xd+ = Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 8 /

But! XOR(x, x 2 ) AND(x, x2) 0 0 AND( x, x 2 ) Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 9 /

Multilayer Perceptron b Paul Werbos, 84. David Rumelhart, 86 x x2 o x3 om Output layer xd Input Layer Hidden layers Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 20 /

Layer Perceptron b x w w2 h w 2 b 2 w2 w3 x2 w3 w22 h2 w32 w 2 2 w23 x3 w33 h3 w 2 3 w2d wn2 wn3 Output Layer f(x) = o(b 2 + d i= w2 hi(x)) wn w3d wd xn w 2 d Input Layer wnd hd Hidden Layer hj(x) = g(b + n i= wijxi) Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 2 /

MLP training: backpropagation XOR example x i w + ixi b a h h(x) = +e a(x) w w 2 w 2 k w2 k hk + ao b2 ho(x) = +e a(x) ỹ x2 w 2 2 i w i2 x2 + b 2 a2 h2 h(x) = +e a(x) w 2 2 w 2 22 b b 2 b 2 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 22 /

MLP training: backpropagation XOR example: initialisation x w = i w + ixi b a h h(x) = +e a(x) x2 w 2 = w 2 = w 22 = b = b 2 = i w i2 x2 + b 2 a2 h2 h(x) = +e a(x) w 2 = w 2 2 = b 2 = k w2 k hk + ao b2 ho(x) = +e a(x) ỹ Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 23 /

MLP training: backpropagation XOR example: Feed-forward x 0 w = a = i w + ixi b h(x) = +e a (x) 0.27 x2 0 w 2 = w 2 = w 22 = b = b 2 = a2 = i w i2 x2 + b 2 h2(x) = +e a 2 (x) 0.27 w 2 = w 2 2 = b 2 = k 0.46 ao = ỹ = 0.39 w2 k hk + ho(x) = +e ao(x) y = 0 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 24 /

MLP training: backpropagation XOR example: Backpropagation w 2 k = w2 k + ηδk δk = (y ỹ)( ỹ)ỹhk x 0 w = a = i w + ixi b h(x) = +e a (x) 0.27 = Ew ỹ ỹ ao ao w 2 k x2 0 w 2 = w 2 = w 22 = b = b 2 = a2 = i w i2 x2 + b 2 h2(x) = +e a 2 (x) 0.27 w 2 = w 2 2 = b 2 = k 0.46 ao = ỹ = 0.39 w2 k hk + ho(x) = +e ao(x) y = 0 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 25 /

MLP training: backpropagation XOR example: Backpropagation w ij = w ij ηδij δij = Ew wij = Ew ho ho ao δij = (y ỹ)( ỹ)ỹwj 2 hj( hj)xi ao hj hj aj aj wij w 2 k = w2 k ηδk δk = Ew, w 2 Ew = 2 (y ỹ)2 k x 0 w = a = i w + ixi b h(x) = +e a (x) 0.27 = Ew ỹ ỹ ao ao w 2 k δk = (y ỹ)( ỹ)ỹhk x2 w 2 = w 2 = 0 w22 = b =.98 b 2 =.98 a2 = i w i2 x2 + b 2 h2(x) = +e a 2 (x) 0.27 w = + (0.39)(.39).39.27 (.27) 0 w22 = + (0.39)(.39).39.27 (.27) 0. b = + (0.39)(.39).39.27 (.27) k w 2 = 0.98 0.46 ao = ỹ = 0.39 w2 k hk + ho(x) = +e ao(x) w 2 2 = 0.98 b 2 =.07 w 2 = w 2 + (0.39)(.39).3 0.27 w 2 2 = w 2 2 + (0.39)(.39).3 0.27 b 2 = b 2 + (0.39)(.39).3 y = 0 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 26 /

MLP training: backpropagation XOR example: Feed-forward x 0 w = a = i w + ixi b.02 h(x) = +e a (x) 0.5 x2 w 2 = w 2 = w 22 = b =.98 b 2 =.98 a2 =.02 i w i2 x2 + b 2 h2(x) = +e a 2 (x) 0.5 k w 2 = 0.98 0.09 ao = ỹ 0.5 w2 k hk + ho(x) = +e ao(x) w 2 2 = 0.98 b 2 =.07 y = Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 27 /

Multilayer Perceptron as a deep NN MLP with more than 2/3 layers is a deep network b x x2 o x3 om Output layer xd Input Layer Hidden layers Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 28 /

Training issues Setting the hyperparameters Initialization Number of iterations Learning rate η Activation function Early stopping criterion Overfitting/Underfitting Number of hidden layers Number of neurons Optimization Vanishing gradient problem Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 29 /

Overfitting/Underfitting Source: Bishop, PRML Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 30 /

Overfitting/Underfitting Source: Bishop, PRML Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 3 /

What s new! Before 2006, training deep architectures was unsuccessful! Underfitting New optimization techniques Overfitting Bengio, Hinton, LeCun Unsupervised pre-training Stochastic dropout" training Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 32 /

Unsupervised pre-training Main idea Initialize the network in an unsupervised way. Consequences Network layers encode successive invariant features (latent structure of the data) Better optimization and hence better generalization Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 33 /

Unsupervised pre-training How to? Proceed greedily, layer by layer. x x x x2 x2 x2 x3 x3 x3 xd xd xd Input Layer = Input Layer = Input Layer Unsupervised NN learning techniques Restricted Boltzmann Machines (RBM) Auto-encoders and many many variants since 2006 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 34 /

Unsupervised pre-training Autoencoder Credit: Hugo Larochelle Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 35 /

Unsupervised pre-training Sparse Autoencoders Credit: Y. Bengio Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 36 /

Fine tuning How to? Add the output layer Initialize its weights with random values initialize the hidden layers with the values from the pre-training Update the layers by Backpropagation x x2 x3 xd Input Layer Output Layer Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 37 /

Drop out Intuition Regularize the network by dropping out stochastically some hidden units. Procedure Assign to each hidden unit a value 0 with probability p (common choice:.5) x x2 x3 xd Input Layer Output Layer Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 38 /

Some applications Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 39 /

Computer vision Convolutional Neural Networks Lecun, 89 State of the art in digit recognition Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 40 /

Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 4 /

Modern CNN A. Krizhevsky, I. Sutskever, and G. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS 202 7 hidden layers, 650,000 units, 60,000,000 parameters Drop out 0 6 images GPU implementation Activation function: f(x) = max(0, x) (ReLu) Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 42 /

Computer vision Convolutional Neural Networks LeCun, Hinton, Bengio, Nature 205 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 43 /

What is deep in deep learning! Old vs New ML paradigms Decision Decision Classifier Hand-crafted features Classifier Learned features Representation learning Raw data Raw data Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 44 /

Softwares TensorFlow (google), https://www.tensorflow.org/ Theano (Python CPU/GPU) mathematical and deep learning library http ://deeplearning.net/software/theano. Can do automatic, symbolic differentiation. Senna: NLP by Collobert et al. http ://ronan.collobert.com/senna/ State-of-the-art performance on many tasks 3500 lines of C, extremely fast and using very little memory. Torch ML Library (C+Lua) : http ://www.torch.ch/ Recurrent Neural Network Language Model http ://www.fit.vutbr.cz/ imikolov/rnnlm Recursive Neural Net and RAE models for paraphrase detection, sentiment analysis, relation classiffcation www.socher.org Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 45 /