Master Recherche IAC TC2: Apprentissage Statistique & Optimisation

Similar documents
Master Recherche IAC Apprentissage Statistique, Optimisation & Applications

Master Recherche IAC Option 2 Apprentissage Statistique & Optimisation

Deep Learning. Alexandre Allauzen, Michèle Sebag, Yann Ollivier CNRS & Université Paris-Sud

Artificial Neural Networks

Feedforward Neural Nets and Backpropagation

Introduction to Neural Networks

Introduction to Artificial Neural Networks

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Data Mining Part 5. Prediction

Artificial Neural Networks The Introduction

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

CS:4420 Artificial Intelligence

EEE 241: Linear Systems

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Unit 8: Introduction to neural networks. Perceptrons

Neural networks. Chapter 20, Section 5 1

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Neural Networks: Introduction

Artificial Neural Network

Part 8: Neural Networks

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Introduction to Neural Networks

Simple Neural Nets For Pattern Classification

Neural networks. Chapter 19, Sections 1 5 1

Artificial Intelligence

Neural networks. Chapter 20. Chapter 20 1

Learning and Memory in Neural Networks

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Lecture 7 Artificial neural networks: Supervised learning

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Artifical Neural Networks

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Artificial Neural Networks. Part 2

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Machine Learning. Neural Networks

2- AUTOASSOCIATIVE NET - The feedforward autoassociative net considered in this section is a special case of the heteroassociative net.

Deep Learning: a gentle introduction

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Lecture 4: Feed Forward Neural Networks

Artificial Neural Networks. MGS Lecture 2

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

CS 4700: Foundations of Artificial Intelligence

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir

Artificial Neural Networks. Historical description

Computational Intelligence Winter Term 2009/10

Neural Networks and the Back-propagation Algorithm

Multilayer Perceptron

SGD and Deep Learning

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

Artificial Neural Networks Examination, June 2005

Neural Networks Lecture 4: Radial Bases Function Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

INTRODUCTION TO ARTIFICIAL INTELLIGENCE

Computational Intelligence

CMSC 421: Neural Computation. Applications of Neural Networks

2018 EE448, Big Data Mining, Lecture 5. (Part II) Weinan Zhang Shanghai Jiao Tong University

Unit III. A Survey of Neural Network Model

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Intelligent Systems Discriminative Learning, Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Chapter ML:VI (continued)

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Lab 5: 16 th April Exercises on Neural Networks

Ch.8 Neural Networks

Artificial Neural Networks Examination, March 2004

CS 4700: Foundations of Artificial Intelligence

Chapter ML:VI. VI. Neural Networks. Perceptron Learning Gradient Descent Multilayer Perceptron Radial Basis Functions

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Computational Intelligence

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Neural Networks and Deep Learning

Introduction To Artificial Neural Networks

Chapter ML:VI (continued)

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

CSC321 Lecture 5: Multilayer Perceptrons

Artificial neural networks

Neural Networks. Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994

Artificial Neural Networks Examination, March 2002

Course 395: Machine Learning - Lectures

Introduction Biologically Motivated Crude Model Backpropagation

Neural Networks Introduction

Neural Networks (Part 1) Goals for the lecture

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Introduction to Convolutional Neural Networks (CNNs)

Artificial Neural Network and Fuzzy Logic

Revision: Neural Network

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

RegML 2018 Class 8 Deep learning

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

COMP-4360 Machine Learning Neural Networks

Artificial Neural Networks Examination, June 2004

Grundlagen der Künstlichen Intelligenz

Transcription:

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation Alexandre Allauzen Anne Auger Michèle Sebag LIMSI LRI Oct. 4th, 2012

This course Bio-inspired algorithms Classical Neural Nets History Structure

Bio-inspired algorithms Facts 10 11 neurons 10 4 connexions per neuron Firing time: 10 3 second 10 10 computers

Bio-inspired algorithms, 2 Human beings are the best! How do we do? What matters is not the number of neurons as one could think in the 80s, 90s... Massive parallelism? Innate skills? = anything we can t yet explain Is is the training process?

Synaptic plasticity Hebb 1949 Conjecture When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A s efficiency, as one of the cells firing B, is increased. Learning rule Cells that fire together, wire together If two neurons are simultaneously excitated, their connexion weight increases. Remark: unsupervised learning.

This course Bio-inspired algorithms Classical Neural Nets History Structure

History of artificial neural nets (ANN) 1. Non supervised NNs and logical neurons 2. Supervised NNs: Perceptron and Adaline algorithms 3. The NN winter: theoretical limitations 4. Multi-layer perceptrons.

History

Thresholded neurons Mc Culloch et Pitt 1943 Ingredients Input (dendrites) x i Weights w i Threshold θ Output: 1 iff i w ix i > θ Remarks Neurons Logics Reasoning Intelligence Logical NNs: can represent any boolean function No differentiability.

Perceptron Rosenblatt 1958 y = sign( w i x i θ) x = (x 1,..., x d ) (x 1,..., x d, 1). w = (w 1,..., w d ) (w 1,... w d, θ) y = sign( w, x )

Learning a Perceptron Given E = {(x i, y i ), x i IR d, y i {1, 1}, i = 1... n} For i = 1... n, do If no mistake, do nothing If mistake Enforcing algorithmic stability: no mistake w, x same sign as y y w, x > 0 w w + y.x i w t+1 w t + α t y.x l α t decreases to 0 faster than 1/t.

Convergence: upper bounding the number of mistakes Assumptions: x i belongs to B(IR d, C) E is separable, i.e. exists solution w s.t. i = 1... n, y i w, x i > δ > 0 x i < C

Convergence: upper bounding the number of mistakes Assumptions: x i belongs to B(IR d, C) E is separable, i.e. exists solution w s.t. i = 1... n, y i w, x i > δ > 0 with w = 1. x i < C

Convergence: upper bounding the number of mistakes Assumptions: x i belongs to B(IR d, C) E is separable, i.e. exists solution w s.t. i = 1... n, y i w, x i > δ > 0 with w = 1. x i < C Then The perceptron makes at most ( C δ )2 mistakes.

Bouding the number of misclassifications Proof Upon the k-th misclassification In the meanwhile: w k+1 = w k + y i x i w k+1, w = w k, w + y i x i, w w k, w + δ w k 1, w + 2δ kδ w k+1 2 = w k + y i x i 2 w k 2 + C 2 kc 2 for some x i Therefore: kc > kδ

Going farther... Remark: Linear programming: Find w, δ such that Max δ, subject to i = 1... n, y i w, x i > δ gives the floor to Support Vector Machines...

Adaline Widrow 1960 Adaptive Linear Element Given E = {(x i, y i ), x i IR d, y i IR, i = 1... n} Learning Minimization of a quadratic function Gradient algorithm w = argmin{err(w) = (y i w, x i ) 2 } w i = w i 1 + α i Err(w i )

The NN winter Limitation of linear hypotheses Minsky Papert 1969 The XOR problem.

Multi-Layer Perceptrons, Rumelhart McClelland 1986 Issues Several layers, non linear separation, addresses the XOR problem A differentiable activation function ouput(x) = 1 1 + exp{ w, x }

The sigmoid function σ(t) = 1 1+exp( a.t), a > 0 approximates step function (binary decision) linear close to 0 Strong increase close to 0 σ (x) = aσ(x)(1 σ(x))

Back-propagation algorithm, Rumelhart McClelland 1986; Le Cun 1986 Intuition Given (x, y) a training sample uniformly randomly drawn Set the d entries of the network to x 1... x d Compute iteratively the output of each neuron until final layer: output ŷ; Compare ŷ and y Err(w) = (ŷ y) 2 Modify the NN weights on the last layer based on the gradient value Looking at the previous layer: we know what we would have liked to have as output; infer what we would have liked to have as input, i.e. as output on the previous layer. And back-propagate... Errors on each i-th layer are used to modify the weights used to compute the output of i-th layer from input of i-th

Back-propagation of the gradient Notations Input x = (x 1,... x d ) From input to the first hidden layer = w jk x k j = f (z (1) j ) From layer i to layer i + 1 = w (i) jk x (i) k z (1) j x (1) z (i+1) j x (i+1) j = f (z (i+1) j ) (f : e.g. sigmoid)

Back-propagation of the gradient Input(x, y), x IR d, y { 1, 1} Phase 1 Propagate information forward For layer i = 1... l For every neuron j on layer i z (i) j = k w (i) j,k x (i 1) k x (i) j Phase 2 Compare = f (z (i) j ) Error: difference between ŷ j = x (l) j and y j. Modify ŷ j by e sortie j = h t (zl j )[ŷ j y j ]

Back-propagation of the gradient Phase 3 retro-propagate the errors e (i 1) j = h t (z(i 1) j ) k w (i) kj e (i) k Phase 4: Update weights on all layers w (k) ij where α is the learning rate (< 1.) = αe (k) i x (k 1) j

This course Bio-inspired algorithms Classical Neural Nets History Structure

Neural nets Ingredients Activation function Connexion topology = directed graph feedforward ( DAG, directed acyclic graph) or recurrent A (scalar, real-valued) weight on each connexion Activation(z) thresholded linear 0 si z < seuil, 1 sinon sigmoid 1/(1 + e z ) Radius-based e z2 /σ 2 z

Neural nets Ingredients Activation function Connexion topology = directed graph feedforward ( DAG, directed acyclic graph) or recurrent A (scalar, real-valued) weight on each connexion Feedforward NN

Neural nets Ingredients Activation function Connexion topology = directed graph feedforward ( DAG, directed acyclic graph) or recurrent A (scalar, real-valued) weight on each connexion Recurrent NN Propagate until stabilisation Back-propagation does not apply Memory of the recurrent NN: value of hidden neurons Beware that memory fades exponentially fast Dynamic data (audio, video)

Structure / Connexion graph / Topology Prior knowledge Invariance under translation, rotation,.. op Complete E consider (op(x i ), y i ) or use weight sharing: convolutionnal networks 100,000 weights 2,600 parameters Details http://yann.lecun.com/exdb/lenet/ http://deeplearning.net/tutorial/lenet.html Demos

Hubel & Wiesel 1968 Visual cortex of the cat cells arranged in such a way that... each cell observes a fraction of the visual field receptive field the union of which covers the whole field Characteristics Simple cells check the presence of a pattern More complex cells consider a larger receptive field, detect the presence of a pattern up to translation/rotation

Sparse connectivity Reducing the number of weights Layer m: detect local patterns Layer m + 1: non linear aggregation, more global field

Convolutional NN: shared weights Reducing the number of weights through adapting the gradient-based update: the update is averaged over all occurrences of the weight.

Max pooling: reduction et invariance Partitioning Return the max value in the subset invariance Global scheme

Properties Good news MLP, RBF: universal approximators For every (decent) function f (= f 2 has a finite integral on every compact of IR d ) for every ɛ > 0, there exists some MLP/RBF g such that f g < ɛ. Bad news Not a constructive proof (the solution exists, and then?) Everything is possible; no guarantee (overfitting).

Key issues Model selection Selecting number of neurons, connexion graph Which learning criterion More Better overfitting Algorithmic choices a difficult optimization problem Initialisation w small! Decrease the learning rate with time Enforce stability through relaxation Stopping criterion w neo (1 α)w old + αw neo Start by normalization of data x x average variance

The curse of NNs http://videolectures.net/eml07 lecun wia/

Pointers URL course: http://neuron.tuke.sk/math.chtf.stuba.sk/pub/ vlado/nn_books_texts/krose_smagt_neuro-intro.pdf FAQ: http://www.faqs.org/faqs/ai-faq/neural-nets/ part1/preamble.html applets http://www.lri.fr/~marc/eeaax/neurones/tutorial/ codes: PDP++/Emergent (www.cnbc.cmu.edu/pdp++/); SNNS http: //www-ra.informatik.uni-tuebingen.de/sgnns/... Also see NEAT & HyperNEAT When no examples available: e.g. robotics. Stanley, U. Texas