Artificial neural networks

Similar documents
Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 20. Chapter 20 1

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Grundlagen der Künstlichen Intelligenz

Neural networks. Chapter 20, Section 5 1

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

CHALMERS, GÖTEBORGS UNIVERSITET. EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

CS:4420 Artificial Intelligence

Introduction to Neural Networks

Artificial Neural Networks

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Deep Learning: a gentle introduction

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Machine Learning for Physicists Lecture 1

Feature Design. Feature Design. Feature Design. & Deep Learning

PV021: Neural networks. Tomáš Brázdil

Course 395: Machine Learning - Lectures

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Lecture 7 Artificial neural networks: Supervised learning

Artificial neural networks

SGD and Deep Learning

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Lecture 4: Feed Forward Neural Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Introduction Biologically Motivated Crude Model Backpropagation

Artificial Neural Networks. Historical description

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Jakub Hajic Artificial Intelligence Seminar I

Simple Neural Nets For Pattern Classification

Neural Networks: Introduction

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Machine Learning. Neural Networks

Part 8: Neural Networks

Data Mining Part 5. Prediction

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Artificial Neural Networks Examination, June 2005

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

CS 4700: Foundations of Artificial Intelligence

EEE 241: Linear Systems

Lecture 4: Perceptrons and Multilayer Perceptrons

Introduction to Deep Learning

Classification with Perceptrons. Reading:

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Neural networks and optimization

Introduction To Artificial Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Sections 18.6 and 18.7 Artificial Neural Networks

Learning and Memory in Neural Networks

CS 4700: Foundations of Artificial Intelligence

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Artificial Neural Network and Fuzzy Logic

Artificial Neural Networks Examination, March 2004

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Artificial Neural Networks Examination, June 2004

Machine Learning

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, Sasidharan Sreedharan

Introduction to Neural Networks

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

2018 EE448, Big Data Mining, Lecture 5. (Part II) Weinan Zhang Shanghai Jiao Tong University

Machine Learning Lecture 10

Artificial Neural Networks The Introduction

Artificial Neural Network

Administrative Issues. CS5242 mirror site:

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Based on the original slides of Hung-yi Lee

Introduction to Deep Learning

Neural Networks. Xiaojin Zhu Computer Sciences Department University of Wisconsin, Madison. slide 1

Neural Networks and Deep Learning.

CE213 Artificial Intelligence Lecture 14

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Introduction to Deep Learning

Lecture 6. Regression

CSE446: Neural Networks Winter 2015

Introduction to Neural Networks

COMP-4360 Machine Learning Neural Networks

Deep Neural Networks (1) Hidden layers; Back-propagation

Master Recherche IAC Apprentissage Statistique, Optimisation & Applications

AI Programming CS F-20 Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks

Artificial Neural Network

CMSC 421: Neural Computation. Applications of Neural Networks

Learning Deep Architectures for AI. Part I - Vijay Chakilam

Deep Learning (CNNs)

Introduction to Artificial Neural Networks

10. Artificial Neural Networks

Using a Hopfield Network: A Nuts and Bolts Approach

Machine Learning Lecture 12

Classification of Hand-Written Digits Using Scattering Convolutional Network

Transcription:

Artificial neural networks B. Mehlig, Department of Physics, University of Gothenburg, Sweden FFR35/FIM70 Artificial Neural Networks Chalmers/Gothenburg University, 7.5 credits

3

Course home page Teachers Forum Bernhard Mehlig, Johan Fries, Marina Rafajlovic Online forum at Piazza. Ask questions on the forum (instead of emailing) - so that other students can benefit. Course representatives TBA Schedule TimeEdit Literature Lecture notes, will be continuously updated. Examination: Homework problems: OpenTA. Written exam. You must be registered for the course to take the exam. Changes since course was last given Rules for passing grade. Format for submission of homework problems. 4

Piazza 5

OpenTA 6

OpenTA Syntax 7

OpenTA Quiz 8

OpenTA Quiz 9

Introductory lecture Slides are on the course home page. 0

Neurons in the cerebral cortex brainmaps.org Neurons in the cerebral cortex (outer layer of the cerebrum, the largest and best developed part of the Human brain)

Neurons in the cerebral cortex brainmaps.org axon output cell body process dendrites input Neurons in the cerebral cortex (outer layer of the cerebrum, the largest and best developed part of the Human brain)

Neuron anatomy and activity Schematic drawing of a neuron Output of a neuron: spike train input process output dendrites synapses cell body axon active inactive time Total length of dendrites up to cm 00 µm synapseweb.clm.utexas.edu/dimensions-dendrites wikipedia.org/wiki/neuron Spike train in electrosensory pyramidal neuron in fish (Eigenmannia) Gabbiani & Metzner, J. Exp. Biol. 0 (999) 67 3

McCulloch-Pitts neuron McCulloch & Pitts, Bull. Math. Biophys. 5 (943) 5 Simple model for a neuron: w i Signal processing: weighted sum of inputs Activation function g(b i ) w i i O i O i = g N w ij j i j= = b i (local field) N w in = g w i +...+ w in N i ) Incoming signals j j=,...,n w ij Weights (synaptic couplings) Threshold i Neuron number Output i O i 4

Activation function Signal processing of McCulloch-Pitts neuron: weighted sum of inputs with activation function g(b i ) : O i = g g(b) N j= w ij j i = b i = g w i +...+ w in N i ) g(b) g(b) Inputs j Synaptic couplings w ij Threshold i Output O i 0 b b 0 b Step function g(b) (b) Signum function sgn(b) Sigmoid function g(x) = with parameter > 0 e b + 0 b ReLU function max(0,b) 5

Highly simplified idealisation of a neuron Take activation function to be signum function sgn(b i ). Output of neuron i can assume only two values, ±, representing the level of activity of the neuron. This is an idealisation. -real neurons fire spike trains active inactive -switching active inactive can be delayed (time delay) -real neurons may perform non-linear summation of inputs -real neurons subject to noise (stochastic dynamics) 6

Neural networks Connect neurons into networks that can perform computing tasks: for example image analysis, object identification, pattern recognition, classification, clustering, data compression. inputs inputs Simple perceptrons Two-layer perceptron Recurrent network Boltzmann machine 7

A classification task Input patterns. Index µ =,...,p labels patterns. In this example (µ) (µ) each pattern has two components, and. " # (µ) Arrange components into vector, (µ) =, shown in the Figure. (µ) We see: the patterns fall into two classes: on the left, and on the right. (µ) (3) () (0) () (6) (4) (5) (8) (7) (9) 8

A classification task Input patterns. Index µ =,...,p labels patterns. (µ) (µ) Each pattern has two components, and. " # Arrange components into vector, (µ) =, shown in the Figure. (µ) We see: the patterns fall into two classes: the left, and on the right. (µ) on (µ) Draw a red line (decision boundary) to distinguish the two types of patterns. (3) () (0) () (6) (4) (5) (8) (7) (9) 9

A classification task Input patterns. Index µ =,...,p labels patterns. (µ) (µ) Each pattern has two components, and. " # Arrange components into vector, (µ) =, shown in the Figure. (µ) We see: the patterns fall into two classes: the left, and on the right. (µ) on (µ) Draw a red line (decision boundary) to distinguish the two types of patterns. (3) () (0) () (6) (4) (5) (8) (7) (9) Aim: train a neural network to compute the decision boundary. To do this, define target values: t (µ) =for, and t (µ) = for Training set ( (µ),t (µ) ),µ=,...,p. 0

Simple perceptron Simple perceptron: one neuron. Two input units and. Activation function sgn(b). No threshold, =0. inputs w w output O Output Vectors O = sgn(w + w ) = sgn(w ) w = apple w denotes length of vector w apple apple cos = w = w sin = apple cos sin Scalar product between vectors: ' w So w = w + w w = w cos( ) = w cos = w (cos cos + sin sin cos angle between w and Aim: adjust the weights w so that network outputs correct target value for all patterns: O (µ) = sgn(w (µ) )=t (µ) for µ =,...,p

Geometrical solution Simple perceptron: One neuron. Two input units and. Activation function sgn(b). No threshold, =0. inputs w w output O Output O = sgn(w ) with w = w cos angle between w and Aim: adjust the weights w so that network outputs correct target values for all patterns: Legend t (µ) = t (µ) = (3) () () (0) (4) (5) ' w (9) (8) (6) (7) O (µ) = sgn(w (µ) )=t (µ) for µ =,...,p where t (µ) =for and t (µ) = for. Solution: w decision boundary Check: w (8) > 0 so O (8) = sgn(w (8) )=. Correct since t (8) =. cos

Hebb s rule (8) Now the pattern is on wrong side of the red line. So O (8) = sgn(w (8) ) 6= t (8). Move the red line so that O (8) = sgn(w (8) )=t (8) by rotating the weight vector w: Legend t (µ) = t (µ) = (3) () () (8) (6) (4) (5) (0) w (9) (7) 3

Hebb s rule Now the pattern (8) is on wrong side of the red line. So O (8) = sgn(w (8) ) 6= t (8). Move the red line so that O (8) = sgn(w (8) )=t (8) by rotating the weight vector w: w = w + (8) (small parameter > 0 ) so that O (8) = sgn(w (8) )=t (8). Legend t (µ) = t (µ) = (3) () () (8) (6) (4) (5) (0) w (7) (9) 4

Hebb s rule (4) Now the pattern is on wrong side of the red line. So O (4) = sgn(w (4) ) 6= t (4). Move the red line so that O (4) = sgn(w (4) )=t (4) by rotating the weight vector w: w = w (4) (small parameter > 0 ) so that O (4) = sgn(w (4) )=t (4). Legend t (µ) = t (µ) = (3) () () (6) (5) (0) w (7) (4) (9) (8) 5

Hebb s rule (4) Now the pattern is on wrong side of the red line. So O (4) = sgn(w (4) ) 6= t (4). Move the red line so that O (4) = sgn(w (4) )=t (4) by rotating the weight vector w: w = w (4) (small parameter > 0 ) so that O (4) = sgn(w (4) )=t (4). Legend t (µ) = t (µ) = (3) () () (6) (5) (0) w (7) (4) (9) (8) 6

Hebb s rule (4) Now the pattern is on wrong side of the red line. So O (4) = sgn(w (4) ) 6= t (4). Move the red line so that O (4) = sgn(w (4) )=t (4) by rotating the weight vector w: w = w (4) (small parameter > 0 ) so that O (4) = sgn(w (4) )=t (4). Legend t (µ) = t (µ) = Note the minus sign: w = w + (8) where t (8) = (3) () () (6) (5) w = w (4) where Learning rule (Hebb s rule) t (4) = (0) w (7) w 0 = w + w with w = t (µ) (µ) (4) (9) (8) Apply learning rule many times until problem is solved. 7

Example - AND function Logical AND Legend t (µ) = t (µ) = t 0 0-0 - () (4) w w = 0 - () (3) w = 8

Example - XOR function Logical XOR t 0 0-0 0 - () () (4) (3) Legend t (µ) = t (µ) = 9

Example - XOR function Logical XOR t 0 0-0 0 - () () (4) (3) Legend t (µ) = t (µ) = This problem is not linearly separable because we cannot separate from by a single red line. 30

Example - XOR function Logical XOR t 0 0-0 0 - () () (4) (3) Legend t (µ) = t (µ) = This problem is not linearly separable because we cannot separate from by a single red line. Solution: use two red lines. 3

Example - XOR function Logical XOR t 0 0-0 0 - () I () (4) (3) Two hidden neurons, each one defines one red line. Together they define three regions, I, II, and III. II Legend t (µ) = t (µ) = III V V layer of hidden neurons V and V (neither input nor output) all hidden weights equal to We need a third neuron (with weights W and W to associate region II with, and regions II and III with. O = sgn(v V ) 3 threshold

Non-(linearly) separable problems Solve problems that are not linearly separable with a hidden layer of neurons Legend t (µ) = t (µ) = Four hidden neurons - one for each red line-segment. Move the red lines into the correct configuration by repeatedly using Hebb s rule until the problem is solved (a fifth neuron assigns regions and solves the classification problem). 33

Generalisation Train the network on a training set ( (µ),t (µ) ),µ=,...,p : move red lines into the correct configuration by repeatedly applying Hebb s rule to adjust all weights. Usually many iterations necessary. Legend t (µ) = t (µ) = Once all red lines are in the right place (all weights determined), apply network to a new data set. If the training set was reliable, then the network has learnt to classify the new data, it has learnt to generalise. 34

Training Train the network on a training set ( (µ),t (µ) ),µ=,...,p : move red lines into the correct configuration by repeatedly applying Hebb s rule to adjust all weights. Usually many iterations necessary. Legend t (µ) = t (µ) = Once all red lines are in the right place (all weights determined), apply network to a new data set. If the training set was reliable, then the network has learnt to classify the new data, it has learnt to generalise. 35

Training Train the network on a training set ( (µ),t (µ) ),µ=,...,p : move red lines into the correct configuration by repeatedly applying Hebb s rule to adjust all weights. Usually many iterations necessary. Legend t (µ) = t (µ) = error Usually a small number of errors is acceptable. It is often not meaningful to try to finetune very precisely (input patterns are subject to noise). 36

Overfitting Train the network on a training set ( (µ),t (µ) ),µ=,...,p : move red lines into the correct configuration by repeatedly applying Hebb s rule to adjust all weights. Usually many iterations necessary. Legend t (µ) = t (µ) = Here: used 5 hidden neurons to fit decision boundary very precisely. But often the inputs are affected by noise, which can be very different in different data sets training set 37

Overfitting Train the network on a training set ( (µ),t (µ) ),µ=,...,p : move red lines into the correct configuration by repeatedly applying Hebb s rule to adjust all weights. Usually many iterations necessary. Legend t (µ) = t (µ) = Here: used 5 hidden neurons to fit decision boundary very precisely. But often the inputs are affected by noise, which can be very different in different data sets. data set Here the network just learnt noise in the training set. Avoid this by reducing number of weights (dropout). 38

Multilayer perceptrons All examples had a two-dimensional input space and one-dimensional outputs. Reason: so that we could draw the problem and understand its geometrical form. In reality problems are often high-dimensional. For example image recognition: input dimension N = #pixels 3. Output dimension: number of classes to be recognised (car, lorry, road sign, person, bicycle,...) Multilayer perceptron: network with many hidden layers Two problems: (i) expensive to train if many weights (ii) slow convergence (weights get stuck) Questions: (i) how many hidden layers necessary? (ii) how many hidden layers efficient? hidden layers number m and m 39

How many hidden layers? All Boolean functions with N inputs can be trained/learned with a single hidden layer. Proof by construction. Requires N neurons in the hidden layer. Example: XOR function ( N =). For large N, this architecture is not practical, because the number of neurons increases exponentially with N. Better: use more layers. Intuition: the more layers the more powerful is the network. hidden layers But such deep networks are in general hard to train. inputs outputs with w +w w>0 40

Deep learning Deep learning. Perceptrons with large numbers of layers. In the last few years, interest in neural-network algorithms has exploded, and their performance has significantly improved. Why? Better training sets classification error (top-5) ILSVRC classification error rate 0. 30 0. 5 0. 0 0. 5 0. 0 0. 05 not NN NN 0. 00 00 0 0 03 04 05 Goodfellow et al. Deep Learning (06) Improved architecture - better activation functions, g(b) = max(0,b) Giorot, Bordes & Bengio (0) - a bunch of other tricks (dropout, batch normalisation) Better hardware - GPUs - dedicated chips by Google en.wikipedia.org/wiki/tensor_processing_unit 4

ImageNet To obtain training set: must collect and manually annotate data. Questions: (i) efficient data collection and annotation (ii) aim for large variety in collected data Example: Image-net (publicly available data set): > 07 images, classified into 03 classes. image-net.org 4

recaptcha github.com/google/recaptcha Two goals. (i) protects websites from bot (ii) users who click correctly provide correct annotation. 43

Convolutional networks Convolutional network. Each neuron in first hidden layer is connected to a small region of inputs. Here 3 3 pixels. 0 0 V 8 8 Slide the region over input image. Use same weights for all hidden neurons. Update V mn = g 3 3 w kl m+k,n+l V mn 0 0 V 8 8 k= l= - detects the same local feature everywhere in input image (edge, corner,...). Feature map. inputs 0 0 hidden 8 8 4 - the form of the sum is called convolution - efficient because fewer weights - use several feature maps to detect different features in input image 44

Convolutional networks - max-pooling: take maximum of over small region ( ) to reduce # of parameters V mn inputs 0 0 hidden 8 8 4 max-pooling outputs 4 4 4 - add fully connected layers to learn more abstract features - Example Krizhevsky, Stuskever & Hinton (0) 45

Object detection www.google.com/recaptcha/intro/android.html Deep learning with a convolutional network. Here Yolo-9000 with tiny number of weights (so that the algorithm runs on my laptop). Pretrained on VOC training set Redmon et al. www.arxiv.org/abs/6.084 github.com/pjreddie/darknet/wiki/yolo:-real-time-object-detection Installation instructions: github.com/philipperemy/yolo-9000 46

Yolo-9000 Object detection from video frames using Yolo-9000 github.com/philipperemy/yolo-9000 Film recorded with Iphone 6 on my way home from work. Tiny number of weights (so that the algorithm runs on my laptop) Pretrained on Pascal VOC data set host.robots.ox.ac.uk/pascal/voc/ 47

Deep learning for self-driving cars Zenuity: joint venture between Autoliv and Volvo Cars. Goal: develop software for assisted and autonomous driving. Founded 07. 500 employees, mostly in Gothenburg. 48

Conclusions Artificial neural networks were already studied in the 80ies In the last few years: deep-learning revolution due to - improved algorithms (focus on object detection) - better hardware (GPUs, dedicated chips) - better training sets Applications: google, facebook, tesla, zenuity,... Perspectives - close connections to statistical physics (spin glasses, random magnets) - related methods in mathematical statistics (big data analysis) - unsupervised learning (clustering and familiarity of input data) 49

Conclusions xkcd.com/897 50

Literature Neural networks Mehlig, Artificial Neural Networks physics.gu.se/~frtbm Hertz, Krogh & Palmer, Introduction to the theory of neural computation (Santa Fe Institute Series) Deep learning Nielsen Neural Networks and Deep Learning neuralnetworksanddeeplearning.com/ Goodfellow, Bengio & Courville Deep Learning MIT Press LeCunn, Bengio & Hinton, Deep learning, Nature 5 (05) 436 Convolutional networks Goodfellow, Bengio & Courville Deep Learning MIT Press 5

Lecture Hopfield model 5

Associative memory problem 53