ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Similar documents
EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, Sasidharan Sreedharan

Artificial Intelligence

Artificial Neural Networks

Neural networks. Chapter 19, Sections 1 5 1

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Neural Networks: Introduction

Course 395: Machine Learning - Lectures

Lecture 4: Feed Forward Neural Networks

Introduction to Neural Networks

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

EEE 241: Linear Systems

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural networks. Chapter 20. Chapter 20 1

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Feedforward Neural Nets and Backpropagation

Data Mining Part 5. Prediction

Lecture 7 Artificial neural networks: Supervised learning

Introduction to Neural Networks

CMSC 421: Neural Computation. Applications of Neural Networks

Introduction to Artificial Neural Networks

Artificial Neural Networks. Historical description

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Neural networks. Chapter 20, Section 5 1

Introduction To Artificial Neural Networks

Unit 8: Introduction to neural networks. Perceptrons

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

Lecture 4: Perceptrons and Multilayer Perceptrons

CS:4420 Artificial Intelligence

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

ECE521 Lectures 9 Fully Connected Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Machine Learning. Neural Networks

Neural Networks (Part 1) Goals for the lecture

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Artificial Neural Networks. Edward Gatt

Introduction Biologically Motivated Crude Model Backpropagation

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Artificial Neural Networks

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Lecture 5: Logistic Regression. Neural Networks

Artificial neural networks

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

Part 8: Neural Networks

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Neural Networks biological neuron artificial neuron 1

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Artificial Neural Network

Artifical Neural Networks

Neural Networks and the Back-propagation Algorithm

Revision: Neural Network

CSC 411 Lecture 10: Neural Networks

Multilayer Perceptron

Artificial Neural Networks The Introduction

CSC321 Lecture 5: Multilayer Perceptrons

AI Programming CS F-20 Neural Networks

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Neural Networks Introduction

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Multilayer Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Artificial Neural Networks. Part 2

CSC242: Intro to AI. Lecture 21

Unit III. A Survey of Neural Network Model

Chapter 2 Single Layer Feedforward Networks

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Neural Networks and Deep Learning

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

PV021: Neural networks. Tomáš Brázdil

Multilayer Perceptrons (MLPs)

Lab 5: 16 th April Exercises on Neural Networks

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Single layer NN. Neuron Model

Simple Neural Nets For Pattern Classification

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

Deep Feedforward Networks

Linear discriminant functions

Lecture 17: Neural Networks and Deep Learning

ECE521 Lecture 7/8. Logistic Regression

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

From perceptrons to word embeddings. Simon Šuster University of Groningen

Multilayer Perceptron = FeedForward Neural Network

Input layer. Weight matrix [ ] Output layer

Sections 18.6 and 18.7 Artificial Neural Networks

The Perceptron. Volker Tresp Summer 2014

Supervised Learning in Neural Networks

Computational Intelligence Winter Term 2009/10

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Transcription:

INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

2

Outline Biological neural networks Artificial NN basics: perceptrons multi layer networks Training ANN Combination with other ML techniques NN and Evolutionary Computing NN and Reinforcement Learning e.g. deep learning 3

(Artificial) Neural Networks Supervised learning technique: error driven classification Output is determined from weighted set of inputs Training updates the weights Used in games for e.g. Select weapon Select item to pick up Steer a car on a circuit Recognize characters Recognize face 4

Biological Neural Nets Pigeons as art experts (Watanabe et al. 1995) Experiment: Pigeon in Skinner box Present paintings of two different artists (e.g. Chagall / Van Gogh) Reward for pecking when presented a particular artist (e.g. Van Gogh) 5

6

Results from experiment Pigeons were able to discriminate between Van Gogh and Chagall with 95% accuracy (when presented with pictures they had been trained on) Discrimination still 85% successful for previously unseen paintings of the artists 7

Praise to neural nets Pigeons have acquired knowledge about art Pigeons do not simply memorise the pictures They can extract and recognise patterns (the style ) They generalise from the already seen to make predictions Pigeons have learned. Can one implement this using an Artificial neural network? 8

Inspiration from biology If a pigeon can do it, how hard can it be? ANN s are biologically inspired. ANN s are not duplicates of brains (and don t try to be). 9

(Natural) Neurons Natural neurons: receive signals through synapses (~ inputs) If signals strong enough (~ above some threshold), the neuron is activated and emits a signal though the axon. (~ output) Natural neuron Artificial neuron (Node) 10

McCulloch & Pitts model (1943) A logical calculus of the ideas immanent in nervous activity x 1 w 1 Linear Combiner hard delimiter output x 2 w 2 y x n w n aka: - linear threshold gate - threshold logic unit n binary inputs x i and 1 binary output y n weights w i ϵ { 1,1} Linear combiner: z = Hard delimiter: unit step function at threshold θ, i.e. 1 if, 0 if 11

Rosenblatt s Perceptron (1958) x z y = g(z) x enhanced version of McCulloch Pitts artificial neuron n+1 real valued inputs: x 1 x n and 1 bias b; binary output y weights w i with real valued values Linear combiner: z = g(z): (hard delimiter) unit step function at threshold 0, i.e. 1if 0, 0if 0 12

Classification: feedforward The algorithm for computing outputs from inputs in perceptron neurons is the feedforward algorithm. 4 w=2 8-3 w=4-12 -4 0 weighted input: z = 0 activation g(z): 0 0 13

Bias & threshold implementation Bias can be incorporated in three different ways, with same effect on output: 1 b b w 0 = 1 θ- b Alternatively: threshold θ canbeincorporatedin three different ways, with same effect on output 14

Single layer perceptron x 1 Input nodes: 1 w 14 w 13 w 23 x 2 2 4 w 24 Single layer of neurons: 3 y 1 y 2 Rosenblatt s perceptron is building block of single layer perceptron which is the simplest feedforward neural network alternative hard limiting activation functions g(z) possible; e.g. sign function: 1 if 0, 1 if 0 can have multiple independent outputs the adjustable weights can be trained using training data the Perceptron learning rule adjusts the weights w 1 w n such that the inputs x 1 x n give rise to the desired output(s) y 15

Perceptron learning: idea Idea: minimize error in the output through gradient descent squared error, per output: (d=desired output) change term proportional to gradient if (non differentiable) activation replaced with y = g(z) = z Proportional change: learning rate > 0 NB in the book the learning rate is called Gain, with notation η 16

Perceptron learning Initialize weights and threshold (or bias) to random numbers; Choose a learning rate 0 1 For each training input t=<x 1,,x n >: calculate the output y(t) and error e(t)=d(t) - y(t) Adjust all n weights using perceptron learning rule: where e(t) 1 epoch desired output Weights for any t changed? All Weights unchanged? or other stopping rule Ready 17

Example: AND- learning (1) x 1 x 2 d 0 0 0 0 1 0 1 0 0 1 1 1 x 2 1 0 1 x 1 desired output of logical AND, given 2 binary inputs 18

Example AND (2) x 1 0 w=0.3 0 x 2 0 w=-0.1 0 0 0.2 0 e(t 1 ) = d(t) 0 = 0 0 Init: choose weights w i and threshold θ randomly in [ 0.5,0.5]; set ; use step function: return 0 if < θ; 1 if θ x 1 x 2 d(t) t 1 0 0 0 t 2 0 1 0 t 3 1 0 0 t 4 1 1 1 Alternative: use bias b= θ with unit stepfunction Done with t 1, for now 19

Example AND (3) x 1 0 w=0.3 0 x 2 1 w=-0.1-0.1-0.1 0 0.2 e(t 2 ) = 0-0 x 1 x 2 d(t) t 1 0 0 0 t 2 0 1 0 t 3 1 0 0 t 4 1 1 1 Done with t 2, for now 20

Example AND (4) x 1 1 w=0.3 w=0.2 0.3 x 2 0 w=-0.1 0 0.3 1 0.2 e(t 3 ) = 0-1 x 1 x 2 d(t) t 1 0 0 0 t 2 0 1 0 t 3 1 0 0 t 4 1 1 1 (t) w 1 0.2; done with t 3, for now 21

Example AND (5) x 1 1 w=0.3 w=0.2 0.2 0.1 0 x 2 1 w=-0.1 w=0-0.1 0.2 e(t 4 ) = 1-0 x 1 x 2 d(t) t 1 0 0 0 t 2 0 1 0 t 3 1 0 0 t 4 1 1 1 (t).1 w 1 0.3 and w 2 0; done with t 4 and first epoch 22

Example (6) : 4 epoch s later x 1 w=0.1 x 2 w=0.1 0.2 algorithm has converged, i.e. the weights do not change any more. algorithm has correctly learned the AND function 23

AND example (7): results x 1 x 2 d y 0 0 0 0 0 1 0 0 1 0 0 0 1 1 1 1 x 2 1 0 1 x 1 Learned function/decision boundary: 0.1 0.1 0.2 0 linear classifier Or: 2 24

Perceptron learning: properties We do gradient descent in space without local optimal Complete: yes, if sufficiently small or initial weights suff. large examplescomefroma linearly separable function! then perceptron learning converges to a solution. Optimal: no (weights serve to correctly separate seen inputs; no guarantees for unseen inputs close to the decision boundaries) 25

Limitation of perceptron: example XOR x 1 x 2 d 0 0 0 0 1 1 1 0 1 1 1 0 x 2 1 0 1 x 1 Cannot separate two output types with a single linear function XOR is not linearly separable. 26

Solving XOR using 2 McCulloch & Pitt models x 1 1 2 x 2 ϴ=1 1 3-1 x 1 1 2 4 1 ϴ=1 x 2-1 ϴ=1 1 1 3 1-1 -1 2 4 1 1 ϴ=1 x 1 x 2 ϴ=1 5 y x 2 x 2 x 2 1 1 1 0 1 x 1 0 1 x 1 0 1 x 1 27

Types of decision regions 28

Multi-layer networks x 1 y 1 x 2 y 2 x 3 y 3 input nodes hidden layer of neurons output neuron layer This type of network is also called a feed forward network hidden layer captures nonlinearities more than 1 hidden layer is possible, but often reducible to 1 hidden layer introduced in 50s, but not studied until 80s 29

Training Multi-Layer Network s MLN s are trained using Back propagation Input signals x 1 y 1 x 2 y 2 x 3 y 3 Error signals 30

Training Multi-Layer Network s I Similar to perceptron learning rule, but now error has to be distributed over hidden nodes squared error, per output: i j We need a continuous activation function 31

Continuous activation functions As continuous activation function, we can use smoothed versions of step function: a sigmoid E.g. logistic sigmoid g(z) z 32

Continuous artificial neurons x 1 w 1 Linear Combiner sigmoid function output x 2 w 2 y x n w n weighted input: activation (logistic sigmoid): z = 33

Example 3 w=2 6-2 w=4-8 -2 0.119 weighted input: activation: z = 34

Training Multi-Layer Network s squared error, per output: output of node is input for node o for node in output layer o = for node in a previous (hidden) layer NB previous = closer to input layer 35

Backpropagation Initialize weights and threshold (or bias) to random numbers; Choose a learning rate 0 1 For each training input t=<x 1,,x n >: calculate the output y(t) and error e(t)=d(t) - y(t) Recursively adjust each weight on link node i to node j: o if j is output node o if j is hidden node Weights for any t changed? All Weights unchanged? or other stopping rule Ready 36

Training for XOR x 1 0 1 W 14 = -5 W 23 = -5 W 13 = 10 3 0.002 W 35 = 5 5 0.003 y x 2 2 4 0 W 24 = 10 W 45 = 5 0.002 e(t) = 0-0.003 Activation function for nodes 3-5: 1 1 (i.e. 6 ) Set 0.9 To simplify computation, if absolute value of e(t) < 0.1, we consider outcome correct. x 1 x 2 d 0 0 0 0 1 1 1 0 1 1 1 0 With the sigmoid as approximation of the step function, we consider this outcome correct no weight updates required for first case, for now.. 37

x 1 0 1 W 14 = -5 W 23 = -5 W 13 = 10 Training for XOR 3 0.000 W 35 = 5 0.252 x 2 4 W 45 = 5 2 W 24 = 10 0.982 1 Activation function for nodes 3-5: 1 1 (i.e. 6 ) Set 0.9 δ 5 = y 5 * (1-y 5 ) * e ~ 0.141 x 1 x 2 d Δw 35 = α * y 3 * δ 5 ~ 0.000 Δw 0 0 0 45 = α * y 4 * δ 5 ~ 0.125 δ 3 = y 3 * (1-y 3 ) * w 35 * δ 5 ~ 0.000 0 1 1 δ 4 = y 4 * (1-y 4 ) * w 45 * δ 5 ~ 0.012 1 0 1 Δw 13 = α * y 1 * δ 3 = α * x 1 * δ 3 = 0 = Δw 14 1 1 0 Δw 23 = α * x 2 * δ 3 ~ 0.000 Δw 24 = α * x 2 * δ 4 ~ 0.011 5 y e(t) = 1-0.252=0.748 38

0 x 1 1 x 2 W 14 = -5 W 23 = -5 2 4 1 W 13 = 10 W 24 = 10.011 Training for XOR 3 0.000 W 35 = 5 0.982 W 45 = 5.125 0.276 5 y e(t) = 1-0.276=0.724 Activation function for nodes 3-5: 1 1 (i.e. 6 ) Set 0.9 x 1 x 2 d 0 0 0 0 1 1 1 0 1 1 1 0 Adjust the weights that require changing: Δw 45 ~ 0.125: update w 45 to 5.125 Δw 24 ~ 0.011: update w 24 to 10.011 39

After many training examples x 1 0 1 W 14 = -13 W 23 = -11 W 13 = 12 3 0.000 W 35 = 13 0.999 5 e(t) = 1-0.999=0.001 2 4 W 45 = 13 x 2 W 24 = 13 0.999 1 Activation function for nodes 3-5: 1 1 (i.e. 6 ) Set 0.9 y x 1 x 2 d y 0 0 0 0.003 0 1 1 0.999 1 0 1 0.999 1 1 0 0.003 e(t) < 0.1 for all cases: we can consider these outcomes correct 40

Properties of MLNs Boolean functions: Every boolean function f:{0,1} k {0,1} can be represented using a single hidden layer Continuous functions: Every bounded piece wise continuous function can be approximated with arbitrarily small error with one hidden layer Anycontinuous functioncanbe approximatedto arbitrary accuracy with two hidden layers Learning: Not efficient (but intractable, regardless of method) No guarantee of convergence 41

Example: Voice Recognition Task: Learn to discriminate between two different voices saying Hello Data Sources Steve Simpson David Raubenheimer Format Frequency distribution (60 bins) Analogy: cochlea 42

Example: Voice Recognition Network architecture Feed forward network 60 input (one for each frequency bin) 6 hidden 2 output (0 1 for Steve, 1 0 for David ) 43

Example: Voice Recognition Presenting the data: feed forward Steve David 44

Example: Voice Recognition Presenting the data: feed forward (untrained network) Steve 0.43 0.26 David 0.73 0.55 45

Example: Voice Recognition Calculate error Steve 0 0.43 = 0.43 1 0.26 = 0.74 David 1 0.73 = 0.27 0 0.55 = 0.55 46

Example: Voice Recognition Backprop total error and adjust weights Steve 0 0.43 = 0.43 1 0.26 = 0.74 1.17 David 1 0.73 = 0.27 0 0.55 = 0.55 0.82 47

Example: Voice Recognition Total error Repeat process (sweep) for all training pairs Present data Calculate error Backpropagate error Adjust weights Repeat process multiple timess #sweeps 48

Presenting the data (trained network) Steve Example: Voice Recognition 0.01 0.99 David 0.99 0.01 49

Example: Voice Recognition Results Voice Recognition Performance of trained network Discrimination accuracy between known Hello s 100% Discrimination accuracy between new Hello s 100% 50

Example: Voice Recognition Results Voice Recognition (ctnd.) Network has learnt to generalise from original data Networks with different weight settings can have same functionality Trained networks concentrate on lower frequencies Network is robust against non functioning nodes 51

Applications of feed-forward nets Classification, pattern recognition, diagnosis: Character Recognition, both printed and handwritten Face Recognition, speech recognition Object classification by means of salient features Analysis of signal to determine their nature and source Regression and forecasting: Examples: In particular non linear functions and time series Sonar mine/rock recognition (Gorman & Sejnowksi, 1988) Navigation of a car (Pomerleau, 1989) Stock market prediction Pronunciation (NETtalk: Sejnowksi & Rosenberg, 1987) 52

More Neural Networks Acyclic: feedforward Cyclic: recurrent 53

(Natural) Neurons revisited Human s have 10 10 neurons, and 10 15 dendrites. Don t even think about creating an ANN of this size Most ANN s do not have feedback loops in the network structure (exception: recurrent NN). The ANN activation function is (probably) much simpler than what happens in the biological neuron. 54

Learning NNs using Evolution https://www.youtube.com/watch?v=ts8qll 3NXk https://www.youtube.com/watch?v=s9y_i9vy8qw 55

Deep learning Source: NIPS 2015 tutorial by Y LeCun 56

NN as function approximator A NN can be used as a black box that represents (an approximation of) a function This can be used in combination with other learning methods E.g. use a NN to represent the Q function in Q learning 57

NN + Q-learning 58

Alpha Go (Deepmind/Google) https://www.youtube.com/watch?v=mzpw10dpheq 59