Lecture 4: Feed Forward Neural Networks

Similar documents
Feedforward Neural Nets and Backpropagation

Artificial Neural Network and Fuzzy Logic

Neural networks. Chapter 19, Sections 1 5 1

Data Mining Part 5. Prediction

Introduction to Artificial Neural Networks

EEE 241: Linear Systems

Artificial Neural Network

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Introduction To Artificial Neural Networks

Neural networks. Chapter 20, Section 5 1

Introduction Biologically Motivated Crude Model Backpropagation

Neural networks. Chapter 20. Chapter 20 1

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Lecture 7 Artificial neural networks: Supervised learning

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, Sasidharan Sreedharan

Introduction to Neural Networks

Artificial Neural Networks. Historical description

Sections 18.6 and 18.7 Artificial Neural Networks

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Sections 18.6 and 18.7 Artificial Neural Networks

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Neural Networks: Introduction

Artificial Neural Networks The Introduction

Neural Networks. Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation

Neural Networks. Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Neural Nets in PR. Pattern Recognition XII. Michal Haindl. Outline. Neural Nets in PR 2

Artifical Neural Networks

An artificial neural networks (ANNs) model is a functional abstraction of the

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Instituto Tecnológico y de Estudios Superiores de Occidente Departamento de Electrónica, Sistemas e Informática. Introductory Notes on Neural Networks

CSC242: Intro to AI. Lecture 21

Artificial neural networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Artificial Intelligence

THE MOST IMPORTANT BIT

Neural Networks Introduction

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Part 8: Neural Networks

Supervised (BPL) verses Hybrid (RBF) Learning. By: Shahed Shahir

Introduction and Perceptron Learning

Artificial Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Learning and Memory in Neural Networks

Machine Learning. Neural Networks

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Artificial Neural Networks

CSC Neural Networks. Perceptron Learning Rule

Multilayer Perceptron Tutorial

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

CN2 1: Introduction. Paul Gribble. Sep 10,

Artificial Neural Networks. Edward Gatt

Neuron. Detector Model. Understanding Neural Components in Detector Model. Detector vs. Computer. Detector. Neuron. output. axon

Linear Regression, Neural Networks, etc.

Revision: Neural Network

CS:4420 Artificial Intelligence

Optimization Methods for Machine Learning (OMML)

Fundamentals of Neural Networks

Artificial Neural Networks. Part 2

Using a Hopfield Network: A Nuts and Bolts Approach

Ch.8 Neural Networks

18.6 Regression and Classification with Linear Models

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang

Neural networks (not in book)

Unit III. A Survey of Neural Network Model

Course 395: Machine Learning - Lectures

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Chapter 9: The Perceptron

CS 4700: Foundations of Artificial Intelligence

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Grundlagen der Künstlichen Intelligenz

Lecture 4: Perceptrons and Multilayer Perceptrons

Simple Neural Nets For Pattern Classification

Hopfield Neural Network and Associative Memory. Typical Myelinated Vertebrate Motoneuron (Wikipedia) Topic 3 Polymers and Neurons Lecture 5

ECE521 Lectures 9 Fully Connected Neural Networks

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Unit 8: Introduction to neural networks. Perceptrons

Computational Intelligence Winter Term 2009/10

Neural Networks Introduction CIS 32

The Perceptron. Volker Tresp Summer 2016

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory

Computational Intelligence

2018 EE448, Big Data Mining, Lecture 5. (Part II) Weinan Zhang Shanghai Jiao Tong University

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation

Artificial Neural Networks Examination, June 2004

CMSC 421: Neural Computation. Applications of Neural Networks

Computational Intelligence

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Multilayer Perceptrons and Backpropagation

Artificial Neural Networks Examination, June 2005

CS 4700: Foundations of Artificial Intelligence

Multilayer Perceptron

Transcription:

Lecture 4: Feed Forward Neural Networks Dr. Roman V Belavkin Middlesex University BIS4435

Biological neurons and the brain A Model of A Single Neuron Neurons as data-driven models Neural Networks Training algorithms Applications Benefits, limitations and applications

HISTORICAL BACKGROUND 1943 McCulloch and Pitts proposed the first computational model of a neuron 1949 Hebb proposed the first learning rule 1958 Rosenblatt s work on perceptrons 1969 Minsky and Papert s paper exposed limitations of the theory 1970s Decade of dormancy for neural networks 1980 90s Neural network return (self organisation, back propagation algorithms, etc)

SOME FACTS Human brain contains 10 11 neurons

SOME FACTS Human brain contains 10 11 neurons Each neuron is connected to 10 4 others

SOME FACTS Human brain contains 10 11 neurons Each neuron is connected to 10 4 others Some scientists compared the brain with a complex, nonlinear, parallel computer.

SOME FACTS Human brain contains 10 11 neurons Each neuron is connected to 10 4 others Some scientists compared the brain with a complex, nonlinear, parallel computer. The largest modern neural networks achieve the complexity comparable to a nervous system of a fly.

NEURONS Synapse Axon Dendrite Nucleus Cell body Evidence suggests that neurons receive, analyse and transmit information.

NEURONS Synapse Axon Dendrite Nucleus Cell body Evidence suggests that neurons receive, analyse and transmit information. The information in transmitted in a form of electro-chemical signals (pulses).

NEURONS Synapse Axon Dendrite Nucleus Cell body Evidence suggests that neurons receive, analyse and transmit information. The information in transmitted in a form of electro-chemical signals (pulses). When a neuron sends the information we say that a neuron fires.

EXCITATION AND INHIBITION The receptors of a neuron are called synapses, and they are located on many branches, called dendrites. There are many types of synapses, but roughly they can be divided into two classes: Excitatory a signal received at this synapse encourages the neuron to fire Inhibitory a signal received at this synapse inhibits the neuron (as if asking to shut up ) The neuron analyses all the signals received at its synapses. If most of them are encouraging, then the neuron gets excited and fires its own message along a single wire, called axon. The axon may have branches to reach many other neurons.

A MODEL OF A SINGLE NEURON (UNIT) McCulloch and Pitts (1943) proposed the integrate and fire model: x 1 w 1. w m x m + y = f ( x i w i )

A MODEL OF A SINGLE NEURON (UNIT) McCulloch and Pitts (1943) proposed the integrate and fire model: x 1 w 1. w m x m + y = f ( x i w i ) Denote the m input values by x 1, x 2,..., x m.

A MODEL OF A SINGLE NEURON (UNIT) McCulloch and Pitts (1943) proposed the integrate and fire model: x 1 w 1. w m x m + y = f ( x i w i ) Denote the m input values by x 1, x 2,..., x m. Each of the m inputs (synapses) has a weight w 1, w 2,..., w m.

A MODEL OF A SINGLE NEURON (UNIT) McCulloch and Pitts (1943) proposed the integrate and fire model: x 1 w 1. w m x m + y = f ( x i w i ) Denote the m input values by x 1, x 2,..., x m. Each of the m inputs (synapses) has a weight w 1, w 2,..., w m. The input values are multiplied by their weights and summed v = w 1 x 1 + w 2 x 2 + + w m x m = m w i x i i=1

A MODEL OF A SINGLE NEURON (UNIT) McCulloch and Pitts (1943) proposed the integrate and fire model: x 1 w 1. w m x m + y = f ( x i w i ) Denote the m input values by x 1, x 2,..., x m. Each of the m inputs (synapses) has a weight w 1, w 2,..., w m. The input values are multiplied by their weights and summed v = w 1 x 1 + w 2 x 2 + + w m x m = m w i x i i=1 The output is some function y = f (v) of the weighted sum

A MODEL OF A SINGLE NEURON (UNIT) McCulloch and Pitts (1943) proposed the integrate and fire model: x 1 w 1 Example. w m x m + y = f ( x i w i ) Let x = (0, 1, 1) and w = (1, 2, 4). Then v = 1 0 2 1 + 4 1 = 2

ACTIVATION FUNCTION The output of a neuron (y) is a function of the weighted sum y = f (v) This function is often called the activation function. What function is it and how is it computed?

ACTIVATION FUNCTION The output of a neuron (y) is a function of the weighted sum y = f (v) This function is often called the activation function. What function is it and how is it computed? Linear function: f (v) = a + v = a + w i x i 2 y Linear where parameter a is called bias. Noice that in this case, a neuron becomes a linear model with bias a being the intercept and the weights, w 1,...,w m, being the slopes. 0-2 -2-1 0 1 2 v

ACTIVATION FUNCTION The output of a neuron (y) is a function of the weighted sum y = f (v) This function is often called the activation function. What function is it and how is it computed? Heviside step function: f (v) = { 1 if v a 0 otherwise y Step Here a is called the threshold Example If a = 0 and v = 2 > 0, then f (2) = 1, the neuron fires 1 0.5 0-2 -1 0 1 2 v

ACTIVATION FUNCTION The output of a neuron (y) is a function of the weighted sum y = f (v) This function is often called the activation function. What function is it and how is it computed? Sigmoid function: 1 y Sigmoid f (v) = 1 1 + e v 0.5 0-2 -1 0 1 2 v

NEURONS AS DATA-DRIVEN MODELS We use data to create models representing the relation between the input x and the output y variables (e.g. between income and credit score) y = f (x) + Error

NEURONS AS DATA-DRIVEN MODELS We use data to create models representing the relation between the input x and the output y variables (e.g. between income and credit score) y = f (x) + Error If we use data to adjust parameters (the weights) to reduce the error, then a neuron becomes a data-driven model

NEURONS AS DATA-DRIVEN MODELS We use data to create models representing the relation between the input x and the output y variables (e.g. between income and credit score) y = f (x) + Error If we use data to adjust parameters (the weights) to reduce the error, then a neuron becomes a data-driven model If we use only linear activation functions, then a neuron is just a linear model with weights corresponding to slopes (i.e. related to correlations) f (x 1,...,x m ) = a + w 1 x 1 + + w m x m

SO, WHAT IS DIFFERENT FROM LINEAR MODELS? The linear mean square regression is a good technique, but it relies heavily on the use of quadratic cost function c(y, f (x)) = y f (x) 2

SO, WHAT IS DIFFERENT FROM LINEAR MODELS? The linear mean square regression is a good technique, but it relies heavily on the use of quadratic cost function c(y, f (x)) = y f (x) 2 Neurons can be trained to using other cost functions, such as the absolute deviation: c(y, f (x)) = y f (x)

SO, WHAT IS DIFFERENT FROM LINEAR MODELS? The linear mean square regression is a good technique, but it relies heavily on the use of quadratic cost function c(y, f (x)) = y f (x) 2 Neurons can be trained to using other cost functions, such as the absolute deviation: c(y, f (x)) = y f (x) Networks of many neurons can be seen as sets of multiple and competing models.

SO, WHAT IS DIFFERENT FROM LINEAR MODELS? The linear mean square regression is a good technique, but it relies heavily on the use of quadratic cost function c(y, f (x)) = y f (x) 2 Neurons can be trained to using other cost functions, such as the absolute deviation: c(y, f (x)) = y f (x) Networks of many neurons can be seen as sets of multiple and competing models. Neural networks can be used to model non-linear relations in data.

FEED-FORWARD NEURAL NETWORKS A collection of neurons connected together in a network can be represented by a directed graph: 1 4 2 5 3

FEED-FORWARD NEURAL NETWORKS A collection of neurons connected together in a network can be represented by a directed graph: 1 4 2 5 3 Nodes represent the neurons, and arrows represent the links between them.

FEED-FORWARD NEURAL NETWORKS A collection of neurons connected together in a network can be represented by a directed graph: 1 4 2 5 3 Nodes represent the neurons, and arrows represent the links between them. Each node has its number, and a link connecting two nodes will have a pair of numbers (e.g. (1, 4) connecting nodes 1 and 4).

FEED-FORWARD NEURAL NETWORKS A collection of neurons connected together in a network can be represented by a directed graph: 1 4 2 5 3 Nodes represent the neurons, and arrows represent the links between them. Each node has its number, and a link connecting two nodes will have a pair of numbers (e.g. (1, 4) connecting nodes 1 and 4). Networks without cycles (feedback loops) are called a feed-forward networks (or perceptron).

INPUT AND OUTPUT NODES 1 4 2 5 3 Input nodes of the network (nodes 1, 2 and 3) are associated with the input variables (x 1,...,x m ). They do not compute anything, but simply pass the values to the processing nodes. Output nodes (4 and 5) are associated with the output variables (y 1,...,y n ).

HIDDEN NODES AND LAYERS A neural network may have hidden nodes they are not connected directly to the environment ( hidden inside the network): 1 4 6 2 5 7 3

HIDDEN NODES AND LAYERS A neural network may have hidden nodes they are not connected directly to the environment ( hidden inside the network): 1 4 6 2 5 7 3 We may organise nodes in layers: input (nodes 1,2 and 3), hidden (4 and 5) and output (6 and 7) layers.

HIDDEN NODES AND LAYERS A neural network may have hidden nodes they are not connected directly to the environment ( hidden inside the network): 1 4 6 2 5 7 3 We may organise nodes in layers: input (nodes 1,2 and 3), hidden (4 and 5) and output (6 and 7) layers. Neural networks can have several hidden layers.

NUMBERING THE WEIGHTS Each jth node in a network has a set of weights w ij. 1 4 6 2 5 7 3 For example, node 4 has weights w 14, w 24 and w 34.

NUMBERING THE WEIGHTS Each jth node in a network has a set of weights w ij. 1 4 6 2 5 7 3 For example, node 4 has weights w 14, w 24 and w 34. A network is completely defined if we know its topology (its graph), the set of all weights w ij and the activation functions, f, of all the nodes.

Example 2 W 13 W W 35 23 5 W 14 W 4 45 W 24 1 3 w 13 = 2 w 23 = 3 w 35 = 2 w 14 = 1 w 45 = 1 w 24 = 4 f (v) = { 1 if v 0 0 otherwise What is the network output, if the inputs are x 1 = 1 and x 2 = 0?

SOLUTION 1 Calculate weighted sums in the first hidden layer: v 3 = w 13 x 1 + w 23 x 2 = 2 1 3 0 = 2 v 4 = w 14 x 1 + w 24 x 2 = 1 1 + 4 0 = 1

SOLUTION 1 Calculate weighted sums in the first hidden layer: v 3 = w 13 x 1 + w 23 x 2 = 2 1 3 0 = 2 v 4 = w 14 x 1 + w 24 x 2 = 1 1 + 4 0 = 1 2 Apply the activation function: y 3 = f (2) = 1, y 4 = f (1) = 1

SOLUTION 1 Calculate weighted sums in the first hidden layer: v 3 = w 13 x 1 + w 23 x 2 = 2 1 3 0 = 2 v 4 = w 14 x 1 + w 24 x 2 = 1 1 + 4 0 = 1 2 Apply the activation function: y 3 = f (2) = 1, y 4 = f (1) = 1 3 Calculate the weighted sum of node 5: v 5 = w 35 y 3 + w 45 y 4 = 2 1 1 1 = 1

SOLUTION 1 Calculate weighted sums in the first hidden layer: v 3 = w 13 x 1 + w 23 x 2 = 2 1 3 0 = 2 v 4 = w 14 x 1 + w 24 x 2 = 1 1 + 4 0 = 1 2 Apply the activation function: y 3 = f (2) = 1, y 4 = f (1) = 1 3 Calculate the weighted sum of node 5: v 5 = w 35 y 3 + w 45 y 4 = 2 1 1 1 = 1 4 The output is y 5 = f (1) = 1

TRAINING NEURAL NETWORKS Let us invert the previous problem: Suppose that the inputs to the network are x 1 = 1 and x 2 = 0, and f is a step function. Find values of the weights, w ij, such that the output of the network y 5 = 0?

TRAINING NEURAL NETWORKS Let us invert the previous problem: Suppose that the inputs to the network are x 1 = 1 and x 2 = 0, and f is a step function. Find values of the weights, w ij, such that the output of the network y 5 = 0? This problem is more difficult, because there are more unknowns (weights) than knowns (input and output). In general, there is an infinite number of solutions.

TRAINING NEURAL NETWORKS Let us invert the previous problem: Suppose that the inputs to the network are x 1 = 1 and x 2 = 0, and f is a step function. Find values of the weights, w ij, such that the output of the network y 5 = 0? This problem is more difficult, because there are more unknowns (weights) than knowns (input and output). In general, there is an infinite number of solutions. The process of finding a set of weights such that for a given input the network produces the desired output is called training.

SUPERVISED LEARNING Algorithms for training neural networks can be supervised (i.e. with a teacher ) and unsupervised (self organising).

SUPERVISED LEARNING Algorithms for training neural networks can be supervised (i.e. with a teacher ) and unsupervised (self organising). Supervised algorithms use a training set a set of pairs (x, y) of inputs with their corresponding desired outputs. We may think of a training set as a set of examples.

SUPERVISED LEARNING Algorithms for training neural networks can be supervised (i.e. with a teacher ) and unsupervised (self organising). Supervised algorithms use a training set a set of pairs (x, y) of inputs with their corresponding desired outputs. We may think of a training set as a set of examples. An outline of a supervised learning algorithm: 1 Initially, set all the weights w ij to some random values 2 Repeat 1 Feed the network with an input x from one of the examples in the training set 2 Compute the network s output f (x) 3 Change the weights w ij of the nodes 3 Until the error c(y,f (x)) is small

DISTRIBUTED MEMORY After training, the weights represent properties of the training data (similar to the covariance matrix, slopes of a linear model, etc)

DISTRIBUTED MEMORY After training, the weights represent properties of the training data (similar to the covariance matrix, slopes of a linear model, etc) Thus, the weights form the memory of a neural network.

DISTRIBUTED MEMORY After training, the weights represent properties of the training data (similar to the covariance matrix, slopes of a linear model, etc) Thus, the weights form the memory of a neural network. The knowledge in this case is said to be distributed across the network. Large number of nodes not only increases the storage capacity of a network, but also ensures that the knowledge is robust.

DISTRIBUTED MEMORY After training, the weights represent properties of the training data (similar to the covariance matrix, slopes of a linear model, etc) Thus, the weights form the memory of a neural network. The knowledge in this case is said to be distributed across the network. Large number of nodes not only increases the storage capacity of a network, but also ensures that the knowledge is robust. By changing the weights in the network we may store new information.

GENERALISATION By memorising patterns in the data during training, neural networks may produce reasonable answers for input patterns not seen during training (generalisation). Data Function with noise Data Function without noise 3 3 2 2 1 1 0 0 2 4 6 8 10 Time 0 0 2 4 6 8 10 Time

GENERALISATION By memorising patterns in the data during training, neural networks may produce reasonable answers for input patterns not seen during training (generalisation). Generalisation is particularly useful for classification of noisy data, the what-if analysis and prediction (e.g. time-series forecast) Data Function with noise Data Function without noise 3 3 2 2 1 1 0 0 2 4 6 8 10 Time 0 0 2 4 6 8 10 Time

APPLICATION OF ANN Include: Function approximation (modelling) Pattern classification (analysis of time series, customer databases, etc). Object recognition (e.g. character recognition) Data compression Security (credit card fraud)

PATTERN CLASSIFICATION In some literature, the set of all input values is called the input pattern, and the set of output values the output pattern x = (x 1,...,x m ) y = (y 1,...,y n )

PATTERN CLASSIFICATION In some literature, the set of all input values is called the input pattern, and the set of output values the output pattern x = (x 1,...,x m ) y = (y 1,...,y n ) A neural network learns the relation between different input and output patterns.

PATTERN CLASSIFICATION In some literature, the set of all input values is called the input pattern, and the set of output values the output pattern x = (x 1,...,x m ) y = (y 1,...,y n ) A neural network learns the relation between different input and output patterns. Thus, a neural network performs pattern classification or pattern recognition (i.e. classifies inputs into output categories).

TIME SERIES ANALYSIS A time series is a recording of some variable (e.g. a share price, temperature) at different time moments: x(t 1 ), x(t 2 ),...,x(t m ) Data 3 Time series 2 1 0 0 2 4 6 8 10 Time The aim of the analysis is to learn to predict the future values.

TIME SERIES (CONT.) We may use a neural network to analyse time series: Input: consider m values in the past x(t 1 ), x(t 2 ),...,x(t m ) as m input variables. Output: consider n future values y(t m+1 ), y(t m+2 ),...,y(t m+n ) as n output variables.

TIME SERIES (CONT.) We may use a neural network to analyse time series: Input: consider m values in the past x(t 1 ), x(t 2 ),...,x(t m ) as m input variables. Output: consider n future values y(t m+1 ), y(t m+2 ),...,y(t m+n ) as n output variables. Our goal is to find the following model (y(t m+1 ),...,y(t m+n )) f (x(t), x(t 1 ),...,x(t m ))

TIME SERIES (CONT.) We may use a neural network to analyse time series: Input: consider m values in the past x(t 1 ), x(t 2 ),...,x(t m ) as m input variables. Output: consider n future values y(t m+1 ), y(t m+2 ),...,y(t m+n ) as n output variables. Our goal is to find the following model (y(t m+1 ),...,y(t m+n )) f (x(t), x(t 1 ),...,x(t m )) By training a neural network with m inputs and n outputs on the time series data, we can create such a model.

BENEFITS OF NEURAL NETWORKS Can be applied to many problems, as long as there is some data.

BENEFITS OF NEURAL NETWORKS Can be applied to many problems, as long as there is some data. Can be applied to problems, for which analytical methods do not yet exist

BENEFITS OF NEURAL NETWORKS Can be applied to many problems, as long as there is some data. Can be applied to problems, for which analytical methods do not yet exist Can be used to model non-linear dependencies.

BENEFITS OF NEURAL NETWORKS Can be applied to many problems, as long as there is some data. Can be applied to problems, for which analytical methods do not yet exist Can be used to model non-linear dependencies. If there is a pattern, then neural networks should quickly work it out, even if the data is noisy.

BENEFITS OF NEURAL NETWORKS Can be applied to many problems, as long as there is some data. Can be applied to problems, for which analytical methods do not yet exist Can be used to model non-linear dependencies. If there is a pattern, then neural networks should quickly work it out, even if the data is noisy. Always gives some answer even when the input information is not complete.

BENEFITS OF NEURAL NETWORKS Can be applied to many problems, as long as there is some data. Can be applied to problems, for which analytical methods do not yet exist Can be used to model non-linear dependencies. If there is a pattern, then neural networks should quickly work it out, even if the data is noisy. Always gives some answer even when the input information is not complete. Networks are easy to maintain.

LIMITATIONS OF NEURAL NETWORKS Like with any data-driven models, they cannot be used if there is no or very little data available.

LIMITATIONS OF NEURAL NETWORKS Like with any data-driven models, they cannot be used if there is no or very little data available. There are many free parameters, such as the number of hidden nodes, the learning rate, minimal error, which may greatly influence the final result.

LIMITATIONS OF NEURAL NETWORKS Like with any data-driven models, they cannot be used if there is no or very little data available. There are many free parameters, such as the number of hidden nodes, the learning rate, minimal error, which may greatly influence the final result. Not good for arithmetics and precise calculations.

LIMITATIONS OF NEURAL NETWORKS Like with any data-driven models, they cannot be used if there is no or very little data available. There are many free parameters, such as the number of hidden nodes, the learning rate, minimal error, which may greatly influence the final result. Not good for arithmetics and precise calculations. Neural networks do not provide explanations. If there are many nodes, then there are too many weights that are difficult to interprete (unlike the slopes in linear models, which can be seen as correlations). In some tasks, explanations are crucial (e.g. air traffic control, medical diagnosis).