Artificial Neural Networks

Size: px

Start display at page:

Download "Artificial Neural Networks"

Primrose Tyler
6 years ago
Views:

1 Artificial Neural Networks Short introduction Bojana Dalbelo Bašić, Marko Čupić, Jan Šnajder Faculty of Electrical Engineering and Computing University of Zagreb Zagreb, June 6, 2018 Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

2 Motivation Automated data processing is nowadays mostly carried out by digital computers. However, there is still far more data that are not automatically processed. This data is processed by nervous systems of living organisms! The development of one computer science branch has been motivated by the dominant way of data processing in the world we live in. We are looking for a different concept of data processing which would be more similar to the way the biological brain functions. An AI-system that successfully mimics the functioning of the brain would be intelligent. Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

3 Contents 1 Introduction to neuro-computing 2 Artificial neuron 3 Artificial neural network Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

4 Contents 1 Introduction to neuro-computing 2 Artificial neuron 3 Artificial neural network Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

5 Motivation for the development of neuro-computing It is known that the brain consists of a large number of neurons operating in parallel. We also know the following about the human brain: There is more than 100 different types of neurons Each type of neuron performs a very simple function The processing speed of a neuron is: 2 milliseconds per input The number of neurons in the human brain: On average, each neuron acquires inputs from between 10 3 and 10 4 other neurons Information is processed both serially and in parallel Information is analogous The processing is fault-tolerant Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

6 Motivation for the development of neuro-computing We wish to build a computer system that would process data in the same way! New paradigm: artificial neural networks The research area dealing with this aspect of data processing: neuro-computing A branch of computing from the group of soft computing Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

7 Development directions of artificial intelligence Since the first days of AI (early 50s) there are two approaches to developing intelligent systems: The symbolic approach: domain knowledge is represented by a set of atomic semantic objects (symbols) which are then manipulated using algorithmic rules The connectivist approach: based on building an architecture similar to that of the brain which, instead of being programmed, learns based on experience The symbolic approach performs well in many areas (it became especially popular with the development of expert systems), but it did not fulfill very high early expectations. The cause of the failure lies in the assumption that all knowledge can be formalized and that the brain is a machine that processes data using formal rules. Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

The connectivist approach Many everyday tasks are too complex for symbolic representations, e.g. pattern recognition Our mother we can recognize within 0.

8 The connectivist approach Many everyday tasks are too complex for symbolic representations, e.g. pattern recognition Our mother we can recognize within 0.1 s Neurons in the brain fire roughly once every millisecond It follows that in the the given time at most 100 neurons can fire Obviously processing is parallel! Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

9 Artificial neural network - definition In a broader sense: an artificial replica of the human brain which seeks to simulate the process of learning and data processing. Artificial neural network An artificial neural network is a set of interlinked simple processing elements (neurons) whose functionality is based on the biological neuron and serves to enable distributed parallel data processing. Enables robust data processing. Can be used for both classification and regression tasks. Capable of learning from data. Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

10 Learning an artificial neural network Two phases of using an ANN: 1 The learning (training) phase and 2 The data processing phase (exploatation). Learning is an iterative procedure of presenting the network with input examples (experience) and possibly expected outputs which adapts the weights between the neurons. One iteration of presenting all training examples to the network is called an epoch Different variants of learning: 1 On-line learning: weights are adapted after every train example 2 Minibatch learning: weights are adapted after several training examples 3 Batch learning: weights are adapted only after all training examples have been presented to the network The knowledge of how to convert inputs to desired outputs is represented implicitly in the neuron weights Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

11 Learning an artificial neural network Learning artificial neural networks can be: 1 supervised (learning with a teacher) 2 unsupervised (learning without a teacher) 3 reinforcement learning The set of available training examples is often divided into: 1 Training set: e.g. 70% of examples, serves for iterative adaptation of weights 2 Validation set: e.g. 15% of examples, serves to check the generalization power of the network 3 Test set: e.g. 15% of examples, used for final check of the network performance and comparison with different models Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

12 Learning an artificial neural network Learning is carried out on the training set by minimizing a measure of output errors. Accuracy is also monitored on the validation set (but this set is not used for adapting the weights). Overfitting: the network loses the desirable generalization property and becomes an expert for the train set. To prevent overfitting, we stop learning when the validation set error starts to rise. pogreška prekinuti učenje pogreška na skupu za provjeru pogreška na skupu za učenje epoha Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

13 Contents 1 Introduction to neuro-computing 2 Artificial neuron 3 Artificial neural network Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

14 The biological neuron The biological neuron consists of the body (soma), dendrites and axon In the human brain each neuron is connected to 1000 do other neurons on average Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

15 Artificial neuron McCulloch-Pitts defined a simple model of a biological neuron (1943.): TLU-perceptron (Threshold Logic Unit) The numerical value of each input x i is multiplied by the sensitivity to that input w i and accumulated in the body. To the total sum a bias of w 0 (sometimes denoted b) is also added. ( n This defines the accumulated value as net = ) x i w i + w 0. i=1 This accumulated value is the input of the activation function which produces the neuron output: o = step(net). w 0 x 1 w 1 x 2 w 2 net prijenosna funkcija step(net) o w n x n Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

16 Artificial neuron In general the the artificial neuron model enables the accumulated value to pass through an activation function f. w 0 x 1 w 1 x 2 w 2 net prijenosna funkcija f(net) o w n x n Activation functions often used in practice: Identity (ADALINE-neuron) Step function (TLU-perceptron) Sigmoid (sigmoid neuron) Hyperbolic tangent Hinge (Rectified Linear Unit, ReLU) Leaky hinge (Leaky Rectified Linear Unit, LReLU) Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

17 Activation functions f(net) f(net) net (a) Identity function net (b) Step function Identity function f (net) = net Step function f (net) = step(net) = { 0 net < 0 1 otherwise Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

18 Activation functions f(net) 0 f(net) (c) Sigmoid function net (d) Hyperbolic tangent net Sigmoid function f (net) = sigm(net) = 1 1+e net Hyperbolic tangent f (net) = tanh(net) = 2 1+e 2 net 1 = 2 sigm(2x) 1 Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

19 Activation functions f(net) 2 f(net) net (e) Hinge net (f) Leaky hinge with α = 0.05 Hinge f (net) = max(0, net) Leaky hinge { net net >= 0 f (net) = α net otherwise Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

20 Classification Classification is the procedure of assigning labels to examples based on a set of features (e.g., color, shape, weight,...). We wish to build a classification system that, based on examples from the training set, can correctly classify new examples If there are only two classes, we are dealing with binary classification. We can use a system that will have a single output defining two clearly separate states (e.g.: 0 and 1; alternatively -1 and 1) which assigns the input example to the first or the second class. If there are multiple classes (and each example belongs to exactly one of them), it is customary to use one-hot encoding: there are as many outputs as there are classes, with the i-th output equalling 1 if the input example belongs to the i-th class and 0 otherwise. Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

21 Binary classification Assume we are building a binary classifier which classifies images into images of dogs and images of cats. Let the "dog" class be encoded as 0 and the "cat" class be encoded as 1. The expected network outputs are shown below the following images: (a) 0 (b) 1 (c) 1 (d) 0 Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

We could ask the output to be a one-hot encoded vector: 100 for dog, 010 for cat, 001

22 Multiclass classification Let s look at a classifier that would classify images into three classes: dogs, cats and parrots. We could ask the output to be a one-hot encoded vector: 100 for dog, 010 for cat, 001 for parrot. (a) 100 (b) 010 (c) 010 (d) 001 (e) 100 (f) 001 Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

23 Binary classification: example During the hiring procedure for the ACME company: each candidate partakes in two independent interviews and is given a rating of 1 to 5 for each based on the two ratings, the HR manager gives a positive or negative opinion on the candidate candidates that get a positive opinion are directed to the second phase of the hiring procedure In an effort to reduce human effort, part of the procedure is being automated. Specifically, we want an AI system to give a positive/negative opinion instead of the HR manager. To develop such a system a data set of the HR manager s past decisions has been collected. Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

24 Binary classification: example Let x 1 be the candidate rating in the first interview and x 2 be the candidate rating in the second interview Company archives contain the following data about ratings and HR manager opinions for past candidates: (x 2, x 1 ) = (2, 5): opinion was positive (x 2, x 1 ) = (5, 2): opinion was positive (x 2, x 1 ) = (1, 5): opinion was negative (x 2, x 1 ) = (5, 1): opinion was negative Based on this data we wish to learn a classifier that would be able to independently determine the appropriate opinion for future candidates. Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

25 Binary classification: example Since there are only two classes, as a classifier we will use a single neuron with a step activation function whose output is, depending on th input, either -1 or 1 the class negative opinion will be encoded by -1 the class positive opinion will be encoded by 1 This setup yields the following set of training examples: x 2 x 1 Class t 2 5 positive opinion positive opinion negative opinion negative opinion -1 Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

26 Binary classification: example The neuron we use: w 0 x 1 w 1 w 2 net prijenosna funkcija step(net) (-1 ili 1) o x 2 The input x 1 is the rating from the first interview The input x 2 is the rating from the second interview. We compute the accumulated sum net = x 2 w 2 + x 1 w 1 + w 0 The final output is given as o = step(net) Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

27 Binary classification: example Assume we set the weights to random values, and they are w 2 = 1, w 1 = 1.3 i w 0 = How will this neuron classify the examples from the training set? (x 2, x 1, x 0 ) t (w 2, w 1, w 0 ) net o Correct (2, 5, 1) 1 (1, 1.3, 5.85) yes (5, 2, 1) 1 (1, 1.3, 5.85) yes (1, 5, 1) 1 (1, 1.3, 5.85) no (5, 1, 1) 1 (0.96, 1.1, 5.89) no Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

28 Binary classification: example The decision boundary for the TLU neuron is linear: o = step(net) changes value when the sign of net changes therefore, we are interested in when net = 0, which is: x 2 w 2 + x 1 w 1 + w 0 = 0 the is a line in two dimensional space where the axes correspond to x 1 and x 2 : x 2 = w 1 w 2 x 1 w 0 w 2 where w 1 w 2 is the slope of the line This line divides this two dimensional space into two subspaces: one of the contains all examples for which the classifier outputs -1 and the other contains all examples for which the output is 1 In three dimensions this decision boundary would be a plane, in multiple dimensional spaces it is a hyperplane Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

29 Binary classification: example Let s look at our specific example where we compute net = 1 x x The decision boundary is given in the image. Examples for which the classifier assigns the -1 output are in the yellow subspace (e.g. the example (1,1) is there). Examples for which the classifier assigns 1 as the output are in the white subspace (e.g. the example (5,5) is there). The image shows that the classifier is not very good. All training examples are assigned the output 1. x x 1 Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

30 Learning from examples Hebb found out by studying biological neurons: to learn is to change the weights of connections between neurons Rosenblatt combines Hebbs idea and the McCulloch-Pitts neuron model and defines the perceptron learning rule Perceptron learning rule 1 Cyclically go through all N training examples, one by one. 2 Classify the current training example 1 If the classification decision is correct, do not change the weights. 1 if this is the N-th consecutive correctly classified example, stop the procedure, 2 otherwise go to the next training example. 2 If the classification decision is incorrect, adapt the weights using the following expression: w i (k + 1) w i (k) + η (t o) x i Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

31 Learning from examples The parameter Parametar η (eta) is called the learning rate It is a small positive number (e.g., between and 0.5) which defines how much the current weight values are going to be modified. If η is too small, the learning procedure will be progress very slowly. If η is too big, the learning procedure could diverge. Let s run the learning procedure with η = 0.02 (refer to the material "Umjetne neuronske mreže" available at chapter 2 for the entire example). The procedure ends with the following weights: w 2 = 0.92, w 1 = 0.94, w 0 = Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

32 Binary classification: example Now the TLU-perceptron computes net = 0.92 x x All training examples are correctly classified The system has learned to generalize: examples like (1,1) will get the label -1 and examples such as (5,5) will get the label 1. This corresponds to decisions a human would make x x 1 Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

33 Limitations of the TLU-perceptron the TLU-perceptron has a linear decision boundary Such a neuron can not solve classification problems where the classes are not linearly separable An example of linearly-inseparable classes (the XOR function) is given in the image to the right To solve such more complex problems we will consider systems that are comprised of more neurons: artificial neural networks x x1 Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

34 Matrix notation In implementation, especially when dealing with neural networks, it s appropriate to use matrix notation. We will stick to the following convention: a vector will denote a single column matrix as y = f ( x) where f is a scalar function of a single variable will denote a new vector obtained by applying f to every element of x, i.e. y(i) = f (x(i)). If a neuron has n inputs, we denote them as x = (x 1, x 2,..., x n ) In such a case it also has n weights w = (w 1, w 2,..., w n ) and one threshold b (i.e. w 0 ) In this notation: o = f (net) = f ( w T x + b) If we have access to an appropriate library for matrix/vector operations our code can be very concise and efficient. Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

35 Contents 1 Introduction to neuro-computing 2 Artificial neuron 3 Artificial neural network Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

36 Architectures To enable modelling of more complex patterns in data for classification and regression, we will use more neurons The architecture of a neural network tells us how many neurons there are and in what way they are connected Layered neural network Non layered neural network x 1 o 1 x 1 o 1 x 2 o 2 x 2 o 2 ulazni sloj 1. skriveni sloj 2. skriveni sloj izlazni sloj ulazni sloj 1. skriveni sloj 2. skriveni sloj izlazni sloj For the network on the right a problem is the recurrent connection which means the network is no longer feedforward. Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

37 The need for non-linear activation functions Assume we have built a multi-layered neural network in which all neurons have the identity as the activation function This entire neural network is equally expressive as a single neuron of this type: a linear combination of linear combinations is itself just a linear combination To increase the expressive power of the network and allow the modelling of non-linear patterns, the neurons must have non-linear activation functions. For feedforward neural networks: until recently it was common to use sigmoid activation functions today ReLU functions are preferred because they allow training deeper networks (networks with more layers) Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

38 Feedforward layered neural network: example 1. sloj neurona (1) w 2. sloj neurona 3. sloj neurona 4. sloj neurona sloj težina y 1 (1) w 11 (1) 1 2 (2) y 1 2. sloj težina (2) w 11 (2) w 01 (2) (2) w 13 w sloj težina (3) y 1 (3) (3) w w x y 1 (4) o 1 x 2 2 y 2 (1) 3 2 w 03 (2) w 02 (3) 2 y 2 (4) o 2 w 25 (1) 4 3 y 3 (3) w 32 (3) w 05 (1) w 51 (2) w 52 (2) w 53 (2) 5 y 5 (2) ulazni sloj 1. skriveni sloj 2. skriveni sloj izlazni sloj Slika : Feedforward layered neural network: example Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

39 Feedforward layered neural network: example this network has two hidden layers and one output layer it performs the following mapping: R R R R conventions: we will denote the input of the network with a two component vector: x the output of each subsequent layer will also be a vector of dimensionality equal to the number of neurons in the layer the outputs of hidden layers will be denoted by h, which gives h 1 (dim=5), h 2 (dim=3) and y = h 3 (dim=2) Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

40 Feedforward layered neural network: example The first neuron of the first hidden layer computes: ( [ ] [ ] ) y (2) 1 = f w (1) 11 w (1) x1 + w (1) The second neuron of the first hidden layer computes: ( [ ] [ ] ) y (2) 2 = f w (1) 12 w (1) x1 + w (1) If we use matrix notation the complete output of the layer can be written as: y (2) 1 w (1) y (2) 11 w (1) 21 w (1) 2 y (2) w (1) 12 w (1) [ ] 3 = f w (1) y (2) 13 w (1) x1 w (1) w (1) y (2) 14 w (1) x 2 w (1) w (1) 5 w (1) 15 w (2) w (1) 05 Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56 x 2 x 2

41 Feedforward layered neural network: example A more concise way to write the previous expression: h1 = f (W 1 x + b 1 ) where h 1 is the vector of outputs for the layer, W 1 is the weight matrix (one line represents weights of one neuron), x a vector of inputs, and b 1 a vector of thresholds (biases) with an additional generalization we come to this expression: hi = f (W i h i 1 + b i ) with h 0 = x and y = h 3, which is easy to implement note: for each layer we must know the weight matrix, the thresholds vector and the activation function Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

42 Learning: adapting weights Which outputs the network will give for given inputs is determined by the weights and thresholds of the network During supervised learning, the network has access to a set of training examples consisting of pairs (input, desired_output): {(x 1,1,..., x 1,Ni ) (t 1,1,..., t 1,No ),..., (x N,1,..., x N,Ni ) (t N,1,..., t N,No )} where N i is the dimensionality of the input (number of features) and N o is the dimensionality of the output. To arrive at an algorithm to adapt weights we must first define an error function E. An often used error function is the halved sum of mean squared differences: E = 1 2 N E(s) = 1 2 s=1 N s=1 1 N N o i=1 (t s,i o s,i ) 2. Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

43 Learning: adapting weights Once the error function has been defined, if it is differentiable, it s possible to write an optimization procedure based on calculating the gradients (partial derivatives of the error function with respect to each weight and threshold) the partial derivative tells us how the error function will change if we increase/decrease the weight we can use this information to modify the weight in a way that will decrease the error function E Error backpropagation An algorithm for learning neural networks, based on efficient computation of all partial derivatives of the error function and using them to determine the appropriate modification for each weight is called Error Backpropagation We will not cover the derivation, but will only present the final result. Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

44 Error backpropagation: the algorithm 1 Initialize weights to random values 2 Repeat until the stopping criterion is satisfied 1 For each training example s : (x s,1,..., x s,ni ) (t s,1,..., t s,no ) do: 1 Set the example (x s,1,..., x s,ni ) as the input of the network. 2 Compute outputs of all neurons from all layers, from the first to the last layer; denote the outputs of the final layer as (o s,1,..., o s,no ). 3 Determine errors of output layer neurons: δ K i = o s,i (1 o s,i ) (t s,i o s,i ). 4 Go back layer by layer toward the first layer. For the i-th neuron of the k-th layer the error is: δ (k) i = y (k) i (1 y (k) i ) w i,d δ (k+1) d 5 Modify all weights. Weight w (k) i,j w (k) i,j w (k) i,j and the thresholds are modified as: d Downstream is modified as: + η y (k) i δ (k+1) j w (k) 0,j w (k) 0,j + η δ (k+1) j. Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

45 Error backpropagation: the algorithm In this course we do not cover the derivation of the Backpropagation algorithm A more detailed description of the algorithm is given in the additional materials The expressions in the pseudocode assume that the sigmoid function was used as the activation function for all neurons In case of other transfer functions, parts of expressions which appeared as the derivative of the sigmoid (f (1 f )) must be replaced by the derivatives of the alternative activation function The expressions for weight modifications can be remembered "visually" as shown on the next two slides Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

46 Error backpropagation: the algorithm Error computation for a neuron in the output layer. y i (k) w ij (k) y j (k+1) o j želim: t j izlazni sloj Error: product of the derivation of the neurons activation function (y (k+1) j (1 y (k+1) j )) and the actual error (t j o j ). δ k+1 j = o s,j (1 o s,j ) (t s,j o s,j ). Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

47 Error backpropagation: the algorithm Error computation for a neuron of the hidden layer: δ (k) j. 1 y 1 (k+1) w j1 (k) y i (k-1) w ij (k-1) j y j (k) w j2 (k) 2 y 2 (k+1) w jm (k) m y m (k+1) Error: product of the neurons activation function and a weighted sum of errors of all neurons to which this neuron sends its output: δ (k) j = y (k) j (1 y (k) j ) (w (k) j,1 δ(k+1) w (k) j,m δ(k+1) m ). Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

48 Error backpropagation: the algorithm Weight modification: proportional to the product of the learning rate η, the output of the neuron to the left of the weight and the error of the neurona to the right of the weight: y i (k) w ij (k) y j (k+1) o j želim: t j izlazni sloj wij k = η y (k) i δ (k+1) j wij k wij k + wij k For thresholds the output of the neuron "to the left" is a constant 1, so we have: w k 0j = η δ (k+1) j Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

49 Examples At the address in the section Umjetne neuronske mreže implementations and descriptions of several examples are available. Classification of examples in 2D Functional regression Gesture classification - how to convert a gesture performed with a mouse into a class (digits 0-9) Recommendation: try it out for yourself. Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

50 Example: classification and regression We consider using a neural network for classifying 2D examples and for regression. Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

Example: digit recognition using gestures Task: create a program that uses a neural network to recognize digits (0-9) based on a gesture which the user performs using the mouse The

51 Example: digit recognition using gestures Task: create a program that uses a neural network to recognize digits (0-9) based on a gesture which the user performs using the mouse The program allows gathering training examples, carrying out the learning procedure, and using the trained network Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

Example: classifying bank notes Task: classify four types of paper bank notes regardless of orientation The set of training examples consists

52 Example: classifying bank notes Task: classify four types of paper bank notes regardless of orientation The set of training examples consists of 16 different patterns sampled with 45x22 image elements Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

53 Example: classifying bank notes The examples are preprocessed before putting them at the input of the ANN Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

54 Example: classifying bank notes Since the examples are digitalized images, they will be different in the intensity of corresponding image elements (a consequence of differences in the general degree of wear and weather the paper is crumpled or otherwise damaged) In the example, a generator of artificial examples is used, which generates examples with varying degrees of wear, in order to validate whether the network works and tune its parameters. Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

Example: classifying bank notes Network parameters: acyclic fully connected multi-layered neural network 990 3 4 Learning: error backpropagation, learning_rate=0.

55 Example: classifying bank notes Network parameters: acyclic fully connected multi-layered neural network Learning: error backpropagation, learning_rate=0.02, moment=0.02, validating the model on the validation set every 2500 epochs Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

56 What s next? Artificial neural networks nowadays have numerous applications (processing of images, sound, text) Some types of networks we omitted today are also often used: convolutional neural networks, recurrent neural networks More information is available in graduate level courses Machine learning Fuzzy, evolutionary, and neuro-computing Deep learning Dalbelo Bašić, Čupić, Šnajder (FER) Artificial Neural Networks lipanj / 56

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,