Artificial Neural Networks
|
|
- Rosanna Bailey
- 5 years ago
- Views:
Transcription
1 Artificial Neural Networks Math 596 Project Summary Fall 2016 Jarod Hart 1 Overview Artificial Neural Networks (ANNs) are machine learning algorithms that imitate neural function. There is a vast theory of ANNs available these days. To limit the scope and length of this project summary, we will only consider three ANN algorithms, which are typically the first ones introduced when starting to work with ANNs. They are the Perceptron, Adaline, and multi-layer feedforward networks. They are all supervised learning algorithms, meaning that they train their parameters based on a set of pre-classified (or labeled) training data. That is, each element of the training data has a known label associated to it. The goal is to learn how to classify novel input according to a rule learned from the training data. We should acknowledge that calling the Perceptron and Adaline models neural networks may be overstating their complexity. Indeed these models are only formulated based on only a single neuron. Hence they could be viewed as a trivial network, but it may be more conducive to think of them as a primer for building mathematical models of neurons. Their construction introduces some foundational ideas of the theory, and they make it much more manageable to work with more complicated networks, like multi-layer networks. There are ways to accomplish more complicated tasks by preprocessing training data and/or using several Perceptrons/Adaline neurons, but even in these models the neurons still function largely independently (not in a coordinated way inherent to more complicated ANN models). We mention a couple ways to extend these models in the Possible Extensions section. There is an important development in the transition from the Perceptron to the Adaline model. Roughly speaking, the Perceptron learns by using a somewhat ad hoc training rule that can be motivated by a geometric argument. The training is a little cumbersome and is reliant on a linear structure in some ways. This makes it difficult to extend directly to more complicated and nonlinear models. Adaline introduces a shift in point of which, which is to train the neuron by minimizing an error function. This notion of minimization in place of a more geometric argument is much more easily extended to more complicated settings, which can be observed in the construction of the multi-layer networks. Much of the information presented here is taken from Mitchell s book on machine learning, but several aspects are presented differently and at times more concretely. In particular, the details of the learning rules here are laid out in more detailed but less general terms. This description is also much less comprehensive, which allows a much shorter presentation of the material. This may be of use for those just starting to work with ANNs, but would probably be best used in along with other resources. 2 Mathematical and Programming Content To complete this project, a background in the following topics is recommended. - Linear algebra: Some familiarity with matrix operations and dot products is necessary for this project. In addition, some understanding of separating hyperplanes is helpful. Although, it does not require a rigorous understanding linear independence, linear combinations, bases, subspaces, diagonalization, etc. - Convex optimization theory: A significant component of training ANNs in the summary is based on gradient descent in reducing a squared error function. A rough understanding of how gradient descent works, and how to use it to generate an iterative optimization scheme is necessary to complete this project. - Graph theory: A very rudimentary understanding of graph theory is helpful to understand ANN topologies. This background can be limited to the basic familiarity weighted directed graphs.
2 - Programming: This project involves a fair amount of programming ability. A thorough understanding of working with vectors/matrices/arrays, decision statements, and loops are essential to implement these ANNs. For some of the applications a understanding of computer graphics is also helpful. 3 Primary Resources For much of the mathematical content listed above, typical text books in the pertinent area are sufficient. Some additional resources on SIR models and specifically stochastic SIR models are the following (this is by no means a complete list). - T. Mitchell, Machine Learning, McGraw-Hill, B. Kröne and P. van der Smart, An Introduction to Neural Networks, Eighth Edition, University of Amsterdam, Available at archive.org. 4 Mathematical Description of the Project Suppose we are given a training data set X 1,...,X N R n, along with corresponding labels l 1,...,l N { 1,1}. We will denote this training set T = {(X 1,l 1 ),...,(X N,l N )} R n { 1,1}. Our goal is to use this data to learn a function F : R n R such that F(X i ) l i for all (X i,l i ) T. 1 We describe three ways of accomplishing this classification function F through ANNs. They are the Perceptron, Adaline, and multi-layer feed-forward networks. In each situation the function F depends on a weight parameters. The number and structure of the weight parameters used to define F depend on the network topology and learning structure. These will be described in more detail below. However, each of these networks works the same way from the perspective of a supervised learning algorithm. Each model has two stages: a learning stage where the weight parameters are selected based on the training data and a classification stage where the function (using the weights learned in the learning stage) classifies novel inputs. Fist we formulate the Perceptron ANN. 2 Define, for fixed θ R and w R n, the function F : R n { 1,1} F(x) = sgn(θ + w x), where sgn(x) = 1 if x 0 and sgn(x) = 1 if x < 0. Figure 1 shows a depiction of how to interpret this function F(x) as a neuron in the biological sense in the n = 2 situation. The parameters θ and w are the weight parameters for this model. We forego our description of the training of the Perceptron for the moment, and describe how it classifies novel input given θ and w. So suppose for a moment that we have already completed our training process, which means that the weights parameters θ and w are already determined. Then the decision function F classifies new inputs by using the hyperplane defined by θ + w x = 0 to split R n in two (ignoring the issues that arise when a new input lie on this hyperplane). It may be easiest to think of this in n = 2 dimensions, where θ + w 1 x 1 + w 2 x 2 = 0 defines a line. If θ + w 1 x 1 + w 2 x 2 0, then (x 1,x 2 ) lies on one side of the line (or on the line), and if θ + w 1 x 1 + w 2 x 2 < 0, then (x 1,x 2 ) lies on the other side. Hence R 2 is split into two sets, where the Perceptron will fire or not according to the function F(x). Figure 2 below shows the decision rule learned by the Perceptron for a simulated training data set. This principle of the hyperplane θ + w x = 0 for w R n splitting R n in two extends naturally to higher dimensions. This describes the classification stage for the Perceptron. Now let s return to discuss how the perceptron is trained. The purpose of the training stage is to figure out what to make θ R and w R n to correctly classify all the training data T = {(X 1,l 1 ),...,(X N,l N )}. To do this, fix a learning rate γ > 0, and repeatedly execute steps 1 3 below: 1 To avoid ambiguous notation, we reserve subscripted capital X i s to represent x-values from the training set, and use x to represent an arbitrary element of R n. 2 Technically, it may be misleading to call this an ANN since we only describe the action of a single neuron. So this is not exactly a neural network, but rather a single node.
3 x 1 w 1 w 2 + w x sgn( + w x) x 2 Figure 1: A depiction of how the Perceptron can be interpreted as a neuron; x 1 and x 2 are the input stimuli for the neuron, and the weight parameters θ and w determine when the neuron fired through the function sgn(θ + w x). x 1 w 1 w 2 + w x ( + w x) x 2 Figure 2: The left plot shows a simulated training data set where blue o s are 1 and red o s are 1. The right plot shows the decision rule learned by the Perceptron, where points above the line return 1 and points below the line return Choose (X i,l i ) from the training set T = {(X 1,l 1 ),...,(X N,l N )}. 2. Compute y = θ + w X i 3. If y l i < 0, update θ and w according to: θ θ + γ (l i y) w w + γ (l i y) X i. Here the notation θ θ+γ(l y) means overwrite the current value of θ with θ+γ(l y). Note also that the update for w describes the update for the entire w vector; recall that X i R n so that the right hand side is well defined by adding w R n and a scalar multiple of X i R n. There are several ways that this algorithm can be iterated. One can simply choose (X i,l i ) randomly from the training data set T for some set number of iterations. Alternatively, one can cycles through all elements of the training dataset in order several times. There are many other ways that will work. One can interpret the Perceptron learning rule in the following way. We take an element (X i,l i ) from T, and check if the Perceptron classifies it correctly (with the current weight parameters). If the point is classified correctly by the
4 Perceptron, we leave the weights unchanged. If the point is misclassified, then we update the weights in such a way that the updated weights are more likely to classify it correctly. For larger values of γ, the weight update changes θ and w more sporadically. Making γ small will cause less sporadic changes, but if they are too small it may take a long time for the Perceptron to train. It is not hard to make a geometric argument for defining this update rule, but it can also be justified using an optimization argument. We will mention this argument at the end of the description of the Adaline ANN. There are a few things that should be observed with the Perceptron ANN. First, the decision rule shown in the right plot of Figure 2 does not appear to be optimally placed. Indeed, it is very close to the red cluster of the training set, and hence one would expect this particular training of the Perceptron would be susceptible to misclassifying red points as blue ones. This type of non-optimal placement of the decision boundary arises from the sporadic update rule, and from the fact that the Perceptron stops updating once it has classified all points correctly. So it will not try to determine the decision hyperplane optimally, only so that it classifies all point correctly. Second, in order for the Perceptron to work well, the training data must be linearly separable. If it is not, the decision line will continue to change sporadically (depending on the size of γ), and it has no hope to fully classify the training data. In Figure 3, we show a simulated training data set that is not linearly separable, and hence for which the Perceptron (at least applied directly as described here) will fail. The Adaline ANN will provide a solution that better addresses the first issue, 3 but will still be limited by the second one the same way the Perceptron is. The second issue can be solved using the last model we discuss, multi-layer ANNs. Figure 3: The left plot shows a simulated training data set of an exclusive or type decision where blue o s are 1 and red o s are 1. The right plot shows the decision rule learned by the perceptron, which has failed to generate an accurate decision rule. The Adaline ANN is an acronym for Adaptive Linear Element. It is still limited to a linear classifier, but it provides an important shift in point of view for updating ANNs. For this model, we modify F(x) slightly. Define σ(x) = tanh(x) = e x e x e x +e x, and F(x) = σ(θ + w x), where θ R and w R n are again our weight parameters. Suppose again that we have already trained our Adaline ANN, and hence have fixed weight parameters θ and w R n. Then Adaline can be used to classify new data x R n by evaluating F(x). To view this in the way that neurons are traditionally viewed, with a binary output, we can simply say the neuron fires if F(x) 0 and doesn t fire if F(x) < 0. From a machine learning perspective, it is typically more informative to retain the extra information contained in F(x), rather than just the binary output ±1. This describes the classification function of Adaline. Figure 4 shows a schematic for the Adaline ANN for n = 2 dimensions, which is a 3 See also support vector machines for a solution to this deficit of the Perceptron.
5 w 2 + w x sgn( + w x) x 2 slight modification of the Perceptron pictured in Figure 1. The only difference is that the smooth function σ(x) is used in place of sgn(x). x 1 w 1 w 2 + w x ( + w x) x 2 Figure 4: A depiction of how the Adaline can be interpreted as a neuron; x 1 and x 2 are the input stimuli for the neuron, and the parameters θ and w determine when the neuron fired through the sign of the function σ(θ + w x). It remains to describe the weight parameter training for Adaline. Our goal in this training is to choose the weight parameters θ and w so that F(X i ) l i for all elements of our training set (X i,l i ) {(X 1,l 1 ),...,(X N,l N )}. The important shift in point of view for this model is to define an error associated to our training data classification (as a function of the weights), and choose our weights by minimizing that error. In particular, define the squared error function E i (θ,w) = 1 2 (σ(θ + w X i) l i ) 2 for i = 1,2,...N and E(θ,w) = N i=1 E i (θ,w). Note that the training data (X i,l i ) are treated as fixed quantities here. Then for a given θ R and w R n, E i (θ,w) is half of the squared error of our classification σ(θ + w X i ) of the training data (X i,l i ) T, and E(θ,w) is half of the cumulative squared error over the entire training set T. 4 Note that E takes into account all of the information provided to us by the training set, and we ve expressed it as a function of the weight parameters θ and w. Hence we have posed our weight training process as an optimization problem: choose θ R and w R n that minimize E(θ,w). In order to solve this optimization problem, we use a gradient descent algorithm applied successively to the incremental marginal error functions E i (θ,w). Roughly speaking, this algorithm is formulated in the following way. Fix a training element (X i,l i ) T. Treating θ and w as variables, we compute θ and w for each j j = 1,2,...,n. Then given the current values of θ and w, we update them by moving in the direction of steepest descent of the error function given by θ and w. Then we replace j θ and w according to the following rule, θ θ γ θ w j w j γ w j for j = 1,2,...,n. Once again we interpret the here as over θ and w, where the right hand side is computed with the current values of θ and w. Now we compute the partial derivatives above to formulate the update rule. For the θ update rule, we calculate θ = (σ(θ + w X i) l i ) θ σ(θ + w X i) = (σ(θ + w X i ) l i ) σ (θ + w X i ), 4 The 1 2 here is just for convenience to simplify computation. The algorithm could be just as easily formulated without the 1 2.
6 and similarly w j = (σ(θ + w X i ) l i ) σ (θ + w X i ) (X i ) j. Here (X i ) j denotes the j th component of X i R n. Also note that if we take σ(x) = tanh(x) as above, then σ (x) = sech(x) = 2 e x +e x. If we define δ = (l i σ(θ + w X i )) σ (θ + w X i ), then the update becomes simply θ θ + γ δ and w w+γ δ X i. This update rule, simplified by computing δ in this way, is often referred to as the δ-rule. In fact, this δ-rule can be generalized to more complicated settings and will be convenient for our description of multi-layer ANNs. For a fixed a learning rate γ > 0, we train the Adaline ANN by repeatedly updating θ and w according to the following rules: 1. Choose (X i,l i ) from the training set T = {(X 1,l 1 ),...,(X N,l N )}. 2. Compute δ = (l i σ(θ + w X i )) σ (θ + w X i ) 3. Update θ and w according to θ θ + γ δ w w + γ δ X i. Once again, there are different strategies for iterating these steps. For this implementation, it is typical to repeatedly cycle through the entire training set from i = 1,2,...,N and update θ,w for each training element. Every full cycle through the training set is sometimes called an epoch. This provides a natural way to report the squared error function. At the end of each epoch, you can compute E(θ,w) as defined above. Then you can measure the success of your training in terms of the squared error function E(θ, w) versus the number of epochs. Figure 5 below demonstrates the outcome of an Adaline classification on simulated training data. It should be noted that Adaline does a better job of placing the decision line (the right plot of Figure 5) than the Perceptron for similar training date (the right plot of Figure 2). Adaline continues to update weights even if all training data points are classified correctly. This is a consequence of the error minimization approach for the Adaline model. This idea can be extended a little to conclude that Adaline is better equipped to classify clusters with limited amounts of overlap (that just fall short of being linearly separable). It should also be noted that Adaline cannot effectively address exclusive or type data; that is, the situation in Figure 3. Since Adaline still relies on a linear classifier, it is not a good choice for classifying data that is not linearly separable in this way (at least in the initial formulation presented here; another option to solve this is described in the Possible Extensions section). It is worth noting that this formulation allows for some flexibility in terms of the choice of function σ to use. Modifying the σ used allows for one to model 0-1 neurons rather than ±1 or even to model general functions for appropriately chosen functions. Finally, we consider a multi-layer feed-forward network. This involves introducing a hidden layer of neurons to the model (we will only address models with a single hidden layer, but one can extend the ideas here to multiple hidden layers). Suppose we wish to construct a network with one hidden layer made up of n h N neurons. Let τ R, θ,v R n h, and W R n h R n F(x) = σ(τ + v a), where a = σ(θ +W x) R n h. Here a = a(x) is a function of x R n, and we interpret σ(y) = (σ(y 1 ),...,σ(y nh )) for y R n h. Now we consider τ R, θ,v R n h, and W R n h R n all to be our weight parameters to be chosen in the learning stage of the algorithm. As before, the classification function of this multi-layer network (when τ, θ, v, and W are fixed after training) works simply by plugging in an element x R n into F. When F(x) 0, we classify x as group 1 (i.e. the neuron fires), and when
7 Figure 5: The left plot shows a simulated training data set where blue o s are 1 and red o s are 1. The right plot shows the decision rule learned by Adaline, where points above the line return 1 and points below the line return 1. F(x) < 0, we classify x as group 1 (i.e. the neuron doesn t fire). In Figure 6, we depict an ANN with n = 2 inputs, a single hidden layer made up of n h = 5 neurons, and a single output neuron. It remains to describe the training rule for this multi-layer network. We do this in a similar way to the Adaline ANN training, by defining an error function and using a gradient descent approach to define the weight updates that work towards minimizing the error function. To account for the multiple layers of weights in this model, we use a notion of back-propagation of error. Roughly speaking this means that we adjust the output layer weights v and use those computations to inform our adjustment of weights W in the preceding layer. More precisely, the updates are formulates as follows. Define E i (τ,v,θ,w) = 1 2 (σ(τ + v σ(θ +W X i)) l i ) 2 for i = 1,2,...,N and E(τ,v,θ,W) = N i=1 E i (τ,v,θ,w). To implement the gradient descent algorithm, we will need compute the partial derivative of E i with respect to τ, each component of v, each component of θ, and each component of W. We first handle the output layer weights, whose update rule come out very similar to those of the Adaline model, τ = δ = δ (σ(θ +W X i )) j, v j where δ = (l i σ(τ + v σ(θ +W X i ))) σ (τ + v σ(θ +W X i )). Note again that we use that notation σ(θ+w X i ) = (σ(θ 1 +(W X i ) 1 ),...,σ(θ nh +(W X i ) nh )), and so v σ(θ+w X i ) is the dot product of two elements in R n h. So the update rules for the output layer are τ τ+γ δ and v v+γ δ X i, where δ is defined as above. For the hidden layer weight parameter updates, we consider the following argument = δ (τ + v σ(θ +W X i )) = δ v j σ (θ j + (W X i ) j ), θ j θ j = δ (τ + v σ(θ +W X i )) = δ v j σ (θ j + (W X i ) j ) (X i ) k, w j,k w j,k where δ is as above. Now we can easily formulate the training rules for choosing τ, v, θ, and W. We iteratively update the multi-layer ANN weight parameters according to the following: 1. Choose (X i,l i ) from the training set T = {(X 1,l 1 ),...,(X N,l N )}.
8 1 1 +(W x) (W x) 2 x 1 3 +(W x) 3 + v ( + W x) ( + v ( + W x)) x 2 4 +(W x) (W x) 5 Figure 6: A depiction of how the Adaline can be interpreted as a neuron; x 1 and x 2 are the input stimuli for the neuron, and the parameters θ and w determine when the neuron fired through the sign of the function σ(θ +W x). Here W is a 5 2 matrix. 2. Compute y = W X i a = (σ(θ 1 + y 1 ),σ(θ 2 + y 2 ),...,σ(θ nh + y nh )) ã = (v 1 σ (θ 1 + y 1 ),v 2 σ (θ 2 + y 2 ),...,v nh σ (θ nh + y nh )) b = σ(τ + v a) b = σ (τ + v a) δ = (l i b) b. 3. Update τ, v, θ, and W according to τ τ + γ δ v j v j + γ δ a j for j = 1,2,...,n h θ j θ j + γ δ ã j w j,k w j,k + γ δ ã j x k for j = 1,2,...,n h for j = 1,2,...,n h and k = 1,2,...,n. Step 2 above is sometimes referred to as the feed-forward part of the algorithm and step 3 the back propagation portion. That is, in step two we take input data X i, and feed it forward through the ANN to arrive at its classification b = σ(τ + v a) (and record several other quantities along the way). Then measure the error in the δ = (l i b) b term, and propagate it back through the layers to update each weight parameter. This multi-layer network, as described above, is capable of solving the exclusive or type problem that neither the Perceptron nor Adaline ANN could solve (at least applied directly). We implemented an ANN to the specifications above in n = 2 dimensions with a single hidden layer made up of n h = 5 neurons applied to simulated exclusive or type training data shown in the left plot of Figure 7. The decision function and decision boundary obtained is shown in various formats in the right plot of Figure 7 and in Figure 8.
9 Figure 7: The left plot shows a simulated training data set where blue o s are 1 and red o s are 1. The right plot shows the decision rule learned by a multi-layer ANN with a single hidden layer made up of n h = 5 neurons. Figure 8: The left plot shows a simulated training data set, and regions colored according to their classification according to a multi-layer ANN with one hidden layer of n h = 5 neurons. The right plot shows a plot of the learned function F(x) with the simulate data plotted on top of it. 5 Possible Extensions There are many, many directions in which the work here can be extended. We first mention some extensions that are possible using only the Perceptron and Adeline (linear classifier) models. It is possible to preprocess training data so that they can solve exclusive or type problems (as well as other non-linearly separable classification problems). This can be done by simply transforming the original training data, embedding it into a higher dimensional space. For example, suppose we have training data T = {(X 1,l 1 ),...,(X N,l N )}, where each X i R 2. We can define a transformed training set using the transformation P 2 (x 1,x 2 ) = (x 1,x 2,x1 2,x 1 x 2,x2 2 ) to create an alternate training set T = {(P(X 1 ),l 1 ),...,(P(X N ),l N )}, which now contains labeled data P 2 (X i ) R 5. Now if we apply either the Perceptron or Adaline to this higher dimensional training data T, we end up with decisions that are allowed to be conic sections in R 2 rather than simply linear classifiers. Of course, we could define higher order transformations P n that allow for polynomials of arbitrary degree (or other functions as well). This highlights a principle in mathematics that, in many situations, one can relax some structural limitations of a model (like requiring a linear classifier) by
10 embedding the problem into higher dimensions. 5 Another extension is to apply the multi-layer network to approximate a function f : R n R given several samples from the graph {(x, f (x))} R n R. This application is really just a slight shift in point of view by allowing the label to be generally real-valued, rather than ±1. Another minor extension is to extend the multi-layer networks described here to one that allows for many classes, rather than just a binary classification ±1. This can be done by allowing for more than one output neuron in the networks. One can also allow for more than one hidden layer (though this may not be very worthwhile, since it is know that single layer networks are capable of classifying any pattern given enough neurons in the hidden layer). Beyond that, one could even develop algorithms like the one above for feed-forward ANNs with a topology described by any acyclic directed graph. Other directions include working with Bayesian neural networks (ANNs trained through a Bayesian learning weight parameter update formulation), convolution neural networks (ANNs that augment multi-layer networks with with preprocessing and feature extraction techniques), lattice algebra neural networks (ANNs that, roughly speaking, replace the summation in the dot product w x with a maximum max(w 1 x 1,...,w n x n )), or recurrent neural networks (ANNs that allow for feedback looks in their network topologies). Each of these take ANNs in a different direction than what was discussed in this project summary, but aspects of the foundational theory are the same. 6 Note From the Author This is a student project from the Math and Biomedical Research course, taught by the current author Jarod Hart, offered at the University of Kansas in the Spring of Some modification and additions were made to the original project for this summary. The course is supported by the Initiative for Maximizing Student Development (IMSD) through an NIH grant NIH-NIGMS 5R25GM The PIs of this IMSD grant are Professors Estela Gavosto (Mathematics Department) and James Orr (Biology Department). We are happy to share these project ideas, and welcome those who are interested to use them. We d love to hear about your results and extensions related to these projects, and in some cases, will provide some support for the projects. Please contact Jarod Hart at jvhart@ku.edu with any typos, errors, questions, or comments about this project summary. 5 Compare this, for example, with reduction of order techniques in ordinary differential equation theory.
Neural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationPattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore
Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationSingle layer NN. Neuron Model
Single layer NN We consider the simple architecture consisting of just one neuron. Generalization to a single layer with more neurons as illustrated below is easy because: M M The output units are independent
More informationLecture 6. Notes on Linear Algebra. Perceptron
Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors
More informationARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD
ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided
More informationIn the Name of God. Lecture 11: Single Layer Perceptrons
1 In the Name of God Lecture 11: Single Layer Perceptrons Perceptron: architecture We consider the architecture: feed-forward NN with one layer It is sufficient to study single layer perceptrons with just
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationLinear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.
Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also
More information1 What a Neural Network Computes
Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists
More informationLecture 4: Perceptrons and Multilayer Perceptrons
Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons
More informationAN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009
AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is
More informationSimple Neural Nets For Pattern Classification
CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification
More informationClassification with Perceptrons. Reading:
Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters
More informationArtificial Intelligence
Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationNeural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationCourse 395: Machine Learning - Lectures
Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture
More informationLinear Discrimination Functions
Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationArtificial Neural Networks
Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationIntroduction to Neural Networks
Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationThe Perceptron Algorithm
The Perceptron Algorithm Greg Grudic Greg Grudic Machine Learning Questions? Greg Grudic Machine Learning 2 Binary Classification A binary classifier is a mapping from a set of d inputs to a single output
More informationCOMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37
COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation
More informationARTIFICIAL INTELLIGENCE. Artificial Neural Networks
INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationLecture 7 Artificial neural networks: Supervised learning
Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in
More informationMachine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler
+ Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions
More informationAI Programming CS F-20 Neural Networks
AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols
More informationIntroduction to Artificial Neural Networks
Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline
More informationNonlinear Classification
Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions
More informationNeural Networks biological neuron artificial neuron 1
Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input
More informationInput layer. Weight matrix [ ] Output layer
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2003 Recitation 10, November 4 th & 5 th 2003 Learning by perceptrons
More informationFinal Exam, Fall 2002
15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work
More informationThe Perceptron Algorithm 1
CS 64: Machine Learning Spring 5 College of Computer and Information Science Northeastern University Lecture 5 March, 6 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu Introduction The Perceptron
More informationArtificial Neural Networks
Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge
More informationIntroduction to Machine Learning Spring 2018 Note Neural Networks
CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationReification of Boolean Logic
526 U1180 neural networks 1 Chapter 1 Reification of Boolean Logic The modern era of neural networks began with the pioneer work of McCulloch and Pitts (1943). McCulloch was a psychiatrist and neuroanatomist;
More informationThe perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.
1 The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt. The algorithm applies only to single layer models
More informationSupervised learning in single-stage feedforward networks
Supervised learning in single-stage feedforward networks Bruno A Olshausen September, 204 Abstract This handout describes supervised learning in single-stage feedforward networks composed of McCulloch-Pitts
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More information) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.
1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2011 Recitation 8, November 3 Corrected Version & (most) solutions
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationNeural Networks DWML, /25
DWML, 2007 /25 Neural networks: Biological and artificial Consider humans: Neuron switching time 0.00 second Number of neurons 0 0 Connections per neuron 0 4-0 5 Scene recognition time 0. sec 00 inference
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationArtificial Neural Networks Examination, June 2005
Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either
More informationIntroduction to Deep Learning
Introduction to Deep Learning Some slides and images are taken from: David Wolfe Corne Wikipedia Geoffrey A. Hinton https://www.macs.hw.ac.uk/~dwcorne/teaching/introdl.ppt Feedforward networks for function
More informationCSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning
CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.
More informationARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92
ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000
More informationMultilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)
Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate
More informationMath for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han
Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR
More information6.036: Midterm, Spring Solutions
6.036: Midterm, Spring 2018 Solutions This is a closed book exam. Calculators not permitted. The problems are not necessarily in any order of difficulty. Record all your answers in the places provided.
More informationOptimization and Gradient Descent
Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function
More informationArtificial Neural Networks
Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationSPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks
Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension
More informationLecture 4: Training a Classifier
Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as
More informationMachine Learning (CS 567) Lecture 3
Machine Learning (CS 567) Lecture 3 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationDeep Feedforward Networks. Sargur N. Srihari
Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationArtificial Neural Networks Examination, March 2004
Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationArtificial Neural Networks. Edward Gatt
Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very
More informationPracticals 5 : Perceptron
Université Paul Sabatier M2 SE Data Mining Practicals 5 : Perceptron Framework The aim of this last session is to introduce the basics of neural networks theory through the special case of the perceptron.
More informationThe Perceptron. Volker Tresp Summer 2014
The Perceptron Volker Tresp Summer 2014 1 Introduction One of the first serious learning machines Most important elements in learning tasks Collection and preprocessing of training data Definition of a
More informationNeural Networks. Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation
Neural Networks Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation Neural Networks Historical Perspective A first wave of interest
More informationCOMP-4360 Machine Learning Neural Networks
COMP-4360 Machine Learning Neural Networks Jacky Baltes Autonomous Agents Lab University of Manitoba Winnipeg, Canada R3T 2N2 Email: jacky@cs.umanitoba.ca WWW: http://www.cs.umanitoba.ca/~jacky http://aalab.cs.umanitoba.ca
More informationArtificial Neural Networks The Introduction
Artificial Neural Networks The Introduction 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001 00100000
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationBinary Classification / Perceptron
Binary Classification / Perceptron Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Supervised Learning Input: x 1, y 1,, (x n, y n ) x i is the i th data
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationCSC321 Lecture 5: Multilayer Perceptrons
CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationUnit 8: Introduction to neural networks. Perceptrons
Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationLecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012
CS-62 Theory Gems September 27, 22 Lecture 4 Lecturer: Aleksander Mądry Scribes: Alhussein Fawzi Learning Non-Linear Classifiers In the previous lectures, we have focused on finding linear classifiers,
More informationLecture 2: Linear regression
Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued
More informationMultilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)
Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron
More informationStatistical NLP for the Web
Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks
More informationThe Perceptron. Volker Tresp Summer 2016
The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters
More informationPattern Recognition and Machine Learning. Perceptrons and Support Vector machines
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3
More informationMachine Learning. Neural Networks
Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationCSC242: Intro to AI. Lecture 21
CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages
More information9 Classification. 9.1 Linear Classifiers
9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive
More informationBack-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples
Back-Propagation Algorithm Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples 1 Inner-product net =< w, x >= w x cos(θ) net = n i=1 w i x i A measure
More informationIntroduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis
Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationMachine Learning and Data Mining. Linear classification. Kalev Kask
Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q
More information