Artificial Neural Networks

Size: px
Start display at page:

Download "Artificial Neural Networks"

Transcription

1 Artificial Neural Networks Math 596 Project Summary Fall 2016 Jarod Hart 1 Overview Artificial Neural Networks (ANNs) are machine learning algorithms that imitate neural function. There is a vast theory of ANNs available these days. To limit the scope and length of this project summary, we will only consider three ANN algorithms, which are typically the first ones introduced when starting to work with ANNs. They are the Perceptron, Adaline, and multi-layer feedforward networks. They are all supervised learning algorithms, meaning that they train their parameters based on a set of pre-classified (or labeled) training data. That is, each element of the training data has a known label associated to it. The goal is to learn how to classify novel input according to a rule learned from the training data. We should acknowledge that calling the Perceptron and Adaline models neural networks may be overstating their complexity. Indeed these models are only formulated based on only a single neuron. Hence they could be viewed as a trivial network, but it may be more conducive to think of them as a primer for building mathematical models of neurons. Their construction introduces some foundational ideas of the theory, and they make it much more manageable to work with more complicated networks, like multi-layer networks. There are ways to accomplish more complicated tasks by preprocessing training data and/or using several Perceptrons/Adaline neurons, but even in these models the neurons still function largely independently (not in a coordinated way inherent to more complicated ANN models). We mention a couple ways to extend these models in the Possible Extensions section. There is an important development in the transition from the Perceptron to the Adaline model. Roughly speaking, the Perceptron learns by using a somewhat ad hoc training rule that can be motivated by a geometric argument. The training is a little cumbersome and is reliant on a linear structure in some ways. This makes it difficult to extend directly to more complicated and nonlinear models. Adaline introduces a shift in point of which, which is to train the neuron by minimizing an error function. This notion of minimization in place of a more geometric argument is much more easily extended to more complicated settings, which can be observed in the construction of the multi-layer networks. Much of the information presented here is taken from Mitchell s book on machine learning, but several aspects are presented differently and at times more concretely. In particular, the details of the learning rules here are laid out in more detailed but less general terms. This description is also much less comprehensive, which allows a much shorter presentation of the material. This may be of use for those just starting to work with ANNs, but would probably be best used in along with other resources. 2 Mathematical and Programming Content To complete this project, a background in the following topics is recommended. - Linear algebra: Some familiarity with matrix operations and dot products is necessary for this project. In addition, some understanding of separating hyperplanes is helpful. Although, it does not require a rigorous understanding linear independence, linear combinations, bases, subspaces, diagonalization, etc. - Convex optimization theory: A significant component of training ANNs in the summary is based on gradient descent in reducing a squared error function. A rough understanding of how gradient descent works, and how to use it to generate an iterative optimization scheme is necessary to complete this project. - Graph theory: A very rudimentary understanding of graph theory is helpful to understand ANN topologies. This background can be limited to the basic familiarity weighted directed graphs.

2 - Programming: This project involves a fair amount of programming ability. A thorough understanding of working with vectors/matrices/arrays, decision statements, and loops are essential to implement these ANNs. For some of the applications a understanding of computer graphics is also helpful. 3 Primary Resources For much of the mathematical content listed above, typical text books in the pertinent area are sufficient. Some additional resources on SIR models and specifically stochastic SIR models are the following (this is by no means a complete list). - T. Mitchell, Machine Learning, McGraw-Hill, B. Kröne and P. van der Smart, An Introduction to Neural Networks, Eighth Edition, University of Amsterdam, Available at archive.org. 4 Mathematical Description of the Project Suppose we are given a training data set X 1,...,X N R n, along with corresponding labels l 1,...,l N { 1,1}. We will denote this training set T = {(X 1,l 1 ),...,(X N,l N )} R n { 1,1}. Our goal is to use this data to learn a function F : R n R such that F(X i ) l i for all (X i,l i ) T. 1 We describe three ways of accomplishing this classification function F through ANNs. They are the Perceptron, Adaline, and multi-layer feed-forward networks. In each situation the function F depends on a weight parameters. The number and structure of the weight parameters used to define F depend on the network topology and learning structure. These will be described in more detail below. However, each of these networks works the same way from the perspective of a supervised learning algorithm. Each model has two stages: a learning stage where the weight parameters are selected based on the training data and a classification stage where the function (using the weights learned in the learning stage) classifies novel inputs. Fist we formulate the Perceptron ANN. 2 Define, for fixed θ R and w R n, the function F : R n { 1,1} F(x) = sgn(θ + w x), where sgn(x) = 1 if x 0 and sgn(x) = 1 if x < 0. Figure 1 shows a depiction of how to interpret this function F(x) as a neuron in the biological sense in the n = 2 situation. The parameters θ and w are the weight parameters for this model. We forego our description of the training of the Perceptron for the moment, and describe how it classifies novel input given θ and w. So suppose for a moment that we have already completed our training process, which means that the weights parameters θ and w are already determined. Then the decision function F classifies new inputs by using the hyperplane defined by θ + w x = 0 to split R n in two (ignoring the issues that arise when a new input lie on this hyperplane). It may be easiest to think of this in n = 2 dimensions, where θ + w 1 x 1 + w 2 x 2 = 0 defines a line. If θ + w 1 x 1 + w 2 x 2 0, then (x 1,x 2 ) lies on one side of the line (or on the line), and if θ + w 1 x 1 + w 2 x 2 < 0, then (x 1,x 2 ) lies on the other side. Hence R 2 is split into two sets, where the Perceptron will fire or not according to the function F(x). Figure 2 below shows the decision rule learned by the Perceptron for a simulated training data set. This principle of the hyperplane θ + w x = 0 for w R n splitting R n in two extends naturally to higher dimensions. This describes the classification stage for the Perceptron. Now let s return to discuss how the perceptron is trained. The purpose of the training stage is to figure out what to make θ R and w R n to correctly classify all the training data T = {(X 1,l 1 ),...,(X N,l N )}. To do this, fix a learning rate γ > 0, and repeatedly execute steps 1 3 below: 1 To avoid ambiguous notation, we reserve subscripted capital X i s to represent x-values from the training set, and use x to represent an arbitrary element of R n. 2 Technically, it may be misleading to call this an ANN since we only describe the action of a single neuron. So this is not exactly a neural network, but rather a single node.

3 x 1 w 1 w 2 + w x sgn( + w x) x 2 Figure 1: A depiction of how the Perceptron can be interpreted as a neuron; x 1 and x 2 are the input stimuli for the neuron, and the weight parameters θ and w determine when the neuron fired through the function sgn(θ + w x). x 1 w 1 w 2 + w x ( + w x) x 2 Figure 2: The left plot shows a simulated training data set where blue o s are 1 and red o s are 1. The right plot shows the decision rule learned by the Perceptron, where points above the line return 1 and points below the line return Choose (X i,l i ) from the training set T = {(X 1,l 1 ),...,(X N,l N )}. 2. Compute y = θ + w X i 3. If y l i < 0, update θ and w according to: θ θ + γ (l i y) w w + γ (l i y) X i. Here the notation θ θ+γ(l y) means overwrite the current value of θ with θ+γ(l y). Note also that the update for w describes the update for the entire w vector; recall that X i R n so that the right hand side is well defined by adding w R n and a scalar multiple of X i R n. There are several ways that this algorithm can be iterated. One can simply choose (X i,l i ) randomly from the training data set T for some set number of iterations. Alternatively, one can cycles through all elements of the training dataset in order several times. There are many other ways that will work. One can interpret the Perceptron learning rule in the following way. We take an element (X i,l i ) from T, and check if the Perceptron classifies it correctly (with the current weight parameters). If the point is classified correctly by the

4 Perceptron, we leave the weights unchanged. If the point is misclassified, then we update the weights in such a way that the updated weights are more likely to classify it correctly. For larger values of γ, the weight update changes θ and w more sporadically. Making γ small will cause less sporadic changes, but if they are too small it may take a long time for the Perceptron to train. It is not hard to make a geometric argument for defining this update rule, but it can also be justified using an optimization argument. We will mention this argument at the end of the description of the Adaline ANN. There are a few things that should be observed with the Perceptron ANN. First, the decision rule shown in the right plot of Figure 2 does not appear to be optimally placed. Indeed, it is very close to the red cluster of the training set, and hence one would expect this particular training of the Perceptron would be susceptible to misclassifying red points as blue ones. This type of non-optimal placement of the decision boundary arises from the sporadic update rule, and from the fact that the Perceptron stops updating once it has classified all points correctly. So it will not try to determine the decision hyperplane optimally, only so that it classifies all point correctly. Second, in order for the Perceptron to work well, the training data must be linearly separable. If it is not, the decision line will continue to change sporadically (depending on the size of γ), and it has no hope to fully classify the training data. In Figure 3, we show a simulated training data set that is not linearly separable, and hence for which the Perceptron (at least applied directly as described here) will fail. The Adaline ANN will provide a solution that better addresses the first issue, 3 but will still be limited by the second one the same way the Perceptron is. The second issue can be solved using the last model we discuss, multi-layer ANNs. Figure 3: The left plot shows a simulated training data set of an exclusive or type decision where blue o s are 1 and red o s are 1. The right plot shows the decision rule learned by the perceptron, which has failed to generate an accurate decision rule. The Adaline ANN is an acronym for Adaptive Linear Element. It is still limited to a linear classifier, but it provides an important shift in point of view for updating ANNs. For this model, we modify F(x) slightly. Define σ(x) = tanh(x) = e x e x e x +e x, and F(x) = σ(θ + w x), where θ R and w R n are again our weight parameters. Suppose again that we have already trained our Adaline ANN, and hence have fixed weight parameters θ and w R n. Then Adaline can be used to classify new data x R n by evaluating F(x). To view this in the way that neurons are traditionally viewed, with a binary output, we can simply say the neuron fires if F(x) 0 and doesn t fire if F(x) < 0. From a machine learning perspective, it is typically more informative to retain the extra information contained in F(x), rather than just the binary output ±1. This describes the classification function of Adaline. Figure 4 shows a schematic for the Adaline ANN for n = 2 dimensions, which is a 3 See also support vector machines for a solution to this deficit of the Perceptron.

5 w 2 + w x sgn( + w x) x 2 slight modification of the Perceptron pictured in Figure 1. The only difference is that the smooth function σ(x) is used in place of sgn(x). x 1 w 1 w 2 + w x ( + w x) x 2 Figure 4: A depiction of how the Adaline can be interpreted as a neuron; x 1 and x 2 are the input stimuli for the neuron, and the parameters θ and w determine when the neuron fired through the sign of the function σ(θ + w x). It remains to describe the weight parameter training for Adaline. Our goal in this training is to choose the weight parameters θ and w so that F(X i ) l i for all elements of our training set (X i,l i ) {(X 1,l 1 ),...,(X N,l N )}. The important shift in point of view for this model is to define an error associated to our training data classification (as a function of the weights), and choose our weights by minimizing that error. In particular, define the squared error function E i (θ,w) = 1 2 (σ(θ + w X i) l i ) 2 for i = 1,2,...N and E(θ,w) = N i=1 E i (θ,w). Note that the training data (X i,l i ) are treated as fixed quantities here. Then for a given θ R and w R n, E i (θ,w) is half of the squared error of our classification σ(θ + w X i ) of the training data (X i,l i ) T, and E(θ,w) is half of the cumulative squared error over the entire training set T. 4 Note that E takes into account all of the information provided to us by the training set, and we ve expressed it as a function of the weight parameters θ and w. Hence we have posed our weight training process as an optimization problem: choose θ R and w R n that minimize E(θ,w). In order to solve this optimization problem, we use a gradient descent algorithm applied successively to the incremental marginal error functions E i (θ,w). Roughly speaking, this algorithm is formulated in the following way. Fix a training element (X i,l i ) T. Treating θ and w as variables, we compute θ and w for each j j = 1,2,...,n. Then given the current values of θ and w, we update them by moving in the direction of steepest descent of the error function given by θ and w. Then we replace j θ and w according to the following rule, θ θ γ θ w j w j γ w j for j = 1,2,...,n. Once again we interpret the here as over θ and w, where the right hand side is computed with the current values of θ and w. Now we compute the partial derivatives above to formulate the update rule. For the θ update rule, we calculate θ = (σ(θ + w X i) l i ) θ σ(θ + w X i) = (σ(θ + w X i ) l i ) σ (θ + w X i ), 4 The 1 2 here is just for convenience to simplify computation. The algorithm could be just as easily formulated without the 1 2.

6 and similarly w j = (σ(θ + w X i ) l i ) σ (θ + w X i ) (X i ) j. Here (X i ) j denotes the j th component of X i R n. Also note that if we take σ(x) = tanh(x) as above, then σ (x) = sech(x) = 2 e x +e x. If we define δ = (l i σ(θ + w X i )) σ (θ + w X i ), then the update becomes simply θ θ + γ δ and w w+γ δ X i. This update rule, simplified by computing δ in this way, is often referred to as the δ-rule. In fact, this δ-rule can be generalized to more complicated settings and will be convenient for our description of multi-layer ANNs. For a fixed a learning rate γ > 0, we train the Adaline ANN by repeatedly updating θ and w according to the following rules: 1. Choose (X i,l i ) from the training set T = {(X 1,l 1 ),...,(X N,l N )}. 2. Compute δ = (l i σ(θ + w X i )) σ (θ + w X i ) 3. Update θ and w according to θ θ + γ δ w w + γ δ X i. Once again, there are different strategies for iterating these steps. For this implementation, it is typical to repeatedly cycle through the entire training set from i = 1,2,...,N and update θ,w for each training element. Every full cycle through the training set is sometimes called an epoch. This provides a natural way to report the squared error function. At the end of each epoch, you can compute E(θ,w) as defined above. Then you can measure the success of your training in terms of the squared error function E(θ, w) versus the number of epochs. Figure 5 below demonstrates the outcome of an Adaline classification on simulated training data. It should be noted that Adaline does a better job of placing the decision line (the right plot of Figure 5) than the Perceptron for similar training date (the right plot of Figure 2). Adaline continues to update weights even if all training data points are classified correctly. This is a consequence of the error minimization approach for the Adaline model. This idea can be extended a little to conclude that Adaline is better equipped to classify clusters with limited amounts of overlap (that just fall short of being linearly separable). It should also be noted that Adaline cannot effectively address exclusive or type data; that is, the situation in Figure 3. Since Adaline still relies on a linear classifier, it is not a good choice for classifying data that is not linearly separable in this way (at least in the initial formulation presented here; another option to solve this is described in the Possible Extensions section). It is worth noting that this formulation allows for some flexibility in terms of the choice of function σ to use. Modifying the σ used allows for one to model 0-1 neurons rather than ±1 or even to model general functions for appropriately chosen functions. Finally, we consider a multi-layer feed-forward network. This involves introducing a hidden layer of neurons to the model (we will only address models with a single hidden layer, but one can extend the ideas here to multiple hidden layers). Suppose we wish to construct a network with one hidden layer made up of n h N neurons. Let τ R, θ,v R n h, and W R n h R n F(x) = σ(τ + v a), where a = σ(θ +W x) R n h. Here a = a(x) is a function of x R n, and we interpret σ(y) = (σ(y 1 ),...,σ(y nh )) for y R n h. Now we consider τ R, θ,v R n h, and W R n h R n all to be our weight parameters to be chosen in the learning stage of the algorithm. As before, the classification function of this multi-layer network (when τ, θ, v, and W are fixed after training) works simply by plugging in an element x R n into F. When F(x) 0, we classify x as group 1 (i.e. the neuron fires), and when

7 Figure 5: The left plot shows a simulated training data set where blue o s are 1 and red o s are 1. The right plot shows the decision rule learned by Adaline, where points above the line return 1 and points below the line return 1. F(x) < 0, we classify x as group 1 (i.e. the neuron doesn t fire). In Figure 6, we depict an ANN with n = 2 inputs, a single hidden layer made up of n h = 5 neurons, and a single output neuron. It remains to describe the training rule for this multi-layer network. We do this in a similar way to the Adaline ANN training, by defining an error function and using a gradient descent approach to define the weight updates that work towards minimizing the error function. To account for the multiple layers of weights in this model, we use a notion of back-propagation of error. Roughly speaking this means that we adjust the output layer weights v and use those computations to inform our adjustment of weights W in the preceding layer. More precisely, the updates are formulates as follows. Define E i (τ,v,θ,w) = 1 2 (σ(τ + v σ(θ +W X i)) l i ) 2 for i = 1,2,...,N and E(τ,v,θ,W) = N i=1 E i (τ,v,θ,w). To implement the gradient descent algorithm, we will need compute the partial derivative of E i with respect to τ, each component of v, each component of θ, and each component of W. We first handle the output layer weights, whose update rule come out very similar to those of the Adaline model, τ = δ = δ (σ(θ +W X i )) j, v j where δ = (l i σ(τ + v σ(θ +W X i ))) σ (τ + v σ(θ +W X i )). Note again that we use that notation σ(θ+w X i ) = (σ(θ 1 +(W X i ) 1 ),...,σ(θ nh +(W X i ) nh )), and so v σ(θ+w X i ) is the dot product of two elements in R n h. So the update rules for the output layer are τ τ+γ δ and v v+γ δ X i, where δ is defined as above. For the hidden layer weight parameter updates, we consider the following argument = δ (τ + v σ(θ +W X i )) = δ v j σ (θ j + (W X i ) j ), θ j θ j = δ (τ + v σ(θ +W X i )) = δ v j σ (θ j + (W X i ) j ) (X i ) k, w j,k w j,k where δ is as above. Now we can easily formulate the training rules for choosing τ, v, θ, and W. We iteratively update the multi-layer ANN weight parameters according to the following: 1. Choose (X i,l i ) from the training set T = {(X 1,l 1 ),...,(X N,l N )}.

8 1 1 +(W x) (W x) 2 x 1 3 +(W x) 3 + v ( + W x) ( + v ( + W x)) x 2 4 +(W x) (W x) 5 Figure 6: A depiction of how the Adaline can be interpreted as a neuron; x 1 and x 2 are the input stimuli for the neuron, and the parameters θ and w determine when the neuron fired through the sign of the function σ(θ +W x). Here W is a 5 2 matrix. 2. Compute y = W X i a = (σ(θ 1 + y 1 ),σ(θ 2 + y 2 ),...,σ(θ nh + y nh )) ã = (v 1 σ (θ 1 + y 1 ),v 2 σ (θ 2 + y 2 ),...,v nh σ (θ nh + y nh )) b = σ(τ + v a) b = σ (τ + v a) δ = (l i b) b. 3. Update τ, v, θ, and W according to τ τ + γ δ v j v j + γ δ a j for j = 1,2,...,n h θ j θ j + γ δ ã j w j,k w j,k + γ δ ã j x k for j = 1,2,...,n h for j = 1,2,...,n h and k = 1,2,...,n. Step 2 above is sometimes referred to as the feed-forward part of the algorithm and step 3 the back propagation portion. That is, in step two we take input data X i, and feed it forward through the ANN to arrive at its classification b = σ(τ + v a) (and record several other quantities along the way). Then measure the error in the δ = (l i b) b term, and propagate it back through the layers to update each weight parameter. This multi-layer network, as described above, is capable of solving the exclusive or type problem that neither the Perceptron nor Adaline ANN could solve (at least applied directly). We implemented an ANN to the specifications above in n = 2 dimensions with a single hidden layer made up of n h = 5 neurons applied to simulated exclusive or type training data shown in the left plot of Figure 7. The decision function and decision boundary obtained is shown in various formats in the right plot of Figure 7 and in Figure 8.

9 Figure 7: The left plot shows a simulated training data set where blue o s are 1 and red o s are 1. The right plot shows the decision rule learned by a multi-layer ANN with a single hidden layer made up of n h = 5 neurons. Figure 8: The left plot shows a simulated training data set, and regions colored according to their classification according to a multi-layer ANN with one hidden layer of n h = 5 neurons. The right plot shows a plot of the learned function F(x) with the simulate data plotted on top of it. 5 Possible Extensions There are many, many directions in which the work here can be extended. We first mention some extensions that are possible using only the Perceptron and Adeline (linear classifier) models. It is possible to preprocess training data so that they can solve exclusive or type problems (as well as other non-linearly separable classification problems). This can be done by simply transforming the original training data, embedding it into a higher dimensional space. For example, suppose we have training data T = {(X 1,l 1 ),...,(X N,l N )}, where each X i R 2. We can define a transformed training set using the transformation P 2 (x 1,x 2 ) = (x 1,x 2,x1 2,x 1 x 2,x2 2 ) to create an alternate training set T = {(P(X 1 ),l 1 ),...,(P(X N ),l N )}, which now contains labeled data P 2 (X i ) R 5. Now if we apply either the Perceptron or Adaline to this higher dimensional training data T, we end up with decisions that are allowed to be conic sections in R 2 rather than simply linear classifiers. Of course, we could define higher order transformations P n that allow for polynomials of arbitrary degree (or other functions as well). This highlights a principle in mathematics that, in many situations, one can relax some structural limitations of a model (like requiring a linear classifier) by

10 embedding the problem into higher dimensions. 5 Another extension is to apply the multi-layer network to approximate a function f : R n R given several samples from the graph {(x, f (x))} R n R. This application is really just a slight shift in point of view by allowing the label to be generally real-valued, rather than ±1. Another minor extension is to extend the multi-layer networks described here to one that allows for many classes, rather than just a binary classification ±1. This can be done by allowing for more than one output neuron in the networks. One can also allow for more than one hidden layer (though this may not be very worthwhile, since it is know that single layer networks are capable of classifying any pattern given enough neurons in the hidden layer). Beyond that, one could even develop algorithms like the one above for feed-forward ANNs with a topology described by any acyclic directed graph. Other directions include working with Bayesian neural networks (ANNs trained through a Bayesian learning weight parameter update formulation), convolution neural networks (ANNs that augment multi-layer networks with with preprocessing and feature extraction techniques), lattice algebra neural networks (ANNs that, roughly speaking, replace the summation in the dot product w x with a maximum max(w 1 x 1,...,w n x n )), or recurrent neural networks (ANNs that allow for feedback looks in their network topologies). Each of these take ANNs in a different direction than what was discussed in this project summary, but aspects of the foundational theory are the same. 6 Note From the Author This is a student project from the Math and Biomedical Research course, taught by the current author Jarod Hart, offered at the University of Kansas in the Spring of Some modification and additions were made to the original project for this summary. The course is supported by the Initiative for Maximizing Student Development (IMSD) through an NIH grant NIH-NIGMS 5R25GM The PIs of this IMSD grant are Professors Estela Gavosto (Mathematics Department) and James Orr (Biology Department). We are happy to share these project ideas, and welcome those who are interested to use them. We d love to hear about your results and extensions related to these projects, and in some cases, will provide some support for the projects. Please contact Jarod Hart at jvhart@ku.edu with any typos, errors, questions, or comments about this project summary. 5 Compare this, for example, with reduction of order techniques in ordinary differential equation theory.

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Single layer NN. Neuron Model

Single layer NN. Neuron Model Single layer NN We consider the simple architecture consisting of just one neuron. Generalization to a single layer with more neurons as illustrated below is easy because: M M The output units are independent

More information

Lecture 6. Notes on Linear Algebra. Perceptron

Lecture 6. Notes on Linear Algebra. Perceptron Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

In the Name of God. Lecture 11: Single Layer Perceptrons

In the Name of God. Lecture 11: Single Layer Perceptrons 1 In the Name of God Lecture 11: Single Layer Perceptrons Perceptron: architecture We consider the architecture: feed-forward NN with one layer It is sufficient to study single layer perceptrons with just

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7.

Linear models: the perceptron and closest centroid algorithms. D = {(x i,y i )} n i=1. x i 2 R d 9/3/13. Preliminaries. Chapter 1, 7. Preliminaries Linear models: the perceptron and closest centroid algorithms Chapter 1, 7 Definition: The Euclidean dot product beteen to vectors is the expression d T x = i x i The dot product is also

More information

1 What a Neural Network Computes

1 What a Neural Network Computes Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

Simple Neural Nets For Pattern Classification

Simple Neural Nets For Pattern Classification CHAPTER 2 Simple Neural Nets For Pattern Classification Neural Networks General Discussion One of the simplest tasks that neural nets can be trained to perform is pattern classification. In pattern classification

More information

Classification with Perceptrons. Reading:

Classification with Perceptrons. Reading: Classification with Perceptrons Reading: Chapters 1-3 of Michael Nielsen's online book on neural networks covers the basics of perceptrons and multilayer neural networks We will cover material in Chapters

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

Linear Discrimination Functions

Linear Discrimination Functions Laurea Magistrale in Informatica Nicola Fanizzi Dipartimento di Informatica Università degli Studi di Bari November 4, 2009 Outline Linear models Gradient descent Perceptron Minimum square error approach

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

The Perceptron Algorithm

The Perceptron Algorithm The Perceptron Algorithm Greg Grudic Greg Grudic Machine Learning Questions? Greg Grudic Machine Learning 2 Binary Classification A binary classifier is a mapping from a set of d inputs to a single output

More information

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37

COMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37 COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Lecture 7 Artificial neural networks: Supervised learning

Lecture 7 Artificial neural networks: Supervised learning Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler + Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

Introduction to Artificial Neural Networks

Introduction to Artificial Neural Networks Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline

More information

Nonlinear Classification

Nonlinear Classification Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions

More information

Neural Networks biological neuron artificial neuron 1

Neural Networks biological neuron artificial neuron 1 Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input

More information

Input layer. Weight matrix [ ] Output layer

Input layer. Weight matrix [ ] Output layer MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2003 Recitation 10, November 4 th & 5 th 2003 Learning by perceptrons

More information

Final Exam, Fall 2002

Final Exam, Fall 2002 15-781 Final Exam, Fall 22 1. Write your name and your andrew email address below. Name: Andrew ID: 2. There should be 17 pages in this exam (excluding this cover sheet). 3. If you need more room to work

More information

The Perceptron Algorithm 1

The Perceptron Algorithm 1 CS 64: Machine Learning Spring 5 College of Computer and Information Science Northeastern University Lecture 5 March, 6 Instructor: Bilal Ahmed Scribe: Bilal Ahmed & Virgil Pavlu Introduction The Perceptron

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge

More information

Introduction to Machine Learning Spring 2018 Note Neural Networks

Introduction to Machine Learning Spring 2018 Note Neural Networks CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Reification of Boolean Logic

Reification of Boolean Logic 526 U1180 neural networks 1 Chapter 1 Reification of Boolean Logic The modern era of neural networks began with the pioneer work of McCulloch and Pitts (1943). McCulloch was a psychiatrist and neuroanatomist;

More information

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt. 1 The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt. The algorithm applies only to single layer models

More information

Supervised learning in single-stage feedforward networks

Supervised learning in single-stage feedforward networks Supervised learning in single-stage feedforward networks Bruno A Olshausen September, 204 Abstract This handout describes supervised learning in single-stage feedforward networks composed of McCulloch-Pitts

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2. 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2011 Recitation 8, November 3 Corrected Version & (most) solutions

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Neural Networks DWML, /25

Neural Networks DWML, /25 DWML, 2007 /25 Neural networks: Biological and artificial Consider humans: Neuron switching time 0.00 second Number of neurons 0 0 Connections per neuron 0 4-0 5 Scene recognition time 0. sec 00 inference

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

Introduction to Deep Learning

Introduction to Deep Learning Introduction to Deep Learning Some slides and images are taken from: David Wolfe Corne Wikipedia Geoffrey A. Hinton https://www.macs.hw.ac.uk/~dwcorne/teaching/introdl.ppt Feedforward networks for function

More information

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

More information

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han Math for Machine Learning Open Doors to Data Science and Artificial Intelligence Richard Han Copyright 05 Richard Han All rights reserved. CONTENTS PREFACE... - INTRODUCTION... LINEAR REGRESSION... 4 LINEAR

More information

6.036: Midterm, Spring Solutions

6.036: Midterm, Spring Solutions 6.036: Midterm, Spring 2018 Solutions This is a closed book exam. Calculators not permitted. The problems are not necessarily in any order of difficulty. Record all your answers in the places provided.

More information

Optimization and Gradient Descent

Optimization and Gradient Descent Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 12, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension

More information

Lecture 4: Training a Classifier

Lecture 4: Training a Classifier Lecture 4: Training a Classifier Roger Grosse 1 Introduction Now that we ve defined what binary classification is, let s actually train a classifier. We ll approach this problem in much the same way as

More information

Machine Learning (CS 567) Lecture 3

Machine Learning (CS 567) Lecture 3 Machine Learning (CS 567) Lecture 3 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Deep Feedforward Networks. Sargur N. Srihari

Deep Feedforward Networks. Sargur N. Srihari Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Artificial Neural Networks. Edward Gatt

Artificial Neural Networks. Edward Gatt Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very

More information

Practicals 5 : Perceptron

Practicals 5 : Perceptron Université Paul Sabatier M2 SE Data Mining Practicals 5 : Perceptron Framework The aim of this last session is to introduce the basics of neural networks theory through the special case of the perceptron.

More information

The Perceptron. Volker Tresp Summer 2014

The Perceptron. Volker Tresp Summer 2014 The Perceptron Volker Tresp Summer 2014 1 Introduction One of the first serious learning machines Most important elements in learning tasks Collection and preprocessing of training data Definition of a

More information

Neural Networks. Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation

Neural Networks. Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation Neural Networks Fundamentals Framework for distributed processing Network topologies Training of ANN s Notation Perceptron Back Propagation Neural Networks Historical Perspective A first wave of interest

More information

COMP-4360 Machine Learning Neural Networks

COMP-4360 Machine Learning Neural Networks COMP-4360 Machine Learning Neural Networks Jacky Baltes Autonomous Agents Lab University of Manitoba Winnipeg, Canada R3T 2N2 Email: jacky@cs.umanitoba.ca WWW: http://www.cs.umanitoba.ca/~jacky http://aalab.cs.umanitoba.ca

More information

Artificial Neural Networks The Introduction

Artificial Neural Networks The Introduction Artificial Neural Networks The Introduction 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001 00100000

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Binary Classification / Perceptron

Binary Classification / Perceptron Binary Classification / Perceptron Nicholas Ruozzi University of Texas at Dallas Slides adapted from David Sontag and Vibhav Gogate Supervised Learning Input: x 1, y 1,, (x n, y n ) x i is the i th data

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Midterm: CS 6375 Spring 2015 Solutions

Midterm: CS 6375 Spring 2015 Solutions Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an

More information

Unit 8: Introduction to neural networks. Perceptrons

Unit 8: Introduction to neural networks. Perceptrons Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad

More information

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation

More information

Lecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012

Lecture 4. 1 Learning Non-Linear Classifiers. 2 The Kernel Trick. CS-621 Theory Gems September 27, 2012 CS-62 Theory Gems September 27, 22 Lecture 4 Lecturer: Aleksander Mądry Scribes: Alhussein Fawzi Learning Non-Linear Classifiers In the previous lectures, we have focused on finding linear classifiers,

More information

Lecture 2: Linear regression

Lecture 2: Linear regression Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron

More information

Statistical NLP for the Web

Statistical NLP for the Web Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks

More information

The Perceptron. Volker Tresp Summer 2016

The Perceptron. Volker Tresp Summer 2016 The Perceptron Volker Tresp Summer 2016 1 Elements in Learning Tasks Collection, cleaning and preprocessing of training data Definition of a class of learning models. Often defined by the free model parameters

More information

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

9 Classification. 9.1 Linear Classifiers

9 Classification. 9.1 Linear Classifiers 9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive

More information

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples Back-Propagation Algorithm Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples 1 Inner-product net =< w, x >= w x cos(θ) net = n i=1 w i x i A measure

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Neural networks and support vector machines

Neural networks and support vector machines Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith

More information

Machine Learning and Data Mining. Linear classification. Kalev Kask

Machine Learning and Data Mining. Linear classification. Kalev Kask Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q

More information