Pattern Recognition and Machine Learning. Artificial Neural networks

Size: px
Start display at page:

Download "Pattern Recognition and Machine Learning. Artificial Neural networks"

Transcription

1 Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation Introduction The Artificial Neuron... 4 The Neural Network odel... 6 Network Structures for Siple Feed-Forward Networks... 8 Regression Analysis...9 Linear Models Estiation of a hyperplane with supervised learning Gradient Descent Practical Considerations for Gradient Descent Regression for a Sigoid Activation Function...16 Gradient Descent Regularisation Using notation and figures fro the Stanford Deep learning tutorial at:

2 Notation x d X D { X } {y } M (l a i (l w ij b i l A feature. An observed or easured value. A vector of D features. The nuber of diensions for the vector X Training saples for learning. The nuber of training saples. the activation output of the i th neuron of the l th layer. the weight for the unit j of layer l and the unit i of layer l+1. the bias ter for i th using of the the l+1 th layer 7-2

3 Introduction, also referred to as Multi-layer Perceptrons, are coputational structures coposed a weighted sus of neural units. Each neural unit is coposed of a weighted su of input units, followed by a non-linear decision function. Note that the ter neural is isleading. The coputational echanis of a neural network is only loosely inspired fro neural biology. Neural networks do NOT ipleent the sae learning and recognition algoriths as biological systes. The approach was first proposed by Warren McCullough and Walter Pitts in 1943 as a possible universal coputational odel. During the 1950 s, Frank Rosenblatt developed the idea to provide a trainable achine for pattern recognition, called a Perceptron. The perceptron is an increental learning algorith for linear classifiers invented by Frank Rosenblatt in The first Perceptron was a roo-sized analog coputer that ipleented Rosenblatz learning recognition functions. Both the learning algorith and the resulting recognition algorith are easily ipleented as coputer progras. In 1969, Marvin Minsky and Seyour Papert of MIT published a book entitled Perceptrons, that claied to docuent the fundaental liitations of the perceptron approach. Notably, they claied that a perceptron could not be constructed to perfor an exclusive OR. In the 1970s, frustrations with the liits of Artificial Intelligence research based on Sybolic Logic led a sall counity of researchers to explore the perceptron based approach. In 1973, Steven Grossberg, showed that a two layered perceptron could overcoe the probles raised by Minsky and Papert, and solve any probles that plagued sybolic AI. In 1975, Paul Werbos developed an algorith referred to as Back-Propagation that uses gradient descent to learn the paraeters for perceptrons fro classification errors with training data. During the 1980 s, Neural Networks went through a period of popularity with researchers showing that Networks could be trained to provide siple solutions to probles such as recognizing handwritten characters, recognizing spoken words, and steering a car on a highway. However, results were overtaken by ore 7-3

4 atheatically sound approaches for statistical pattern recognition such as support vector achines and boosted learning. In 1998, Yves LeCun showed that convolutional networks coposed fro any layers could outperfor other approaches recognition probles. Unfortunately such networks required extreely large aounts of data and coputation. Around 2010, with the eergence of cloud coputing cobined with planetary-scale data convolutional networks becae practical. Since 2012, Deep Networks have outperfored other approaches for recognition tasks coon to coputer Vision, Speech and robotics. A rapidly growing research counity currently seeks to extend the application beyond recognition to generation of speech and robot actions. The Artificial Neuron The siplest possible neural network is coposed of a single neuron. A neuron is a coputational unit that integrates inforation fro a vector of D features, X, to the likelihood of a hypothesis, h w,b ( a = h w,b ( X The neuron is coposed of a weighted su of input values z = w 1 x 1 + w 2 x w D x D + b followed by a non-linear activation function, f (z (soeties written (z a = h ( w,b X = f ( w T X + b Many different activation functions are used. 1 A popular choice for hidden layers is the sigoid (logistic function: f (z = 1 e z 7-4

5 This function is useful because the derivative is: df (z dz = f (z(1 f (z This gives a decision function: if h w,b ( X > 0.5 POSITIVE else NEGATIVE Other popular decision functions include the hyperbolic tangent and the softax. The hyperbolic Tangent: f (z = tanh(z = ez e z e z + e z The hyperbolic tangent is a rescaled for of sigoid ranging over [-1,1] The rectified linear function is also popular Relu: f (z = ax(0,z While Relu is discontinuous at z=0, for z > 0 : df (z dz =1 The following plot (fro A. Ng shows the sigoid, tanh and Relu functions Note that the choice of decision function will deterine the target variable y for supervised learning. 7-5

6 The Neural Network odel A neural network is a ulti-layer assebly of neurons of the for. For exaple, this is a 2-layer network: The circles labeled +1 are the bias ters. The circles on the left are the input ters, also called L 1. This is called the input layer. Note that any authors do not consider this to count as a layer The rightost circle is the output layer (in this case, only one node, also called L 3 The circles in the iddle are referred to as a hidden layer, L 2. Here we follow the notation used by Andrew Ng. The paraeters carry a superscript, referring to their layer. For exaple: a 1 (2 is the activation output of the first neuron of the second layer. W 13 (2 is the weight for input 1 of activation neuron 3 in the second level. The above network would be described by: a (2 1 = f (w (1 11 X 1 + w (1 12 X 2 + w (1 13 X 3 + b (1 1 a (2 2 = f (w (1 21 X 1 + w (1 22 X 2 + w (1 23 X 3 + b (1 2 a (2 3 = f (w (1 31 X 1 + w (1 32 X 2 + w (1 33 X 3 + b (1 3 h ( w,b X = a (3 1 = f (w (2 11 a (2 1 + w (2 12 a (2 2 + w (2 13 a (2 3 + b (2 1 This can be generalized to ultiple layers. For exaple: 7-6

7 (taken fro the Stanford ufldl tutorial, Ng et al Note that we can recognize ultiple classes by learning ultiple h w,b (x functions. In general (following the notation of Ng, this can be described by: L 1 is the input layer. L l is the l th layer L N is the output layer (l w ij denotes the paraeter (weight for the unit j of layer l and the unit i of layer l+1. l b i is the bias ter for i th using of the the l+1 th layer (l a i is the activation i in layer l Note that any authors prefer to define the input layer as l=0. This way the l is the nuber of hidden layers. In deriving the regression algoriths for learning, we will use N l z (l+1 i = w (l a (l ij + b (l i j=1 It will be ore convenient to represent this using vectors: (l z 1 % (l z z (l 2 = # & (l z N l As defined, the neural network coputes a forward propagation of the for z l+1 = w (l a (l + b (l a (l1 = f ( z (l+1 7-7

8 This is called the Feed Forward Network. Note that feed forward networks do not contain loops. The weights for a Neural Network are coonly trained using a technique called back-propagation, or back propagation of errors. Back propagation is a for of regression analysis. The ost coon approach is to eploy a for of Gradient Descent. Gradient descent calculates a loss function for the weights of the network and then iteratively seeks to iniize this loss function. This is coonly perfored as a for of supervised learning using labeled training data. Note that Gradient descent requires that the activation function be differentiable. Network Structures for Siple Feed-Forward Networks The architecture of a neural network refers to the nuber of layers, the nuber of neurons of each layer and the kind of neurons, and the kinds of neural coputation perfored. For a siple feed forward network: The nuber of input neurons is deterined by the nuber of input values for each observation (size of the feature vector, D The nuber of output neurons is the nuber of classes to be recognized. The nuber of hidden layers is deterined by the coplexity of the recognition and the aount of training data available. If your training data is linearly separable (if there exists a hyper-plane between the two classes then you do not even need a hidden layer. Use a siple linear classifiers for exaple defined by a Support Vector achine or Linear Discriinant Analysis. A single hidden layer is often sufficient for any probles (although ore reliable solutions can be found using Support vector achines or other techniques. 7-8

9 The nuber of neurons in a hidden layer depends on the coplexity of the proble, and the aount of training data available. In theory, any proble can be solved with a single hidden layer, given enough training data. In practice the quantity of training data is not practical. Experience shows that soe very difficult probles can be solved using less training data by adding additional layers. However, the spectacular progress of the last few years has been obtained by the introduction of new coputational odels such as Convolutional Neural Networks, Pooling, Auto-encoders and Long Ter-Short Ter Meory. Many of these are actually known pattern recognition techniques recycled into a Neural Network fraework. But before we get to ore advanced techniques, we need to look at the basics. Regression Analysis The paraeters for a feed forward network are coonly learned using regression analysis of labeled training data. Regression is the estiation of the paraeters for a function that aps a set of independent variables into a dependent variable. y ˆ = f ( X, w Where X is a vector of D independent (unknown variables. y ˆ is an estiate for a variable y that depends on X. and f ( is a function that aps X onto y ˆ w is a vector of paraeters for the odel. Note: For y ˆ, the hat indicates an estiated value for the target value y X is upper case because it is a rando (unknown vector. 7-9

10 Regression analysis refers to a faily of techniques for odeling and analyzing the apping one or ore independent variables fro a dependent variable. For exaple, consider the following table of age, height and weight for 10 feales: M AGE H (M W (kg We can use any two variables to estiate the third. We can use regression to estiate the paraeters for a function to predict any feature y ˆ fro the two other features X. For exaple we can predict weight fro height and age as a function. y ˆ = f ( X, w Age % where y ˆ = Weight, X = and w are the odel paraeters # Height& Linear Models A linear odel has the for y ˆ = f ( X, w = w T X + b = w 1 x 1 + w 2 x w D x D + b w 1 % The vector w w 2 = are the paraeters of the odel that relates X to y ˆ. # w D & The equation w T X + b = 0 is a hyper-plane in a D-diensional space, w 1 % w 2 w = is the noral to the hyperplane and b is a constant ter. # & w D 7-10

11 It is generally convenient to include the constant as part of the paraeter vector and to add an extra constant ter to the observed feature vector. This gives a linear odel with D+1 paraeters where the vectors are: 1 % x 1 X = # & x D and w = # w 0 w 1 w D % & where w 0 represents b. This gives the hoogeneous equation for the odel: y ˆ = f ( X, w = w T X Hoogeneous coordinates provide a unified notation for geoetric operations. With this notation, we can predict weight fro height and age using a function learned fro regression analysis. y ˆ = f ( X, w 1 % w where y ˆ = Weight, X = Age and 0 % W = w 1 are the odel # Height& # w 2 & paraeters, and the surface is a plane in the space (weight, age, height. In a D diensional space, linear hoogeneous equation is a hyper-plane. The perpendicular distance of an arbitrary point fro the plane is coputed as d = w 0 +w 1 x 1 +w 2 x 2 This can be used as an error. Estiation of a hyperplane with supervised learning In supervised learning, we learn the paraeters of a odel fro a labeled set of training data. The training data is coposed of M sets of independent variables, { X } for which we know the value of the dependent variable {y }. The training data is the set { X }, {y } For a linear odel, learning the paraeters of the odel fro a training set is equivalent to estiating the paraeters of a hyperplane using least squares. In the case of a linear odel, there are any ways to estiate the paraeters: 7-11

12 For exaple, atrix algebra provides a direct, closed for solution. Assue a training set of M observations { X } {y } where the constant d is included as a 0th ter in X and w. 1 % x 1 X = and # & x D w = # w 0 w 1 w D % & We seek the paraeters for a linear odel: y ˆ = f ( X, w = w T X This can be deterined by iniizing a Loss function that can be defined as the Square of the error. L( w M = #( w T X y 2 =1 To build or function, we will use the M training saples to copose a atrix X and a vector Y % x 11 x 12 x 1M X = x 21 x 22 x 2 M # # x D1 x D2 x DM & (D+1 rows by M coluns Y = # y 1 y 2 y M % & (M rows. We can factor the loss function to obtain: L( w = ( w T X Y T ( w T X Y To iniize the loss function, we calculate the derivative and solve for w when the derivative is 0. L( w w = 2X T Y # 2X T X w = 0 which gives X T Y = 2X T X w and thus w = (X T X 1 X T Y While this is an elegant solution for linear regression, it does not generalize to other odels. A ore general approach is to use Gradient Descent. 7-12

13 Gradient Descent Gradient descent is a popular algorith for estiating paraeters for a large variety of odels. Here we will illustrate the approach with estiation of paraeters for a linear odel. As before we seek to estiate that paraeters w for a odel y ˆ = f ( X, w = w T X fro a training set of M saples { X } {y } We will define our loss function as 1 2 average error L( w = 1 M 2M ( f ( X, w # y 2 =1 where we have included the ter 1 2 to siplify the algebra later. The gradient is the derivative of the loss function with respect to each ter w d of w is f ( X, w = #f ( X, w # w #f ( X,w 0 & & #w 0 #f ( X &,w 1 = & #w 1 & # #f ( X &,w D & % #w D ( where: f ( X,w d = 1 M w d M ( f ( X, w # y x d =1 x d is the d th coefficient of the th training vector. Of course x 0 =1 is the constant ter. We use the gradient to correct an estiate of the paraeter vector for each training saple. The correction is weighted by a learning rate α We can see M 1 M ( f ( X, w (i1 (i1 # y x d as the average error for paraeter w d =1 Gradient descent corrects by subtracting the average error weighted by the learning rate. 7-13

14 Gradient Descent Algorith Initialization: (i=0 Let w d (o = 0 for all D coefficients of w Repeat until L( w (i L( w (i1 < # : w (i = w (i1 # f ( X, w (i1 where L( w = 1 M 2M ( f ( X, w y 2 =1 That is: w (i d = w (i1 d # 1 M M ( f ( X, w (i1 y x d =1 Note that all coefficients are updated in parallel. The algorith halts when the change in L( w (i becoes sall: L( w (i L( w (i1 < # For soe sall constant. Gradient Descent can be used to learn the paraeters for a non-linear odel. For exaple, when D=2, a second order odel would be: 1 % x 1 2 x 1 X = x 2 2 x 1 # x 1 x 2 & and f ( X, w = w 0 +w 1 x 1 + w 2 x w 3 x 2 + w 4 x w 5 x 1 x 2 Practical Considerations for Gradient Descent The following are soe practical issues concerning gradient descent. Feature Scaling Make sure that features have siilar scales (range of values. One way to assure this is to noralize the training date so that each feature has a range of 1. Siple technique: divide by the range of saple values. 7-14

15 For a training set { X } of M training saples with D values. Range: r D = Max(x d - Min(x d Then M =1 : x d := x d r d Even better would be to scale with the ean and standard deviation of the each feature in the training data µ d = E{x d } 2 = E{(x d # µ d 2 } M =1 : x d := (x # µ d d d Note that the value of the loss function should always decrease: Verify that L( w (i L( w (i1 < 0. if L( w (i L( w (i1 > 0 then decrease the learning rate α You can use this to dynaically adjust the learning rate α. For exaple, one can start with a high learning rate. Any tie that L( w (i L( w (i1 > 0 1 reset a a/2 2 Recalculate the i th iteration. Halt when a < threshold. 7-15

16 Regression for a Sigoid Activation Function Gradient descent is easily used to estiate a sigoid estiation function. Recall that the sigoid function has the for h( X, w 1 = 1+ e g( X, w This function is differentiable. When g k ( X, w = w T X h( X, w 1 = 1+ e w T X and the decision rule is: if h( X, w > 0.5 then Positive else Negative The cost function for this activation function is: Cost(h( X, w, y = # % Thus the loss function would be: L( w = 1 M Cost(h( X, w, y =1 log(h( X, w if y =1 log(1 h( X, w if y = 1 L( w = 1 M # y log(h( X, w + (1 y log(1 h( X, w =1 h( X,w d = 1 M w d M (y # h( X, w x d =1 7-16

17 Gradient Descent Repeat until L( w (i L( w (i1 < # : w (i = w (i1 # h( X, w (i1 where L( w = 1 M # y log(h( X, w + (1 y log(1 h( X, w =1 w (i d = w (i1 d # 1 M M (y h( X, w (i1 x d =1 Regularisation Overfitting: Tuning the function to the training data. Overfitting is a coon proble with achine learning, particularly when there are any features and not enough training data. One solution is to regularise the function by adding an additional ter. replace w (i 0 = w (i1 0 # 1 M M (y h( X, w x 0 with =1 w (i d = w (i1 d # 1 M M (y h( X, w (i1 x d + % w d =1 (i1 7-17

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Feedforward Networks

Feedforward Networks Feedforward Networks Gradient Descent Learning and Backpropagation Christian Jacob CPSC 433 Christian Jacob Dept.of Coputer Science,University of Calgary CPSC 433 - Feedforward Networks 2 Adaptive "Prograing"

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3

More information

Support Vector Machines. Maximizing the Margin

Support Vector Machines. Maximizing the Margin Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the

More information

Feedforward Networks. Gradient Descent Learning and Backpropagation. Christian Jacob. CPSC 533 Winter 2004

Feedforward Networks. Gradient Descent Learning and Backpropagation. Christian Jacob. CPSC 533 Winter 2004 Feedforward Networks Gradient Descent Learning and Backpropagation Christian Jacob CPSC 533 Winter 2004 Christian Jacob Dept.of Coputer Science,University of Calgary 2 05-2-Backprop-print.nb Adaptive "Prograing"

More information

Support Vector Machines. Goals for the lecture

Support Vector Machines. Goals for the lecture Support Vector Machines Mark Craven and David Page Coputer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Soe of the slides in these lectures have been adapted/borrowed fro aterials developed

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Introduction Biologically Motivated Crude Model Backpropagation

Introduction Biologically Motivated Crude Model Backpropagation Introduction Biologically Motivated Crude Model Backpropagation 1 McCulloch-Pitts Neurons In 1943 Warren S. McCulloch, a neuroscientist, and Walter Pitts, a logician, published A logical calculus of the

More information

Neural Networks: Introduction

Neural Networks: Introduction Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1

More information

Support Vector Machines MIT Course Notes Cynthia Rudin

Support Vector Machines MIT Course Notes Cynthia Rudin Support Vector Machines MIT 5.097 Course Notes Cynthia Rudin Credit: Ng, Hastie, Tibshirani, Friedan Thanks: Şeyda Ertekin Let s start with soe intuition about argins. The argin of an exaple x i = distance

More information

Combining Classifiers

Combining Classifiers Cobining Classifiers Generic ethods of generating and cobining ultiple classifiers Bagging Boosting References: Duda, Hart & Stork, pg 475-480. Hastie, Tibsharini, Friedan, pg 246-256 and Chapter 10. http://www.boosting.org/

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

Ensemble Based on Data Envelopment Analysis

Ensemble Based on Data Envelopment Analysis Enseble Based on Data Envelopent Analysis So Young Sohn & Hong Choi Departent of Coputer Science & Industrial Systes Engineering, Yonsei University, Seoul, Korea Tel) 82-2-223-404, Fax) 82-2- 364-7807

More information

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization

Support Vector Machine Classification of Uncertain and Imbalanced data using Robust Optimization Recent Researches in Coputer Science Support Vector Machine Classification of Uncertain and Ibalanced data using Robust Optiization RAGHAV PAT, THEODORE B. TRAFALIS, KASH BARKER School of Industrial Engineering

More information

Feedforward Networks

Feedforward Networks Feedforward Neural Networks - Backpropagation Feedforward Networks Gradient Descent Learning and Backpropagation CPSC 533 Fall 2003 Christian Jacob Dept.of Coputer Science,University of Calgary Feedforward

More information

Ch 12: Variations on Backpropagation

Ch 12: Variations on Backpropagation Ch 2: Variations on Backpropagation The basic backpropagation algorith is too slow for ost practical applications. It ay take days or weeks of coputer tie. We deonstrate why the backpropagation algorith

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networs Brain s. Coputer Designed to sole logic and arithetic probles Can sole a gazillion arithetic and logic probles in an hour absolute precision Usually one ery fast procesor high

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Neural Networks biological neuron artificial neuron 1

Neural Networks biological neuron artificial neuron 1 Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input

More information

Introduction To Artificial Neural Networks

Introduction To Artificial Neural Networks Introduction To Artificial Neural Networks Machine Learning Supervised circle square circle square Unsupervised group these into two categories Supervised Machine Learning Supervised Machine Learning Supervised

More information

Soft-margin SVM can address linearly separable problems with outliers

Soft-margin SVM can address linearly separable problems with outliers Non-linear Support Vector Machines Non-linearly separable probles Hard-argin SVM can address linearly separable probles Soft-argin SVM can address linearly separable probles with outliers Non-linearly

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

Boosting with log-loss

Boosting with log-loss Boosting with log-loss Marco Cusuano-Towner Septeber 2, 202 The proble Suppose we have data exaples {x i, y i ) i =... } for a two-class proble with y i {, }. Let F x) be the predictor function with the

More information

COS 424: Interacting with Data. Written Exercises

COS 424: Interacting with Data. Written Exercises COS 424: Interacting with Data Hoework #4 Spring 2007 Regression Due: Wednesday, April 18 Written Exercises See the course website for iportant inforation about collaboration and late policies, as well

More information

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words)

A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine. (1900 words) 1 A Self-Organizing Model for Logical Regression Jerry Farlow 1 University of Maine (1900 words) Contact: Jerry Farlow Dept of Matheatics Univeristy of Maine Orono, ME 04469 Tel (07) 866-3540 Eail: farlow@ath.uaine.edu

More information

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space

Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space Journal of Machine Learning Research 3 (2003) 1333-1356 Subitted 5/02; Published 3/03 Grafting: Fast, Increental Feature Selection by Gradient Descent in Function Space Sion Perkins Space and Reote Sensing

More information

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search

Quantum algorithms (CO 781, Winter 2008) Prof. Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search Quantu algoriths (CO 781, Winter 2008) Prof Andrew Childs, University of Waterloo LECTURE 15: Unstructured search and spatial search ow we begin to discuss applications of quantu walks to search algoriths

More information

A Smoothed Boosting Algorithm Using Probabilistic Output Codes

A Smoothed Boosting Algorithm Using Probabilistic Output Codes A Soothed Boosting Algorith Using Probabilistic Output Codes Rong Jin rongjin@cse.su.edu Dept. of Coputer Science and Engineering, Michigan State University, MI 48824, USA Jian Zhang jian.zhang@cs.cu.edu

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research

Foundations of Machine Learning Boosting. Mehryar Mohri Courant Institute and Google Research Foundations of Machine Learning Boosting Mehryar Mohri Courant Institute and Google Research ohri@cis.nyu.edu Weak Learning Definition: concept class C is weakly PAC-learnable if there exists a (weak)

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Topic 5a Introduction to Curve Fitting & Linear Regression

Topic 5a Introduction to Curve Fitting & Linear Regression /7/08 Course Instructor Dr. Rayond C. Rup Oice: A 337 Phone: (95) 747 6958 E ail: rcrup@utep.edu opic 5a Introduction to Curve Fitting & Linear Regression EE 4386/530 Coputational ethods in EE Outline

More information

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION

A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION A eshsize boosting algorith in kernel density estiation A MESHSIZE BOOSTING ALGORITHM IN KERNEL DENSITY ESTIMATION C.C. Ishiekwene, S.M. Ogbonwan and J.E. Osewenkhae Departent of Matheatics, University

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons

More information

Chapter 1: Basics of Vibrations for Simple Mechanical Systems

Chapter 1: Basics of Vibrations for Simple Mechanical Systems Chapter 1: Basics of Vibrations for Siple Mechanical Systes Introduction: The fundaentals of Sound and Vibrations are part of the broader field of echanics, with strong connections to classical echanics,

More information

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer.

UNIVERSITY OF TRENTO ON THE USE OF SVM FOR ELECTROMAGNETIC SUBSURFACE SENSING. A. Boni, M. Conci, A. Massa, and S. Piffer. UIVRSITY OF TRTO DIPARTITO DI IGGRIA SCIZA DLL IFORAZIO 3823 Povo Trento (Italy) Via Soarive 4 http://www.disi.unitn.it O TH US OF SV FOR LCTROAGTIC SUBSURFAC SSIG A. Boni. Conci A. assa and S. Piffer

More information

A method to determine relative stroke detection efficiencies from multiplicity distributions

A method to determine relative stroke detection efficiencies from multiplicity distributions A ethod to deterine relative stroke detection eiciencies ro ultiplicity distributions Schulz W. and Cuins K. 2. Austrian Lightning Detection and Inoration Syste (ALDIS), Kahlenberger Str.2A, 90 Vienna,

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair

A Simplified Analytical Approach for Efficiency Evaluation of the Weaving Machines with Automatic Filling Repair Proceedings of the 6th SEAS International Conference on Siulation, Modelling and Optiization, Lisbon, Portugal, Septeber -4, 006 0 A Siplified Analytical Approach for Efficiency Evaluation of the eaving

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Least Squares Fitting of Data

Least Squares Fitting of Data Least Squares Fitting of Data David Eberly, Geoetric Tools, Redond WA 98052 https://www.geoetrictools.co/ This work is licensed under the Creative Coons Attribution 4.0 International License. To view a

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

NBN Algorithm Introduction Computational Fundamentals. Bogdan M. Wilamoswki Auburn University. Hao Yu Auburn University

NBN Algorithm Introduction Computational Fundamentals. Bogdan M. Wilamoswki Auburn University. Hao Yu Auburn University NBN Algorith Bogdan M. Wilaoswki Auburn University Hao Yu Auburn University Nicholas Cotton Auburn University. Introduction. -. Coputational Fundaentals - Definition of Basic Concepts in Neural Network

More information

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks Sections 18.6 and 18.7 Analysis of Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline Univariate regression

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy

Nonmonotonic Networks. a. IRST, I Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I Povo (Trento) Italy Storage Capacity and Dynaics of Nononotonic Networks Bruno Crespi a and Ignazio Lazzizzera b a. IRST, I-38050 Povo (Trento) Italy, b. Univ. of Trento, Physics Dept., I-38050 Povo (Trento) Italy INFN Gruppo

More information

Predicting FTSE 100 Close Price Using Hybrid Model

Predicting FTSE 100 Close Price Using Hybrid Model SAI Intelligent Systes Conference 2015 Noveber 10-11, 2015 London, UK Predicting FTSE 100 Close Price Using Hybrid Model Bashar Al-hnaity, Departent of Electronic and Coputer Engineering, Brunel University

More information

Interactive Markov Models of Evolutionary Algorithms

Interactive Markov Models of Evolutionary Algorithms Cleveland State University EngagedScholarship@CSU Electrical Engineering & Coputer Science Faculty Publications Electrical Engineering & Coputer Science Departent 2015 Interactive Markov Models of Evolutionary

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

COMP-4360 Machine Learning Neural Networks

COMP-4360 Machine Learning Neural Networks COMP-4360 Machine Learning Neural Networks Jacky Baltes Autonomous Agents Lab University of Manitoba Winnipeg, Canada R3T 2N2 Email: jacky@cs.umanitoba.ca WWW: http://www.cs.umanitoba.ca/~jacky http://aalab.cs.umanitoba.ca

More information

ZISC Neural Network Base Indicator for Classification Complexity Estimation

ZISC Neural Network Base Indicator for Classification Complexity Estimation ZISC Neural Network Base Indicator for Classification Coplexity Estiation Ivan Budnyk, Abdennasser Сhebira and Kurosh Madani Iages, Signals and Intelligent Systes Laboratory (LISSI / EA 3956) PARIS XII

More information

Figure 1: Equivalent electric (RC) circuit of a neurons membrane

Figure 1: Equivalent electric (RC) circuit of a neurons membrane Exercise: Leaky integrate and fire odel of neural spike generation This exercise investigates a siplified odel of how neurons spike in response to current inputs, one of the ost fundaental properties of

More information

Neural Networks (Part 1) Goals for the lecture

Neural Networks (Part 1) Goals for the lecture Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed

More information

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm

Symbolic Analysis as Universal Tool for Deriving Properties of Non-linear Algorithms Case study of EM Algorithm Acta Polytechnica Hungarica Vol., No., 04 Sybolic Analysis as Universal Tool for Deriving Properties of Non-linear Algoriths Case study of EM Algorith Vladiir Mladenović, Miroslav Lutovac, Dana Porrat

More information

Geometrical intuition behind the dual problem

Geometrical intuition behind the dual problem Based on: Geoetrical intuition behind the dual proble KP Bennett, EJ Bredensteiner, Duality and Geoetry in SVM Classifiers, Proceedings of the International Conference on Machine Learning, 2000 1 Geoetrical

More information

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are,

are equal to zero, where, q = p 1. For each gene j, the pairwise null and alternative hypotheses are, Page of 8 Suppleentary Materials: A ultiple testing procedure for ulti-diensional pairwise coparisons with application to gene expression studies Anjana Grandhi, Wenge Guo, Shyaal D. Peddada S Notations

More information

PAC-Bayes Analysis Of Maximum Entropy Learning

PAC-Bayes Analysis Of Maximum Entropy Learning PAC-Bayes Analysis Of Maxiu Entropy Learning John Shawe-Taylor and David R. Hardoon Centre for Coputational Statistics and Machine Learning Departent of Coputer Science University College London, UK, WC1E

More information

Qualitative Modelling of Time Series Using Self-Organizing Maps: Application to Animal Science

Qualitative Modelling of Time Series Using Self-Organizing Maps: Application to Animal Science Proceedings of the 6th WSEAS International Conference on Applied Coputer Science, Tenerife, Canary Islands, Spain, Deceber 16-18, 2006 183 Qualitative Modelling of Tie Series Using Self-Organizing Maps:

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

A Theoretical Framework for Deep Transfer Learning

A Theoretical Framework for Deep Transfer Learning A Theoretical Fraewor for Deep Transfer Learning Toer Galanti The School of Coputer Science Tel Aviv University toer22g@gail.co Lior Wolf The School of Coputer Science Tel Aviv University wolf@cs.tau.ac.il

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Pattern Classification using Simplified Neural Networks with Pruning Algorithm

Pattern Classification using Simplified Neural Networks with Pruning Algorithm Pattern Classification using Siplified Neural Networks with Pruning Algorith S. M. Karuzzaan 1 Ahed Ryadh Hasan 2 Abstract: In recent years, any neural network odels have been proposed for pattern classification,

More information

Using a De-Convolution Window for Operating Modal Analysis

Using a De-Convolution Window for Operating Modal Analysis Using a De-Convolution Window for Operating Modal Analysis Brian Schwarz Vibrant Technology, Inc. Scotts Valley, CA Mark Richardson Vibrant Technology, Inc. Scotts Valley, CA Abstract Operating Modal Analysis

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

More information

An improved self-adaptive harmony search algorithm for joint replenishment problems

An improved self-adaptive harmony search algorithm for joint replenishment problems An iproved self-adaptive harony search algorith for joint replenishent probles Lin Wang School of Manageent, Huazhong University of Science & Technology zhoulearner@gail.co Xiaojian Zhou School of Manageent,

More information

Ufuk Demirci* and Feza Kerestecioglu**

Ufuk Demirci* and Feza Kerestecioglu** 1 INDIRECT ADAPTIVE CONTROL OF MISSILES Ufuk Deirci* and Feza Kerestecioglu** *Turkish Navy Guided Missile Test Station, Beykoz, Istanbul, TURKEY **Departent of Electrical and Electronics Engineering,

More information

Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!

Regression and Classification with Linear Models CMPSCI 383 Nov 15, 2011! Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011! 1 Todayʼs topics" Learning from Examples: brief review! Univariate Linear Regression! Batch gradient descent! Stochastic gradient

More information

An Adaptive UKF Algorithm for the State and Parameter Estimations of a Mobile Robot

An Adaptive UKF Algorithm for the State and Parameter Estimations of a Mobile Robot Vol. 34, No. 1 ACTA AUTOMATICA SINICA January, 2008 An Adaptive UKF Algorith for the State and Paraeter Estiations of a Mobile Robot SONG Qi 1, 2 HAN Jian-Da 1 Abstract For iproving the estiation accuracy

More information

Learning Methods for Linear Detectors

Learning Methods for Linear Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2011/2012 Lesson 20 27 April 2012 Contents Learning Methods for Linear Detectors Learning Linear Detectors...2

More information

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA   1/ 21 Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural

More information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information

Inspection; structural health monitoring; reliability; Bayesian analysis; updating; decision analysis; value of information Cite as: Straub D. (2014). Value of inforation analysis with structural reliability ethods. Structural Safety, 49: 75-86. Value of Inforation Analysis with Structural Reliability Methods Daniel Straub

More information

A note on the multiplication of sparse matrices

A note on the multiplication of sparse matrices Cent. Eur. J. Cop. Sci. 41) 2014 1-11 DOI: 10.2478/s13537-014-0201-x Central European Journal of Coputer Science A note on the ultiplication of sparse atrices Research Article Keivan Borna 12, Sohrab Aboozarkhani

More information

Artificial Neural Networks The Introduction

Artificial Neural Networks The Introduction Artificial Neural Networks The Introduction 01001110 01100101 01110101 01110010 01101111 01101110 01101111 01110110 01100001 00100000 01110011 01101011 01110101 01110000 01101001 01101110 01100001 00100000

More information

Hand Written Digit Recognition Using Backpropagation Neural Network on Master-Slave Architecture

Hand Written Digit Recognition Using Backpropagation Neural Network on Master-Slave Architecture J.V.S.Srinivas et al./ International Journal of Coputer Science & Engineering Technology (IJCSET) Hand Written Digit Recognition Using Backpropagation Neural Network on Master-Slave Architecture J.V.S.SRINIVAS

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs. artifical neural

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

The Algorithms Optimization of Artificial Neural Network Based on Particle Swarm

The Algorithms Optimization of Artificial Neural Network Based on Particle Swarm Send Orders for Reprints to reprints@benthascience.ae The Open Cybernetics & Systeics Journal, 04, 8, 59-54 59 Open Access The Algoriths Optiization of Artificial Neural Network Based on Particle Swar

More information

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks Sections 18.6 and 18.7 Artificial Neural Networks CS4811 - Artificial Intelligence Nilufer Onder Department of Computer Science Michigan Technological University Outline The brain vs artifical neural networks

More information

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 2. Perceptrons COMP9444 17s2 Perceptrons 1 Outline Neurons Biological and Artificial Perceptron Learning Linear Separability Multi-Layer Networks COMP9444 17s2

More information