Neural Nets Supervised learning
|
|
- Alfred Jennings
- 5 years ago
- Views:
Transcription
1 6.034 Artificial Intelligence Big idea: Learning as acquiring a function on feature vectors Background Nearest Neighbors Identification Trees Neural Nets Neural Nets Supervised learning y s(z) w w 0 w w n x 1 x 2 x n Training: Choose connection weights that minimize error (Observed - Computed) = error driven learning Prediction: Propagate input feature values through the network of artificial neurons Advantages: Fast prediction Does feature weighting Very generally applicable Disadvantages: Very slow training Overfitting is easy 1
2 x! Multi-Layer Neural net x 1 x 2 y! x 3 y 1 y 2 x u input hidden hidden output More powerful than single layer. Lower layers transform the input problem into more tractable (linearly separable) problems for subsequent layers. Error driven learning:minimize error Error is difference between (Observed/Data - Currently computed) or Error function E(w) = (Observed - computed) value Goal is to make the Error 0 by adjusting the weights on the neural net units So first we have to figure out how far off we are: we must make a forward propagation through the net w/ our current weights to see how close we are to the truth Once we know how close we are, we must figure out how to change the w s to minimize E(w) So - how do we minimize a function? Set derivative to 0! So, we do this by gradient ascent: look at the slope of the Error function E with respect to the weights, and figure out how to minimize E by changing the w s: if slope is negative, move down; o.w. up. (We want to get to the place where The backpropagation algorithm tells us how to adjust w s. 2
3 Minimizing by Gradient Descent Basic Problem: Given initial parameters w 0 with error E=G(w 0 ), find w that is local minimum of G, that is, all first derivatives of G are zero. E Slope = dg dw G(w) Approach: 1. Find slope of G at w 0 2. Move to w 1 which is down slope Local min < > w 1 w 0 w 0 w 1 w Gradient is n-dimensional slope Gradient descent, r is rate a small step size Stop when gradient is very small (or change in G is very small) The simplest possible net y Output y = w 1 x 1 + w 2 x 2 w 0 w 1 w 2 x 1 Error or Performance -1function = x 2 1/2(y* y) 2, so what is the derivative of the error function wrt w? It is just (y*-y) dy/dw (Note that the two negative signs cancel) For this linear case easy to figure out how to change w in order to change y - i.e., dy/dw. It is just the vector of coefficients on the w i - i.e., [x 1 x 2 ] (e.g., dy/dx for y=wx + 2 is just x ) So Credit/blame assignment is simple 3
4 Simplest neural net Learning Rule New weight w Original weight w Desired (correct) output!!! w! = w + r(d " z) x learning rate (small) actual output Deriv Error: derivative of Error fn wrt w; how error changes if you change w a little bit (this is the slope of the error function at point w (d! z) Change weights on non-zero inputs in direction of error. (gradient ascent) - note for a single layer, this is linear, so derivative of simple: it is just [x1 x2]. We update the weight - we take a step based on the size of the error times the derivative, a direction in which to take a step (size, plus direction). This is gradient descent w x=0 y= w 1 x 1 +w 2 x 2 dy/dw = x = [x 1 x 2 ] 4
5 Current hyperplane x Correct hyperplane Δw = r(y* y)x Note: Weight vector changed in the direction of x Learning AND IN P U T TARGET INPUT WEIG H T S ACTIVATIO N OUTPU T TARGE T WEIG H T S L EARN? OK CHANGE
6 Learning XOR: this is not separable by a single line, which is all a 1-layer neural net can do 0,1 1,1 0,0 1,0 Learning XOR: note that we need two lines through the plane to separability red from green - there isn t a single slice that will work to separate the two But two slices will work 0,1 1,1 0,0 1,0 So, we need a hidden layer with two units (one to make each slice), and then one output neuron to decide between these two slices: 6
7 XOR solvable by 2-layer net: one layer separates off/off & on/on together; next layer can separate on/off, off/on A more complex pattern will require more hidden units, depending on how many slices we need 0,1 1,1 Think about what happens now. How many slices will we need? Therefore, how many hidden units? Will we need more than 2 layers? 0,0 1,0 7
8 Now we add 2 things, one at a time Sigmoid unit on the unit (instead of just adding) Intermediate (hidden) units between input and output The first just makes the computation of the derivative relating the output back to the input of a unit a bit more complicated The second also - how to allocate credit/blame yz Sigmoid Unit zy w 0 w 1 w 2 x 1-1 x 2 s(z) w w 0 w w n x 1 x 2 x n n 1 x out = " w i x i s(x out ) = 1 + e! x out i x 2 output surface x 1 y=1/2 8
9 Derivative of the sigmoid wrt its input 1 s(x) = 1 + e! x ds(x) = d x dx "# (1 + e! x )!1 $ % = [!(1+ e! x )!2 ][!e! x ] " 1 $ " e! x $ = # & 1 + e! x % ' & # 1+ e! x ' % = s(x)(1! s(x)) or just z(1! z) where z is the output from the sigmoid function, i.e., the final output from this particular net unit Note - if this is the 'final output' we sometimes use o instead of z So now, the derivative (slope) relating changes in w to changes in the output is a bit more complicated: How this changes the computation of the way to twiddle the weights Original:!w= r(d " z)! x After adding sigmoid:!w= r(d " z)! x[z(1" z)] w 0 zy w 1 w 2 x 1-1 x 2 zy s(z) s(x) w w 0 w w n x 1 x 2 x n So far, this is all We ve added - to take care of computing the derivative correctly as we go thru a Sigmoid unit Note: we will d for the true or desired value of y, And y net or o (as in Winston) for the value output or computed by the net 9
10 Now we add hidden units y 3 z 3-1 z 1 w 03 w 13 y 1 w 23 y 2 z 2 Given (y obs! y net /calc ), how do we twiddle the w ij? w w w 22 w 02-1 w 11 w 12-1 x 1 x 2 The KEY POINT: hidden units have no correct target value so how do we know what their error is? If we don t know error, how can we know how to change the weights for them? Ans: we use the error from the output and propagate it back - allocate the blame proportionately over the next units in the layer down Q: How much blame should each unit get? A: it should be proportional to the weight on what it passes to the next unit Minimizing by Gradient Descent Basic Problem: Given initial parameters w 0 with error P=G(w 0 ), find w that is local minimum of G, that is, all first derivatives of G are zero. Approach: 1. Find slope of G at w 0 2. Move to w 1 which is down slope & ' G ' G ( wg = $! %' w0 ' w1 wi +1 = wi " r! w G ' G ' w n #! " P(erformance fn) Local min < > Slope = dg dw w 1 w 0 w 0 w 1 w Gradient is n-dimensional slope G(w) Gradient descent, r is rate a small step size Stop when gradient is very small (or change in G is very small) (Recall that the minus sign in front will be canceled by the negative sign for del G of our Performance function in our actual formula.) 10
11 The simplest net - just the output unit w z (Note: p z =y * w z ) Output from unit Input to unit y p z z Sigmoid Output unit (Note that the output unit could also include a bias w, set to 1, Included as in the 1-layer net case to make the threshold come out to 0) The picture: propagating back one step - the delta rule Intuitively, delp/del w z tells us how much the P(erformance) function (the difference between the actual desired output and the output the net computes given its current weights) will change if we twiddle w z a little bit This calculation has three parts OUTPUT UNIT (Note: p z =y * w z ) Output from unit Input to unit y p z Output z Sigmoid P =!1 / 2(observed! computed) 2 unit dp = (d[esired]! z) = derivative of error function dz "P = "P "w z "z # "z # "p z = "p z "w z "P "z # "z # "(w zy) = "p z "w z "P "w z = (d! z) # z(1! z) # y w z Output value z=s(p z )!P!w z = (d " z) # z(1" z) # y delta! z 11
12 The three components So,!P!w z = (d " z) # z(1" z) # y Derivative of the performance function wrt z Input to the unit Derivative of output z wrt the output from the neuron before passing it through the sigmoid function Intuition!P!w z = (d " z) # z(1" z) # y w z y p z z Sigmoid x Output value Intuitively, this makes sense, because the size of the effect on error P is proportional to the amount of flow that comes into a node, y, which is the derivative wrt w z, times the derivative twiddle produced by the sigmoid, which is z(1-z), times the derivative of P wrt z 12
13 Or, it divides into 3 parts, one for each part of the chain in the net - read it backward off the formula w z y p z z Sigmoid x!p!w z = (d " z) # z(1" z) # y Output value Part 1: del y w/del w = y Or, it divides into 3 parts, one for each part of the chain in the net - read it backward off the formula w z y p z z Sigmoid x!p!w z = (d " z) # z(1 " z) # y Output value Part 2: del sigmoid(z)/del w = -z(1-z) (I have put the sign out front.) 13
14 Or, it divides into 3 parts, one for each part of the chain in the net - read it backward off the formula w z!p!w z = (d " z) # z(1 " z) # y Part 3: del P/del w = (d-z) y p z z Sigmoid x Delta rule for output unit OUTPUT UNIT w z y p z z Sigmoid x!p!w z = (d " z) # z(1 " z) # y delta! z For output unit, where z = o, and re-arranging:! output =! o = o(1" o) # (d " o) Actual weight change $w = r # input to unit #! o or $w = r # y # o(1" o) # (d " o) 14
15 So now we know how to change the output unit weight, given some example (input, true output) training example pair:!w = r " y " o(1 # o) " (d # o) w new = w old +!w = w old + r " y " o(1# o) " (d # o) where r is the learning rate; y is the input to the unit from the previous layer; d is the desired, true output; and o is the value actually compute the net w/ its current weights But, how do we know y, the input to this unit? Ans: because we had to compute the forward propagation first, from the very first input to the output at the end of the net, and this tells us,given the current weights, what the net outputs at each of its units, including the 'hidden' ones (like y) Backpropagation An efficient method of implementing gradient descent for neural networks r Gradient descent Backprop rule 1. Initialize weights to small random values 2. Choose a random sample input feature vector 3. Compute total input and output for each unit (forward prop) 4. Compute for output layer! j! o! output = o(1 " o)(d " o) 5. Compute for preceding layer by backprop rule (repeat for all layers) 6. Compute weight change by descent rule (repeat for all weights) 15
16 Now that we know this, we can find the weight changes required in the previous layer. Here we use Winston s notation: i l = input to left neuron; w l = weight on left neuron; p l = product of i l and w l on left neuron (input to sigmoid); o l = output from left neuron (and input to right neuron) i r = input to right neuron (from left neuron); w r = weight on right neuron; p r = product of i r and w r on right neuron (input to sigmoid); o r = output from right neuron We already know how to find the wt delta for the right hand side neuron (this is like the output layer as before): 16
17 Now we compute the weight delta for the layer to the left of this neuron: Recall we called this δ z before So if we use the following notation we can see the key recursive pattern better: So we can write the weight derivatives more simply as a function of the input to each neuron plus the delta for each neuron: Clearly, if add another layer to the left, we can just repeat this computation: we just take the delta for the new layer (which is given in terms of things we already know plus the delta to the right) and the input to that layer (we we can compute via forward propagation) 17
18 From deltas to weight adjustments First, the output layer, the neuron on the right:! output =! right = o(1 " o) # (d " o)! left =! l = o l (1" o l ) # w l$r #! right %w right = r # o l #! right (Here I have used the subscript right instead of r to avoid confusion with the learning rate r; Winston uses alpha instead of r in his netnotes.pdf for just this reason. Note that o l = the output from the left neuron, is the input to the right neuron, so this equation has the form of r X input to final layer neuron X delta for final neuron) Finding the next adjustment back, w l, involves using the delta from the right, the the weight on the right unit, the delta for the left unit, and the input to the left unit:! output =! right = o(1" o) # (d " o)! left =! l = o l (1 " o l ) # w right #! right $w l = r # i l #! l 18
19 Let s do an example! Suppose we have training example (1, 2), and wts as shown. So, the input to the net is 1 and d (the true value) is 2. Let the learning rate r be 0.2. Q: what are the new wts after one iteration of backprop? = desired output value, d Step 1: Do forward propagation to find all the p l, o l, (hence i l ), p r, o r (the last the computed output o) (a) Right neuron, his gives: 1 x 2.0 = 2.0 for p l ; 1/(1+e -2 ) =0.88 for o l (b) Left neuron: o l = i l =0.88 x 1.0 = 0.88 for p r ; 1/(1+e )= 0.71 for o r Step 2: Now start backpropagation. First, compute the weight change for the output weight, w r = Δw = r x i r x o(1-o) (d o) = 0.2 x 0.88 x 0.71 x (1 0.71) x (2 0.71) = Note: o(1 o)(d o) = δ r = new weight w r = = Step 3: Compute the weight change for the next level to the left, w l = Δw = r x I i x o l (1 o l ) x w r x δ r = 0.2 x 1.0 x 0.88 x (1 0.88) x 1.0 x = new weight w l = = So we ve adjusted both the weights upwards, as desired (compare to the 1-layer case) though very slowly Now we could go back and iterate with this sample point until we get no further change in the weights and then we go to the next training example. For a much more detailed example in the same vein, please see my other handout on my web page, neuralnetex.pdf! Let s sum up 19
20 Backpropagation An efficient method of implementing gradient descent for neural networks r Gradient descent Backprop rule 1. Initialize weights to small random values 2. Choose a random sample input feature vector 3. Compute total input and output for each unit (forward prop) 4. Compute for output layer! j! o! output = o(1 " o)(d " o) 5. Compute for preceding layer by backprop rule (repeat for all layers) 6. Compute weight change by descent rule (repeat for all weights) Training Neural Nets without overfitting, hopefully Given: Data set, desired outputs and a neural net with m weights. Find a setting for the weights that will give good predictive performance on new data. Estimate expected performance on new data. 1. Split data set (randomly) into three subsets: Training set used for picking weights Validation set used to stop training Test set used to evaluate performance 2. Pick random, small weights as initial values 3. Perform iterative minimization of error over training set. 4. Stop when error on validation set reaches a minimum (to avoid overfitting). 5. Use best weights to compute error on test set, which is estimate of performance on new data. Do not repeat training to improve this. Can use cross-validation if data set is too small to divide into three subsets. 20
21 On-line vs off-line There are two approaches to performing the error minimization: On-line training present X i and Y i * (chosen randomly from the training set). Change the weights to reduce the error on this instance. Repeat. Off-line training change weights to reduce the total error on training set (sum over all instances). On-line training is a stochastic approximation to gradient descent since the gradient based on one instance is noisy relative to the full gradient (based on all instances). This can be beneficial in pushing the system out of shallow local minima. Classification vs Regression A neural net with a single sigmoid output unit is aimed at binary classification. Class is 0 if y<0.5 and 1 otherwise. For multiway classification, use one output unit per class. 21
22 Classification vs Regression A neural net with a single sigmoid output unit is aimed at binary classification. Class is 0 if y<0.5 and 1 otherwise. For multiway classification, use one output unit per class. A sigmoid output unit is not suitable for regression, since sigmoids are designed to change quickly from 0 to 1. For regression, we want a linear output unit, that is, remove the output non-linearity. The rest of the net still retains the sigmoid units. 22
) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.
1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2011 Recitation 8, November 3 Corrected Version & (most) solutions
More informationAI Programming CS F-20 Neural Networks
AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols
More informationIntroduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis
Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.
More informationInput layer. Weight matrix [ ] Output layer
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2003 Recitation 10, November 4 th & 5 th 2003 Learning by perceptrons
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationNeural networks. Chapter 20. Chapter 20 1
Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,
More informationNeural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron
More information6.034f Neural Net Notes October 28, 2010
6.034f Neural Net Notes October 28, 2010 These notes are a supplement to material presented in lecture. I lay out the mathematics more prettily and etend the analysis to handle multiple-neurons per layer.
More informationMultilayer Perceptrons and Backpropagation
Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:
More informationFeedforward Neural Nets and Backpropagation
Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features
More informationStatistical NLP for the Web
Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks
More information1 What a Neural Network Computes
Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationSPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks
Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationCSCI567 Machine Learning (Fall 2018)
CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due
More informationAN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009
AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is
More information6.034 Notes: Section 12.1
6.034 Notes: Section 12.1 Slide 12.1.1 In this installment, we continue to look at maximal-margin classifiers. In particular, we will look at an extension of this approach that gives us a quantum improvement
More informationPattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore
Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationCSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!
CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya
More informationNonlinear Classification
Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions
More informationSupervised Learning in Neural Networks
The Norwegian University of Science and Technology (NTNU Trondheim, Norway keithd@idi.ntnu.no March 7, 2011 Supervised Learning Constant feedback from an instructor, indicating not only right/wrong, but
More informationMachine Learning. Neural Networks
Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE
More informationARTIFICIAL INTELLIGENCE. Artificial Neural Networks
INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationNeural networks. Chapter 19, Sections 1 5 1
Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10
More informationMultilayer Neural Networks
Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More informationEPL442: Computational
EPL442: Computational Learning Systems Lab 2 Vassilis Vassiliades Department of Computer Science University of Cyprus Outline Artificial Neuron Feedforward Neural Network Back-propagation Algorithm Notes
More informationArtificial Neural Networks
Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on
More informationBackpropagation: The Good, the Bad and the Ugly
Backpropagation: The Good, the Bad and the Ugly The Norwegian University of Science and Technology (NTNU Trondheim, Norway keithd@idi.ntnu.no October 3, 2017 Supervised Learning Constant feedback from
More informationFeed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.
Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will
More informationNeural Networks (Part 1) Goals for the lecture
Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed
More informationLecture 4: Perceptrons and Multilayer Perceptrons
Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationMachine Learning (CSE 446): Backpropagation
Machine Learning (CSE 446): Backpropagation Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 8, 2017 1 / 32 Neuron-Inspired Classifiers correct output y n L n loss hidden units
More informationC4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang
C4 Phenomenological Modeling - Regression & Neural Networks 4040-849-03: Computational Modeling and Simulation Instructor: Linwei Wang Recall.. The simple, multiple linear regression function ŷ(x) = a
More informationMachine Learning
Machine Learning 10-601 Maria Florina Balcan Machine Learning Department Carnegie Mellon University 02/10/2016 Today: Artificial neural networks Backpropagation Reading: Mitchell: Chapter 4 Bishop: Chapter
More informationRevision: Neural Network
Revision: Neural Network Exercise 1 Tell whether each of the following statements is true or false by checking the appropriate box. Statement True False a) A perceptron is guaranteed to perfectly learn
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationArtificial Intelligence
Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationMultilayer Neural Networks
Multilayer Neural Networks Introduction Goal: Classify objects by learning nonlinearity There are many problems for which linear discriminants are insufficient for minimum error In previous methods, the
More informationUnit III. A Survey of Neural Network Model
Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of
More informationNeural Networks: Basics. Darrell Whitley Colorado State University
Neural Networks: Basics Darrell Whitley Colorado State University In the Beginning: The Perceptron X1 W W 1,1 1,2 X2 W W 2,1 2,2 W source, destination In the Beginning: The Perceptron The Perceptron Learning
More informationNeural Networks biological neuron artificial neuron 1
Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationArtificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!
Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error
More informationAutomatic Differentiation and Neural Networks
Statistical Machine Learning Notes 7 Automatic Differentiation and Neural Networks Instructor: Justin Domke 1 Introduction The name neural network is sometimes used to refer to many things (e.g. Hopfield
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationArtificial Neural Networks Examination, June 2005
Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either
More informationSupervised Learning. George Konidaris
Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,
More informationLab 5: 16 th April Exercises on Neural Networks
Lab 5: 16 th April 01 Exercises on Neural Networks 1. What are the values of weights w 0, w 1, and w for the perceptron whose decision surface is illustrated in the figure? Assume the surface crosses the
More informationCSC242: Intro to AI. Lecture 21
CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationArtifical Neural Networks
Neural Networks Artifical Neural Networks Neural Networks Biological Neural Networks.................................. Artificial Neural Networks................................... 3 ANN Structure...........................................
More informationIntro to Neural Networks and Deep Learning
Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs
More informationMultilayer Perceptron
Aprendizagem Automática Multilayer Perceptron Ludwig Krippahl Aprendizagem Automática Summary Perceptron and linear discrimination Multilayer Perceptron, nonlinear discrimination Backpropagation and training
More informationCSC 411 Lecture 10: Neural Networks
CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16 Slides adapted from Jordan Boyd-Graber, Justin Johnson, Andrej Karpathy, Chris Ketelsen, Fei-Fei Li, Mike Mozer, Michael Nielson
More informationCSC321 Lecture 5 Learning in a Single Neuron
CSC321 Lecture 5 Learning in a Single Neuron Roger Grosse and Nitish Srivastava January 21, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 5 Learning in a Single Neuron January 21, 2015 1 / 14
More informationLearning and Neural Networks
Artificial Intelligence Learning and Neural Networks Readings: Chapter 19 & 20.5 of Russell & Norvig Example: A Feed-forward Network w 13 I 1 H 3 w 35 w 14 O 5 I 2 w 23 w 24 H 4 w 45 a 5 = g 5 (W 3,5 a
More informationCSC321 Lecture 5: Multilayer Perceptrons
CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3
More informationIntroduction to Artificial Neural Networks
Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline
More informationMultilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)
Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More informationMachine Learning (CSE 446): Neural Networks
Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /
More informationMachine Learning
Machine Learning 10-315 Maria Florina Balcan Machine Learning Department Carnegie Mellon University 03/29/2019 Today: Artificial neural networks Backpropagation Reading: Mitchell: Chapter 4 Bishop: Chapter
More informationError Backpropagation
Error Backpropagation Sargur Srihari 1 Topics in Error Backpropagation Terminology of backpropagation 1. Evaluation of Error function derivatives 2. Error Backpropagation algorithm 3. A simple example
More informationCOGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.
COGS Q250 Fall 2012 Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November. For the first two questions of the homework you will need to understand the learning algorithm using the delta
More informationNeural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21
Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural
More informationCSC321 Lecture 6: Backpropagation
CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1 / 21 Overview We ve seen that multilayer neural networks are powerful. But how can we actually learn them?
More informationArtificial Neural Networks 2
CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b
More informationAE = q < H(p < ) + (1 q < )H(p > ) H(p) = p lg(p) (1 p) lg(1 p)
1 Decision Trees (13 pts) Data points are: Negative: (-1, 0) (2, 1) (2, -2) Positive: (0, 0) (1, 0) Construct a decision tree using the algorithm described in the notes for the data above. 1. Show the
More informationStatistical Machine Learning from Data
January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole
More informationCSC321 Lecture 8: Optimization
CSC321 Lecture 8: Optimization Roger Grosse Roger Grosse CSC321 Lecture 8: Optimization 1 / 26 Overview We ve talked a lot about how to compute gradients. What do we actually do with them? Today s lecture:
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationComputational Intelligence Winter Term 2017/18
Computational Intelligence Winter Term 207/8 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS ) Fakultät für Informatik TU Dortmund Plan for Today Single-Layer Perceptron Accelerated Learning
More informationCS:4420 Artificial Intelligence
CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart
More informationLecture 6: Backpropagation
Lecture 6: Backpropagation Roger Grosse 1 Introduction So far, we ve seen how to train shallow models, where the predictions are computed as a linear function of the inputs. We ve also observed that deeper
More informationLearning Deep Architectures for AI. Part I - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part I - Vijay Chakilam Chapter 0: Preliminaries Neural Network Models The basic idea behind the neural network approach is to model the response as a
More informationy(x n, w) t n 2. (1)
Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,
More informationLinear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging
More informationSerious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions
BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design
More informationCSC 578 Neural Networks and Deep Learning
CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationNeural networks. Chapter 20, Section 5 1
Neural networks Chapter 20, Section 5 Chapter 20, Section 5 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 20, Section 5 2 Brains 0 neurons of
More informationCSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning
CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More information