Backpropagation: The Good, the Bad and the Ugly
|
|
- Basil Davidson
- 6 years ago
- Views:
Transcription
1 Backpropagation: The Good, the Bad and the Ugly The Norwegian University of Science and Technology (NTNU Trondheim, Norway October 3, 2017
2 Supervised Learning Constant feedback from an instructor, indicating not only right/wrong, but also the correct answer for each training case. Many cases (i.e., input-output pairs to be learned. Weights are modified by a complex procedure (back-propagation based on output error. Feed-forward networks with back-propagation learning are the standard implementation. 99% of neural network applications use this. Typical usage: problems with a lots of input-output training data, and b goal of a mapping (function from inputs to outputs. Not biologically plausible, although the cerebellum appears to exhibit some aspects. But, the result of backprop, a trained ANN to perform some function, can be very useful to neuroscientists as a sufficiency proof.
3 Backpropagation Overview Training/Test Cases: {(d1, r1 (d2, r2 (d3, r3...} d3 r3 Encoder Decoder r* E = r3 - r* de/dw Feed-Forward Phase - Inputs sent through the ANN to compute outputs. Feedback Phase - Error passed back from output to input layers and used to update weights along the way.
4 Training -vs- Testing Cases Training N times, with learning Neural Net Test 1 time, without learning Generalization - correctly handling test cases (that ANN has not been trained on. Over-Training - weights become so fine-tuned to the training cases that generalization suffers: failure on many test cases.
5 Widrow-Hoff (a.k.a. Delta Rule X w T = target output value δ = error 1 Y Node N 0 0 S Δw = ηδx Y δ = T - Y Delta (δ = error; Eta (η = learning rate Goal: change w so as to reduce δ. Intuitive: If δ > 0, then we want to decrease it, so we must increase Y. Thus, we must increase the sum of weighted inputs to N, and we do that by increasing (decreasing w if X is positive (negative. Similar for δ < 0 Assumes derivative of N s transfer function is everywhere non-negative.
6 Gradient Descent Goal = minimize total error across all output nodes Method = modify weights throughout the network (i.e., at all levels to follow the route of steepest descent in error space. min ΔE Error(E Weight Vector (W w ij = η E i w ij
7 Computing E i w ij 1 x 1d w i1 f T t id sum id i o id n x nd w in E id Sum of Squared Errors (SSE E i = 1 2 (t id o id 2 d D E = 1 w ij 2 2(t id o id (t id o id = w d D ij (t id o id ( o id w d D ij
8 Computing ( o id w ij 1 t id x 1d w i1 f T sum id i o id n x nd w in E id Since output = f(sum weighted inputs where E = w ij (t id o id ( f T (sum id w d D ij n sum id = w ik x kd k=1 Using Chain Rule: f (g(x = f x g(x g(x x (f T (sum id = f T (sum id sum id = f T (sum id x w ij sum id w ij sum jd id
9 Computing sum id w ij - Easy!! sum id w ij = ( n k=1 w ik x kd = ( wi1 x 1d + w i2 x 2d w ij x jd w in x nd w ij w ij = (w i1x 1d w ij + (w i2x 2d w ij (w ijx jd w ij (w inx nd w ij = x jd = x jd
10 Computing f T (sum id sum id - Harder for some f T f T = Identity function: f T (sum id = sum id Thus: f T (sum id sum id = 1 (f T (sum id = f T (sum id w ij sum id sum id w ij = 1 x jd = x jd f T = Sigmoid: f T (sum id = 1 1+e sum id Thus: (f T (sum id = f T (sum id w ij sum id f T (sum id sum id = o id (1 o id sum id w ij = o id (1 o id x jd = o id (1 o id x jd
11 The only non-trivial calculation f T (sum id sum id But notice that: = ( (1 + e sum id 1 = ( 1 (1 + e sumid (1 + e sum id 2 sum id sum id = ( 1( 1e sum id (1 + e sum id 2 = e sum id (1 + e sum id 2 e sum id (1 + e sum id 2 = f T (sum id (1 f T (sum id = o id (1 o id
12 Putting it all together E i = w ij (t id o id ( f ( T (sum id = w d D ij d D (t id o id f T (sum id sum id sum id w ij So for f T = Identity: E i = w ij (t id o id x jd d D and for f T = Sigmoid: E i = w ij (t id o id o id (1 o id x jd d D
13 Weight Updates (f T = Sigmoid Batch: update weights after each training epoch w ij = η E i = η w ij (t id o id o id (1 o id x jd d D The weight changes are actually computed after each training case, but w ij is not updated until the epoch s end. Incremental: update weights after each training case w ij = η E i w ij = η(t id o id o id (1 o id x jd A lower learning rate (η recommended here than for batch method. Can be dependent upon case-presentation order. So randomly sort the cases after each epoch.
14 Backpropagation in Multi-Layered Neural Networks d(e d d(sum 1d 1 d(sum 1d d(o jd d(o jd d(sum jd w 1j sum jd j o jd E d w nj d(sum nd d(o jd n d(e d d(sum nd For each node (j and each training case (d, backpropagation computes an error term: δ jd = E d sum jd by calculating the influence of sum jd along each connection from node j to the next downstream layer.
15 Computing δ jd d(e d d(sum 1d 1 d(sum 1d d(o jd d(o jd d(sum jd w 1j sum jd j o jd E d w nj d(sum nd d(o jd n d(e d d(sum nd Along the upper path, the contribution to So summing along all paths: E d sum jd is: o jd sum jd sum 1d o jd E d sum 1d E d sum jd = o jd sum jd n sum kd o k=1 jd E d sum kd
16 Computing δ jd Just as before, most terms are 0 in the derivative of the sum, so: sum kd o jd = w kj Assuming f T = a sigmoid: Thus: o jd = f T (sum jd = o sum jd sum jd (1 o jd jd δ jd = E d = o n jd sum sum jd sum jd kd E d o k=1 jd sum kd n n = o jd (1 o jd w kj ( δ kd = o jd (1 o jd w kj δ kd k=1 k=1
17 Computing δ jd Note that δ jd is defined recursively in terms of the δ values in the next downstream layer: δ jd = o jd (1 o jd n k=1 w kj δ kd So all δ values in the network can be computed by moving backwards, one layer at a time.
18 Computing E d w ij from δ jd - Easy!! 1 w i1 d(e d d(sum id j o jd w ij sum id i The only effect of w ij upon the error is via its effect upon sum id, which is: sum id w ij = o jd So: E d = sum id E d = sum id ( δ w ij w ij sum id w id = o jd δ id ij
19 Computing w ij Given an error term, δ id (for node i on training case d, the update of w ij for all nodes j that feed into i is: w ij = η E d w ij = η( o jd δ id = ηδ id o jd So given δ i, you can easily calculate w ij for all incoming arcs.
20 Learning XOR 1.2 Sum-Squared-Error 1.2 Sum-Squared-Error 1.2 Sum-Squared-Error Error 0.6 Error Error 0.6 Error Error 0.6 Error Epoch Epoch Epoch Epoch = All 4 entries of the XOR truth table. 2 (inputs X 2 (hidden X 1 (output network Random init of all weights in [-1 1]. Not linearly separable, so it takes awhile! Each run is different due to random weight init.
21 Learning to Classify Wines Class Properties Wine Properties Wine Class Hidden Layer
22 Wine Runs Sum-Squared-Error Sum-Squared-Error Error Error Error Error Epoch Epoch 13x5x1 (lrate = x5x1 (lrate = 0.1 Sum-Squared-Error Sum-Squared-Error Error Error Error Error Epoch Epoch 13 x 10 x 1 (lrate = x 25 x 1 (lrate = 0.3
23 WARNING: Notation Change When working with matrices, it is often easier to let w ij denote the weight on the connection FROM node i TO node j. Then, the normal syntax for matrix row-column indices implies that: The values in row i are written as w i? and denote the weights on the outputs of node i. The values in column j are written as w?j and denote the weights on the inputs to node j. w 11 w 12 w 13 Weights from node 2 w 21 w 22 w 23 w 31 w 32 w 33 Weights to node 3
24 Backpropagation using Tensors X V Y g: W Z f: v 11 v 12 v 21 v W = V = 22 R = XV Y = g(r S = YW Z = f(s w 11 w 21 X = [x 1, x 2, x 3 ] Y = [y 1, y 2 ] Z = [z 1 ] f = g = sigmoid v 31 v 32 v ij = weight from i to j
25 Goal: Compute Z W Using the Chain Rule of Tensor Calculus: Z f (S = W W = f (S S S f (S = W S (Y W W f(s is a sigmoid, with derivative f(s(1-f(s, so simplify to: Using the product rule: z 1 (1 z 1 (Y W W z 1 (1 z 1 (Y W W = z 1(1 z 1 [ Y W W + Y W W ] Y is independent of W, so red term = 0. But W W is not simply 1, but:
26 Computing Z W Y W W = (y 1,y ( = y1 y 2 Putting it all together: Z W = z 1(1 z 1 Y W W = z 1 (1 z 1 y 1 z 1 (1 z 1 y 2 = Z w 1,1 Z w 2,1 For each case, c k, in a minibatch, the forward pass will produce values for X, Y, and Z. Those values will be used to compute actual numeric gradients by filling in these symbolic gradients: z Z 1 (1 z 1 y 1 = W ck z 1 (1 z 1 y 2 ck
27 Tensor Calculations across Several Layers Goal: Compute Z V Expanding using the Chain Rule: Z f (S = V V Using the Product Rule: (Y W V = f (S S S V = z 1(1 z 1 (Y W V = Y V W + Y W V = (g(r V W is indep. of V, so red term = 0; and Y = g(r. Using the Chain Rule, and R = X V, yields: (g(r V g(r is also a sigmoid, so: W = [ g(r R R V ] W W g(r R = g(r(1 g(r = Y (1 Y = [y 1(1 y 1,y 2 (1 y 2 ]
28 Computing Z V Since R = X V, and using the Product Rule: R V = (X V = X V V V + X V V X is indep. of V, so red term = 0; V is 4-dimensional: V V V =
29 Computing Z V Take dot product of X = [x 1,x 2,x 3 ] with each internal maxtrix of V V : X V V = [x ,x 2,x 3 ] = x 1 x 2 x x 1 0 x 2 0 x 3
30 Computing Z V Putting the pieces back together: g(r R R V = [y 1(1 y 1,y 2 (1 y 2 ] = x 1 y 1 (1 y 1 x 2 y 1 (1 y 1 x 3 y 1 (1 y x 1 x 2 x x 1 y 2 (1 y 2 x 2 y 2 (1 y 2 x 3 y 2 (1 y 2 0 x 1 0 x 2 0 x 3
31 Computing Z V (g(r V W = [ g(r R R V ] W = x 1 y 1 (1 y 1 x 2 y 1 (1 y 1 x 3 y 1 (1 y 1 = x 1 y 2 (1 y 2 x 2 y 2 (1 y 2 x 3 y 2 (1 y 2 x 1 y 1 (1 y 1 w 11 x 1 y 2 (1 y 2 w 21 x 2 y 1 (1 y 1 w 11 x 2 y 2 (1 y 2 w 21 x 3 y 1 (1 y 1 w 11 x 3 y 2 (1 y 2 w 21 ( w11 w 21 = (Y W V
32 Computing Z V...and finally... = Z f (S = V V = z 1(1 z 1 (Y W V x 1 y 1 (1 y 1 w 11 x 1 y 2 (1 y 2 w 21 = z 1 (1 z 1 x 2 y 1 (1 y 1 w 11 x 2 y 2 (1 y 2 w 21 x 3 y 1 (1 y 1 w 11 x 3 y 2 (1 y 2 w 21 z 1 (1 z 1 x 1 y 1 (1 y 1 w 11 z 1 (1 z 1 x 1 y 2 (1 y 2 w 21 z 1 (1 z 1 x 2 y 1 (1 y 1 w 11 z 1 (1 z 1 x 2 y 2 (1 y 2 w 21 z 1 (1 z 1 x 3 y 1 (1 y 1 w 11 z 1 (1 z 1 x 3 y 2 (1 y 2 w 21 = Z v 11 Z v 21 Z v 31 Z v 12 Z v 22 Z v 32
33 Gradients and the Minibatch For every case c k in a minibatch, the values of X, Y and Z are computed during the forward pass. The symbolic gradients in Z V are filled in using these values in X,Y and Z (along with W, producing one numeric, 3 x 2 matrix per case: Z Z v 11 v 12 = ( Z V ck = Z v 21 Z v 31 Z v 22 Z v 32 z 1 (1 z 1 x 1 y 1 (1 y 1 w 11 z 1 (1 z 1 x 1 y 2 (1 y 2 w 21 z 1 (1 z 1 x 2 y 1 (1 y 1 w 11 z 1 (1 z 1 x 2 y 2 (1 y 2 w 21 z 1 (1 z 1 x 3 y 1 (1 y 1 w 11 z 1 (1 z 1 x 3 y 2 (1 y 2 w 21 ck ck
34 Gradients and the Minibatch In most Deep Learning situations, the gradients will be based on a loss function, L, not simply the output of the final layer. But that s just one more level of derivative calculations. At the completion of a minibatch, the numeric gradient matrices are added together to yield the complete gradient, which is then used to update the weights. For any weight matrix U in the network, and minibatch M, update weight u i,j as follows: ( L u M = i,j ( L u ck c k M i,j η = learning rate u i,j = η( L u i,j M Tensorflow and Theano do all of this for you. Everytime you write Deep Learning code, be grateful!!
35 The General Algorithm: Jacobian Product Chain Weights (W Layer M Goal: Compute dl / dw Layer N Targets Loss (L L W = L N N M + 1 N 1 M M W
36 Distant Gradients = Sums of Path Products X X 2 dy 1 dy 3 dx 2 dy 2 dx 2 X Y dx 2 dz 3 Y Z Z 3 dz 3 dy 2 dy 1 dz 3 dy 3 Z More generally: z 3 x 2 = z 3 y 1 y 1 x 2 + z 3 y 2 y 2 x 2 + z 3 y 3 y 3 x 2 z i z = x j i y k y k Y k x j This results from the multiplication of two Jacobian matrices.
37 The Jacobian Matrix J Z Y = z 1 y 1 z 1 y 2 z 2 y 1 z 2.. z m y 1 y z m y 2 z 1 y n z 2 y n.. z m y n
38 Multiplying Jacobian Matrices. Jy z Jx y z = i y 1... =... z i y 2. z i y 1 + z i y 1 x j. z i y 3.. y 2 y 2 x j. + z i y 3 y 3 y 1 x j y 2 x j y 3 x j. x j... = Jx z We can do this repeatedly to form J n m, the Jacobian relating the activation levels of upstream layer m with those of downstream layer n: R = Identity Matrix For q = n down to m+1 do: J n m R R R J q q 1
39 Last of the Jacobians Once we ve computed J n m, we need one final Jacobian: J m w, where w are the weights feeding into layer m. Then we have the standard Jacobian, J n w needed for updating all weights in matrix w. J n w J n m J m w Weights V from layer X to Y (from the earlier example ( T ( y1 y 2 y1 y 2 JV Y = Y v 11 v 11 v 12 v V = ( 12 T y1 y 2 v 21 v 21 ( T y1 y 2 v 31 v 31 T Since a the act func for Y is sigmoid, and b y k = 0 when j k: v ij ( JV Y = Y y1 (1 y 1 x 1 0 T ( T 0 y2 (1 y 2 x V = ( 1 y1 (1 y 1 x 2 0 T ( T 0 y2 (1 y 2 x 2 ( y1 (1 y 1 x 3 0 T ( T 0 y2 (1 y 2 x 3 Inner vectors are transposed to fit them on the page.
40 Last of the Jacobian Multiplications From the previous example (the network with layer sizes 3-2-1, once we have JY Z and JY V, we can multiply (taking dot products of JZ Y with the inner vectors of JV Y to produce JZ V : the matrix of gradients that backprop uses to modify the weights in V. J Z V = JZ Y JY V = z 1 y 1 z 1 y 2 Y V Since a the act func for Z is sigmoid, and b sum(z k = w y i ik (where sum(z k = sum of weighted inputs to node z k : JV Z = = z 1 (1 z 1 w 11 z 1 (1 z 1 w 21 Y V z 1 (1 z 1 x 1 y 1 (1 y 1 w 11 z 1 (1 z 1 x 1 y 2 (1 y 2 w 21 z 1 (1 z 1 x 2 y 1 (1 y 1 w 11 z 1 (1 z 1 x 2 y 2 (1 y 2 w 21 z 1 (1 z 1 x 3 y 1 (1 y 1 w 11 z 1 (1 z 1 x 3 y 2 (1 y 2 w 21
41 First of the Jacobians This iterative process is very general, and permits the calculation of JM N (M < N for any layers M and N, or JW N for an weights (W upstream of N. However, the standard situation in backpropagation is to compute J L W i i where L is the objective (loss function and W i are the weight matrices. This follows the same procedure as sketched above, but the series of dot products begins with JN L : the Jacobian of derivatives of the loss function w.r.t. the activations of the output layer. For example, assume L = Mean Squared Error (MSE, Z is an output layer of 3 sigmoid nodes, and t i are the target values for those 3 nodes for a particular case (c: L(c = (z i t i 2 k=1 Taking partial derivatives of L(c w.r.t. the z i yields: J L Z = L Z = 2 3 (z 1 t (z 2 t (z 3 t 3
42 First of the Jacobian Multiplications Continuing the example: Assume that layer Y (of size 2 feeds into layer Z, using weights W, then: JY Z = z 1 (1 z 1 w 11 z 1 (1 z 1 w 21 z 2 (1 z 2 w 12 z 2 (1 z 2 w 22 z 3 (1 z 3 w 13 z 3 (1 z 3 w 23 Our first Jacobian multiplication yields JY L, a 1 x 2 row vector (shown transposed to fit on the page: J L Z JZ Y = JL Y = 2 3 (z 1 t 1 z 1 (1 z 1 w (z 2 t 2 z 2 (1 z 2 w (z 3 t 3 z 3 (1 z 3 w (z 1 t 1 z 1 (1 z 1 w (z 2 t 2 z 2 (1 z 2 w (z 3 t 3 z 3 (1 z 3 w 23 T
43 Backpropagation with Tensors: The Big Picture General Algorithm Assume Layer M is upstream of Layer N (the output layer. So M < N. Assume V is the tensor of weights feeding into Layer M. Assume L is the loss function. Goal: Compute J L V = L V R = JN L (the partial derivatives of the loss function w.r.t. the output layer For q = N down to M + 1 do: R R J q q 1 J L V R JM V Use JV L to update the weights in V.
44 Practical Tips 1 Only add as many hidden layers and hidden nodes as necessary. Too many more weights to learn + increased chance of over-specialization. 2 Scale all input values to the same range, typically [0 1] or [-1 1]. 3 Use target values of 0.1 (for zero and 0.9 (for 1 to avoid saturation effects of sigmoids. 4 Beware of tricky encodings of input (and decodings of output values. Don t combine too much info into a single node s activation value (even though it s fun to try, since this can make proper weights difficult (or impossible to learn. 5 For discrete (e.g. nominal values, one (input or output node per value is often most effective. E.g. car model and city of residence -vs- income and education for assessing car-insurance risk. 6 All initial weights should be relatively small: [ ] 7 Bias nodes can be helpful for complicated data sets. 8 Check that all your layer sizes, activation functions, activation ranges, weight ranges, learning rates, etc. make sense in terms of each other and your goals for the ANN. One improper choice can ruin the results.
45 Supervised Learning in the Cerebellum Parallel Fibers Granular Cells Golgi Cells Mossy Fibers Climbing Fibers Purkinje Cells Inferior Olive Sensory + Cortical Inputs Somatosensory (touch, pain, body position + Cortical Inputs Efference Copy Inhibition of deep cerebellar To neurons Spinal Cord To Cerebral Cortex Granular cells detect contexts. Parallel fibers and Purkinje cells map contexts to actions Climbing fibers from Inferior Olive provide (supervisory feedback signals for LTD on Parallel-Purkinje synapses
Supervised Learning in Neural Networks
The Norwegian University of Science and Technology (NTNU Trondheim, Norway keithd@idi.ntnu.no March 7, 2011 Supervised Learning Constant feedback from an instructor, indicating not only right/wrong, but
More informationGradient Descent Training Rule: The Details
Gradient Descent Training Rule: The Details 1 For Perceptrons The whole idea behind gradient descent is to gradually, but consistently, decrease the output error by adjusting the weights. The trick is
More informationMultilayer Perceptrons and Backpropagation
Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:
More informationIntroduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis
Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.
More informationy(x n, w) t n 2. (1)
Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,
More informationNeural Nets Supervised learning
6.034 Artificial Intelligence Big idea: Learning as acquiring a function on feature vectors Background Nearest Neighbors Identification Trees Neural Nets Neural Nets Supervised learning y s(z) w w 0 w
More informationLecture 4 Backpropagation
Lecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 5, 2017 Things we will look at today More Backpropagation Still more backpropagation Quiz
More informationAI Programming CS F-20 Neural Networks
AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationMachine Learning
Machine Learning 10-315 Maria Florina Balcan Machine Learning Department Carnegie Mellon University 03/29/2019 Today: Artificial neural networks Backpropagation Reading: Mitchell: Chapter 4 Bishop: Chapter
More informationIntroduction to Neural Networks
Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning
More informationCSC 411 Lecture 10: Neural Networks
CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11
More informationArtifical Neural Networks
Neural Networks Artifical Neural Networks Neural Networks Biological Neural Networks.................................. Artificial Neural Networks................................... 3 ANN Structure...........................................
More informationChapter 2 Single Layer Feedforward Networks
Chapter 2 Single Layer Feedforward Networks By Rosenblatt (1962) Perceptrons For modeling visual perception (retina) A feedforward network of three layers of units: Sensory, Association, and Response Learning
More informationNeural networks. Chapter 20. Chapter 20 1
Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms
More informationCS 453X: Class 20. Jacob Whitehill
CS 3X: Class 20 Jacob Whitehill More on training neural networks Training neural networks While training neural networks by hand is (arguably) fun, it is completely impractical except for toy examples.
More informationMachine Learning
Machine Learning 10-601 Maria Florina Balcan Machine Learning Department Carnegie Mellon University 02/10/2016 Today: Artificial neural networks Backpropagation Reading: Mitchell: Chapter 4 Bishop: Chapter
More informationInput layer. Weight matrix [ ] Output layer
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2003 Recitation 10, November 4 th & 5 th 2003 Learning by perceptrons
More informationLecture 4: Perceptrons and Multilayer Perceptrons
Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons
More informationStatistical NLP for the Web
Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationARTIFICIAL INTELLIGENCE. Artificial Neural Networks
INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,
More information8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples
8-1: Backpropagation Prof. J.C. Kao, UCLA Backpropagation Chain rule for the derivatives Backpropagation graphs Examples 8-2: Backpropagation Prof. J.C. Kao, UCLA Motivation for backpropagation To do gradient
More informationSingle Layer Perceptron Networks
Single Layer Perceptron Networks We have looked at what artificial neural networks (ANNs) can do, and by looking at their history have seen some of the different types of neural network. We started looking
More informationMultilayer Neural Networks
Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient
More informationError Backpropagation
Error Backpropagation Sargur Srihari 1 Topics in Error Backpropagation Terminology of backpropagation 1. Evaluation of Error function derivatives 2. Error Backpropagation algorithm 3. A simple example
More informationIntroduction to Neural Networks
Introduction to Neural Networks Philipp Koehn 3 October 207 Linear Models We used before weighted linear combination of feature values h j and weights λ j score(λ, d i ) = j λ j h j (d i ) Such models
More informationCSC321 Lecture 6: Backpropagation
CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1 / 21 Overview We ve seen that multilayer neural networks are powerful. But how can we actually learn them?
More informationMachine Learning. Neural Networks
Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE
More informationSerious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions
BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design
More informationStatistical Machine Learning from Data
January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationBack-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples
Back-Propagation Algorithm Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples 1 Inner-product net =< w, x >= w x cos(θ) net = n i=1 w i x i A measure
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25
More informationArtificial Neural Networks
Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on
More informationAdministration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6
Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationLab 5: 16 th April Exercises on Neural Networks
Lab 5: 16 th April 01 Exercises on Neural Networks 1. What are the values of weights w 0, w 1, and w for the perceptron whose decision surface is illustrated in the figure? Assume the surface crosses the
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationCourse 395: Machine Learning - Lectures
Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationPattern Classification
Pattern Classification All materials in these slides were taen from Pattern Classification (2nd ed) by R. O. Duda,, P. E. Hart and D. G. Stor, John Wiley & Sons, 2000 with the permission of the authors
More informationFeedforward Neural Nets and Backpropagation
Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More information) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.
1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2011 Recitation 8, November 3 Corrected Version & (most) solutions
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationArtificial Neural Network : Training
Artificial Neural Networ : Training Debasis Samanta IIT Kharagpur debasis.samanta.iitgp@gmail.com 06.04.2018 Debasis Samanta (IIT Kharagpur) Soft Computing Applications 06.04.2018 1 / 49 Learning of neural
More informationRegularization and Optimization of Backpropagation
Regularization and Optimization of Backpropagation The Norwegian University of Science and Technology (NTNU) Trondheim, Norway keithd@idi.ntnu.no October 17, 2017 Regularization Definition of Regularization
More informationMultilayer Perceptron
Aprendizagem Automática Multilayer Perceptron Ludwig Krippahl Aprendizagem Automática Summary Perceptron and linear discrimination Multilayer Perceptron, nonlinear discrimination Backpropagation and training
More informationChapter 3 Supervised learning:
Chapter 3 Supervised learning: Multilayer Networks I Backpropagation Learning Architecture: Feedforward network of at least one layer of non-linear hidden nodes, e.g., # of layers L 2 (not counting the
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 13 Mar 1, 2018 1 Reminders Homework 5: Neural
More informationNeural networks. Chapter 19, Sections 1 5 1
Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationRevision: Neural Network
Revision: Neural Network Exercise 1 Tell whether each of the following statements is true or false by checking the appropriate box. Statement True False a) A perceptron is guaranteed to perfectly learn
More informationArtificial Neural Networks. Edward Gatt
Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very
More informationPlan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation
Neural Networks Plan Perceptron Linear discriminant Associative memories Hopfield networks Chaotic networks Multilayer perceptron Backpropagation Perceptron Historically, the first neural net Inspired
More informationTutorial on Tangent Propagation
Tutorial on Tangent Propagation Yichuan Tang Centre for Theoretical Neuroscience February 5, 2009 1 Introduction Tangent Propagation is the name of a learning technique of an artificial neural network
More informationArtificial Neural Networks
Artificial Neural Networks Threshold units Gradient descent Multilayer networks Backpropagation Hidden layer representations Example: Face Recognition Advanced topics 1 Connectionist Models Consider humans:
More informationC4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang
C4 Phenomenological Modeling - Regression & Neural Networks 4040-849-03: Computational Modeling and Simulation Instructor: Linwei Wang Recall.. The simple, multiple linear regression function ŷ(x) = a
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationArtificial Neural Networks
Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationNeural Networks 2. 2 Receptive fields and dealing with image inputs
CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There
More information1 What a Neural Network Computes
Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists
More informationARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD
ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided
More informationNotes on Back Propagation in 4 Lines
Notes on Back Propagation in 4 Lines Lili Mou moull12@sei.pku.edu.cn March, 2015 Congratulations! You are reading the clearest explanation of forward and backward propagation I have ever seen. In this
More informationNeural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron
More informationBackpropagation Neural Net
Backpropagation Neural Net As is the case with most neural networks, the aim of Backpropagation is to train the net to achieve a balance between the ability to respond correctly to the input patterns that
More informationIntroduction to Artificial Neural Networks
Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline
More informationComputing Neural Network Gradients
Computing Neural Network Gradients Kevin Clark 1 Introduction The purpose of these notes is to demonstrate how to quickly compute neural network gradients in a completely vectorized way. It is complementary
More informationNeural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016
Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph
More informationLecture 7 Artificial neural networks: Supervised learning
Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in
More informationAN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009
AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is
More informationRadial-Basis Function Networks
Radial-Basis Function etworks A function is radial () if its output depends on (is a nonincreasing function of) the distance of the input from a given stored vector. s represent local receptors, as illustrated
More informationArtificial Intelligence
Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationNeural Networks (Part 1) Goals for the lecture
Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed
More informationCSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!
CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya
More informationDeep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation
Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation Steve Renals Machine Learning Practical MLP Lecture 5 16 October 2018 MLP Lecture 5 / 16 October 2018 Deep Neural Networks
More informationClassification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More informationUnit III. A Survey of Neural Network Model
Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of
More informationIntro to Neural Networks and Deep Learning
Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs
More informationArtificial Neural Networks. Historical description
Artificial Neural Networks Historical description Victor G. Lopez 1 / 23 Artificial Neural Networks (ANN) An artificial neural network is a computational model that attempts to emulate the functions of
More informationArtificial Neural Networks Examination, June 2005
Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either
More informationA thorough derivation of back-propagation for people who really want to understand it by: Mike Gashler, September 2010
A thorough derivation of back-propagation for people who really want to understand it by: Mike Gashler, September 2010 Define the problem: Suppose we have a 5-layer feed-forward neural network. (I intentionally
More informationNeural Network Language Modeling
Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith Course Project Sign up your course project In-class presentation
More informationMore on Neural Networks
More on Neural Networks Yujia Yan Fall 2018 Outline Linear Regression y = Wx + b (1) Linear Regression y = Wx + b (1) Polynomial Regression y = Wφ(x) + b (2) where φ(x) gives the polynomial basis, e.g.,
More informationFeed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.
Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will
More informationDeep learning in the brain. Deep learning summer school Montreal 2017
Deep learning in the brain Deep learning summer school Montreal 207 . Why deep learning is not just for AI The recent success of deep learning in artificial intelligence (AI) means that most people associate
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More informationBACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation
BACKPROPAGATION Neural network training optimization problem min J(w) w The application of gradient descent to this problem is called backpropagation. Backpropagation is gradient descent applied to J(w)
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationWeek 5: The Backpropagation Algorithm
Week 5: The Backpropagation Algorithm Hans Georg Schaathun 25th April 2017 Time Topic Reading 8.15- Recap of tutorials last week Lecture: From perceptron to back-propagation Marsland Section 4 9.00- Tutorial
More information