Backpropagation: The Good, the Bad and the Ugly

Size: px
Start display at page:

Download "Backpropagation: The Good, the Bad and the Ugly"

Transcription

1 Backpropagation: The Good, the Bad and the Ugly The Norwegian University of Science and Technology (NTNU Trondheim, Norway October 3, 2017

2 Supervised Learning Constant feedback from an instructor, indicating not only right/wrong, but also the correct answer for each training case. Many cases (i.e., input-output pairs to be learned. Weights are modified by a complex procedure (back-propagation based on output error. Feed-forward networks with back-propagation learning are the standard implementation. 99% of neural network applications use this. Typical usage: problems with a lots of input-output training data, and b goal of a mapping (function from inputs to outputs. Not biologically plausible, although the cerebellum appears to exhibit some aspects. But, the result of backprop, a trained ANN to perform some function, can be very useful to neuroscientists as a sufficiency proof.

3 Backpropagation Overview Training/Test Cases: {(d1, r1 (d2, r2 (d3, r3...} d3 r3 Encoder Decoder r* E = r3 - r* de/dw Feed-Forward Phase - Inputs sent through the ANN to compute outputs. Feedback Phase - Error passed back from output to input layers and used to update weights along the way.

4 Training -vs- Testing Cases Training N times, with learning Neural Net Test 1 time, without learning Generalization - correctly handling test cases (that ANN has not been trained on. Over-Training - weights become so fine-tuned to the training cases that generalization suffers: failure on many test cases.

5 Widrow-Hoff (a.k.a. Delta Rule X w T = target output value δ = error 1 Y Node N 0 0 S Δw = ηδx Y δ = T - Y Delta (δ = error; Eta (η = learning rate Goal: change w so as to reduce δ. Intuitive: If δ > 0, then we want to decrease it, so we must increase Y. Thus, we must increase the sum of weighted inputs to N, and we do that by increasing (decreasing w if X is positive (negative. Similar for δ < 0 Assumes derivative of N s transfer function is everywhere non-negative.

6 Gradient Descent Goal = minimize total error across all output nodes Method = modify weights throughout the network (i.e., at all levels to follow the route of steepest descent in error space. min ΔE Error(E Weight Vector (W w ij = η E i w ij

7 Computing E i w ij 1 x 1d w i1 f T t id sum id i o id n x nd w in E id Sum of Squared Errors (SSE E i = 1 2 (t id o id 2 d D E = 1 w ij 2 2(t id o id (t id o id = w d D ij (t id o id ( o id w d D ij

8 Computing ( o id w ij 1 t id x 1d w i1 f T sum id i o id n x nd w in E id Since output = f(sum weighted inputs where E = w ij (t id o id ( f T (sum id w d D ij n sum id = w ik x kd k=1 Using Chain Rule: f (g(x = f x g(x g(x x (f T (sum id = f T (sum id sum id = f T (sum id x w ij sum id w ij sum jd id

9 Computing sum id w ij - Easy!! sum id w ij = ( n k=1 w ik x kd = ( wi1 x 1d + w i2 x 2d w ij x jd w in x nd w ij w ij = (w i1x 1d w ij + (w i2x 2d w ij (w ijx jd w ij (w inx nd w ij = x jd = x jd

10 Computing f T (sum id sum id - Harder for some f T f T = Identity function: f T (sum id = sum id Thus: f T (sum id sum id = 1 (f T (sum id = f T (sum id w ij sum id sum id w ij = 1 x jd = x jd f T = Sigmoid: f T (sum id = 1 1+e sum id Thus: (f T (sum id = f T (sum id w ij sum id f T (sum id sum id = o id (1 o id sum id w ij = o id (1 o id x jd = o id (1 o id x jd

11 The only non-trivial calculation f T (sum id sum id But notice that: = ( (1 + e sum id 1 = ( 1 (1 + e sumid (1 + e sum id 2 sum id sum id = ( 1( 1e sum id (1 + e sum id 2 = e sum id (1 + e sum id 2 e sum id (1 + e sum id 2 = f T (sum id (1 f T (sum id = o id (1 o id

12 Putting it all together E i = w ij (t id o id ( f ( T (sum id = w d D ij d D (t id o id f T (sum id sum id sum id w ij So for f T = Identity: E i = w ij (t id o id x jd d D and for f T = Sigmoid: E i = w ij (t id o id o id (1 o id x jd d D

13 Weight Updates (f T = Sigmoid Batch: update weights after each training epoch w ij = η E i = η w ij (t id o id o id (1 o id x jd d D The weight changes are actually computed after each training case, but w ij is not updated until the epoch s end. Incremental: update weights after each training case w ij = η E i w ij = η(t id o id o id (1 o id x jd A lower learning rate (η recommended here than for batch method. Can be dependent upon case-presentation order. So randomly sort the cases after each epoch.

14 Backpropagation in Multi-Layered Neural Networks d(e d d(sum 1d 1 d(sum 1d d(o jd d(o jd d(sum jd w 1j sum jd j o jd E d w nj d(sum nd d(o jd n d(e d d(sum nd For each node (j and each training case (d, backpropagation computes an error term: δ jd = E d sum jd by calculating the influence of sum jd along each connection from node j to the next downstream layer.

15 Computing δ jd d(e d d(sum 1d 1 d(sum 1d d(o jd d(o jd d(sum jd w 1j sum jd j o jd E d w nj d(sum nd d(o jd n d(e d d(sum nd Along the upper path, the contribution to So summing along all paths: E d sum jd is: o jd sum jd sum 1d o jd E d sum 1d E d sum jd = o jd sum jd n sum kd o k=1 jd E d sum kd

16 Computing δ jd Just as before, most terms are 0 in the derivative of the sum, so: sum kd o jd = w kj Assuming f T = a sigmoid: Thus: o jd = f T (sum jd = o sum jd sum jd (1 o jd jd δ jd = E d = o n jd sum sum jd sum jd kd E d o k=1 jd sum kd n n = o jd (1 o jd w kj ( δ kd = o jd (1 o jd w kj δ kd k=1 k=1

17 Computing δ jd Note that δ jd is defined recursively in terms of the δ values in the next downstream layer: δ jd = o jd (1 o jd n k=1 w kj δ kd So all δ values in the network can be computed by moving backwards, one layer at a time.

18 Computing E d w ij from δ jd - Easy!! 1 w i1 d(e d d(sum id j o jd w ij sum id i The only effect of w ij upon the error is via its effect upon sum id, which is: sum id w ij = o jd So: E d = sum id E d = sum id ( δ w ij w ij sum id w id = o jd δ id ij

19 Computing w ij Given an error term, δ id (for node i on training case d, the update of w ij for all nodes j that feed into i is: w ij = η E d w ij = η( o jd δ id = ηδ id o jd So given δ i, you can easily calculate w ij for all incoming arcs.

20 Learning XOR 1.2 Sum-Squared-Error 1.2 Sum-Squared-Error 1.2 Sum-Squared-Error Error 0.6 Error Error 0.6 Error Error 0.6 Error Epoch Epoch Epoch Epoch = All 4 entries of the XOR truth table. 2 (inputs X 2 (hidden X 1 (output network Random init of all weights in [-1 1]. Not linearly separable, so it takes awhile! Each run is different due to random weight init.

21 Learning to Classify Wines Class Properties Wine Properties Wine Class Hidden Layer

22 Wine Runs Sum-Squared-Error Sum-Squared-Error Error Error Error Error Epoch Epoch 13x5x1 (lrate = x5x1 (lrate = 0.1 Sum-Squared-Error Sum-Squared-Error Error Error Error Error Epoch Epoch 13 x 10 x 1 (lrate = x 25 x 1 (lrate = 0.3

23 WARNING: Notation Change When working with matrices, it is often easier to let w ij denote the weight on the connection FROM node i TO node j. Then, the normal syntax for matrix row-column indices implies that: The values in row i are written as w i? and denote the weights on the outputs of node i. The values in column j are written as w?j and denote the weights on the inputs to node j. w 11 w 12 w 13 Weights from node 2 w 21 w 22 w 23 w 31 w 32 w 33 Weights to node 3

24 Backpropagation using Tensors X V Y g: W Z f: v 11 v 12 v 21 v W = V = 22 R = XV Y = g(r S = YW Z = f(s w 11 w 21 X = [x 1, x 2, x 3 ] Y = [y 1, y 2 ] Z = [z 1 ] f = g = sigmoid v 31 v 32 v ij = weight from i to j

25 Goal: Compute Z W Using the Chain Rule of Tensor Calculus: Z f (S = W W = f (S S S f (S = W S (Y W W f(s is a sigmoid, with derivative f(s(1-f(s, so simplify to: Using the product rule: z 1 (1 z 1 (Y W W z 1 (1 z 1 (Y W W = z 1(1 z 1 [ Y W W + Y W W ] Y is independent of W, so red term = 0. But W W is not simply 1, but:

26 Computing Z W Y W W = (y 1,y ( = y1 y 2 Putting it all together: Z W = z 1(1 z 1 Y W W = z 1 (1 z 1 y 1 z 1 (1 z 1 y 2 = Z w 1,1 Z w 2,1 For each case, c k, in a minibatch, the forward pass will produce values for X, Y, and Z. Those values will be used to compute actual numeric gradients by filling in these symbolic gradients: z Z 1 (1 z 1 y 1 = W ck z 1 (1 z 1 y 2 ck

27 Tensor Calculations across Several Layers Goal: Compute Z V Expanding using the Chain Rule: Z f (S = V V Using the Product Rule: (Y W V = f (S S S V = z 1(1 z 1 (Y W V = Y V W + Y W V = (g(r V W is indep. of V, so red term = 0; and Y = g(r. Using the Chain Rule, and R = X V, yields: (g(r V g(r is also a sigmoid, so: W = [ g(r R R V ] W W g(r R = g(r(1 g(r = Y (1 Y = [y 1(1 y 1,y 2 (1 y 2 ]

28 Computing Z V Since R = X V, and using the Product Rule: R V = (X V = X V V V + X V V X is indep. of V, so red term = 0; V is 4-dimensional: V V V =

29 Computing Z V Take dot product of X = [x 1,x 2,x 3 ] with each internal maxtrix of V V : X V V = [x ,x 2,x 3 ] = x 1 x 2 x x 1 0 x 2 0 x 3

30 Computing Z V Putting the pieces back together: g(r R R V = [y 1(1 y 1,y 2 (1 y 2 ] = x 1 y 1 (1 y 1 x 2 y 1 (1 y 1 x 3 y 1 (1 y x 1 x 2 x x 1 y 2 (1 y 2 x 2 y 2 (1 y 2 x 3 y 2 (1 y 2 0 x 1 0 x 2 0 x 3

31 Computing Z V (g(r V W = [ g(r R R V ] W = x 1 y 1 (1 y 1 x 2 y 1 (1 y 1 x 3 y 1 (1 y 1 = x 1 y 2 (1 y 2 x 2 y 2 (1 y 2 x 3 y 2 (1 y 2 x 1 y 1 (1 y 1 w 11 x 1 y 2 (1 y 2 w 21 x 2 y 1 (1 y 1 w 11 x 2 y 2 (1 y 2 w 21 x 3 y 1 (1 y 1 w 11 x 3 y 2 (1 y 2 w 21 ( w11 w 21 = (Y W V

32 Computing Z V...and finally... = Z f (S = V V = z 1(1 z 1 (Y W V x 1 y 1 (1 y 1 w 11 x 1 y 2 (1 y 2 w 21 = z 1 (1 z 1 x 2 y 1 (1 y 1 w 11 x 2 y 2 (1 y 2 w 21 x 3 y 1 (1 y 1 w 11 x 3 y 2 (1 y 2 w 21 z 1 (1 z 1 x 1 y 1 (1 y 1 w 11 z 1 (1 z 1 x 1 y 2 (1 y 2 w 21 z 1 (1 z 1 x 2 y 1 (1 y 1 w 11 z 1 (1 z 1 x 2 y 2 (1 y 2 w 21 z 1 (1 z 1 x 3 y 1 (1 y 1 w 11 z 1 (1 z 1 x 3 y 2 (1 y 2 w 21 = Z v 11 Z v 21 Z v 31 Z v 12 Z v 22 Z v 32

33 Gradients and the Minibatch For every case c k in a minibatch, the values of X, Y and Z are computed during the forward pass. The symbolic gradients in Z V are filled in using these values in X,Y and Z (along with W, producing one numeric, 3 x 2 matrix per case: Z Z v 11 v 12 = ( Z V ck = Z v 21 Z v 31 Z v 22 Z v 32 z 1 (1 z 1 x 1 y 1 (1 y 1 w 11 z 1 (1 z 1 x 1 y 2 (1 y 2 w 21 z 1 (1 z 1 x 2 y 1 (1 y 1 w 11 z 1 (1 z 1 x 2 y 2 (1 y 2 w 21 z 1 (1 z 1 x 3 y 1 (1 y 1 w 11 z 1 (1 z 1 x 3 y 2 (1 y 2 w 21 ck ck

34 Gradients and the Minibatch In most Deep Learning situations, the gradients will be based on a loss function, L, not simply the output of the final layer. But that s just one more level of derivative calculations. At the completion of a minibatch, the numeric gradient matrices are added together to yield the complete gradient, which is then used to update the weights. For any weight matrix U in the network, and minibatch M, update weight u i,j as follows: ( L u M = i,j ( L u ck c k M i,j η = learning rate u i,j = η( L u i,j M Tensorflow and Theano do all of this for you. Everytime you write Deep Learning code, be grateful!!

35 The General Algorithm: Jacobian Product Chain Weights (W Layer M Goal: Compute dl / dw Layer N Targets Loss (L L W = L N N M + 1 N 1 M M W

36 Distant Gradients = Sums of Path Products X X 2 dy 1 dy 3 dx 2 dy 2 dx 2 X Y dx 2 dz 3 Y Z Z 3 dz 3 dy 2 dy 1 dz 3 dy 3 Z More generally: z 3 x 2 = z 3 y 1 y 1 x 2 + z 3 y 2 y 2 x 2 + z 3 y 3 y 3 x 2 z i z = x j i y k y k Y k x j This results from the multiplication of two Jacobian matrices.

37 The Jacobian Matrix J Z Y = z 1 y 1 z 1 y 2 z 2 y 1 z 2.. z m y 1 y z m y 2 z 1 y n z 2 y n.. z m y n

38 Multiplying Jacobian Matrices. Jy z Jx y z = i y 1... =... z i y 2. z i y 1 + z i y 1 x j. z i y 3.. y 2 y 2 x j. + z i y 3 y 3 y 1 x j y 2 x j y 3 x j. x j... = Jx z We can do this repeatedly to form J n m, the Jacobian relating the activation levels of upstream layer m with those of downstream layer n: R = Identity Matrix For q = n down to m+1 do: J n m R R R J q q 1

39 Last of the Jacobians Once we ve computed J n m, we need one final Jacobian: J m w, where w are the weights feeding into layer m. Then we have the standard Jacobian, J n w needed for updating all weights in matrix w. J n w J n m J m w Weights V from layer X to Y (from the earlier example ( T ( y1 y 2 y1 y 2 JV Y = Y v 11 v 11 v 12 v V = ( 12 T y1 y 2 v 21 v 21 ( T y1 y 2 v 31 v 31 T Since a the act func for Y is sigmoid, and b y k = 0 when j k: v ij ( JV Y = Y y1 (1 y 1 x 1 0 T ( T 0 y2 (1 y 2 x V = ( 1 y1 (1 y 1 x 2 0 T ( T 0 y2 (1 y 2 x 2 ( y1 (1 y 1 x 3 0 T ( T 0 y2 (1 y 2 x 3 Inner vectors are transposed to fit them on the page.

40 Last of the Jacobian Multiplications From the previous example (the network with layer sizes 3-2-1, once we have JY Z and JY V, we can multiply (taking dot products of JZ Y with the inner vectors of JV Y to produce JZ V : the matrix of gradients that backprop uses to modify the weights in V. J Z V = JZ Y JY V = z 1 y 1 z 1 y 2 Y V Since a the act func for Z is sigmoid, and b sum(z k = w y i ik (where sum(z k = sum of weighted inputs to node z k : JV Z = = z 1 (1 z 1 w 11 z 1 (1 z 1 w 21 Y V z 1 (1 z 1 x 1 y 1 (1 y 1 w 11 z 1 (1 z 1 x 1 y 2 (1 y 2 w 21 z 1 (1 z 1 x 2 y 1 (1 y 1 w 11 z 1 (1 z 1 x 2 y 2 (1 y 2 w 21 z 1 (1 z 1 x 3 y 1 (1 y 1 w 11 z 1 (1 z 1 x 3 y 2 (1 y 2 w 21

41 First of the Jacobians This iterative process is very general, and permits the calculation of JM N (M < N for any layers M and N, or JW N for an weights (W upstream of N. However, the standard situation in backpropagation is to compute J L W i i where L is the objective (loss function and W i are the weight matrices. This follows the same procedure as sketched above, but the series of dot products begins with JN L : the Jacobian of derivatives of the loss function w.r.t. the activations of the output layer. For example, assume L = Mean Squared Error (MSE, Z is an output layer of 3 sigmoid nodes, and t i are the target values for those 3 nodes for a particular case (c: L(c = (z i t i 2 k=1 Taking partial derivatives of L(c w.r.t. the z i yields: J L Z = L Z = 2 3 (z 1 t (z 2 t (z 3 t 3

42 First of the Jacobian Multiplications Continuing the example: Assume that layer Y (of size 2 feeds into layer Z, using weights W, then: JY Z = z 1 (1 z 1 w 11 z 1 (1 z 1 w 21 z 2 (1 z 2 w 12 z 2 (1 z 2 w 22 z 3 (1 z 3 w 13 z 3 (1 z 3 w 23 Our first Jacobian multiplication yields JY L, a 1 x 2 row vector (shown transposed to fit on the page: J L Z JZ Y = JL Y = 2 3 (z 1 t 1 z 1 (1 z 1 w (z 2 t 2 z 2 (1 z 2 w (z 3 t 3 z 3 (1 z 3 w (z 1 t 1 z 1 (1 z 1 w (z 2 t 2 z 2 (1 z 2 w (z 3 t 3 z 3 (1 z 3 w 23 T

43 Backpropagation with Tensors: The Big Picture General Algorithm Assume Layer M is upstream of Layer N (the output layer. So M < N. Assume V is the tensor of weights feeding into Layer M. Assume L is the loss function. Goal: Compute J L V = L V R = JN L (the partial derivatives of the loss function w.r.t. the output layer For q = N down to M + 1 do: R R J q q 1 J L V R JM V Use JV L to update the weights in V.

44 Practical Tips 1 Only add as many hidden layers and hidden nodes as necessary. Too many more weights to learn + increased chance of over-specialization. 2 Scale all input values to the same range, typically [0 1] or [-1 1]. 3 Use target values of 0.1 (for zero and 0.9 (for 1 to avoid saturation effects of sigmoids. 4 Beware of tricky encodings of input (and decodings of output values. Don t combine too much info into a single node s activation value (even though it s fun to try, since this can make proper weights difficult (or impossible to learn. 5 For discrete (e.g. nominal values, one (input or output node per value is often most effective. E.g. car model and city of residence -vs- income and education for assessing car-insurance risk. 6 All initial weights should be relatively small: [ ] 7 Bias nodes can be helpful for complicated data sets. 8 Check that all your layer sizes, activation functions, activation ranges, weight ranges, learning rates, etc. make sense in terms of each other and your goals for the ANN. One improper choice can ruin the results.

45 Supervised Learning in the Cerebellum Parallel Fibers Granular Cells Golgi Cells Mossy Fibers Climbing Fibers Purkinje Cells Inferior Olive Sensory + Cortical Inputs Somatosensory (touch, pain, body position + Cortical Inputs Efference Copy Inhibition of deep cerebellar To neurons Spinal Cord To Cerebral Cortex Granular cells detect contexts. Parallel fibers and Purkinje cells map contexts to actions Climbing fibers from Inferior Olive provide (supervisory feedback signals for LTD on Parallel-Purkinje synapses

Supervised Learning in Neural Networks

Supervised Learning in Neural Networks The Norwegian University of Science and Technology (NTNU Trondheim, Norway keithd@idi.ntnu.no March 7, 2011 Supervised Learning Constant feedback from an instructor, indicating not only right/wrong, but

More information

Gradient Descent Training Rule: The Details

Gradient Descent Training Rule: The Details Gradient Descent Training Rule: The Details 1 For Perceptrons The whole idea behind gradient descent is to gradually, but consistently, decrease the output error by adjusting the weights. The trick is

More information

Multilayer Perceptrons and Backpropagation

Multilayer Perceptrons and Backpropagation Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

y(x n, w) t n 2. (1)

y(x n, w) t n 2. (1) Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,

More information

Neural Nets Supervised learning

Neural Nets Supervised learning 6.034 Artificial Intelligence Big idea: Learning as acquiring a function on feature vectors Background Nearest Neighbors Identification Trees Neural Nets Neural Nets Supervised learning y s(z) w w 0 w

More information

Lecture 4 Backpropagation

Lecture 4 Backpropagation Lecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 5, 2017 Things we will look at today More Backpropagation Still more backpropagation Quiz

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Machine Learning

Machine Learning Machine Learning 10-315 Maria Florina Balcan Machine Learning Department Carnegie Mellon University 03/29/2019 Today: Artificial neural networks Backpropagation Reading: Mitchell: Chapter 4 Bishop: Chapter

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

CSC 411 Lecture 10: Neural Networks

CSC 411 Lecture 10: Neural Networks CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11

More information

Artifical Neural Networks

Artifical Neural Networks Neural Networks Artifical Neural Networks Neural Networks Biological Neural Networks.................................. Artificial Neural Networks................................... 3 ANN Structure...........................................

More information

Chapter 2 Single Layer Feedforward Networks

Chapter 2 Single Layer Feedforward Networks Chapter 2 Single Layer Feedforward Networks By Rosenblatt (1962) Perceptrons For modeling visual perception (retina) A feedforward network of three layers of units: Sensory, Association, and Response Learning

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

CS 453X: Class 20. Jacob Whitehill

CS 453X: Class 20. Jacob Whitehill CS 3X: Class 20 Jacob Whitehill More on training neural networks Training neural networks While training neural networks by hand is (arguably) fun, it is completely impractical except for toy examples.

More information

Machine Learning

Machine Learning Machine Learning 10-601 Maria Florina Balcan Machine Learning Department Carnegie Mellon University 02/10/2016 Today: Artificial neural networks Backpropagation Reading: Mitchell: Chapter 4 Bishop: Chapter

More information

Input layer. Weight matrix [ ] Output layer

Input layer. Weight matrix [ ] Output layer MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2003 Recitation 10, November 4 th & 5 th 2003 Learning by perceptrons

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

Statistical NLP for the Web

Statistical NLP for the Web Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,

More information

8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples

8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples 8-1: Backpropagation Prof. J.C. Kao, UCLA Backpropagation Chain rule for the derivatives Backpropagation graphs Examples 8-2: Backpropagation Prof. J.C. Kao, UCLA Motivation for backpropagation To do gradient

More information

Single Layer Perceptron Networks

Single Layer Perceptron Networks Single Layer Perceptron Networks We have looked at what artificial neural networks (ANNs) can do, and by looking at their history have seen some of the different types of neural network. We started looking

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient

More information

Error Backpropagation

Error Backpropagation Error Backpropagation Sargur Srihari 1 Topics in Error Backpropagation Terminology of backpropagation 1. Evaluation of Error function derivatives 2. Error Backpropagation algorithm 3. A simple example

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Philipp Koehn 3 October 207 Linear Models We used before weighted linear combination of feature values h j and weights λ j score(λ, d i ) = j λ j h j (d i ) Such models

More information

CSC321 Lecture 6: Backpropagation

CSC321 Lecture 6: Backpropagation CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1 / 21 Overview We ve seen that multilayer neural networks are powerful. But how can we actually learn them?

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples Back-Propagation Algorithm Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples 1 Inner-product net =< w, x >= w x cos(θ) net = n i=1 w i x i A measure

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

Lab 5: 16 th April Exercises on Neural Networks

Lab 5: 16 th April Exercises on Neural Networks Lab 5: 16 th April 01 Exercises on Neural Networks 1. What are the values of weights w 0, w 1, and w for the perceptron whose decision surface is illustrated in the figure? Assume the surface crosses the

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Pattern Classification

Pattern Classification Pattern Classification All materials in these slides were taen from Pattern Classification (2nd ed) by R. O. Duda,, P. E. Hart and D. G. Stor, John Wiley & Sons, 2000 with the permission of the authors

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2. 1 Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.034 Artificial Intelligence, Fall 2011 Recitation 8, November 3 Corrected Version & (most) solutions

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Artificial Neural Network : Training

Artificial Neural Network : Training Artificial Neural Networ : Training Debasis Samanta IIT Kharagpur debasis.samanta.iitgp@gmail.com 06.04.2018 Debasis Samanta (IIT Kharagpur) Soft Computing Applications 06.04.2018 1 / 49 Learning of neural

More information

Regularization and Optimization of Backpropagation

Regularization and Optimization of Backpropagation Regularization and Optimization of Backpropagation The Norwegian University of Science and Technology (NTNU) Trondheim, Norway keithd@idi.ntnu.no October 17, 2017 Regularization Definition of Regularization

More information

Multilayer Perceptron

Multilayer Perceptron Aprendizagem Automática Multilayer Perceptron Ludwig Krippahl Aprendizagem Automática Summary Perceptron and linear discrimination Multilayer Perceptron, nonlinear discrimination Backpropagation and training

More information

Chapter 3 Supervised learning:

Chapter 3 Supervised learning: Chapter 3 Supervised learning: Multilayer Networks I Backpropagation Learning Architecture: Feedforward network of at least one layer of non-linear hidden nodes, e.g., # of layers L 2 (not counting the

More information

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 13 Mar 1, 2018 1 Reminders Homework 5: Neural

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Revision: Neural Network

Revision: Neural Network Revision: Neural Network Exercise 1 Tell whether each of the following statements is true or false by checking the appropriate box. Statement True False a) A perceptron is guaranteed to perfectly learn

More information

Artificial Neural Networks. Edward Gatt

Artificial Neural Networks. Edward Gatt Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very

More information

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation Neural Networks Plan Perceptron Linear discriminant Associative memories Hopfield networks Chaotic networks Multilayer perceptron Backpropagation Perceptron Historically, the first neural net Inspired

More information

Tutorial on Tangent Propagation

Tutorial on Tangent Propagation Tutorial on Tangent Propagation Yichuan Tang Centre for Theoretical Neuroscience February 5, 2009 1 Introduction Tangent Propagation is the name of a learning technique of an artificial neural network

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Threshold units Gradient descent Multilayer networks Backpropagation Hidden layer representations Example: Face Recognition Advanced topics 1 Connectionist Models Consider humans:

More information

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang C4 Phenomenological Modeling - Regression & Neural Networks 4040-849-03: Computational Modeling and Simulation Instructor: Linwei Wang Recall.. The simple, multiple linear regression function ŷ(x) = a

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Neural Networks 2. 2 Receptive fields and dealing with image inputs CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There

More information

1 What a Neural Network Computes

1 What a Neural Network Computes Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Notes on Back Propagation in 4 Lines

Notes on Back Propagation in 4 Lines Notes on Back Propagation in 4 Lines Lili Mou moull12@sei.pku.edu.cn March, 2015 Congratulations! You are reading the clearest explanation of forward and backward propagation I have ever seen. In this

More information

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron

More information

Backpropagation Neural Net

Backpropagation Neural Net Backpropagation Neural Net As is the case with most neural networks, the aim of Backpropagation is to train the net to achieve a balance between the ability to respond correctly to the input patterns that

More information

Introduction to Artificial Neural Networks

Introduction to Artificial Neural Networks Facultés Universitaires Notre-Dame de la Paix 27 March 2007 Outline 1 Introduction 2 Fundamentals Biological neuron Artificial neuron Artificial Neural Network Outline 3 Single-layer ANN Perceptron Adaline

More information

Computing Neural Network Gradients

Computing Neural Network Gradients Computing Neural Network Gradients Kevin Clark 1 Introduction The purpose of these notes is to demonstrate how to quickly compute neural network gradients in a completely vectorized way. It is complementary

More information

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph

More information

Lecture 7 Artificial neural networks: Supervised learning

Lecture 7 Artificial neural networks: Supervised learning Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009 AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009 SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is

More information

Radial-Basis Function Networks

Radial-Basis Function Networks Radial-Basis Function etworks A function is radial () if its output depends on (is a nonincreasing function of) the distance of the input from a given stored vector. s represent local receptors, as illustrated

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Neural Networks (Part 1) Goals for the lecture

Neural Networks (Part 1) Goals for the lecture Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed

More information

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya

More information

Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation

Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation Steve Renals Machine Learning Practical MLP Lecture 5 16 October 2018 MLP Lecture 5 / 16 October 2018 Deep Neural Networks

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

Unit III. A Survey of Neural Network Model

Unit III. A Survey of Neural Network Model Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of

More information

Intro to Neural Networks and Deep Learning

Intro to Neural Networks and Deep Learning Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs

More information

Artificial Neural Networks. Historical description

Artificial Neural Networks. Historical description Artificial Neural Networks Historical description Victor G. Lopez 1 / 23 Artificial Neural Networks (ANN) An artificial neural network is a computational model that attempts to emulate the functions of

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

A thorough derivation of back-propagation for people who really want to understand it by: Mike Gashler, September 2010

A thorough derivation of back-propagation for people who really want to understand it by: Mike Gashler, September 2010 A thorough derivation of back-propagation for people who really want to understand it by: Mike Gashler, September 2010 Define the problem: Suppose we have a 5-layer feed-forward neural network. (I intentionally

More information

Neural Network Language Modeling

Neural Network Language Modeling Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith Course Project Sign up your course project In-class presentation

More information

More on Neural Networks

More on Neural Networks More on Neural Networks Yujia Yan Fall 2018 Outline Linear Regression y = Wx + b (1) Linear Regression y = Wx + b (1) Polynomial Regression y = Wφ(x) + b (2) where φ(x) gives the polynomial basis, e.g.,

More information

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch. Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will

More information

Deep learning in the brain. Deep learning summer school Montreal 2017

Deep learning in the brain. Deep learning summer school Montreal 2017 Deep learning in the brain Deep learning summer school Montreal 207 . Why deep learning is not just for AI The recent success of deep learning in artificial intelligence (AI) means that most people associate

More information

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of

More information

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation BACKPROPAGATION Neural network training optimization problem min J(w) w The application of gradient descent to this problem is called backpropagation. Backpropagation is gradient descent applied to J(w)

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Week 5: The Backpropagation Algorithm

Week 5: The Backpropagation Algorithm Week 5: The Backpropagation Algorithm Hans Georg Schaathun 25th April 2017 Time Topic Reading 8.15- Recap of tutorials last week Lecture: From perceptron to back-propagation Marsland Section 4 9.00- Tutorial

More information