CS 453X: Class 20. Jacob Whitehill
|
|
- Meredith Banks
- 5 years ago
- Views:
Transcription
1 CS 3X: Class 20 Jacob Whitehill
2 More on training neural networks
3 Training neural networks While training neural networks by hand is (arguably) fun, it is completely impractical except for toy examples. How can we find good values for the weights and bias terms automatically? Gradient descent. x2 xm W (1) W (2) z1 zk b (1) b (2) ŷ1 ŷl
4 Training neural networks While training neural networks by hand is (arguably) fun, it is completely impractical except for toy examples. How can we find good values for the weights and bias terms automatically? Gradient descent. x2 xm W (1) W (2) z1 zk b (1) b (2) ŷ1 ŷl
5 Gradient descent: XOR problem Here is how we can conduct gradient descent for the XOR problem Let s first define: W (1) = apple w1 w 2 w 3 w, W (2) = apple w w 6, b (1) = apple b1 b 2, b (2) = b 3 Then we can define g so that: ŷ = g apple = W (2) W (1) x + b (1) + b (2) apple apple apple apple W (1) w w1 ww (2) = 2 b1 + + b w 6 z1 w 3 w b 3 ŷ 2 = w (w 1 x 1 + w 2 + b 1 )+w 6 (w 3 x 1 + w + b 2 )+b 3 x2 z2 b (1) b (2)
6 Gradient descent: XOR problem Here is how we can conduct gradient descent for the XOR problem Let s first define: W (1) = apple w1 w 2 w 3 w, W (2) = apple w w 6, b (1) = apple b1 b 2, b (2) = b 3 Then we can define g so that: ŷ = g apple = W (2) W (1) x + b (1) + b (2) = apple w T w 6 apple w1 w 2 w 3 w apple + apple b1 b 2 + b 3 = w (w 1 x 1 + w 2 + b 1 )+w 6 (w 3 x 1 + w + b 2 )+b 3
7 Gradient descent: XOR problem Here is how we can conduct gradient descent for the XOR problem Let s first define: W (1) = apple w1 w 2 w 3 w, W (2) = apple w w 6, b (1) = apple b1 b 2, b (2) = b 3 Then we can define g so that: ŷ = g apple = W (2) W (1) x + b (1) + b (2) = apple w T w 6 apple w1 w 2 w 3 w apple + apple b1 b 2 + b 3 = w (w 1 x 1 + w 2 + b 1 )+w 6 (w 3 x 1 + w + b 2 )+b 3
8 Gradient descent: XOR problem From ŷ, we can compute the fmse cost as: f MSE (ŷ; w 1,w 2,w 3,w,w,w 6,b 1,b 2,b 3 )= 1 (ŷ y)2 2 We then then calculate the derivative of fmse w.r.t. each parameter p using the chain rule as: where:
9 Gradient descent: XOR problem From ŷ, we can compute the fmse cost as: f MSE (ŷ; w 1,w 2,w 3,w,w,w 6,b 1,b 2,b 3 )= 1 2 (ŷ y)2 We then calculate the derivative of fmse w.r.t. each parameter p using the chain rule as: where: = MSE =(ŷ y)
10 Gradient descent: XOR problem Now we just have to differentiate ŷ = g(x) w.r.t each parameter p:, e.g.: 1 [w (w 1 x 1 + w 2 + b 1 )+w 6 (w 3 x 1 + w + b 2 )+b 3 ] = w 0 (w 1 x 1 + w 2 + b 1 )x 1
11 Gradient descent: XOR problem Now we just have to differentiate ŷ = g(x) w.r.t each parameter p:, e.g.: 1 [w (w 1 x 1 + w 2 + b 1 )+w 6 (w 3 x 1 + w + b 2 )+b 3 ] = w 0 (w 1 x 1 + w 2 + b 1 )x 1 This is the only term that depends on w1. It has the form: c*σ(z) where z is a function of w1. Recall [c (z)] 1 1 = c 1
12 Gradient descent: XOR problem Now we just have to differentiate ŷ = g(x) w.r.t each parameter p:, e.g.: 1 [w (w 1 x 1 + w 2 + b 1 )+w 6 (w 3 x 1 + w + b 2 )+b 3 ] = w 0 (w 1 x 1 + w 2 + b 1 )x 1
13 Gradient descent: XOR problem Now we just have to differentiate ŷ = g(x) w.r.t each parameter p:, e.g.: 1 [w (w 1 x 1 + w 2 + b 1 )+w 6 (w 3 x 1 + w + b 2 )+b 3 ] = w 0 (w 1 x 1 + w 2 + b 1 )x 1
14 Gradient descent: XOR problem Now we just have to differentiate ŷ = g(x) w.r.t each parameter p:, e.g.: 1 [w (w 1 x 1 + w 2 + b 1 )+w 6 (w 3 x 1 + w + b 2 )+b 3 ] = w 0 (w 1 x 1 + w 2 + b 1 )x 1
15 Gradient descent: XOR problem Now we just have to differentiate ŷ = g(x) w.r.t each parameter p:, e.g.: 1 [w (w 1 x 1 + w 2 + b 1 )+w 6 (w 3 x 1 + w + b 2 )+b 3 ] = w 0 (w 1 x 1 + w 2 + b 1 )x 1 0 (z) = 0 if z apple 0 1 if z>0 for ReLU
16 Gradient descent: XOR problem Hence: 1 since: 0 if w1 x 1 + w 2 + b 1 apple 0 w x 1 if w 1 x 1 + w 2 + b 1 > 0 0 (z) = 0 if z apple 0 1 if z>0 for ReLU
17 Exercise 2 [w (w 1 x 1 + w 2 + b 1 )+w 6 (w 3 x 1 + w + b 2 )+b 3 2 = w 0 6 (w 3 x 1 + w + b 2 ) = 0 if w3 x 1 + w + b 2 apple 0 w 6 if w 3 x 1 + w + b 2 > 0
18 Exercise 2 [w (w 1 x 1 + w 2 + b 1 )+w 6 (w 3 x 1 + w + b 2 )+b 3 2 = w 0 6 (w 3 x 1 + w + b 2 ) = 0 if w3 x 1 + w + b 2 apple 0 w 6 if w 3 x 1 + w + b 2 > 0
19 Gradient descent: XOR problem We also need to derive the other partial 1 Based on these, we can compute the gradient terms: 1 6 = 1 = = MSE = 3 W (1) W (2) z1 ŷ x2 z2 b (1) b (2)
20 Gradient descent: XOR problem Given the gradients, we can conduct gradient descent: For each epoch: For each minibatch: Compute all 9 partial derivatives (summed over all examples in the minibatch): Update w1: Update w6: Update b1: Update b3: w 1 w 1 1 w 6 w 6 6 b 1 b 1 1 b 3 b 3 3
21 Gradient descent: (any 3-layer NN) It is more compact and efficient to represent and compute these gradients more abstractly: r W (1)f MSE r W (2)f MSE r b (1)f MSE r b (2)f MSE where r W (1)f MSE = etc. " # W (1) W (2) z1 ŷ x2 z2 b (1) b (2)
22 Gradient descent (any 3-layer NN) Given the gradients, we can conduct gradient descent: For each epoch: For each minibatch: Compute all gradient terms (summed over all examples in the minibatch): Update W (1) : Update W (2) : Update b (1) : Update b (2) : W (1) W (1) r W (1)f MSE W (2) W (2) r W (2)f MSE b (1) b (1) r b (1)f MSE b (2) b (2) r b (2)f MSE
23 Deriving the gradient terms In the XOR problem, we derived each element of each gradient term by hand. For large networks with many layers, this would be prohibitively tedious. Fortunately, we can largely automate the process of computing each gradient term W (1), W (2),, W (d) (for d layers). This procedure is enabled by the chain rule of multivariate calculus
24 Jacobian matrices & the chain rule of multivariate calculus
25 Jacobian matrices Consider a function f that takes a vector as input and produces a vector as output, e.g.: f apple What is the set of f s partial derivatives? = 2 2x 1 + x 1 / = 1/ = x 1 / 2
26 Jacobian matrices Consider a function f that takes a vector as input and produces a vector as output, e.g.: f apple What is the set of f s partial derivatives? = 2 2x 1 + x 1 / = 1/ = x 1 / 2
27 Jacobian matrices Consider a function f that takes a vector as input and produces a vector as output, e.g.: f apple What is the set of f s partial derivatives? 2 = x 1 + x 1 / 1/ x 1 / 2 Jacobian matrix 3 3
28 Jacobian matrices For any function f : R m! R n, we can define the Jacobian matrix of all partial = Columns are the inputs to f m m 3 7 Row are the outputs of f.
29 Chain rule of multivariate calculus Suppose f : R m! R n and g : R k! R m. Note that the Jacobian order matters!, of f. i.e.:
30 Chain rule of multivariate calculus Suppose f : R m! R n and g : R k! R m. Note that the order Jacobian matters!, of g. i.e.:
31 Chain rule of multivariate calculus Suppose f : R m! R n and g : R k! R m. Note that the order matters!,
32 Chain rule of multivariate calculus: example Suppose: f x 1 x 3 What is 31 A = x x 3 / g apple? Two alternative methods to derive: 1.Substitute g into f and differentiate. 2.Apply chain rule. = 2 x 1 +2 x
33 Chain rule of multivariate calculus: example Suppose: f x 1 x 3 What is 31 A = x x 3 / g Two alternative methods: 1.Substitute g into f and differentiate. 2.Apply chain rule. = 2 x 1 +2 x
34 Chain rule of multivariate calculus: example f x 1 x 3 31 A = x x 3 / g apple = 2 x 1 +2 x Substitution: f g apple = f g apple = x ( )+( x 1 + 3)/ = 3 (x 1 + 1) = 3 0
35 Chain rule of multivariate calculus: example f x 1 x 3 31 A = x x 3 / g apple = 2 x 1 +2 x Substitution: f g apple = f g apple = x ( )+( x 1 + 3)/ = 3 (x 1 + 1) = 3 0
36 Chain rule of multivariate calculus: example f x 1 x 3 31 A = x x 3 / g apple = 2 x 1 +2 x Substitution: f g apple = f g apple = x ( )+( x 1 + 3)/ = 3 (x 1 + 1) = 3 0
37 Chain rule of multivariate calculus: example f x 1 x 3 31 A = x x 3 / g apple = 2 x 1 +2 x Substitution: f g apple = f g apple = x ( )+( x 1 + 3)/ = 3 (x 1 + 1) = 3 0
38 Chain rule of multivariate calculus: example f x 1 x 3 31 A = x x 3 / g apple = 2 x 1 +2 x Chain @x = = 1 1? = = x
39 Chain rule of multivariate calculus: example f x 1 x 3 31 A = x x 3 / g apple = 2 x 1 +2 x Chain @x = = = = x
40 Chain rule of multivariate calculus: example f x 1 x 3 31 A = x x 3 / g apple = 2 x 1 +2 x Chain @x = = = = x
41 Chain rule of multivariate calculus: example f x 1 x 3 31 A = x x 3 / g apple = 2 x 1 +2 x Chain @x = = = = x
42 Chain rule of multivariate calculus: example f x 1 x 3 31 A = x x 3 / g apple = 2 x 1 +2 x Chain @x = = = = x
43 f: R m R: Jacobian versus gradient Note, for a real-valued function f : R m! R, the Jacobian matrix is the transpose of the gradient. r x f(x) = 2 m 3 7 m i Gradient Jacobian
44 Jacobian matrices Note that we sometimes examine the same function from different perspectives, e.g.: f(x; w) =f(x, y; a, b) =2ax + b/y
45 Jacobian matrices Note that we sometimes examine the same function from different perspectives, e.g.: f(x; w) =f(x, y; a, b) =2ax + b/y We can consider x, y to be the variables and a, b to be = 2a b/y 2
46 Jacobian matrices Note that we sometimes examine the same function from different perspectives, e.g.: f(x; w) =f(x, y; a, b) =2ax + b/y We can consider a, b to be the variables and x, y to be = 2x 1/y
47 Computing the gradients Jacobian matrices and the chain rule provide a recipe for how to compute all the gradient terms efficiently. Consider the 3-layer NN below: From x, W (1), and b (1), we can compute z. From z and σ, we can compute h = σ(z). From h, W (2), and b (2), we can compute ŷ. x2 xm W (1) W (2) z1 h1 zk hk b (1) b (2) ŷ1 ŷl
48 Computing the gradients Jacobian matrices and the chain rule provide a recipe for how to compute all the gradient terms efficiently. Consider the 3-layer NN below: From x, W (1), and b (1), we can compute z. From z and σ, we can compute h = σ(z). From h, W (2), and b (2), we can compute ŷ. x2 xm W (1) W (2) z1 h1 zk hk b (1) b (2) ŷ1 ŷl
49 Computing the gradients Jacobian matrices and the chain rule provide a recipe for how to compute all the gradient terms efficiently. Consider the 3-layer NN below: From x, W (1), and b (1), we can compute z. From z and σ, we can compute h = σ(z). From h, W (2), and b (2), we can compute ŷ. x2 xm W (1) W (2) z1 h1 zk hk b (1) b (2) ŷ1 ŷl
50 Computing the gradients Jacobian matrices and the chain rule provide a recipe for how to compute all the gradient terms efficiently. This process is known as forward propagation. It produces all the intermediary (h, z) and final (ŷ) network outputs. From h, W (2), and b (2), we can compute ŷ. x2 xm W (1) W (2) z1 h1 zk hk b (1) b (2) ŷ1 ŷl
51 Computing the gradients Jacobian matrices and the chain rule provide a recipe for how to compute all the gradient terms efficiently. This process is known as forward propagation. It can be applied a NN of arbitrary depth; in the NN below, it would produce z (1), h (1), z (2), h (2), and ŷ. Hidden layer 1 Hidden layer 2 x2 xm W (1) W (2) W (3) z1 h1 zk hk b (1) b (3) b (2) z1 zq h1 hq ŷ1 ŷl
52 Computing the gradients Now, let s look at how to compute each gradient term: (2) (1) x2 x3 W (1) W (2) z1 h1 z2 h2 b (1) b (2) ŷ f
53 Computing the gradients Now, let s look at how to compute each gradient term: (2) (1) Redundant computation x2 x3 W (1) W (2) z1 h1 z2 h2 b (1) b (2) ŷ f
54 Computing the gradients Here s how we can compute all these efficiently: (2) (1) x2 W (1) W (2) z1 h1 ŷ f x3 z2 h2 b (1) b (2)
55 Computing the gradients Here s how we can compute all these efficiently: (2) (1) x2 W (1) W (2) z1 (2) ŷ f x3 z2 h2 b (1) b (2)
56 Computing the gradients Here s how we can compute all these efficiently: (2) (1) x2 W (1) W (2) z1 (2) ŷ f x3 z2 h2 b (1) b (2)
57 Computing the gradients Here s how we can compute all these efficiently: (2) (1) x2 W W (2) z1 h1 ŷ f x3 z2 h2 b (1) b (2)
58 Computing the gradients Here s how we can compute all these efficiently: (2) (1) x2 W (2) h1 ŷ f x3 z2 h2 b (1) b (2)
59 Computing the gradients Here s how we can compute all these efficiently: (2) (1) W (2) h1 ŷ f x3 z2 h2 b (1) b (2)
60 Computing the gradients Here s how we can compute all these efficiently: (2) (1) W (2) h1 ŷ (1) z2 h2 b (1) b (2)
61 Computing the gradients For the simple NN below, let s see how we can use the chain rule to compute each gradient term (2) How does ŷ depend on W (2)? ŷ = W (2) h = W (2) 1 h 1 + W (2) 2 h 2 =) = (2) 1 h 2 = h x2 x3 W (1) W (2) z1 h1 z2 h2 b (1) b (2) ŷ f
62 Computing the gradients For the simple NN below, let s see how we can use the chain rule to compute each gradient term (2) How does ŷ depend on W (2)? ŷ = W (2) h = W (2) 1 h 1 + W (2) 2 h 2 =) = (2) 1 h 2 = h T x2 x3 W (1) W (2) z1 h1 z2 h2 b (1) b (2) ŷ f
CS 453X: Class 5. Jacob Whitehill
CS 453X: Class 5 Jacob Whitehill Example 2: weather data Data from https://www.kaggle.com/selfishgene/historicalhourly-weather-data/data Hourly measurements of pressure, humidity, and temperature, and
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationTopic 3: Neural Networks
CS 4850/6850: Introduction to Machine Learning Fall 2018 Topic 3: Neural Networks Instructor: Daniel L. Pimentel-Alarcón c Copyright 2018 3.1 Introduction Neural networks are arguably the main reason why
More informationComputing Neural Network Gradients
Computing Neural Network Gradients Kevin Clark 1 Introduction The purpose of these notes is to demonstrate how to quickly compute neural network gradients in a completely vectorized way. It is complementary
More informationDeep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated
Deep Feedforward Networks Lecture slides for Chapter 6 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last updated 2016-10-04 Roadmap Example: Learning XOR Gradient-Based Learning Hidden Units
More informationCS60010: Deep Learning
CS60010: Deep Learning Sudeshna Sarkar Spring 2018 16 Jan 2018 FFN Goal: Approximate some unknown ideal function f : X! Y Ideal classifier: y = f*(x) with x and category y Feedforward Network: Define parametric
More informationBackpropagation: The Good, the Bad and the Ugly
Backpropagation: The Good, the Bad and the Ugly The Norwegian University of Science and Technology (NTNU Trondheim, Norway keithd@idi.ntnu.no October 3, 2017 Supervised Learning Constant feedback from
More informationDeep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville
Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville Chapter 6 :Deep Feedforward Networks Benoit Massé Dionyssos Kounades-Bastian Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25
More informationMore on Neural Networks
More on Neural Networks Yujia Yan Fall 2018 Outline Linear Regression y = Wx + b (1) Linear Regression y = Wx + b (1) Polynomial Regression y = Wφ(x) + b (2) where φ(x) gives the polynomial basis, e.g.,
More informationLecture 4 Backpropagation
Lecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 5, 2017 Things we will look at today More Backpropagation Still more backpropagation Quiz
More informationSolutions. Part I Logistic regression backpropagation with a single training example
Solutions Part I Logistic regression backpropagation with a single training example In this part, you are using the Stochastic Gradient Optimizer to train your Logistic Regression. Consequently, the gradients
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More informationIntro to Neural Networks and Deep Learning
Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs
More informationNeural Networks. Intro to AI Bert Huang Virginia Tech
Neural Networks Intro to AI Bert Huang Virginia Tech Outline Biological inspiration for artificial neural networks Linear vs. nonlinear functions Learning with neural networks: back propagation https://en.wikipedia.org/wiki/neuron#/media/file:chemical_synapse_schema_cropped.jpg
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,
More informationNeural networks. Chapter 20. Chapter 20 1
Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms
More informationTutorial on Tangent Propagation
Tutorial on Tangent Propagation Yichuan Tang Centre for Theoretical Neuroscience February 5, 2009 1 Introduction Tangent Propagation is the name of a learning technique of an artificial neural network
More informationMachine Learning (CSE 446): Neural Networks
Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16 Slides adapted from Jordan Boyd-Graber, Justin Johnson, Andrej Karpathy, Chris Ketelsen, Fei-Fei Li, Mike Mozer, Michael Nielson
More informationFeedforward Neural Networks
Feedforward Neural Networks Michael Collins 1 Introduction In the previous notes, we introduced an important class of models, log-linear models. In this note, we describe feedforward neural networks, which
More informationLab 5: 16 th April Exercises on Neural Networks
Lab 5: 16 th April 01 Exercises on Neural Networks 1. What are the values of weights w 0, w 1, and w for the perceptron whose decision surface is illustrated in the figure? Assume the surface crosses the
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationCS 453X: Class 1. Jacob Whitehill
CS 453X: Class 1 Jacob Whitehill How old are these people? Guess how old each person is at the time the photo was taken based on their face image. https://www.vision.ee.ethz.ch/en/publications/papers/articles/eth_biwi_01299.pdf
More informationIntelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera
Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar Behera Department of Electrical Engineering Indian Institute of Technology, Kanpur Recurrent Networks p.1/40 Subjects
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 03: Multi-layer Perceptron Outline Failure of Perceptron Neural Network Backpropagation Universal Approximator 2 Outline Failure of Perceptron Neural Network Backpropagation
More informationCSC321 Lecture 2: Linear Regression
CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationMulti-layer Perceptron Networks
Multi-layer Perceptron Networks Our task is to tackle a class of problems that the simple perceptron cannot solve. Complex perceptron networks not only carry some important information for neuroscience,
More informationLecture 13 Back-propagation
Lecture 13 Bac-propagation 02 March 2016 Taylor B. Arnold Yale Statistics STAT 365/665 1/21 Notes: Problem set 4 is due this Friday Problem set 5 is due a wee from Monday (for those of you with a midterm
More informationNeural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav
Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1 Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
More informationFeedforward Neural Networks. Michael Collins, Columbia University
Feedforward Neural Networks Michael Collins, Columbia University Recap: Log-linear Models A log-linear model takes the following form: p(y x; v) = exp (v f(x, y)) y Y exp (v f(x, y )) f(x, y) is the representation
More information8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples
8-1: Backpropagation Prof. J.C. Kao, UCLA Backpropagation Chain rule for the derivatives Backpropagation graphs Examples 8-2: Backpropagation Prof. J.C. Kao, UCLA Motivation for backpropagation To do gradient
More informationA thorough derivation of back-propagation for people who really want to understand it by: Mike Gashler, September 2010
A thorough derivation of back-propagation for people who really want to understand it by: Mike Gashler, September 2010 Define the problem: Suppose we have a 5-layer feed-forward neural network. (I intentionally
More informationMachine Learning (CSE 446): Backpropagation
Machine Learning (CSE 446): Backpropagation Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 8, 2017 1 / 32 Neuron-Inspired Classifiers correct output y n L n loss hidden units
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline
More informationC4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang
C4 Phenomenological Modeling - Regression & Neural Networks 4040-849-03: Computational Modeling and Simulation Instructor: Linwei Wang Recall.. The simple, multiple linear regression function ŷ(x) = a
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationMachine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler
+ Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions
More informationCSC 411 Lecture 10: Neural Networks
CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11
More informationArtifical Neural Networks
Neural Networks Artifical Neural Networks Neural Networks Biological Neural Networks.................................. Artificial Neural Networks................................... 3 ANN Structure...........................................
More informationLecture 16: Introduction to Neural Networks
Lecture 16: Introduction to Neural Networs Instructor: Aditya Bhasara Scribe: Philippe David CS 5966/6966: Theory of Machine Learning March 20 th, 2017 Abstract In this lecture, we consider Bacpropagation,
More informationError Backpropagation
Error Backpropagation Sargur Srihari 1 Topics in Error Backpropagation Terminology of backpropagation 1. Evaluation of Error function derivatives 2. Error Backpropagation algorithm 3. A simple example
More informationDeepLearning on FPGAs
DeepLearning on FPGAs Introduction to Deep Learning Sebastian Buschäger Technische Universität Dortmund - Fakultät Informatik - Lehrstuhl 8 October 21, 2017 1 Recap Computer Science Approach Technical
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationNeural networks. Chapter 19, Sections 1 5 1
Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationKAI ZHANG, SHIYU LIU, AND QING WANG
CS 229 FINAL PROJECT: STRUCTURE PREDICTION OF OPTICAL FUNCTIONAL DEVICES WITH DEEP LEARNING KAI ZHANG, SHIYU LIU, AND QING WANG x y H l m n p r Nomenclature The output electrical field, which is the input
More informationGradient Descent Training Rule: The Details
Gradient Descent Training Rule: The Details 1 For Perceptrons The whole idea behind gradient descent is to gradually, but consistently, decrease the output error by adjusting the weights. The trick is
More informationBackpropagation and Neural Networks part 1. Lecture 4-1
Lecture 4: Backpropagation and Neural Networks part 1 Lecture 4-1 Administrative A1 is due Jan 20 (Wednesday). ~150 hours left Warning: Jan 18 (Monday) is Holiday (no class/office hours) Also note: Lectures
More informationSerious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions
BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design
More informationBACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation
BACKPROPAGATION Neural network training optimization problem min J(w) w The application of gradient descent to this problem is called backpropagation. Backpropagation is gradient descent applied to J(w)
More informationNeural Networks: Optimization & Regularization
Neural Networks: Optimization & Regularization Shan-Hung Wu shwu@cs.nthu.edu.tw Department of Computer Science, National Tsing Hua University, Taiwan Machine Learning Shan-Hung Wu (CS, NTHU) NN Opt & Reg
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationLearning Neural Networks
Learning Neural Networks Neural Networks can represent complex decision boundaries Variable size. Any boolean function can be represented. Hidden units can be interpreted as new features Deterministic
More informationNeural networks COMS 4771
Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationTopics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)
Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 3: Introduction to Deep Learning (continued) Course Logistics - Update on course registrations - 6 seats left now -
More informationDeep Feedforward Networks. Seung-Hoon Na Chonbuk National University
Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections
More informationDeep Feedforward Networks
Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function
More informationVasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks
C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,
More informationRecitation 1. Gradients and Directional Derivatives. Brett Bernstein. CDS at NYU. January 21, 2018
Gradients and Directional Derivatives Brett Bernstein CDS at NYU January 21, 2018 Brett Bernstein (CDS at NYU) Recitation 1 January 21, 2018 1 / 23 Initial Question Intro Question Question We are given
More informationArtificial Neuron (Perceptron)
9/6/208 Gradient Descent (GD) Hantao Zhang Deep Learning with Python Reading: https://en.wikipedia.org/wiki/gradient_descent Artificial Neuron (Perceptron) = w T = w 0 0 + + w 2 2 + + w d d where
More informationNeural Network Tutorial & Application in Nuclear Physics. Weiguang Jiang ( 蒋炜光 ) UTK / ORNL
Neural Network Tutorial & Application in Nuclear Physics Weiguang Jiang ( 蒋炜光 ) UTK / ORNL Machine Learning Logistic Regression Gaussian Processes Neural Network Support vector machine Random Forest Genetic
More informationArtificial Neural Network : Training
Artificial Neural Networ : Training Debasis Samanta IIT Kharagpur debasis.samanta.iitgp@gmail.com 06.04.2018 Debasis Samanta (IIT Kharagpur) Soft Computing Applications 06.04.2018 1 / 49 Learning of neural
More informationCSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!
CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya
More informationDeep Feedforward Networks. Sargur N. Srihari
Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationMultilayer Neural Networks
Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 13 Mar 1, 2018 1 Reminders Homework 5: Neural
More informationIntroduction to Machine Learning Spring 2018 Note Neural Networks
CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,
More informationMultilayer Neural Networks
Multilayer Neural Networks Introduction Goal: Classify objects by learning nonlinearity There are many problems for which linear discriminants are insufficient for minimum error In previous methods, the
More information. D Matrix Calculus D 1
D Matrix Calculus D 1 Appendix D: MATRIX CALCULUS D 2 In this Appendix we collect some useful formulas of matrix calculus that often appear in finite element derivations D1 THE DERIVATIVES OF VECTOR FUNCTIONS
More informationCSC321 Lecture 6: Backpropagation
CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1 / 21 Overview We ve seen that multilayer neural networks are powerful. But how can we actually learn them?
More informationVector, Matrix, and Tensor Derivatives
Vector, Matrix, and Tensor Derivatives Erik Learned-Miller The purpose of this document is to help you learn to take derivatives of vectors, matrices, and higher order tensors (arrays with three dimensions
More informationIntroduction to Machine Learning. Recitation 8. w 2, b 2. w 1, b 1. z 0 z 1. The function we want to minimize is the loss over all examples: f =
Introduction to Macine Learning Lecturer: Regev Scweiger Recitation 8 Fall Semester Scribe: Regev Scweiger 8.1 Backpropagation We will develop and review te backpropagation algoritm for neural networks.
More informationSPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks
Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationComputational Graphs, and Backpropagation
Chapter 1 Computational Graphs, and Backpropagation (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction We now describe the backpropagation algorithm for calculation of derivatives
More informationLinear Neural Networks
Chapter 10 Linear Neural Networks In this chapter, we introduce the concept of the linear neural network. 10.1 Introduction and Notation 1. The linear neural cell, or node has the schematic form as shown
More informationCOGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.
COGS Q250 Fall 2012 Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November. For the first two questions of the homework you will need to understand the learning algorithm using the delta
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationNN V: The generalized delta learning rule
NN V: The generalized delta learning rule We now focus on generalizing the delta learning rule for feedforward layered neural networks. The architecture of the two-layer network considered below is shown
More informationNonlinear equations and optimization
Notes for 2017-03-29 Nonlinear equations and optimization For the next month or so, we will be discussing methods for solving nonlinear systems of equations and multivariate optimization problems. We will
More informationComputational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationClassification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses
More informationΝεςπο-Ασαυήρ Υπολογιστική Neuro-Fuzzy Computing
Νεςπο-Ασαυήρ Υπολογιστική Neuro-Fuzzy Computing ΗΥ418 Διδάσκων Δημήτριος Κατσαρός @ Τμ. ΗΜΜΥ Πανεπιστήμιο Θεσσαλίαρ Διάλεξη 21η BackProp for CNNs: Do I need to understand it? Why do we have to write the
More information7. The Multivariate Normal Distribution
of 5 7/6/2009 5:56 AM Virtual Laboratories > 5. Special Distributions > 2 3 4 5 6 7 8 9 0 2 3 4 5 7. The Multivariate Normal Distribution The Bivariate Normal Distribution Definition Suppose that U and
More informationMark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.
University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x
More informationCSC321 Lecture 5: Multilayer Perceptrons
CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationChapter 2. Optimization. Gradients, convexity, and ALS
Chapter 2 Optimization Gradients, convexity, and ALS Contents Background Gradient descent Stochastic gradient descent Newton s method Alternating least squares KKT conditions 2 Motivation We can solve
More informationSingle layer NN. Neuron Model
Single layer NN We consider the simple architecture consisting of just one neuron. Generalization to a single layer with more neurons as illustrated below is easy because: M M The output units are independent
More information