Neural Networks: Backpropagation
|
|
- Eunice Charles
- 5 years ago
- Views:
Transcription
1 Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1
2 Neural Networks What is a neural network? Predicting with a neural network Training neural networks Practical concerns 2
3 This lecture What is a neural network? Predicting with a neural network Training neural networks Backpropagation Practical concerns 3
4 Training a neural network Given A network architecture (layout of neurons, their connectivity and activations) A dataset of labeled examples S = {(x i, y i )} The goal: Learn the weights of the neural network Remember: For a fixed architecture, a neural network is a function parameterized by its weights Prediction: y" = NN(x, w) 4
5 Recall: Learning as loss minimization We have a classifier NN that is completely defined by its weights Learn the weights by minimizing a loss L min / L(NN x 0, w, y 0 ) w 0 Perhaps with a regularizer 5
6 Recall: Learning as loss minimization We have a classifier NN that is completely defined by its weights Learn the weights by minimizing a loss L min / L(NN x 0, w, y 0 ) w 0 Perhaps with a regularizer So far, we saw that this strategy worked for: 1. Logistic Regression Each 2. Support Vector Machines minimizes a 3. Perceptron different loss 4. LMS regression function All of these are linear models Same idea for non-linear models too! 6
7 Back to our running example Given an input x, how is the output predicted output y = w w 6 55 z 5 + w 6 95 z 9 z 9 = σ(w 49 + w 59 x 5 + w 99 x 9 ) z 5 = σ(w 45 + w 55 x 5 + w 95 x 9 ) 7
8 Back to our running example Given an input x, how is the output predicted output y = w w 6 55 z 5 + w 6 95 z 9 z 9 = σ(w 49 + w 59 x 5 + w 99 x 9 ) z 5 = σ(w 45 + w 55 x 5 + w 95 x 9 ) Suppose the true label for this example is a number y 8
9 Back to our running example Given an input x, how is the output predicted output y = w w 6 55 z 5 + w 6 95 z 9 z 9 = σ(w 49 + w 59 x 5 + w 99 x 9 ) z 5 = σ(w 45 + w 55 x 5 + w 95 x 9 ) Suppose the true label for this example is a number y We can write the square loss for this example as: L = 1 2 y y 9 9
10 Learning as loss minimization We have a classifier NN that is completely defined by its weights Learn the weights by minimizing a loss L min / L(NN x 0, w, y 0 ) w 0 Perhaps with a regularizer How do we solve the optimization problem? 10
11 Stochastic gradient descent min w / L(NN x 0, w, y 0 ) 0 Given a training set S = {(x i, y i )}, x 2 < d 1. Initialize parameters w 2. For epoch = 1 T: 1. Shuffle the training set 2. For each training example (x i, y i )2 S: Treat this example as the entire dataset Compute the gradient of the loss L(NN x 0, w, y 0 ) Update: w w γ F L(NN x 0, w, y 0 )) 3. Return w 11
12 Stochastic gradient descent min w / L(NN x 0, w, y 0 ) 0 Given a training set S = {(x i, y i )}, x 2 < d 1. Initialize parameters w 2. For epoch = 1 T: 1. Shuffle the training set 2. For each training example (x i, y i )2 S: Treat this example as the entire dataset Compute the gradient of the loss L(NN x 0, w, y 0 ) Update: w w γ F L(NN x 0, w, y 0 )) 3. Return w 12
13 Stochastic gradient descent min w / L(NN x 0, w, y 0 ) 0 Given a training set S = {(x i, y i )}, x 2 < d 1. Initialize parameters w 2. For epoch = 1 T: 1. Shuffle the training set 2. For each training example (x i, y i )2 S: Treat this example as the entire dataset Compute the gradient of the loss L(NN x 0, w, y 0 ) Update: w w γ F L(NN x 0, w, y 0 )) 3. Return w 13
14 Stochastic gradient descent min w / L(NN x 0, w, y 0 ) 0 Given a training set S = {(x i, y i )}, x 2 < d 1. Initialize parameters w 2. For epoch = 1 T: 1. Shuffle the training set 2. For each training example (x i, y i )2 S: Treat this example as the entire dataset Compute the gradient of the loss L(NN x 0, w, y 0 ) Update: w w γ F L(NN x 0, w, y 0 )) 3. Return w 14
15 Stochastic gradient descent min w / L(NN x 0, w, y 0 ) 0 Given a training set S = {(x i, y i )}, x 2 < d 1. Initialize parameters w 2. For epoch = 1 T: 1. Shuffle the training set 2. For each training example (x i, y i )2 S: Treat this example as the entire dataset Compute the gradient of the loss L(NN x 0, w, y 0 ) Update: w w γ F L(NN x 0, w, y 0 )) 3. Return w 15
16 Stochastic gradient descent min w / L(NN x 0, w, y 0 ) 0 Given a training set S = {(x i, y i )}, x 2 < d 1. Initialize parameters w 2. For epoch = 1 T: 1. Shuffle the training set 2. For each training example (x i, y i )2 S: Treat this example as the entire dataset Compute the gradient of the loss L(NN x 0, w, y 0 ) Update: w w γ F L(NN x 0, w, y 0 )) 3. Return w t : learning rate, many tweaks possible 16
17 Stochastic gradient descent min w / L(NN x 0, w, y 0 ) 0 Given a training set S = {(x i, y i )}, x 2 < d 1. Initialize parameters w 2. For epoch = 1 T: 1. Shuffle the training set 2. For each training example (x i, y i )2 S: Treat this example as the entire dataset Compute the gradient of the loss L(NN x 0, w, y 0 ) The objective is not convex. Initialization can be important Update: w w γ F L(NN x 0, w, y 0 )) 3. Return w t : learning rate, many tweaks possible 17
18 Stochastic gradient descent min w / L(NN x 0, w, y 0 ) 0 Given a training set S = {(x i, y i )}, x 2 < d 1. Initialize parameters w 2. For epoch = 1 T: 1. Shuffle the training set 2. For each training example (x i, y i )2 S: Treat this example as the entire dataset Compute the gradient of the loss L(NN x 0, w, y 0 ) The objective is not convex. Initialization can be important Update: w w γ F L(NN x 0, w, y 0 )) 3. Return w Have we solved everything? t : learning rate, many tweaks possible 18
19 The derivative of the loss function? If the neural network is a differentiable function, we can find the gradient Or maybe its sub-gradient This is decided by the activation functions and the loss function It was easy for SVMs and logistic regression Only one layer L(NN x 0, w, y 0 ) But how do we find the sub-gradient of a more complex function? Eg: A recent paper used a ~150 layer neural network for image classification! We need an efficient algorithm: Backpropagation 19
20 The derivative of the loss function? If the neural network is a differentiable function, we can find the gradient Or maybe its sub-gradient This is decided by the activation functions and the loss function It was easy for SVMs and logistic regression Only one layer L(NN x 0, w, y 0 ) But how do we find the sub-gradient of a more complex function? Eg: A recent paper used a ~150 layer neural network for image classification! We need an efficient algorithm: Backpropagation 20
21 The derivative of the loss function? If the neural network is a differentiable function, we can find the gradient Or maybe its sub-gradient This is decided by the activation functions and the loss function It was easy for SVMs and logistic regression Only one layer L(NN x 0, w, y 0 ) But how do we find the sub-gradient of a more complex function? Eg: A recent paper used a ~150 layer neural network for image classification! We need an efficient algorithm: Backpropagation 21
22 Checkpoint Where are we If we have a neural network (structure, activations and weights), we can make a prediction for an input If we had the true label of the input, then we can define the loss for that example If we can take the derivative of the loss with respect to each of the weights, we can take a gradient step in SGD Questions? 22
23 Checkpoint Where are we If we have a neural network (structure, activations and weights), we can make a prediction for an input If we had the true label of the input, then we can define the loss for that example If we can take the derivative of the loss with respect to each of the weights, we can take a gradient step in SGD Questions? 23
24 Checkpoint Where are we If we have a neural network (structure, activations and weights), we can make a prediction for an input If we had the true label of the input, then we can define the loss for that example If we can take the derivative of the loss with respect to each of the weights, we can take a gradient step in SGD Questions? 24
25 Checkpoint Where are we If we have a neural network (structure, activations and weights), we can make a prediction for an input If we had the true label of the input, then we can define the loss for that example If we can take the derivative of the loss with respect to each of the weights, we can take a gradient step in SGD Questions? 25
26 Checkpoint Where are we If we have a neural network (structure, activations and weights), we can make a prediction for an input If we had the true label of the input, then we can define the loss for that example If we can take the derivative of the loss with respect to each of the weights, we can take a gradient step in SGD Questions? 26
27 Reminder: Chain rule for derivatives If z is a function of y and y is a function of x Then z is a function of x, as well Question: how to find GH GI Slide courtesy Richard Socher 27
28 Reminder: Chain rule for derivatives If z = a function of y 5 + a function of y 9, and the y 0 s are functions of x Then z is a function of x, as well Question: how to find GH GI Slide courtesy Richard Socher 28
29 Reminder: Chain rule for derivatives If z is a sum of functions of y 0 s, and the y 0 s are functions of x Then z is a function of x, as well Question: how to find GH GI Slide courtesy Richard Socher 29
30 Backpropagation L = 1 2 y y 9 output y = w w 6 55 z 5 + w 6 95 z 9 z 9 = σ(w 49 + w 59 x 5 + w 99 x 9 ) z 5 = σ(w 45 + w 55 x 5 + w 95 x 9 ) 30
31 Backpropagation L = 1 2 y y 9 output y = w w 6 55 z 5 + w 6 95 z 9 z 9 = σ(w 49 + w 59 x 5 + w 99 x 9 ) z 5 = σ(w 45 + w 55 x 5 + w 95 x 9 ) We want to compute GJ GK LM N and GJ GK LM O 31
32 Backpropagation Applying the chain rule to compute the gradient (And remembering partial computations along the way to speed up things) L = 1 2 y y 9 output y = w w 6 55 z 5 + w 6 95 z 9 z 9 = σ(w 49 + w 59 x 5 + w 99 x 9 ) z 5 = σ(w 45 + w 55 x 5 + w 95 x 9 ) We want to compute GJ GK LM N and GJ GK LM O 32
33 Backpropagation example Output layer output L = 1 2 y = w w 6 55 z 5 + w 6 95 z 9 y y 9 6 w 45 4 w 45 33
34 Backpropagation example Output layer output L = 1 2 y = w w 6 55 z 5 + w 6 95 z 9 y y 9 6 w 45 6 w 45 34
35 Backpropagation example Output layer output L = 1 2 y = w w 6 55 z 5 + w 6 95 z 9 y y 9 6 w 45 6 w 45 = y y 35
36 Backpropagation example Output layer output L = 1 2 y = w w 6 55 z 5 + w 6 95 z 9 y y 9 6 w 45 6 w 45 = y y w 45 6 = 1 36
37 Backpropagation example Output layer output L = 1 2 y = w w 6 55 z 5 + w 6 95 z 9 y y 9 6 w 55 4 w 55 37
38 Backpropagation example Output layer output L = 1 2 y = w w 6 55 z 5 + w 6 95 z 9 y y 9 6 w 55 4 w 55 38
39 Backpropagation example Output layer output L = 1 2 y = w w 6 55 z 5 + w 6 95 z 9 y y 9 6 w 55 4 w 55 = y y 39
40 Backpropagation example Output layer output L = 1 2 y = w w 6 55 z 5 + w 6 95 z 9 y y 9 6 w 55 4 w 55 = y y w 45 6 = z 5 40
41 Backpropagation example Output layer output L = 1 2 y = w w 6 55 z 5 + w 6 95 z 9 y y 9 6 w 55 6 w 55 = y y w 45 6 = z 5 We have already computed this partial derivative for the previous case Cache to speed up! 41
42 Backpropagation example Hidden layer derivatives L = 1 2 y y 9 output y = w w 6 55 z 5 + w 6 95 z 9 z 9 = σ(w 49 + w 59 x 5 + w 99 x 9 ) z 5 = σ(w 45 + w 55 x 5 + w 95 x 9 ) 42
43 Backpropagation example Hidden layer derivatives L = 1 2 y y 9 output y = w w 6 55 z 5 + w 6 95 z 9 z 9 = σ(w 49 + w 59 x 5 + w 99 x 9 ) z 5 = σ(w 45 + w 55 x 5 + w 95 x 9 ) We want GJ GKO QQ 43
44 Backpropagation example Hidden layer derivatives L = 1 2 y y 9 44
45 Backpropagation example Hidden layer y = w w 6 55 z 5 + w 6 95 z 9 w (w w 6 55 z 5 + w 6 95 z 9 ) 45
46 Backpropagation example Hidden layer y = w w 6 55 z 5 + w 6 95 z 9 (w 55 w (w w 6 55 z 5 + w 6 95 z 9 ) z 5 + w 95 6 z 9) 46
47 Backpropagation example Hidden layer y = w w 6 55 z 5 + w 6 95 z 9 (w 55 w (w w 6 55 z 5 + w 6 95 z 9 ) z 5 + w 95 6 z 9) 0 z 5 is not a function of w 99 47
48 Backpropagation example Hidden layer y = w w 6 55 z 5 + w 6 95 z 9 w 95 w (w z w 6 55 z 5 + w 6 95 z 9 ) 48
49 Backpropagation example Hidden layer z 9 = σ(w 49 + w 59 x 5 + w 99 x 9 ) w 95 w (w z w 6 55 z 5 + w 6 95 z 9 ) 49
50 Backpropagation example Hidden layer z 9 = σ(w 49 + w 59 x 5 + w 99 x 9 ) w 95 w (w z 9 Call this s 6 + w 6 55 z 5 + w 6 95 z 9 ) 50
51 Backpropagation example Hidden layer z 9 = σ(w 49 + w 59 x 5 + w 99 x 9 ) w 95 w (w z 9 w 95 6 z 9 s Call this s 6 + w 6 55 z 5 + w 6 95 z 9 ) s 51
52 Backpropagation example Hidden layer z 9 = σ(w 49 + w 59 x 5 + w 99 x 9 ) w 95 6 z 9 s s Call this s (From previous slide) 52
53 Backpropagation example Hidden layer z 9 = σ(w 49 + w 59 x 5 + w 99 x 9 ) w 95 6 z 9 s s Call this s Each of these partial derivatives is easy 53
54 Backpropagation example Hidden layer z 9 = σ(w 49 + w 59 x 5 + w 99 x 9 ) w 95 6 z 9 s s Call this s Each of these partial derivatives is easy = y y 54
55 Backpropagation example Hidden layer z 9 = σ(w 49 + w 59 x 5 + w 99 x 9 ) w 95 6 z 9 s s Call this s Each of these partial derivatives is easy = y y z 9 s = z 9(1 z 9 ) Why? Because z 9 s is the logistic function we have already seen 55
56 Backpropagation example Hidden layer z 9 = σ(w 49 + w 59 x 5 + w 99 x 9 ) w 95 6 z 9 s s Call this s Each of these partial derivatives is easy = y y z 9 s = z 9(1 z 9 ) s w = x 9 99 Why? Because z 9 s is the logistic function we have already seen 56
57 Backpropagation example Hidden layer z 9 = σ(w 49 + w 59 x 5 + w 99 x 9 ) w 95 6 z 9 s s Call this s Each of these partial derivatives is easy = y y z 9 s = z 9(1 z 9 ) s w = x 9 99 More important: We have already computed many of these partial derivatives because we are proceeding from top to bottom (i.e. backwards) Why? Because z 9 s is the logistic function we have already seen 57
58 The Backpropagation Algorithm The same algorithm works for multiple layers Repeated application of the chain rule for partial derivatives First perform forward pass from inputs to the output Compute loss From the loss, proceed backwards to compute partial derivatives using the chain rule Cache partial derivatives as you compute them Will be used for lower layers 58
59 Mechanizing learning Backpropagation gives you the gradient that will be used for gradient descent SGD gives us a generic learning algorithm Backpropagation is a generic method for computing partial derivatives A recursive algorithm that proceeds from the top of the network to the bottom Modern neural network libraries implement automatic differentiation using backpropagation Allows easy exploration of network architectures Don t have to keep deriving the gradients by hand each time 59
60 Stochastic gradient descent Given a training set S = {(x i, y i )}, x 2 < d 1. Initialize parameters w 2. For epoch = 1 T: 1. Shuffle the training set 2. For each training example (x i, y i )2 S: Treat this example as the entire dataset min w / L(NN x 0, w, y 0 ) 0 The objective is not convex. Initialization can be important Compute the gradient of the loss L(NN x 0, w, y 0 ) using backpropagation Update: w w γ F L(NN x 0, w, y 0 )) 3. Return w t : learning rate, many tweaks possible 60
Neural Networks: Introduction
Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More informationLeast Mean Squares Regression
Least Mean Squares Regression Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture Overview Linear classifiers What functions do linear classifiers express? Least Squares Method
More informationMachine Learning (CSE 446): Neural Networks
Machine Learning (CSE 446): Neural Networks Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 6, 2017 1 / 22 Admin No Wednesday office hours for Noah; no lecture Friday. 2 /
More informationSGD and Deep Learning
SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients
More informationIntro to Neural Networks and Deep Learning
Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs
More informationAdministration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6
Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects
More informationSupport Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017
Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,
More informationSupervised Learning. George Konidaris
Supervised Learning George Konidaris gdk@cs.brown.edu Fall 2017 Machine Learning Subfield of AI concerned with learning from data. Broadly, using: Experience To Improve Performance On Some Task (Tom Mitchell,
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationMachine Learning (CSE 446): Backpropagation
Machine Learning (CSE 446): Backpropagation Noah Smith c 2017 University of Washington nasmith@cs.washington.edu November 8, 2017 1 / 32 Neuron-Inspired Classifiers correct output y n L n loss hidden units
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationMachine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler
+ Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions
More informationSPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks
Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension
More informationNonlinear Classification
Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions
More informationClassification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses
More informationNeural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationApplied Machine Learning Lecture 5: Linear classifiers, continued. Richard Johansson
Applied Machine Learning Lecture 5: Linear classifiers, continued Richard Johansson overview preliminaries logistic regression training a logistic regression classifier side note: multiclass linear classifiers
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationLeast Mean Squares Regression. Machine Learning Fall 2018
Least Mean Squares Regression Machine Learning Fall 2018 1 Where are we? Least Squares Method for regression Examples The LMS objective Gradient descent Incremental/stochastic gradient descent Exercises
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationMachine Learning Lecture 10
Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes
More informationComputational statistics
Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial
More informationCSCI567 Machine Learning (Fall 2018)
CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due
More informationLecture 16: Introduction to Neural Networks
Lecture 16: Introduction to Neural Networs Instructor: Aditya Bhasara Scribe: Philippe David CS 5966/6966: Theory of Machine Learning March 20 th, 2017 Abstract In this lecture, we consider Bacpropagation,
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 4: Optimization (LFD 3.3, SGD) Cho-Jui Hsieh UC Davis Jan 22, 2018 Gradient descent Optimization Goal: find the minimizer of a function min f (w) w For now we assume f
More informationINTRODUCTION TO DATA SCIENCE
INTRODUCTION TO DATA SCIENCE JOHN P DICKERSON Lecture #13 3/9/2017 CMSC320 Tuesdays & Thursdays 3:30pm 4:45pm ANNOUNCEMENTS Mini-Project #1 is due Saturday night (3/11): Seems like people are able to do
More informationLecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning
Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function
More informationCSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer
CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene
More informationArtificial Neural Networks 2
CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationNeural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35
Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How
More informationStatistical NLP for the Web
Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks
More informationMachine Learning Lecture 12
Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability
More informationKernelized Perceptron Support Vector Machines
Kernelized Perceptron Support Vector Machines Emily Fox University of Washington February 13, 2017 What is the perceptron optimizing? 1 The perceptron algorithm [Rosenblatt 58, 62] Classification setting:
More informationUnderstanding How ConvNets See
Understanding How ConvNets See Slides from Andrej Karpathy Springerberg et al, Striving for Simplicity: The All Convolutional Net (ICLR 2015 workshops) CSC321: Intro to Machine Learning and Neural Networks,
More informationCSC321 Lecture 5 Learning in a Single Neuron
CSC321 Lecture 5 Learning in a Single Neuron Roger Grosse and Nitish Srivastava January 21, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 5 Learning in a Single Neuron January 21, 2015 1 / 14
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationLecture 6. Notes on Linear Algebra. Perceptron
Lecture 6. Notes on Linear Algebra. Perceptron COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture Notes on linear algebra Vectors
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationTopics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families
Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines
More informationArtifical Neural Networks
Neural Networks Artifical Neural Networks Neural Networks Biological Neural Networks.................................. Artificial Neural Networks................................... 3 ANN Structure...........................................
More informationLecture 2: Learning with neural networks
Lecture 2: Learning with neural networks Deep Learning @ UvA LEARNING WITH NEURAL NETWORKS - PAGE 1 Lecture Overview o Machine Learning Paradigm for Neural Networks o The Backpropagation algorithm for
More informationMultilayer Perceptrons and Backpropagation
Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:
More informationDeep Learning for NLP
Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationArtificial Neuron (Perceptron)
9/6/208 Gradient Descent (GD) Hantao Zhang Deep Learning with Python Reading: https://en.wikipedia.org/wiki/gradient_descent Artificial Neuron (Perceptron) = w T = w 0 0 + + w 2 2 + + w d d where
More informationPattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore
Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal
More information11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)
11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationIntroduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis
Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline
More informationBased on the original slides of Hung-yi Lee
Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services
More informationtext classification 3: neural networks
text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University
More information17 Neural Networks NEURAL NETWORKS. x XOR 1. x Jonathan Richard Shewchuk
94 Jonathan Richard Shewchuk 7 Neural Networks NEURAL NETWORKS Can do both classification & regression. [They tie together several ideas from the course: perceptrons, logistic regression, ensembles of
More informationCSC321 Lecture 4: Learning a Classifier
CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 28 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron
More informationIntroduction to Machine Learning (67577) Lecture 7
Introduction to Machine Learning (67577) Lecture 7 Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Solving Convex Problems using SGD and RLM Shai Shalev-Shwartz (Hebrew
More informationFeed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.
Neural Networks Oliver Schulte - CMPT 726 Bishop PRML Ch. 5 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationIntroduction to Neural Networks
CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character
More informationPerceptron & Neural Networks
School of Computer Science 10-701 Introduction to Machine Learning Perceptron & Neural Networks Readings: Bishop Ch. 4.1.7, Ch. 5 Murphy Ch. 16.5, Ch. 28 Mitchell Ch. 4 Matt Gormley Lecture 12 October
More informationMachine Learning Basics
Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each
More informationCS260: Machine Learning Algorithms
CS260: Machine Learning Algorithms Lecture 4: Stochastic Gradient Descent Cho-Jui Hsieh UCLA Jan 16, 2019 Large-scale Problems Machine learning: usually minimizing the training loss min w { 1 N min w {
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More informationLogistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Logistic Regression Matt Gormley Lecture 8 Feb. 12, 2018 1 10-601 Introduction
More informationIFT Lecture 7 Elements of statistical learning theory
IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationML4NLP Multiclass Classification
ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 13 Mar 1, 2018 1 Reminders Homework 5: Neural
More informationStochastic gradient descent; Classification
Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP
More informationRegML 2018 Class 8 Deep learning
RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x
More informationIntroduction to Deep Learning
Introduction to Deep Learning Some slides and images are taken from: David Wolfe Corne Wikipedia Geoffrey A. Hinton https://www.macs.hw.ac.uk/~dwcorne/teaching/introdl.ppt Feedforward networks for function
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More informationIntroduction to Machine Learning
Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationMachine Learning and Data Mining. Linear classification. Kalev Kask
Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q
More informationNeural Networks. Intro to AI Bert Huang Virginia Tech
Neural Networks Intro to AI Bert Huang Virginia Tech Outline Biological inspiration for artificial neural networks Linear vs. nonlinear functions Learning with neural networks: back propagation https://en.wikipedia.org/wiki/neuron#/media/file:chemical_synapse_schema_cropped.jpg
More informationLab 5: 16 th April Exercises on Neural Networks
Lab 5: 16 th April 01 Exercises on Neural Networks 1. What are the values of weights w 0, w 1, and w for the perceptron whose decision surface is illustrated in the figure? Assume the surface crosses the
More informationCSC321 Lecture 4: Learning a Classifier
CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 31 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16 Slides adapted from Jordan Boyd-Graber, Justin Johnson, Andrej Karpathy, Chris Ketelsen, Fei-Fei Li, Mike Mozer, Michael Nielson
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationNeural Networks biological neuron artificial neuron 1
Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input
More informationIn the Name of God. Lecture 11: Single Layer Perceptrons
1 In the Name of God Lecture 11: Single Layer Perceptrons Perceptron: architecture We consider the architecture: feed-forward NN with one layer It is sufficient to study single layer perceptrons with just
More informationIntroduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen
Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /
More information