Computational Graphs, and Backpropagation. Michael Collins, Columbia University

Size: px
Start display at page:

Download "Computational Graphs, and Backpropagation. Michael Collins, Columbia University"

Transcription

1 Computational Graphs, and Backpropagation Michael Collins, Columbia University

2 A Key Problem: Calculating Derivatives where and p(y x; θ, v) = exp (v(y) φ(x; θ) + γ y ) y Y exp (v(y ) φ(x; θ) + γ y ) φ(x; θ) = g(w x + b) m is an integer specifying the number of hidden units W R m d and b R m are the parameters in θ. g : R m R m is the transfer function Key question, given a training example (x i, y i ), define (1) L(θ, v) = log p(y i x i ; θ, v) How do we calculate derivatives such as dl(θ,v) dw k,j?

3 A Simple Version of Stochastic Gradient Descent (Continued) Algorithm: For t = 1... T Select an integer i uniformly at random from {1... n} Define L(θ, v) = log p(yi x i ; θ, v) For each parameter θj, θ j = θ j η t dl(θ,v) dθ j For each label y, for each parameter vk (y), v k (y) = v k (y) η t dl(θ,v) dv k (y) For each label y, γy = γ y η t dl(θ,v) dγ y Output: parameters θ and v

4 Overview Introduction The chain rule Derivatives in a single-layer neural network Computational graphs Backpropagation in computational graphs Justification for backpropagation

5 Partial Derivatives Assume we have scalar variables z 1, z 2... z n, and y, and a function f, and we define y = f(z 1, z 2,... z n ) Then the partial derivative of f with respect to z i is written as f(z 1, z 2,... z n ) z i We will also write the partial derivative as f y z i z 1...z m which can be read as the partial derivative of y with respect to z i, under function f, at values z 1... z m

6 Partial Derivatives (continued) We will also write the partial derivative as y z i f z 1...z m which can be read as the partial derivative of y with respect to z i, under function f, at values z 1... z m The notation including f is non-standard, but helps to alleviate a lot of potential confusion... We will sometimes drop f and/or z 1... z m when this is clear from context

7 The Chain Rule Assume we have equations y = f(z), z = g(x) Then Or equivalently, dh(x) dx y x h(x) = f(g(x)) h x = df(g(x)) dz = y z f g(x) dg(x) dx z x g x

8 The Chain Rule Assume we have equations y = f(z), z = g(x) then dh(x) dx h(x) = f(g(x)) = df(g(x)) dz dg(x) dx For example, assume f(z) = z 2 and g(x) = x 3. Assume in addition that x = 2. Then: z = x 3 = 8, dg(x) dx = 3x2 = 12, f(z) = z 2 = 64, df(z) dz = 2z = 16 from which it follows that dh(x) dx = = 192

9 The Chain Rule (continued) Assume we have equations y = f(z) z 1 = g 1 (x), z 2 = g 2 (x),..., z n = g n (x) For some functions f, g 1... g n, where z is a vector z R n, and x is a vector x R m. Define the function h(x) = f(g 1 (x), g 2 (x),... g n (x)) Then we have h(x) x j = i f(z) g i (x) z i x j where z is the vector g 1 (x), g 2 (x),... g n (x).

10 The Jacobian Matrix Assume we have a function f : R n R m that takes some vector x R n and then returns a vector y R m : y = f(x) The Jacobian J R m n is defined as the matrix with entries J i,j = f i(x) x j Hence the Jacobian contains all partial derivatives of the function.

11 The Jacobian Matrix Assume we have a function f : R n R m that takes some vector x R n and then returns a vector y R m : y = f(x) The Jacobian J R m n is defined as the matrix with entries J i,j = f i(x) x j Hence the Jacobian contains all partial derivatives of the function. We will also use f y x x for vectors y and x to refer to the Jacobian matrix with respect to a function f mapping x to y, evaluated at x

12 An Example of the Jacobian: The LOG-SOFTMAX Function We define LS : R K R K to be the function such that for k = 1... K, ( ) exp{lk } LS k (l) = log = l k log exp{l k } k exp{l k } k The Jacobian then has entries [ ] LS(l) = LS k(l) l k,k l k = [[k = k ]] exp{l k } k exp{l k } where [[k = k ]] = 1 if k = k, 0 otherwise.

13 The Chain Rule (continued) Assume we have equations y = f(z 1, z 2,... z n ) z i = g i (x 1, x 2,... x m ) for i = 1... n where y is a vector, z i for all i are vectors, and x j for all j are vectors. Define h(x 1... x m ) to be the composition of f and g, so y = h(x 1... x m ). Then h y x j = }{{} d(y) d(x j ) n f y z i zi g i x }{{} j }{{} d(y) d(z i ) d(z i ) d(x j ) i=1 where d(v) is the dimensionality of vector v.

14 Overview Introduction The chain rule Derivatives in a single-layer neural network Computational graphs Backpropagation in computational graphs Justification for backpropagation

15 Derivatives in a Feedforward Network Definitions: The set of possible labels is Y. We define K = Y. g : R m R m is a transfer function. We define LS = LOG-SOFTMAX. Inputs: x i R d, y i Y, W R m d, b R m, V R K m, γ R K. Equations: z R m = W x i + b h R m = g(z) l R K = V h + γ q R K = LS(l) o R = q yi

16 If W R m d, z R m, the Jacobian z W is a matrix of dimension m m where m = (m d) is the number of entries in W. So we treat W as a vector with (m d) elements. Jacobian Involving Matrices Equations: z R m = W x i + b h R m = g(z) l R K = V h + γ q R K = LS(l) o R = q yi

17 Local Functions Equations: z R m = W x i + b h R m = g(z) l R K = V h + γ q R K = LS(l) o R = q yi Leaf variables: W, x i, b, V, γ, y i Intermediate variables: z, h, l, q Output variable: o Each intermediate variable has a Local function: f z (W, x i, b) = W x i + b, f h (z) = g(z), f l (h) = V h + γ,...

18 Global Functions Equations: z R m = W x i + b h R m = g(z) l R K = V h + γ q R K = LS(l) o R = q yi Leaf variables: W, x i, b, V, γ, y i Intermediate variables: z, h, l, q Output variable: o Global functions: for the output variable o, we define f o to be the function that maps the leaf values W, x i, b, V, γ, y i to the output value o = f o (W, x i, b, V, γ, y i ). We use similar definitions for f z (W, x i, b, V, γ, y i ), f h (W, x i, b, V, γ, y i ), etc.

19 Applying the Chain Rule Equations: z R m = W x i + b h R m = g(z) l R K = V h + γ q R K = LS(l) Derivative: o f o W = o R = q yi

20 Applying the Chain Rule Equations: z R m = W x i + b h R m = g(z) l R K = V h + γ q R K = LS(l) Derivative: o f o W = o f o q q W f q o R = q yi

21 Applying the Chain Rule Equations: z R m = W x i + b h R m = g(z) l R K = V h + γ q R K = LS(l) Derivative: o f o W = o f o q q W = o f o q q l f q f q l W f l o R = q yi

22 Applying the Chain Rule Equations: z R m = W x i + b h R m = g(z) l R K = V h + γ q R K = LS(l) Derivative: o f o W = o f o q q f q W = o f o q q f q l l f l W = o f o q q f q l l f l h h W f h o R = q yi

23 Applying the Chain Rule Equations: z R m = W x i + b h R m = g(z) l R K = V h + γ q R K = LS(l) o R = q yi Derivative: o f o W = o f o q q f q W = o f o q q f q l l f l W = o f o q q f q l l f l h h f h W = o f o q q f q l l f l h h f h z z W f z

24 Applying the Chain Rule Equations: z R m = W x i + b h R m = g(z) l R K = V h + γ q R K = LS(l) o R = q yi Derivative: o f o W = o f o q q f q W = o f o q q f q l l f l W = o f o q q f q l l f l h h f h W = o f o q q f q l l f l h h f h z z W = o f o q q f q l l f l h h f h z z W f z f z

25 Another Derivative Equations: z R m = W x i + b h R m = g(z) l R K = V h + γ q R K = LS(l) o R = q yi o V f o = o f o q q f q l l v f l

26 A Computational Graph Equations: z R m = W x i + b h R m = g(z) l R K = V h + γ q R K = LS(l) o R = q yi Derivatives: o f o V = o f o q q f q l l f l v o f o W = o f o q q f q l l f l h h f h z z W f z

27 Overview Introduction The chain rule Derivatives in a single-layer neural network Computational graphs Backpropagation in computational graphs Justification for backpropagation

28 Computational Graphs: a Formal Definition A computational graph consists of: An integer n specifying the number of vertices in the graph. An integer l < n specifying the number of leaves in the graph. Vertices 1... l are leaves in the graph. Vertex n is a special output vertex. A set of directed edges E. Each member of E is an ordered pair (j, i) where j {1... n}, i {(l + 1)... n}, and i > j. For any i we define π(i) to be the set of parents of i in the graph: π(i) = {j : (j, i) E}

29 Computational Graphs (continued) A variable u i R d i is associated with each vertex in the graph. Here d i for i = 1... n specifies the dimensionality of u i. We assume d n = 1, hence the output variable is a scalar. A function f i is associated with each non-leaf vertex in the graph (i {(l + 1)... n}). The function maps a vector A i defined as A i = u j j π(i) to a vector f i (A i ) R d i

30 An Example Define n = 4, l = 2 Define d i = 1 for all i (all variables are scalars) Define E = {(1, 3), (2, 3), (2, 4), (3, 4)} Define f 3 (u 1, u 2 ) = u 1 + u 2 f 4 (u 2, u 3 ) = u 2 u 3

31 Two Questions Note that the computational graph defines a function, which we call f n, from the values of the leaf variables to the output variable: u n = f n (u 1... u l ) Given a computational graph, and values for the leaf variables u 1... u l : 1. How do we compute the output u n? 2. How do we compute the partial derivatives for all i {1... l}? u n u i f n

32 Forward Computation Input: Values for leaf variables u 1... u l Algorithm: For i = (l + 1)... n where u i = f i (A i ) A i = u j j π(i)

33 An Example Define n = 4, l = 2 Define d i = 1 for all i (all variables are scalars) Define E = {(1, 3), (2, 3), (2, 4), (3, 4)} Define f 3 (u 1, u 2 ) = u 1 + u 2 f 4 (u 2, u 3 ) = u 2 u 3

34 Defining and Calculating Derivatives For any k {(l + 1)... n}, there is a function f k such that u k = f k (u 1, u 2,... u l ) We want to calculate for j = 1... l u n u j f n u 1...u l

35 Computational Graphs (continued) A function J j i is associated with each edge (j, i) E. The function maps a vector A i defined as A i = u j j π(i) to a matrix J j i (A i ) R d i d j. J j i (A i ) = f i (A i ) u j = ui u j f i A i

36 Forward Pass Input: Values for leaf variables u 1... u l Algorithm: For i = (l + 1)... n A i = u j j π(i) u i = f i (A i )

37 Backward Pass p n = 1 For j = (n 1)... 1: p j = p i J j i (A i ) i:(j,i) E Output: p i for i = 1... l satisfying p i = o u i f n u 1...u l

38 An Example p n = 1 For j = (n 1)... 1: p j = p i J j i (A i ) i:(j,i) E

39 Overview Introduction The chain rule Derivatives in a single-layer neural network Computational graphs Backpropagation in computational graphs Justification for backpropagation

40 Products of Jacobians over Paths in the Graph A directed path between vertices j and k is a sequence of edges (i 1, i 2 ), (i 2, i 3 ),... (i n 1, i n ) with n 2 such that each edge is in E, and i 1 = j, and i n = k. For any j, k, we write P(j, k) to denote the set of all directed paths between j and k For convenience we define D a b = J a b (A b ) for all edges (a, b). Theorem: for any j {1... l}, k {(l + 1)... n}, u k u j f k = p P(j,k) (a,b) p D a b

41 An Example u k u j f k = p P(j,k) (a,b) p D a b

42 Proof Sketch For any j, j, k, we write P(j, j, k) to denote the set of all directed paths between j and k such that the last edge in the sequence is (j, k). Proof sketch: By induction over the graph. By the chain rule we have u k u j f k = = = j :(j,k) E j :(j,k) E D j k uj u j D j k j :(j,k) E p P(j,j,k) (a,b) p = p P(j,k) (a,b) p D a b f j p P(j,j ) (a,b) p D a b D a b

43 Backward Pass p n = 1 For j = (n 1)... 1: p j = p i D j i i:(j,i) E Output: p i for i = 1... l satisfying p i = o u i f o u 1,u 2,...u l

44 Correctness of the Backward Pass Theorem: For all p i we have p i = p P(i,n) (a,b) p It follows that for any i {1... l}, p i = un u i f n D a b

45 Proof Theorem: For all p i we have p i = p P(i,n) (a,b) p D a b Proof sketch: by induction on i = n, i = (n 1), i = (n 2),... i = 1. For i = n we have p n = 1 so the proposition is true. For j = (n 1)... 1 we have p j = p i D j i i:(j,i) E = i:(j,i) E p P(i,n) (a,b) p D a b D j i = p P(j,n) (a,b) p D a b

Computational Graphs, and Backpropagation

Computational Graphs, and Backpropagation Chapter 1 Computational Graphs, and Backpropagation (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction We now describe the backpropagation algorithm for calculation of derivatives

More information

Feedforward Neural Networks. Michael Collins, Columbia University

Feedforward Neural Networks. Michael Collins, Columbia University Feedforward Neural Networks Michael Collins, Columbia University Recap: Log-linear Models A log-linear model takes the following form: p(y x; v) = exp (v f(x, y)) y Y exp (v f(x, y )) f(x, y) is the representation

More information

Feedforward Neural Networks

Feedforward Neural Networks Feedforward Neural Networks Michael Collins 1 Introduction In the previous notes, we introduced an important class of models, log-linear models. In this note, we describe feedforward neural networks, which

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples

8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples 8-1: Backpropagation Prof. J.C. Kao, UCLA Backpropagation Chain rule for the derivatives Backpropagation graphs Examples 8-2: Backpropagation Prof. J.C. Kao, UCLA Motivation for backpropagation To do gradient

More information

Neural Networks: Backpropagation

Neural Networks: Backpropagation Neural Networks: Backpropagation Seung-Hoon Na 1 1 Department of Computer Science Chonbuk National University 2018.10.25 eung-hoon Na (Chonbuk National University) Neural Networks: Backpropagation 2018.10.25

More information

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph

More information

Neural networks COMS 4771

Neural networks COMS 4771 Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a

More information

Word Embeddings in Feedforward Networks; Tagging and Dependency Parsing using Feedforward Networks. Michael Collins, Columbia University

Word Embeddings in Feedforward Networks; Tagging and Dependency Parsing using Feedforward Networks. Michael Collins, Columbia University Word Embeddings in Feedforward Networks; Tagging and Dependency Parsing using Feedforward Networks Michael Collins, Columbia University Overview Introduction Multi-layer feedforward networks Representing

More information

Ch.6 Deep Feedforward Networks (2/3)

Ch.6 Deep Feedforward Networks (2/3) Ch.6 Deep Feedforward Networks (2/3) 16. 10. 17. (Mon.) System Software Lab., Dept. of Mechanical & Information Eng. Woonggy Kim 1 Contents 6.3. Hidden Units 6.3.1. Rectified Linear Units and Their Generalizations

More information

Lecture 4 Backpropagation

Lecture 4 Backpropagation Lecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 5, 2017 Things we will look at today More Backpropagation Still more backpropagation Quiz

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 13 Mar 1, 2018 1 Reminders Homework 5: Neural

More information

Backpropagation in Neural Network

Backpropagation in Neural Network Backpropagation in Neural Network Victor BUSA victorbusa@gmailcom April 12, 2017 Well, again, when I started to follow the Stanford class as a self-taught person One thing really bother me about the computation

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

Computing Neural Network Gradients

Computing Neural Network Gradients Computing Neural Network Gradients Kevin Clark 1 Introduction The purpose of these notes is to demonstrate how to quickly compute neural network gradients in a completely vectorized way. It is complementary

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole

More information

To keep things simple, let us just work with one pattern. In that case the objective function is defined to be. E = 1 2 xk d 2 (1)

To keep things simple, let us just work with one pattern. In that case the objective function is defined to be. E = 1 2 xk d 2 (1) Backpropagation To keep things simple, let us just work with one pattern. In that case the objective function is defined to be E = 1 2 xk d 2 (1) where K is an index denoting the last layer in the network

More information

Calculus & Analytic Geometry I

Calculus & Analytic Geometry I TQS 124 Autumn 2008 Quinn Calculus & Analytic Geometry I The Derivative: Analytic Viewpoint Derivative of a Constant Function. For c a constant, the derivative of f(x) = c equals f (x) = Derivative of

More information

Vector, Matrix, and Tensor Derivatives

Vector, Matrix, and Tensor Derivatives Vector, Matrix, and Tensor Derivatives Erik Learned-Miller The purpose of this document is to help you learn to take derivatives of vectors, matrices, and higher order tensors (arrays with three dimensions

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

How the backpropagation algorithm works Srikumar Ramalingam School of Computing University of Utah

How the backpropagation algorithm works Srikumar Ramalingam School of Computing University of Utah How the backpropagation agorithm works Srikumar Ramaingam Schoo of Computing University of Utah Reference Most of the sides are taken from the second chapter of the onine book by Michae Nieson: neuranetworksanddeepearning.com

More information

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples Back-Propagation Algorithm Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples 1 Inner-product net =< w, x >= w x cos(θ) net = n i=1 w i x i A measure

More information

Neural Networks. Intro to AI Bert Huang Virginia Tech

Neural Networks. Intro to AI Bert Huang Virginia Tech Neural Networks Intro to AI Bert Huang Virginia Tech Outline Biological inspiration for artificial neural networks Linear vs. nonlinear functions Learning with neural networks: back propagation https://en.wikipedia.org/wiki/neuron#/media/file:chemical_synapse_schema_cropped.jpg

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Neural Networks in Structured Prediction. November 17, 2015

Neural Networks in Structured Prediction. November 17, 2015 Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving

More information

Backpropagation: The Good, the Bad and the Ugly

Backpropagation: The Good, the Bad and the Ugly Backpropagation: The Good, the Bad and the Ugly The Norwegian University of Science and Technology (NTNU Trondheim, Norway keithd@idi.ntnu.no October 3, 2017 Supervised Learning Constant feedback from

More information

Learning Neural Networks

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex decision boundaries Variable size. Any boolean function can be represented. Hidden units can be interpreted as new features Deterministic

More information

CSC 411 Lecture 10: Neural Networks

CSC 411 Lecture 10: Neural Networks CSC 411 Lecture 10: Neural Networks Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 10-Neural Networks 1 / 35 Inspiration: The Brain Our brain has 10 11

More information

More on Neural Networks

More on Neural Networks More on Neural Networks Yujia Yan Fall 2018 Outline Linear Regression y = Wx + b (1) Linear Regression y = Wx + b (1) Polynomial Regression y = Wφ(x) + b (2) where φ(x) gives the polynomial basis, e.g.,

More information

CSC321 Lecture 6: Backpropagation

CSC321 Lecture 6: Backpropagation CSC321 Lecture 6: Backpropagation Roger Grosse Roger Grosse CSC321 Lecture 6: Backpropagation 1 / 21 Overview We ve seen that multilayer neural networks are powerful. But how can we actually learn them?

More information

Automatic Differentiation and Neural Networks

Automatic Differentiation and Neural Networks Statistical Machine Learning Notes 7 Automatic Differentiation and Neural Networks Instructor: Justin Domke 1 Introduction The name neural network is sometimes used to refer to many things (e.g. Hopfield

More information

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning  Ian Goodfellow Last updated Deep Feedforward Networks Lecture slides for Chapter 6 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last updated 2016-10-04 Roadmap Example: Learning XOR Gradient-Based Learning Hidden Units

More information

Recurrent Neural Networks and Logic Programs

Recurrent Neural Networks and Logic Programs Recurrent Neural Networks and Logic Programs The Very Idea Propositional Logic Programs Propositional Logic Programs and Learning Propositional Logic Programs and Modalities First Order Logic Programs

More information

NN V: The generalized delta learning rule

NN V: The generalized delta learning rule NN V: The generalized delta learning rule We now focus on generalizing the delta learning rule for feedforward layered neural networks. The architecture of the two-layer network considered below is shown

More information

Induction, sequences, limits and continuity

Induction, sequences, limits and continuity Induction, sequences, limits and continuity Material covered: eclass notes on induction, Chapter 11, Section 1 and Chapter 2, Sections 2.2-2.5 Induction Principle of mathematical induction: Let P(n) be

More information

Lecture 2: Learning with neural networks

Lecture 2: Learning with neural networks Lecture 2: Learning with neural networks Deep Learning @ UvA LEARNING WITH NEURAL NETWORKS - PAGE 1 Lecture Overview o Machine Learning Paradigm for Neural Networks o The Backpropagation algorithm for

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November. COGS Q250 Fall 2012 Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November. For the first two questions of the homework you will need to understand the learning algorithm using the delta

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Introduction to Machine Learning Spring 2018 Note Neural Networks

Introduction to Machine Learning Spring 2018 Note Neural Networks CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Introduction to Machine Learning (67577)

Introduction to Machine Learning (67577) Introduction to Machine Learning (67577) Shai Shalev-Shwartz School of CS and Engineering, The Hebrew University of Jerusalem Deep Learning Shai Shalev-Shwartz (Hebrew U) IML Deep Learning Neural Networks

More information

Gradient Descent Training Rule: The Details

Gradient Descent Training Rule: The Details Gradient Descent Training Rule: The Details 1 For Perceptrons The whole idea behind gradient descent is to gradually, but consistently, decrease the output error by adjusting the weights. The trick is

More information

Solutions to Homework 11

Solutions to Homework 11 Solutions to Homework 11 Read the statement of Proposition 5.4 of Chapter 3, Section 5. Write a summary of the proof. Comment on the following details: Does the proof work if g is piecewise C 1? Or did

More information

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University Deep Feedforward Networks Seung-Hoon Na Chonbuk National University Neural Network: Types Feedforward neural networks (FNN) = Deep feedforward networks = multilayer perceptrons (MLP) No feedback connections

More information

Deep Feedforward Networks. Sargur N. Srihari

Deep Feedforward Networks. Sargur N. Srihari Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Gradient operator. In our calculation of dφ along the vector ds, we see that it can be described as the scalar product

Gradient operator. In our calculation of dφ along the vector ds, we see that it can be described as the scalar product Gradient operator In our calculation of dφ along the vector ds, we see that it can be described as the scalar product ( φ dφ = x î + φ y ĵ + φ ) z ˆk ( ) u x dsî + u y dsĵ + u z dsˆk We take dφ = φ ds

More information

CSC321 Lecture 5 Learning in a Single Neuron

CSC321 Lecture 5 Learning in a Single Neuron CSC321 Lecture 5 Learning in a Single Neuron Roger Grosse and Nitish Srivastava January 21, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 5 Learning in a Single Neuron January 21, 2015 1 / 14

More information

Test 2 Review Math 1111 College Algebra

Test 2 Review Math 1111 College Algebra Test 2 Review Math 1111 College Algebra 1. Begin by graphing the standard quadratic function f(x) = x 2. Then use transformations of this graph to graph the given function. g(x) = x 2 + 2 *a. b. c. d.

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

CHAIN RULE: DAY 2 WITH TRIG FUNCTIONS. Section 2.4A Calculus AP/Dual, Revised /30/2018 1:44 AM 2.4A: Chain Rule Day 2 1

CHAIN RULE: DAY 2 WITH TRIG FUNCTIONS. Section 2.4A Calculus AP/Dual, Revised /30/2018 1:44 AM 2.4A: Chain Rule Day 2 1 CHAIN RULE: DAY WITH TRIG FUNCTIONS Section.4A Calculus AP/Dual, Revised 018 viet.dang@humbleisd.net 7/30/018 1:44 AM.4A: Chain Rule Day 1 THE CHAIN RULE A. d dx f g x = f g x g x B. If f(x) is a differentiable

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Multilayer Perceptrons and Backpropagation

Multilayer Perceptrons and Backpropagation Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:

More information

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form LECTURE # - EURAL COPUTATIO, Feb 4, 4 Linear Regression Assumes a functional form f (, θ) = θ θ θ K θ (Eq) where = (,, ) are the attributes and θ = (θ, θ, θ ) are the function parameters Eample: f (, θ)

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear

More information

MS 3011 Exercises. December 11, 2013

MS 3011 Exercises. December 11, 2013 MS 3011 Exercises December 11, 2013 The exercises are divided into (A) easy (B) medium and (C) hard. If you are particularly interested I also have some projects at the end which will deepen your understanding

More information

How the backpropagation algorithm works Srikumar Ramalingam School of Computing University of Utah

How the backpropagation algorithm works Srikumar Ramalingam School of Computing University of Utah How the backpropagation agorithm works Srikumar Ramaingam Schoo of Computing University of Utah Reference Most of the sides are taken from the second chapter of the onine book by Michae Nieson: neuranetworksanddeepearning.com

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Math 110 Midterm 1 Study Guide October 14, 2013

Math 110 Midterm 1 Study Guide October 14, 2013 Name: For more practice exercises, do the study set problems in sections: 3.4 3.7, 4.1, and 4.2. 1. Find the domain of f, and express the solution in interval notation. (a) f(x) = x 6 D = (, ) or D = R

More information

1) The line has a slope of ) The line passes through (2, 11) and. 6) r(x) = x + 4. From memory match each equation with its graph.

1) The line has a slope of ) The line passes through (2, 11) and. 6) r(x) = x + 4. From memory match each equation with its graph. Review Test 2 Math 1314 Name Write an equation of the line satisfying the given conditions. Write the answer in standard form. 1) The line has a slope of - 2 7 and contains the point (3, 1). Use the point-slope

More information

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

Neural Network Back-propagation HYUNG IL KOO

Neural Network Back-propagation HYUNG IL KOO Neural Network Back-propagation HYUNG IL KOO Hidden Layer Representations Backpropagation has an ability to discover useful intermediate representations at the hidden unit layers inside the networks which

More information

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run

More information

worked out from first principles by parameterizing the path, etc. If however C is a A path C is a simple closed path if and only if the starting point

worked out from first principles by parameterizing the path, etc. If however C is a A path C is a simple closed path if and only if the starting point III.c Green s Theorem As mentioned repeatedly, if F is not a gradient field then F dr must be worked out from first principles by parameterizing the path, etc. If however is a simple closed path in the

More information

Derivatives of Functions from R n to R m

Derivatives of Functions from R n to R m Derivatives of Functions from R n to R m Marc H Mehlman Department of Mathematics University of New Haven 20 October 2014 1 Matrices Definition 11 A m n matrix, A, is an array of numbers having m rows

More information

Lecture 2: Modular Learning

Lecture 2: Modular Learning Lecture 2: Modular Learning Deep Learning @ UvA MODULAR LEARNING - PAGE 1 Announcement o C. Maddison is coming to the Deep Vision Seminars on the 21 st of Sep Author of concrete distribution, co-author

More information

MTH4101 CALCULUS II REVISION NOTES. 1. COMPLEX NUMBERS (Thomas Appendix 7 + lecture notes) ax 2 + bx + c = 0. x = b ± b 2 4ac 2a. i = 1.

MTH4101 CALCULUS II REVISION NOTES. 1. COMPLEX NUMBERS (Thomas Appendix 7 + lecture notes) ax 2 + bx + c = 0. x = b ± b 2 4ac 2a. i = 1. MTH4101 CALCULUS II REVISION NOTES 1. COMPLEX NUMBERS (Thomas Appendix 7 + lecture notes) 1.1 Introduction Types of numbers (natural, integers, rationals, reals) The need to solve quadratic equations:

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation Neural Networks Plan Perceptron Linear discriminant Associative memories Hopfield networks Chaotic networks Multilayer perceptron Backpropagation Perceptron Historically, the first neural net Inspired

More information

CSCI567 Machine Learning (Fall 2018)

CSCI567 Machine Learning (Fall 2018) CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due

More information

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered

More information

A Lower Bound for the Size of Syntactically Multilinear Arithmetic Circuits

A Lower Bound for the Size of Syntactically Multilinear Arithmetic Circuits A Lower Bound for the Size of Syntactically Multilinear Arithmetic Circuits Ran Raz Amir Shpilka Amir Yehudayoff Abstract We construct an explicit polynomial f(x 1,..., x n ), with coefficients in {0,

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information

Multiclass Logistic Regression

Multiclass Logistic Regression Multiclass Logistic Regression Sargur. Srihari University at Buffalo, State University of ew York USA Machine Learning Srihari Topics in Linear Classification using Probabilistic Discriminative Models

More information

y(x n, w) t n 2. (1)

y(x n, w) t n 2. (1) Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,

More information

Neural Network Language Modeling

Neural Network Language Modeling Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith Course Project Sign up your course project In-class presentation

More information

Integration - Past Edexcel Exam Questions

Integration - Past Edexcel Exam Questions Integration - Past Edexcel Exam Questions 1. (a) Given that y = 5x 2 + 7x + 3, find i. - ii. - (b) ( 1 + 3 ) x 1 x dx. [4] 2. Question 2b - January 2005 2. The gradient of the curve C is given by The point

More information

Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation

Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation Steve Renals Machine Learning Practical MLP Lecture 5 16 October 2018 MLP Lecture 5 / 16 October 2018 Deep Neural Networks

More information

Neural Networks (and Gradient Ascent Again)

Neural Networks (and Gradient Ascent Again) Neural Networks (and Gradient Ascent Again) Frank Wood April 27, 2010 Generalized Regression Until now we have focused on linear regression techniques. We generalized linear regression to include nonlinear

More information

Analytic Geometry and Calculus I Exam 1 Practice Problems Solutions 2/19/7

Analytic Geometry and Calculus I Exam 1 Practice Problems Solutions 2/19/7 Analytic Geometry and Calculus I Exam 1 Practice Problems Solutions /19/7 Question 1 Write the following as an integer: log 4 (9)+log (5) We have: log 4 (9)+log (5) = ( log 4 (9)) ( log (5)) = 5 ( log

More information

Chapter 3. Differentiable Mappings. 1. Differentiable Mappings

Chapter 3. Differentiable Mappings. 1. Differentiable Mappings Chapter 3 Differentiable Mappings 1 Differentiable Mappings Let V and W be two linear spaces over IR A mapping L from V to W is called a linear mapping if L(u + v) = Lu + Lv for all u, v V and L(λv) =

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient

More information

arxiv: v3 [cs.lg] 14 Jan 2018

arxiv: v3 [cs.lg] 14 Jan 2018 A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe

More information

Electromagnetism HW 1 math review

Electromagnetism HW 1 math review Electromagnetism HW math review Problems -5 due Mon 7th Sep, 6- due Mon 4th Sep Exercise. The Levi-Civita symbol, ɛ ijk, also known as the completely antisymmetric rank-3 tensor, has the following properties:

More information

Preliminary Exam 2016 Solutions to Morning Exam

Preliminary Exam 2016 Solutions to Morning Exam Preliminary Exam 16 Solutions to Morning Exam Part I. Solve four of the following five problems. Problem 1. Find the volume of the ice cream cone defined by the inequalities x + y + z 1 and x + y z /3

More information

Error Backpropagation

Error Backpropagation Error Backpropagation Sargur Srihari 1 Topics in Error Backpropagation Terminology of backpropagation 1. Evaluation of Error function derivatives 2. Error Backpropagation algorithm 3. A simple example

More information

Lecture 6: Backpropagation

Lecture 6: Backpropagation Lecture 6: Backpropagation Roger Grosse 1 Introduction So far, we ve seen how to train shallow models, where the predictions are computed as a linear function of the inputs. We ve also observed that deeper

More information

Fisher Information in Gaussian Graphical Models

Fisher Information in Gaussian Graphical Models Fisher Information in Gaussian Graphical Models Jason K. Johnson September 21, 2006 Abstract This note summarizes various derivations, formulas and computational algorithms relevant to the Fisher information

More information