Solutions. Part I Logistic regression backpropagation with a single training example

Similar documents
Neural Networks: Backpropagation

ECE521 Lecture 7/8. Logistic Regression

CSC321 Lecture 6: Backpropagation

Multilayer Perceptron

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

CSC 411 Lecture 10: Neural Networks

Logistic Regression & Neural Networks

Error Backpropagation

Intro to Neural Networks and Deep Learning

Neural Network Training

Machine Learning (CSE 446): Neural Networks

Lecture 13 Back-propagation

y(x n, w) t n 2. (1)

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Machine Learning Basics III

ECE521 Lectures 9 Fully Connected Neural Networks

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Neural Networks and Deep Learning

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Artificial Neural Networks. MGS Lecture 2

CSC321 Lecture 4: Learning a Classifier

1 What a Neural Network Computes

Deep Feedforward Networks

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Reading Group on Deep Learning Session 1

CSC321 Lecture 4: Learning a Classifier

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CSC242: Intro to AI. Lecture 21

CS 453X: Class 20. Jacob Whitehill

Regression and Classification" with Linear Models" CMPSCI 383 Nov 15, 2011!

Neural networks COMS 4771

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

A Tutorial On Backward Propagation Through Time (BPTT) In The Gated Recurrent Unit (GRU) RNN

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

CSCI567 Machine Learning (Fall 2018)

Lecture 2: Modular Learning

17 Neural Networks NEURAL NETWORKS. x XOR 1. x Jonathan Richard Shewchuk

Feedforward Neural Nets and Backpropagation

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 2: Learning with neural networks

Lecture 5: Logistic Regression. Neural Networks

4. Multilayer Perceptrons

Input layer. Weight matrix [ ] Output layer

Artificial Neural Networks 2

Neural Networks in Structured Prediction. November 17, 2015

Statistical Machine Learning from Data

Backpropagation Neural Net

Lecture 3 Feedforward Networks and Backpropagation

Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville

Error Functions & Linear Regression (1)

Lecture 6: Backpropagation

CS545 Contents XVI. Adaptive Control. Reading Assignment for Next Class. u Model Reference Adaptive Control. u Self-Tuning Regulators

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)

How to do backpropagation in a brain

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Lecture 17: Neural Networks and Deep Learning

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Learning Deep Architectures for AI. Part I - Vijay Chakilam

Logistic Regression. Stochastic Gradient Descent

Statistical Machine Learning

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

More on Neural Networks

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Lecture 4 Backpropagation

18.6 Regression and Classification with Linear Models

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Lecture 3 Feedforward Networks and Backpropagation

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Introduction to Neural Networks

Advanced Machine Learning

Machine Learning Lecture 10

DATA MINING AND MACHINE LEARNING. Lecture 4: Linear models for regression and classification Lecturer: Simone Scardapane

Learning Neural Networks

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification. Sandro Cumani. Politecnico di Torino

6.036: Midterm, Spring Solutions

Neural Networks: Backpropagation

ECE521: Inference Algorithms and Machine Learning University of Toronto. Assignment 1: k-nn and Linear Regression

Artificial Neural Networks

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

Topic 3: Neural Networks

Neural Nets Supervised learning

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

You submitted this quiz on Wed 16 Apr :18 PM IST. You got a score of 5.00 out of 5.00.

CSC 578 Neural Networks and Deep Learning

Supervised Learning. George Konidaris

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Neural Network Tutorial & Application in Nuclear Physics. Weiguang Jiang ( 蒋炜光 ) UTK / ORNL

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

Lecture 5 Neural models for NLP

Machine Learning. Lecture 04: Logistic and Softmax Regression. Nevin L. Zhang

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Neural Networks (Part 1) Goals for the lecture

Transcription:

Solutions Part I Logistic regression backpropagation with a single training example In this part, you are using the Stochastic Gradient Optimizer to train your Logistic Regression. Consequently, the gradients leading to the parameter updates are computed on a single training example. a) Forward propagation equations Before getting into the details of backpropagation, let s spend a few minutes on the forward pass. For one training example x = (x 1, x 2,..., x n) of dimension n, the forward propagation is: z = wx + b ŷ = a = σ(z) L = (ylog(ŷ) + (1 y) log(1 ŷ)) b) Dimensions of the variables in the forward propagation equations It s important to note the shapes of the quantities in the previous equations: x = (n,1), w = (1,n), b = (1,1), z = (1,1), a = (1,1), L is a scalar. c) Backpropagation equations Training our model means updating our weights and biases, W and b, using the gradient of the loss with respect to these parameters. At every step, we need to calculate : To do this, we will apply the chain rule. So we need to calculate the following derivatives :

We will calculate those derivatives to get an expression of and. Why did we choose X.T rather than X? We can have a look at the following dimensions without forgetting that the dimensions of the derivative of a term are the same as the dimensions of the term. (1, n) (1, n) (1, 1) (1, n) Then :

Part II Backpropagation for a batch of m training examples In this part, you are using a Batch Gradient Optimization to train your Logistic Regression. Consequently, the gradients leading to the parameter updates are computed on the entire batch of m training examples. a) Write down the forward propagation equations leading to J. b) Analyze the dimensions of all the variables in your forward propagation equations. c) Write down the backpropagation equations to compute. a) Forward propagation equations Before getting into the details of backpropagation, let s study the forward pass. For a batch of m training examples, each of dimension n, the forward propagation is: z = w X + b a = σ(z) J = m L (i) (1) (2) where L is the binary cross entropy loss L (i) = y (i) log(a (i) ) + ( 1 y (i) )log(1 a (i) ) b) Dimensions of the variables in the forward propagation equations It s important to note the shapes of the quantities in equations (1) and (2). 1 n w = R, X = R n m, b = R 1 m, but is really of shape 1 1 and broadcasted to z = R 1 m and a = R 1 m and J is a scalar. 1 m c) Backpropagation equations To train our model, we need to update our weights and biases w and b, using the gradient of the loss with respect to these parameters. In other words, we need to calculate b and. To do this, we will apply the chain rule. a z We can write as a z a The first step is to calculate. a z

a = m (i) = m (i) (i) a (y log(a ) + ( 1 y (i) )log(1 a (i) )) = m y ( (i) + (1 y (i) ) 1 ) (i) 1 a (i) a(i) a(i) a (i) = a (i) (1 a (i) ) which is the derivative of the sigmoid function. z (i) and Putting this together, W = m z (i) z (i) W y = ( (i) 1 ) )a (1 ) (1 ) 1 )a z (i) a (i) ( y (i) 1 (i) 1 a a (i) = y (i) a (i) + ( y (i) (i) = a (i) y (i) (i) z (i) = Therefore, (wx i + b ) = wxi = w j X n 1 n 1 ji ) X = m (i) (a y (i) w j ji To evaluate this derivative, we will find the derivative with respect to each element of W. p = m (a (i) y (i) ) n 1 X z (i) = p p p p w j ji (wx + b ) = wx = w j X ji = X pi Where X p is a row vector corresponding to p p n 1 the p th row of the X matrix. = m (a (i) y (i) )Xpi To get we simply stack all these derivatives up, row wise. This can efficiently be written in matrix form as: A )X = ( Y T z Following a very similar procedure, and noting that (i) b = 1 A ).1 Where 1 is a column vector of 1 s. = ( Y Part III Revisiting Backpropagation There are several possible ways to obtain an optimal set of weights/parameters for a neural network. The naive approach would consist in randomly generating a new set of weights at each iteration step. An improved method would use local information of the loss function (e.g the gradient) to pick a better guess in the next iteration. Does backpropagation compute a numerical or analytical value of the gradients in a neural network? (Answer on Menti)

1. You are given the following neural network and your goal is to compute. a[1] 1 a) What other derivatives do you need to compute before finding [1]? b) What values do you need to cache during the forward propagation in order to compute? A: a) You need to compute the intermediary derivatives,,,,, ŷ ŷ z 1 z 1 a i a i z j z i [1] b) d z [L] = d a [L] [L] g (z ) d W [L] [L] [L 1] = d z a d b [L] = d z [L] [L] d a [L 1] = W [L]T dz d z [L] = W [L+1]T [L+1] [L] dz g (z )... 2. Backpropagation example on a univariate scalar function (e.g. f : R R ): Let s suppose that you have built a model that uses the following loss function: L = (ˆ y y) 2 where ŷ = tanh [ σ (wx ) 2 + b ] Assume that all the above variables are scalars. Using backpropagation, calculate. ( ŷ 2 ) z ( ŷ 2 ) z z x 2 A: (y ) (y ) = 2 ˆ y ŷ = 2 ˆ y 1 (y ) = 2 ˆ y 1 (1 ) where z = σ (wx + b )

n 3. Backpropagation example on a multivariate scalar function (e.g. f : R R ): Let s suppose that you have built a model that uses the following loss function: L = y log(y) ˆ where ŷ = R elu (wt x + b ) n a) Assume x R. What s the shape of w? n 1 A: w R b) Using backpropagation, obtain. A: We will derive for...n: i log(y) ˆ 1 ŷ = y (z) (z) where, if i i = y ŷ = y 1 i ŷ f z = y 1 i ŷ f x i f (z) = 1 z > 0, f (z) = 0 otherwise (where z = w T x + b.) n m 4. Backpropagation applied to scalar-matrix functions ( f : R R ): The final case that is worth exploring is: L = ŷ y 2 2 where ŷ = σ (x) W 1 m a) Assume that ŷ R and x R. What is the shape of W? 1 n A: W is an nxm matrix. (Note that the shapes of y and x differ from what you are used to in the class notations.) b) Using backpropagation, calculate. x A: = 2(y ˆ y ) ŷ y z (1 ) where x W = 2ˆ W T z z = σ (x)