Neural Network Back-propagation HYUNG IL KOO

Similar documents
Tasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based

Understanding CNNs using visualisation and transformation analysis

8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples

Convolutional Neural Networks

CSC321 Lecture 6: Backpropagation

Neural Networks. Intro to AI Bert Huang Virginia Tech

Understanding How ConvNets See

How to do backpropagation in a brain

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Lecture 17: Neural Networks and Deep Learning

CSC 411 Lecture 10: Neural Networks

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Explaining Machine Learning Decisions

Learning Deep Architectures for AI. Part I - Vijay Chakilam

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Multilayer Perceptrons and Backpropagation

An overview of deep learning methods for genomics

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Neural Networks: Backpropagation

Feedforward Neural Networks. Michael Collins, Columbia University

Backpropagation: The Good, the Bad and the Ugly

Deep Feedforward Networks. Sargur N. Srihari

Computational Graphs, and Backpropagation. Michael Collins, Columbia University

Neural Nets Supervised learning

ECE521 Lectures 9 Fully Connected Neural Networks

Lecture 35: Optimization and Neural Nets

Machine Learning (CSE 446): Backpropagation

Lecture 4 Backpropagation

Deep Feedforward Networks

CSC321 Lecture 5 Learning in a Single Neuron

More on Neural Networks

A summary of Deep Learning without Poor Local Minima

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Conditional Language modeling with attention

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018

Νεςπο-Ασαυήρ Υπολογιστική Neuro-Fuzzy Computing

Neural Network Training

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Tutorial on Tangent Propagation

Tutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion

Lecture 3 Feedforward Networks and Backpropagation

CSC321 Lecture 5: Multilayer Perceptrons

Artificial Neural Networks. MGS Lecture 2

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

A thorough derivation of back-propagation for people who really want to understand it by: Mike Gashler, September 2010

Neural networks COMS 4771

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Automatic Differentiation and Neural Networks

Convolutional networks. Sebastian Seung

Normalization Techniques in Training of Deep Neural Networks

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Computational Graphs, and Backpropagation

Lecture 6: Backpropagation

Lecture 2: Learning with neural networks

CSC321 Lecture 4: Learning a Classifier

Introduction to gradient descent

Statistical Machine Learning from Data

Introduction to Neural Networks

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Approximate Q-Learning. Dan Weld / University of Washington

Machine Learning Basics III

Statistical NLP for the Web

y(x n, w) t n 2. (1)

Deep Learning (CNNs)

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Lecture 5: Logistic Regression. Neural Networks

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CSC321 Lecture 4: Learning a Classifier

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Solutions. Part I Logistic regression backpropagation with a single training example

Lecture 3 Feedforward Networks and Backpropagation

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Backpropagation and Neural Networks part 1. Lecture 4-1

Lab 5: 16 th April Exercises on Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Layer-wise Relevance Propagation for Neural Networks with Local Renormalization Layers

Convolutional Neural Network Architecture

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16

Sajid Anwar, Kyuyeon Hwang and Wonyong Sung

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

ECE521 Lecture 7/8. Logistic Regression

Intro to Neural Networks and Deep Learning

Deep Neural Networks (1) Hidden layers; Back-propagation

Kirk Borne Booz Allen Hamilton

Backpropagation in Neural Network

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Spatial Transformer Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Neural Network Tutorial & Application in Nuclear Physics. Weiguang Jiang ( 蒋炜光 ) UTK / ORNL

Deep Neural Networks (1) Hidden layers; Back-propagation

Variational Autoencoders (VAEs)

Transcription:

Neural Network Back-propagation HYUNG IL KOO

Hidden Layer Representations Backpropagation has an ability to discover useful intermediate representations at the hidden unit layers inside the networks which capture properties of the input spaces that are most relevant to learning the target function. When more layers of units are used in the network, more complex features can be invented. But the representations of the hidden layers are very hard to understand for humans.

Basic Math

Optimization Find x that minimizes f(x) If f(x) is differentiable, But, in many cases, solving the above equation is a still difficult problem.

Gradient descent f(x 1 ) y = f(x) f(x 2 ) x 1 x 2 x 3

Chain Rule Chain rule with a single variable Chain Rule (multiple variables) w = f x, y, z => dw = f dx + f dy + f dz dt x dt y dt z dt Δw f x Δx + f y Δy + f z Δz

Feed-forward neural network

Feed forward network example: 1 st layer w 11 w 12w21 +1 w 22 w 31 w 32 The 1st hidden layer Non-linear function

Feed forward network example: 2 nd layer u 11 u 21 u 22 u 12 u 13 +1 u 23 The 1st hidden layer +1 The 2 nd hidden layer

Layer 1 Layer 2 Soft Max Forward propagation The 1st hidden layer The 2 nd hidden layer Output

Layer 1 Layer 2 Soft Max Forward propagation matrix repr. The 1st hidden layer The 2 nd hidden layer Output

Back-propagation algorithm

Weight update method Parameter update : Gradient Descent ` Loss function (L) Learning rate

Layer 1 Layer 2 Soft Max Dataflow diagram VS The 1st hidden layer The 2 nd hidden layer Output Ground Truth

Soft Max Back-propagation step; Loss function Output Ground Truth Computing Loss function(l): ex) Cross entropy

Layer 1 Layer 2 Soft Max Layer 1 Layer 2 Soft Max Overview 1 0 The 1st hidden layer The 2 nd hidden layer Output Ground Truth Loss function

Layer 2 Soft Max Back-propagation; 2 nd layer The 1st hidden layer The 2 nd hidden layer Output Ground Truth Weight update the Layer 2 has to do Error propagation

Layer 2 Soft Max Back-propagation; 2 nd layer The 1st hidden layer The 2 nd hidden layer Output Ground Truth Weight update the Layer 2 has to do Error propagation

Error propagation (feed forward network) +1 +1 The 1st hidden layer The 2 nd hidden layer z 1 and z 2 are from its upper layer.

Weight updates (feed forward network) +1 +1 The 1st hidden layer The 2 nd hidden layer z 1 and z 2 are from its upper layer.

Layer 1 Layer 2 Soft Max Back propagation; 1 st layer The 1st hidden layer The 2 nd hidden layer Output Ground Truth the Layer 1 has to do Weight update w new ij = w old ij μ w ij Error propagation, Input update???

Weight updates w ij new = w ij old μ w ij (feed forward network) +1 The 1st hidden layer, and are y 1 y 2 y 3 from its upper layer

Block-based perspective

Basic Math Y= X AX Y+ΔY X + ΔX A X + ΔX = X AX + ΔX AX + X ΔX + ΔX AΔX X AX + X A + A ΔX = Y X Y A Y+ΔY X A + ΔA X = X AX + X ΔAX ΔY = X ΔAX = tr X ΔAX = tr(xx ΔA) Y A = XX

Block-based representation X Y Z Y = WX + b Z = φ(y) X Y Z X = W Y Y = diag φ Z Z W b = Y = Y X

Soft Max Forward propagation (block-based representation) Layer 1 Layer 2 Output Layer 1 Layer 2

Soft Max Backward propagation; 2 nd layer VS Layer 1 Layer 2 Output Ground Truth Error propagation

Soft Max Backward propagation; 2 nd layer VS Layer 1 Layer 2 Error propagation Output Ground Truth VS

Soft Max Backward propagation; 2 nd layer VS Layer 1 Layer 2 Output Ground Truth Error propagation Weight update

Soft Max Backward propagation; 1 st layer VS Layer 1 Layer 2 Error propagation Output Ground Truth

Soft Max Backward propagation; 1 st layer VS Layer 1 Layer 2 Error propagation Weight update Output Ground Truth

Input Optimization while fixing all weights

Input update (feed forward network) +1 The 1st hidden layer y 1, y 2 and 3 are from its upper layer.

Output Loss Building

I new = I old μ I Output Loss Building

Output Loss Building

Inceptionism: Going Deeper into Neural Networks https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html

Inceptionism: Going Deeper into Neural Networks

DEEP INSIDE CONVOLUTION NETWORKS Inputs maximizing class score

Inputs maximizing class score S c I λ I 2 Objective Function (to be maximized) K. Simonyan, A. Vedaldi, A. Zisserman, Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps, ICLR Workshop 2014

Inputs maximizing class score S c I λ I 2 Objective Function (to be maximized) Goose class

Maximizing class score S c I λ I 2 Objective Function (to be maximized) Goose class

Inputs maximizing class score dumbbell cup dalmatian bell pepper lemon husky

Inputs maximizing class score computer keyboard kit fox limousine Washing machine goose ostrich

DEEP INSIDE CONVOLUTION NETWORKS Saliency visualization

Saliency visualization Linear score model for class c: w : importance of corresponding pixels of I for class c

Saliency visualization Dog Class Objective Function

Saliency visualization Dog Class Differentiation Objective Function S c (I 0 ) I 0 =

Saliency visualization I 0 = saliency map

yacht dog monkey washing machine cow building

A NEURAL ALGORITHM OF ARTISTIC STYLE

Loss Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "A neural algorithm of artistic style." arxiv preprint arxiv:1508.06576 (2015).

Loss

Artistic style

Artistic style

Backups

Vector-by-scalar

Scalar-by-vector

Vector-by-vector

Scalar-by-matrix

Why W = X Y? We want to find W satisfying: ΔL = tr from Y = WX ΔL = Y ΔY. W ΔW. [Intuitive derivation] ΔY = ΔWX ΔL = W Y ΔY = ΔWX = tr Y = X Y Y ΔWX = tr X Y ΔW

X Y Z V Y = WX Z = Y + b V = φ(z) X Y Z V X = W Y Y = Z Z = diag φ Z V W = Y X b = Z

Proof Element-wise operation y 1 y 2 y n y i = z i φ z i Y y 1 y 2 y n = diag φ Z Z z 1 = φ(y 1 ) z 2 = φ(y 2 ) z n = φ(y n ) z 1 z 2 z n

Proof Y = W X L Y Y Y W X L Y W X X = W Y = X y = WX L tr y WX L Y Y W W L WX = tr Y W = Y Y WX = tr X Y W X

New Layer Design

Layer 1 Layer m Layer N Soft Max New layer addition α 1 β 1 α 2 α 3 β 2 β 3 The 1st layer A newly added layer The Nth layer Output

Layer m Layer m New layer design Forward pass Compute output from input Backward pass Compute the derivatives w.r.t. data Update weights?? dl dl α 1 β 1 dα 1 dl dβ 1 dl α 2 β 2 dα 2 dl dβ 2 dl α 3 β 3 dα 3 dβ 3 Forward pass Backward pass

Max pooling layer Example: Max pooling 5 6 3 9 2 1 4 11 17 14 3 21 11 13 12 9 6 11 17 21

max max Derivatives of max For forward pass For backward pass x dl dx = dl dz dz dx y z dl dy = dl dz dz dy dl dz Forward pass Backward pass