Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

Similar documents
CS60010: Deep Learning

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

Deep Feedforward Networks

Ch.6 Deep Feedforward Networks (2/3)

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation

Machine Learning Basics III

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Deep Feedforward Networks

Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville

Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation

ECE521 Lectures 9 Fully Connected Neural Networks

CSC321 Lecture 5: Multilayer Perceptrons

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

CSC 411 Lecture 10: Neural Networks

Logistic Regression & Neural Networks

Lecture 4 Backpropagation

Gradient-Based Learning. Sargur N. Srihari

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Reading Group on Deep Learning Session 1

Deep Neural Networks (1) Hidden layers; Back-propagation

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Feed-forward Network Functions

Introduction to Machine Learning Spring 2018 Note Neural Networks

Neural Networks and Deep Learning

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Deep Neural Networks (1) Hidden layers; Back-propagation

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)

CS 453X: Class 20. Jacob Whitehill

Lecture 17: Neural Networks and Deep Learning

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Neural Network Training

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Computational statistics

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6

Artificial Neural Networks. MGS Lecture 2

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Feedforward Neural Nets and Backpropagation

Introduction to Machine Learning

Course 395: Machine Learning - Lectures

Feedforward Neural Networks. Michael Collins, Columbia University

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Advanced Machine Learning

Artificial Neural Networks

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Lecture 5: Logistic Regression. Neural Networks

ECE521 Lecture 7/8. Logistic Regression

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Artificial Intelligence

Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Introduction to Neural Networks

Deep Feedforward Networks

Learning Deep Architectures for AI. Part I - Vijay Chakilam

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

CSC 578 Neural Networks and Deep Learning

Multilayer Perceptrons and Backpropagation

Introduction to Neural Networks

Multilayer Perceptron

Artificial Neuron (Perceptron)

Machine Learning Lecture 5

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

CS489/698: Intro to ML

8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples

Perceptron & Neural Networks

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

Other Topologies. Y. LeCun: Machine Learning and Pattern Recognition p. 5/3

Convolutional Neural Network Architecture

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Machine Learning

From perceptrons to word embeddings. Simon Šuster University of Groningen

A summary of Deep Learning without Poor Local Minima

Based on the original slides of Hung-yi Lee

The XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic

Deep Feedforward Networks. Sargur N. Srihari

Lecture 2: Logistic Regression and Neural Networks

Learning from Data: Multi-layer Perceptrons

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Neural Network Tutorial & Application in Nuclear Physics. Weiguang Jiang ( 蒋炜光 ) UTK / ORNL

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2005, Lecture 4 Gradient-Based Learning III: Architectures Yann LeCun

Lecture 2: Modular Learning

1 What a Neural Network Computes

Jakub Hajic Artificial Intelligence Seminar I

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Neural Network Language Modeling

Introduction to Neural Networks

Transcription:

Deep Feedforward Networks Lecture slides for Chapter 6 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last updated 2016-10-04

Roadmap Example: Learning XOR Gradient-Based Learning Hidden Units Architecture Design Back-Propagation

XOR is not linearly separable 1 Original x space x2 0 0 1 x 1 Figure 6.1, left

Rectified Linear Activation g(z) = max{0,z} 0 0 z Figure 6.3

DEEP FEEDFORWARD NETWORKS Network Diagrams y y w h1 h2 h W x1 x2 x Figure 6.2 n example of a feedforward network, drawn in two different style

Solving XOR f(x; W, c, w,b)=w > max{0, W > x + c} + b. (6.3) W = apple apple 1 1 1 1, (6.4) c = w = apple 0 1 apple 1 2, (6.5), (6.6)

Solving XOR Original x space Learned h space 1 1 x2 h2 0 0 0 1 x 1 Figure 6.1 0 1 2 h 1

Roadmap Example: Learning XOR Gradient-Based Learning Hidden Units Architecture Design Back-Propagation

Gradient-Based Learning Specify Model Cost Design model and cost so cost is smooth Minimize cost using gradient descent or related techniques

Conditional Distributions and Cross-Entropy J( ) = E x,y ˆpdata log p model (y x). (6.12)

Output Types Output Type Output Distribution Output Layer Binary Bernoulli Sigmoid Discrete Multinoulli Softmax Continuous Gaussian Linear Continuous Mixture of Gaussian Mixture Density Cost Function Binary crossentropy Discrete crossentropy Gaussian crossentropy (MSE) Cross-entropy Continuous Arbitrary See part III: GAN, VAE, FVBN Various

Mixture Density Outputs y x Figure 6.4

Don t mix and match Sigmoid output with target of 1 (z) Cross-entropy loss MSE loss 1.0 0.5 0.0 3 2 1 0 1 2 3 z

Roadmap Example: Learning XOR Gradient-Based Learning Hidden Units Architecture Design Back-Propagation

Hidden units Use ReLUs, 90% of the time For RNNs, see Chapter 10 For some research projects, get creative Many hidden units perform comparably to ReLUs. New hidden units that perform comparably are rarely interesting.

Roadmap Example: Learning XOR Gradient-Based Learning Hidden Units Architecture Design Back-Propagation

Architecture Basics y Depth h 1 x 1 h 2 x 2 Width

Universal Approximator Theorem One hidden layer is enough to represent (not learn) an approximation of any function to an arbitrary degree of accuracy So why deeper? Shallow net may need (exponentially) more width Shallow net may overfit more

Exponential Representation Advantage of Depth Figure 6.5

Better Generalization with Greater Depth 96.5 96.0 Test accuracy (percent) 95.5 95.0 94.5 94.0 93.5 93.0 92.5 92.0 3 4 5 6 7 8 9 10 11 Layers Figure 6.6

Large, Shallow Models Overfit More Test accuracy (percent) 97 96 95 94 93 92 3, convolutional 3, fully connected 11, convolutional 91 0.0 0.2 0.4 0.6 0.8 1.0 Number of parameters 10 8 Figure 6.7

Roadmap Example: Learning XOR Gradient-Based Learning Hidden Units Architecture Design Back-Propagation

Back-Propagation Back-propagation is just the chain rule of calculus dz dx = dz dy X But it s a particular implementation of the chain rule Uses dynamic programming (table filling) Avoids recomputing repeated subexpressions Speed vs memory tradeoff dy dx. (6.44) 2 R 2 R @y > r x z = r y z, (6.46) @x

Simple Back-Prop Example Compute loss y Compute activations Forward prop h 1 x 1 h 2 x 2 Back-prop Compute derivatives

CHAPTER 6. DEEP FEEDFORWARD NETWORKS Computation Graphs y Multiplication Logistic regression z u(1) u(2) + dot y x x w (a) (b) u(2) H ReLU layer U relu (1) U (2) y u dot matmul W (c) b Figure 6.8 x u(3) sum + X b (1) sqr Linear regression and weight decay w (d) Figure 6.8: Examples of computational graphs. (a)the graph using the operation to compute z = xy. (b)the graph for the logistic regression prediction y = x> w + b.

Repeated Subexpressions z f f f y x @z @w = @z @y @y @x @x @w (6.50) (6.51) =f 0 (y)f 0 (x)f 0 (w) (6.52) =f 0 (f(f(w)))f 0 (f(w))f 0 (w) (6.53) w Figure 6.9 Back-prop avoids computing this twice

Symbol-to-Symbol Differentiation f z f z Figure 6.10 y y f 0 dz dy f f x x f 0 dy dx dz dx f f w w f 0 dx dw dz dw

APTER 6. DEEP FEEDFORWARD NETWORKS Neural Network Loss Function JMLE J cross_entropy U (2) + y u(8) matmul H W (2) relu sqr U (5) sum u(6) u(7) + U (1) matmul Figure 6.11 X W (1) sqr U (3) sum u(4) ure 6.11: The computational graph used to compute the cost used to train our example

Hessian-vector Products h(r Hv = r x x f(x)) > i v. (6.59)

Questions