Introduction to Deep Learning

Similar documents
Introduction to Deep Learning

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

SGD and Deep Learning

Lecture 17: Neural Networks and Deep Learning

Jakub Hajic Artificial Intelligence Seminar I

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

CSC 411 Lecture 10: Neural Networks

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

CSC321 Lecture 5: Multilayer Perceptrons

Neural Networks and Deep Learning.

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Introduction to Convolutional Neural Networks (CNNs)

Artificial Neural Networks. MGS Lecture 2

From perceptrons to word embeddings. Simon Šuster University of Groningen

Neural Networks. Intro to AI Bert Huang Virginia Tech

CSC321 Lecture 4: Learning a Classifier

Feature Design. Feature Design. Feature Design. & Deep Learning

Intelligent Systems Discriminative Learning, Neural Networks

A summary of Deep Learning without Poor Local Minima

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Networks: Introduction

CSC321 Lecture 4: Learning a Classifier

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Course 395: Machine Learning - Lectures

Understanding How ConvNets See

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Neural networks and optimization

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Reading Group on Deep Learning Session 1

Introduction to Machine Learning

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Neural Networks and Deep Learning

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Neural Networks. Learning and Computer Vision Prof. Olga Veksler CS9840. Lecture 10

Logistic Regression & Neural Networks

CSC321 Lecture 16: ResNets and Attention

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

y(x n, w) t n 2. (1)

Neural Networks (Part 1) Goals for the lecture

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

ECE521 Lecture 7/8. Logistic Regression

Machine Learning Lecture 12

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Neural networks. Chapter 19, Sections 1 5 1

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Machine Learning for Physicists Lecture 1

Machine Learning Lecture 10

1 What a Neural Network Computes

Artificial Neural Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Convolutional Neural Networks. Srikumar Ramalingam

Neural networks. Chapter 20. Chapter 20 1

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Grundlagen der Künstlichen Intelligenz

Deep Learning: a gentle introduction

Intro to Neural Networks and Deep Learning

Deep Learning (CNNs)

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

CSE446: Neural Networks Winter 2015

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural networks. Chapter 20, Section 5 1

Artificial Neural Networks. Historical description

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CSCI567 Machine Learning (Fall 2018)

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

CSC 411 Lecture 7: Linear Classification

Machine Learning

Neural Networks: Backpropagation

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Machine Learning Basics III

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Sequence Modeling with Neural Networks

Introduction to Neural Networks

Lecture 4: Perceptrons and Multilayer Perceptrons

CSC 411: Lecture 04: Logistic Regression

CSC 411: Lecture 02: Linear Regression

Multilayer Perceptrons and Backpropagation

Slide credit from Hung-Yi Lee & Richard Socher

Feedforward Neural Nets and Backpropagation

CS489/698: Intro to ML

Nonlinear Classification

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Natural Language Processing

Introduction to (Convolutional) Neural Networks

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016

Neural networks COMS 4771

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Transcription:

Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2015 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 1 / 39

Outline 1 Universality of Neural Networks 2 Learning Neural Networks 3 Deep Learning 4 Applications 5 References A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 2 / 39

What are neural networks? Let s ask Biological Input layer Hidden layer Output layer Input #1 Computational Input #2 Input #3 Input #4 Output A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 3 / 39

What are neural networks?...neural networks (NNs) are computational models inspired by biological neural networks [...] and are used to estimate or approximate functions... [Wikipedia] A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 4 / 39

What are neural networks? Origins: Traced back to threshold logic [W. McCulloch and W. Pitts, 1943] Perceptron [F. Rosenblatt, 1958] A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 5 / 39

What are neural networks? Use cases Classification Playing video games Captcha Neural Turing Machine (e.g., learn how to sort) Alex Graves http://www.technologyreview.com/view/532156/googles-secretive-deepmind-startup-unveils-a-neural-turing-machine/ A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 6 / 39

What are neural networks? Example: input x parameters w 1, w 2, b A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 7 / 39

What are neural networks? Example: input x parameters w 1, w 2, b x R w 1 w 2 h 1 f b R A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 7 / 39

How to compute the function? Forward propagation/pass, inference, prediction: Given input x and parameters w, b Compute (latent variables/) intermediate results in a feed-forward manner Until we obtain output function f x R w 1 w 2 h 1 f b R A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 8 / 39

How to compute the function? Forward propagation/pass, inference, prediction: Given input x and parameters w, b Compute (latent variables/) intermediate results in a feed-forward manner Until we obtain output function f A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 8 / 39

How to compute the function? Example: input x, parameters w 1, w 2, b x R w 1 w 2 h 1 f b R A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 9 / 39

How to compute the function? Example: input x, parameters w 1, w 2, b x R w 1 h w 2 1 f h 1 = σ(w 1 x + b) b R f = w 2 h 1 Sigmoid function: σ(z) = 1/(1 + exp( z)) A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 9 / 39

How to compute the function? Example: input x, parameters w 1, w 2, b x R w 1 h w 2 1 f h 1 = σ(w 1 x + b) b R f = w 2 h 1 Sigmoid function: σ(z) = 1/(1 + exp( z)) x = ln 2, b = ln 3, w 1 = 2, w 2 = 2 h 1 =? f =? A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 9 / 39

How to compute the function? Given parameters, what is f for x = 0, x = 1, x = 2,... f = w 2 σ(w 1 x + b) A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 10 / 39

f How to compute the function? Given parameters, what is f for x = 0, x = 1, x = 2,... f = w 2 σ(w 1 x + b) 2 1.5 1 0.5 0 5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 10 / 39

Let s mess with parameters: x R w 1 h 1 w 2 f h 1 = σ(w 1 x + b) f = w 2 h 1 b R σ(z) = 1/(1 + exp( z)) A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 11 / 39

Let s mess with parameters: x R w 1 h 1 w 2 f h 1 = σ(w 1 x + b) f = w 2 h 1 b R σ(z) = 1/(1 + exp( z)) w 1 = 1.0, b changes A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 11 / 39

Let s mess with parameters: f 1 0.8 0.6 x R w 1 h 1 w 2 f h 1 = σ(w 1 x + b) f = w 2 h 1 b R σ(z) = 1/(1 + exp( z)) w 1 = 1.0, b changes 0.4 b = 2 0.2 b = 0 b = 2 0 5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 11 / 39

Let s mess with parameters: f x R w 1 h 1 w 2 f h 1 = σ(w 1 x + b) f = w 2 h 1 b R σ(z) = 1/(1 + exp( z)) 1 w 1 = 1.0, b changes b = 0, w 1 changes 0.8 0.6 0.4 b = 2 0.2 b = 0 b = 2 0 5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 11 / 39

Let s mess with parameters: f x R w 1 h 1 w 2 f h 1 = σ(w 1 x + b) f = w 2 h 1 b R σ(z) = 1/(1 + exp( z)) 1 0.8 w 1 = 1.0, b changes 1 0.8 b = 0, w 1 changes 0.6 0.4 b = 2 0.2 b = 0 b = 2 0 5 0 5 x 0.6 = 0 fw1 0.4 w 1 = 0.5 0.2 w 1 = 1.0 w 1 = 100 0 5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 11 / 39

Let s mess with parameters: f x R w 1 h 1 w 2 f h 1 = σ(w 1 x + b) f = w 2 h 1 b R σ(z) = 1/(1 + exp( z)) 1 0.8 w 1 = 1.0, b changes 1 0.8 b = 0, w 1 changes 0.6 0.4 b = 2 0.2 b = 0 b = 2 0 5 0 5 x 0.6 = 0 fw1 0.4 w 1 = 0.5 0.2 w 1 = 1.0 w 1 = 100 0 5 0 5 x Keep in mind the step function. A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 11 / 39

How to use Neural Networks for binary classification? Feature/Measurement: x Output: How likely is the input to be a cat? 1 0.8 0.6 y 0.4 0.2 0 5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 12 / 39

f How to use Neural Networks for binary classification? Feature/Measurement: x Output: How likely is the input to be a cat? 1 0.8 0.6 0.4 0.2 0 5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 12 / 39

f How to use Neural Networks for binary classification? Feature/Measurement: x Output: How likely is the input to be a cat? 1 0.8 0.6 0.4 0.2 0 5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 12 / 39

f How to use Neural Networks for binary classification? Feature/Measurement: x Output: How likely is the input to be a cat? 1 0.8 0.6 0.4 0.2 0 5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 12 / 39

f How to use Neural Networks for binary classification? Shifted feature/measurement: x Output: How likely is the input to be a cat? 1 0.8 0.6 Previous features 0.4 0.2 0 5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 13 / 39

f f How to use Neural Networks for binary classification? Shifted feature/measurement: x Output: How likely is the input to be a cat? 1 0.8 0.6 0.4 0.2 Previous features 0 5 0 5 x 1 0.8 0.6 0.4 0.2 Shifted features 0 5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 13 / 39

f f How to use Neural Networks for binary classification? Shifted feature/measurement: x Output: How likely is the input to be a cat? 1 0.8 0.6 0.4 0.2 Previous features 0 5 0 5 x 1 0.8 0.6 0.4 0.2 Shifted features 0 5 0 5 x Learning/Training means finding the right parameters. A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 13 / 39

f So far we are able to scale and translate sigmoids. How well can we approximate an arbitrary function? With the simple model we are obviously not going very far. Features are good Simple classifier 1 0.8 0.6 0.4 0.2 0 5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 14 / 39

f f So far we are able to scale and translate sigmoids. How well can we approximate an arbitrary function? With the simple model we are obviously not going very far. Features are good Simple classifier 1 0.8 0.6 Features are noisy More complex classifier 1 0.8 0.6 0.4 0.2 0 5 0 5 x 0.4 0.2 0 5 0 5 x A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 14 / 39

f f So far we are able to scale and translate sigmoids. How well can we approximate an arbitrary function? With the simple model we are obviously not going very far. Features are good Simple classifier 1 0.8 0.6 Features are noisy More complex classifier 1 0.8 0.6 0.4 0.2 0 5 0 5 x 0.4 0.2 0 5 0 5 x How can we generalize? A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 14 / 39

Let s use more hidden variables: b 1 h 1 x R w 1 w 2 w 3 w 4 f h 1 = σ(w 1 x + b 1 ) h 2 = σ(w 3 x + b 2 ) f = w 2 h 1 + w 4 h 2 h 2 b 2 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 15 / 39

Let s use more hidden variables: f b 1 h 1 x R w 1 w 2 w 3 w 4 f h 1 = σ(w 1 x + b 1 ) h 2 = σ(w 3 x + b 2 ) f = w 2 h 1 + w 4 h 2 h 2 b 2 Combining two step functions gives a bump. 2 1.8 1.6 1.4 1.2 1 5 0 5 x w 1 = 100, b 1 = 40, w 3 = 100, b 2 = 60, w 2 = 1, w 4 = 1 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 15 / 39

So let s simplify: b 1 x R w 1 h 1 w 2 w 3 w 4 f h 2 b 2 Bump(x 1, x 2, h) f We simplify a pair of hidden nodes to a bump function: Starts at x 1 Ends at x 2 Has height h A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 16 / 39

Now we can represent bumps very well. How can we generalize? A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 17 / 39

Now we can represent bumps very well. How can we generalize? f Bump(0.0, 0.2, h 1 ) Bump(0.2, 0.4, h 2 ) Bump(0.4, 0.6, h 3 ) f Bump(0.6, 0.8, h 4 ) Bump(0.8, 1.0, h 5) 1.5 1 0.5 0 Target Approximation 0.5 0 0.5 1 x More bumps gives more accurate approximation. Corresponds to a single layer network. A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 17 / 39

Universality: theoretically we can approximate an arbitrary function So we can learn a really complex cat classifier Where is the catch? A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 18 / 39

Universality: theoretically we can approximate an arbitrary function So we can learn a really complex cat classifier Where is the catch? Complexity, we might need quite a few hidden units Overfitting, memorize the training data A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 18 / 39

Generalizations are possible to A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 19 / 39

Generalizations are possible to include more input dimensions capture more output dimensions employ multiple layers for more efficient representations See http://neuralnetworksanddeeplearning.com/chap4.html for a great read! A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 19 / 39

How do we find the parameters to obtain a good approximation? How do we tell a computer to do that? A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 20 / 39

How do we find the parameters to obtain a good approximation? How do we tell a computer to do that? Intuitive explanation: Compute approximation error at the output Propagate error back by computing individual contributions of parameters to error [Fig. from H. Lee] A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 20 / 39

Example for backpropagation of error: Target function: 5x 2 Approximation: f (x, w) Domain of interest: x {0, 1, 2, 3} Error: e(w) = (5x 2 f (x, w)) 2 x {0,1,2,3} A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 21 / 39

Example for backpropagation of error: Target function: 5x 2 Approximation: f (x, w) Domain of interest: x {0, 1, 2, 3} Error: Program of interest: min w e(w) = e(w) = min w x {0,1,2,3} x {0,1,2,3} (5x 2 f (x, w)) 2 (5x 2 f (x, w)) 2 }{{} l(x,w) How to optimize? A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 21 / 39

Example for backpropagation of error: Target function: 5x 2 Approximation: f (x, w) Domain of interest: x {0, 1, 2, 3} Error: Program of interest: min w e(w) = e(w) = min w x {0,1,2,3} x {0,1,2,3} (5x 2 f (x, w)) 2 (5x 2 f (x, w)) 2 }{{} l(x,w) How to optimize? Gradient descent A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 21 / 39

Gradient descent min e(w) w A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 22 / 39

Gradient descent min e(w) w Algorithm: start with w 0, t = 0 1 Compute gradient g t = e 2 Update w t+1 = w t ηg t 3 Set t t + 1 w w=wt A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 22 / 39

Chain rule is important to compute gradients: min w e(w) = min w x {0,1,2,3} (5x 2 f (x, w)) 2 }{{} l(x,w) A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 23 / 39

Chain rule is important to compute gradients: min w Loss function: l(x, w) e(w) = min w x {0,1,2,3} (5x 2 f (x, w)) 2 }{{} l(x,w) A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 23 / 39

Chain rule is important to compute gradients: min w Loss function: l(x, w) Squared loss Log loss Hinge loss e(w) = min w x {0,1,2,3} (5x 2 f (x, w)) 2 }{{} l(x,w) A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 23 / 39

Chain rule is important to compute gradients: min w Loss function: l(x, w) Squared loss Log loss Hinge loss Derivatives: e(w) = min w e(w) w = = x {0,1,2,3} x {0,1,2,3} (5x 2 f (x, w)) 2 }{{} l(x,w) l(x, w) w A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 23 / 39

Chain rule is important to compute gradients: min w Loss function: l(x, w) Squared loss Log loss Hinge loss Derivatives: e(w) = min w e(w) w = = x {0,1,2,3} x {0,1,2,3} x {0,1,2,3} (5x 2 f (x, w)) 2 }{{} l(x,w) l(x, w) w l(x, w) f (x, w) f w A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 23 / 39

Slightly more complex example: Composite function represented as a directed a-cyclic graph l(x, w) = f 1 (w 1, f 2 (w 2, f 3 (...))) f 1 (w 1, f 2 ) f f 1 1 w f 1 2 w 1 f 2 (w 2, f 3 ) f 1 f 2 f 2 f 3 f 1 f 2 f 2 w 2 w 2 f 3 (...) Repeated application of chain rule for efficient computation of all gradients A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 24 / 39

Back propagation doesn t work well for deep sigmoid networks: Diffusion of gradient signal (multiplication of many small numbers) Attractivity of many local minima (random initialization is very far from good points) Requires a lot of training samples Need for significant computational power A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 25 / 39

Back propagation doesn t work well for deep sigmoid networks: Diffusion of gradient signal (multiplication of many small numbers) Attractivity of many local minima (random initialization is very far from good points) Requires a lot of training samples Need for significant computational power Solution: 2 step approach Greedy layerwise pre-training Perform full fine tuning at the end A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 25 / 39

Why go deep? Representation efficiency (fewer computational units for the same function) Hierarchical representation (non-local generalization) Combinatorial sharing (re-use of earlier computation) Works very well [Fig. from H. Lee] A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 26 / 39

To obtain more flexibility/non-linearity we use additional function prototypes: A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 27 / 39

To obtain more flexibility/non-linearity we use additional function prototypes: Sigmoid Rectified linear unit (ReLU) Pooling Dropout Convolutions A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 27 / 39

Convolutions What do the numbers mean? See Sanja s lecture 14 for the answers... [Fig. adapted from A. Krizhevsky] A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 28 / 39

f conv (, ) A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 29 / 39

f conv (, ) = A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 29 / 39

Max Pooling What is happening here? [Fig. adapted from A. Krizhevsky] A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 30 / 39

A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 31 / 39

Rectified Linear Unit (ReLU) A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 32 / 39

Rectified Linear Unit (ReLU) Drop information if smaller than zero Fixes the problem of vanishing gradients to some degree A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 32 / 39

Rectified Linear Unit (ReLU) Drop information if smaller than zero Fixes the problem of vanishing gradients to some degree Dropout A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 32 / 39

Rectified Linear Unit (ReLU) Drop information if smaller than zero Fixes the problem of vanishing gradients to some degree Dropout Drop information at random Kind of a regularization, enforcing redundancy A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 32 / 39

A famous deep learning network called AlexNet. The network won the ImageNet competition in 2012. How many parameters? Given an image, what is happening? Inference Time: about 2ms per image when processing many images in parallel on the GPU Training Time: a few days given a single recent GPU [Fig. adapted from A. Krizhevsky] A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 33 / 39

Demo A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 34 / 39

Neural networks have been used for many applications: Classification and Recognition in Computer Vision Text Parsing in Natural Language Processing Playing Video Games Stock Market Prediction Captcha Demos: Russ website Antonio Places website A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 35 / 39

Classification in Computer Vision: ImageNet Challenge http://deeplearning.cs.toronto.edu/ Since it s the end of the semester, let s find the beach... A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 36 / 39

Classification in Computer Vision: ImageNet Challenge http://deeplearning.cs.toronto.edu/ A place to maybe prepare for exams... A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 37 / 39

Links: Tutorials: http://deeplearning.net/tutorial/deeplearning.pdf Toronto Demo by Russ and students: http://deeplearning.cs.toronto.edu/ MIT Demo by Antonio and students: http://places.csail.mit.edu/demo.html Honglak Lee: http://deeplearningworkshopnips2010.files.wordpress.com/2010/09/n workshop-tutorial-final.pdf Yann LeCun: http://www.cs.nyu.edu/ yann/talks/lecun-ranzato-icml2013.pdf Richard Socher: http://lxmls.it.pt/2014/socher-lxmls.pdf A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 38 / 39

Videos: Video games: https://www.youtube.com/watch?v=mart-xpable Captcha: http://singularityhub.com/2013/10/29/tiny-ai-startupvicarious-says-its-solved-captcha/ https://www.youtube.com/watch?v=lge-dl2juam#t=27 Stock exchange: http://cs.stanford.edu/people/eroberts/courses/soco/projects/neuralnetworks/applications/stocks.html A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 39 / 39