Deep Learning (CNNs)

Similar documents
Bayesian Networks (Part I)

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018

Perceptron & Neural Networks

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 8 Feb. 12, 2018

Logistic Regression Introduction to Machine Learning. Matt Gormley Lecture 9 Sep. 26, 2018

Lecture 17: Neural Networks and Deep Learning

Jakub Hajic Artificial Intelligence Seminar I

Lecture 7 Convolutional Neural Networks

Convolutional Neural Networks

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Introduction to Convolutional Neural Networks (CNNs)

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

SGD and Deep Learning

Introduction to Machine Learning (67577)

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Neural Networks and Deep Learning

Understanding How ConvNets See

Convolutional Neural Network Architecture

Jelinek Summer School Machine Learning

Neural networks COMS 4771

Outline. CSCI567 Machine Learning (Spring 2019) Outline. Math formulation. Prof. Victor Adamchik. Feb. 12, 2019

Lecture 35: Optimization and Neural Nets

Machine Learning Basics

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

CSC 411 Lecture 10: Neural Networks

Perceptron (Theory) + Linear Regression

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16

Introduction to (Convolutional) Neural Networks

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Machine Learning Lecture 12

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Neural networks and optimization

Loss Functions and Optimization. Lecture 3-1

Deep Neural Networks (1) Hidden layers; Back-propagation

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Grundlagen der Künstlichen Intelligenz

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Machine Learning Lecture 10

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

Deriving Principal Component Analysis (PCA)

Artificial Neural Networks

CSCI567 Machine Learning (Fall 2018)

Statistical Machine Learning

CSCI 315: Artificial Intelligence through Deep Learning

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

ECE521 Lectures 9 Fully Connected Neural Networks

Deep Neural Networks (1) Hidden layers; Back-propagation

Hidden Markov Models

Deep Learning: a gentle introduction

Introduction to Deep Learning

Bayesian Networks (Part II)

CS 343: Artificial Intelligence

Tasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based

Binary Convolutional Neural Network on RRAM

Deep Learning Lab Course 2017 (Deep Learning Practical)

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

How to do backpropagation in a brain

Automatic Differentiation and Neural Networks

Loss Functions and Optimization. Lecture 3-1

Logistic Regression & Neural Networks

A summary of Deep Learning without Poor Local Minima

CS60010: Deep Learning

Spatial Transformer Networks

Non-Linearity. CS 188: Artificial Intelligence. Non-Linear Separators. Non-Linear Separators. Deep Learning I

Machine Learning. Boris

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks: Backpropagation

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Introduction to Deep Learning

8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Deep neural networks

Neural Networks. Learning and Computer Vision Prof. Olga Veksler CS9840. Lecture 10

Theories of Deep Learning

Course 395: Machine Learning - Lectures

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Introduction to Deep Neural Networks

Artificial Neuron (Perceptron)

MLE/MAP + Naïve Bayes

From perceptrons to word embeddings. Simon Šuster University of Groningen

CSC242: Intro to AI. Lecture 21

Neural networks and optimization

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Convolutional Neural Networks. Srikumar Ramalingam

Neural Networks 2. 2 Receptive fields and dealing with image inputs

CSC321 Lecture 5: Multilayer Perceptrons

Intro to Neural Networks and Deep Learning

Transcription:

10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deep Learning (CNNs) Deep Learning Readings: Murphy 28 Bishop - - HTF - - Mitchell - - Matt Gormley Lecture 21 April 05, 2017 1

Reminders Homework 5 (Part II): Peer Review Release: Wed, Mar. 29 Due: Wed, Apr. 05 at 11:59pm Peer Tutoring Homework 7: Deep Learning Release: Wed, Apr. 05 Watch for multiple due dates!! Expectation: You should spend at most 1 hour on your reviews 2

BACKPROPAGATION 3

Background A Recipe for Machine Learning 1. Given training data: 3. Define goal: 2. Choose each of these: Decision function Loss function 4. Train with SGD: (take small steps opposite the gradient) 4

Training Backpropagation Whiteboard Example: Backpropagation for Calculus Quiz #1 Calculus Quiz #1: Suppose x = 2 and z = 3, what are dy/dx and dy/dz for the function below? 5

Training Backpropagation Automatic Differentiation Reverse Mode (aka. Backpropagation) Forward Computation 1. Write an algorithm for evaluating the function y = f(x). The algorithm defines a directed acyclic graph, where each variable is a node (i.e. the computation graph ) 2. Visit each node in topological order. For variable u i with inputs v 1,, v N a. Compute u i = g i (v 1,, v N ) b. Store the result at the node Backward Computation 1. Initialize all partial derivatives dy/du j to 0 and dy/dy = 1. 2. Visit each node in reverse topological order. For variable u i = g i (v 1,, v N ) a. We already know dy/du i b. Increment dy/dv j by (dy/du i )(du i /dv j ) (Choice of algorithm ensures computing (du i /dv j ) is easy) Return partial derivatives dy/du i for all variables 6

Training Backpropagation Simple Example: The goal is to compute J = ( (x 2 )+3x 2 ) on the forward pass and the derivative dj dx on the backward pass. Forward J = cos(u) u = u 1 + u 2 u 1 = sin(t) u 2 =3t t = x 2 7

Training Backpropagation Simple Example: The goal is to compute J = ( (x 2 )+3x 2 ) on the forward pass and the derivative dj dx on the backward pass. Forward J = cos(u) u = u 1 + u 2 u 1 = sin(t) u 2 =3t t = x 2 Backward dj du = sin(u) dj = dj du, du 1 du du 1 dj dj du 1 = dt du 1 dt, du 1 dt dj dj du 2 = dt du 2 dt, du 2 dt dj dj dt = dx dt dx, du du 1 =1 = (t) =3 dt dx =2x dj = dj du 2 du du du 2, du du 2 =1 8

Training Backpropagation Case 1: Logistic Regression Output Input θ 1 θ 2 θ 3 θ M Forward J = y y +(1 y ) (1 y) y = a = 1 1+ ( a) D j=0 jx j Backward dj dy = y y + (1 y ) y 1 dj da = dj dy dy da, dy da = ( a) ( ( a) + 1) 2 dj = dj d j da dj = dj dx j da da d j, da dx j, da d j = x j da dx j = j 9

Training Backpropagation Output (E) Output (sigmoid) y = 1 1+ ( b) Hidden Layer (D) Output (linear) b = D j=0 jz j Input (C) Hidden (sigmoid) z j = 1 1+ ( a j ), j (B) Hidden (linear) a j = M i=0 jix i, j (A) Input Given x i, i 10

Training Backpropagation (F) Loss J = 1 2 (y y )2 Output (E) Output (sigmoid) y = 1 1+ ( b) Hidden Layer (D) Output (linear) b = D j=0 jz j Input (C) Hidden (sigmoid) z j = 1 1+ ( a j ), j (B) Hidden (linear) a j = M i=0 jix i, j (A) Input Given x i, i 11

Training Backpropagation Case 2: Neural Network Forward J = y y +(1 y ) (1 y) y = b = z j = a j = 1 1+ ( b) D j=0 jz j 1 1+ ( a j ) M i=0 jix i Backward dj dy = y y + (1 y ) y 1 dj db = dj dy dy db, dy db = ( b) ( ( b) + 1) 2 dj = dj d j db dj = dj dz j db db d j, db dz j, db d j = z j db dz j = dj = dj dz j, dz j = da j dz j da j da j dj = dj d ji da j da j d ji, dj = dj da j, da j = dx i da j dx i dx i j ( a j ) ( ( a j ) + 1) 2 da j d ji = x i D j=0 ji 12

Training Backpropagation Case 2: Neural Loss Network Sigmoid Linear Sigmoid Linear Forward J = y y +(1 y ) (1 y) y = b = z j = a j = 1 1+ ( b) D j=0 jz j 1 1+ ( a j ) M i=0 jix i Backward dj dy = y y + (1 y ) y 1 dj db = dj dy dy db, dy db = ( b) ( ( b) + 1) 2 dj = dj d j db dj = dj dz j db db d j, db dz j, db d j = z j db dz j = dj = dj dz j, dz j = da j dz j da j da j dj = dj d ji da j da j d ji, dj = dj da j, da j = dx i da j dx i dx i j ( a j ) ( ( a j ) + 1) 2 da j d ji = x i D j=0 ji 13

Training Backpropagation Whiteboard SGD for Neural Network Example: Backpropagation for Neural Network 14

Training Backpropagation Backpropagation (Auto.Diff. - Reverse Mode) Forward Computation 1. Write an algorithm for evaluating the function y = f(x). The algorithm defines a directed acyclic graph, where each variable is a node (i.e. the computation graph ) 2. Visit each node in topological order. a. Compute the corresponding variable s value b. Store the result at the node Backward Computation 1. Initialize all partial derivatives dy/du j to 0 and dy/dy = 1. 2. Visit each node in reverse topological order. For variable u i = g i (v 1,, v N ) a. We already know dy/du i b. Increment dy/dv j by (dy/du i )(du i /dv j ) (Choice of algorithm ensures computing (du i /dv j ) is easy) Return partial derivatives dy/du i for all variables 15

Background A Recipe for Gradients Machine Learning 1. Given training data: 3. Define goal: And it s a special case of a more general algorithm called reverse- mode automatic differentiation that Decision functioncan compute 4. Train the with gradient SGD: of any differentiable function efficiently! 2. Choose each of these: Loss function Backpropagation can compute this gradient! (take small steps opposite the gradient) 16

Summary 1. Neural Networks provide a way of learning features are highly nonlinear prediction functions (can be) a highly parallel network of logistic regression classifiers discover useful hidden representations of the input 2. Backpropagation provides an efficient way to compute gradients is a special case of reverse- mode automatic differentiation 17

DEEP LEARNING 18

Deep Learning Outline Background: Computer Vision Image Classification ILSVRC 2010-2016 Traditional Feature Extraction Methods Convolution as Feature Extraction Convolutional Neural Networks (CNNs) Learning Feature Abstractions Common CNN Layers: Convolutional Layer Max- Pooling Layer Fully- connected Layer (w/tensor input) Softmax Layer ReLU Layer Background: Subgradient Architecture: LeNet Architecture: AlexNet Training a CNN SGD for CNNs Backpropagation for CNNs 19

Motivation Why is everyone talking about Deep Learning? Because a lot of money is invested in it DeepMind: Acquired by Google for $400 million DNNResearch: Three person startup (including Geoff Hinton) acquired by Google for unknown price tag Enlitic, Ersatz, MetaMind, Nervana, Skylab: Deep Learning startups commanding millions of VC dollars Because it made the front page of the New York Times 20

Motivation Why is everyone talking about Deep Learning? 1960s 1980s 1990s 2006 2016 Deep learning: Has won numerous pattern recognition competitions Does so with minimal feature engineering This wasn t always the case! Since 1980s: Form of models hasn t changed much, but lots of new tricks More hidden units Better (online) optimization New nonlinear functions (ReLUs) Faster computers (CPUs and GPUs) 21

BACKGROUND: COMPUTER VISION 22

Example: Image Classification ImageNet LSVRC- 2011 contest: Dataset: 1.2 million labeled images, 1000 classes Task: Given a new image, label it with the correct class Multiclass classification problem Examples from http://image- net.org/ 23

24

25

26

Example: Image Classification Traditional Feature Extraction for Images: SIFT HOG 27

Example: Image Classification CNN for Image Classification (Krizhevsky, Sutskever & Hinton, 2012) 15.3% error on ImageNet LSVRC- 2012 contest Input image (pixels) Five convolutional layers (w/max- pooling) Three fully connected layers 1000- way softmax 28

CNNs for Image Recognition Slide from Kaiming He 29

CONVOLUTION 30

What s a convolution? Basic idea: Pick a 3x3 matrix F of weights Slide this over an image and compute the inner product (similarity) of F and the corresponding field of the image, and replace the pixel in the center of the field with the output of the inner product operation Key point: Different convolutions extract different types of low- level features from an image All that we need to vary to generate these different features is the weights of F Slide adapted from William Cohen

Background: Image Processing A convolution matrix is used in image processing for tasks such as edge detection, blurring, sharpening, etc. Input Image 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Convolution 0 0 0 0 1 1 0 1 0 Convolved Image 1 1 1 1 1 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 1 0 0 0 0 32

Background: Image Processing A convolution matrix is used in image processing for tasks such as edge detection, blurring, sharpening, etc. Input Image 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Convolution 0 0 0 0 1 1 0 1 0 Convolved Image 3 2 2 3 1 2 0 2 1 0 2 2 1 0 0 3 1 0 0 0 1 0 0 0 0 33

Background: Image Processing A convolution matrix is used in image processing for tasks such as edge detection, blurring, sharpening, etc. Input Image 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Convolution Convolved Image 3 2 2 3 1 2 0 2 1 0 2 2 1 0 0 3 1 0 0 0 1 0 0 0 0 34

Background: Image Processing A convolution matrix is used in image processing for tasks such as edge detection, blurring, sharpening, etc. Input Image 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Convolution Convolved Image 3 2 2 3 1 2 0 2 1 0 2 2 1 0 0 3 1 0 0 0 1 0 0 0 0 35

Background: Image Processing A convolution matrix is used in image processing for tasks such as edge detection, blurring, sharpening, etc. Input Image 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Convolution Convolved Image 3 2 2 3 1 2 0 2 1 0 2 2 1 0 0 3 1 0 0 0 1 0 0 0 0 36

Background: Image Processing A convolution matrix is used in image processing for tasks such as edge detection, blurring, sharpening, etc. Input Image 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Convolution 3 Convolved Image 37

Background: Image Processing A convolution matrix is used in image processing for tasks such as edge detection, blurring, sharpening, etc. Input Image 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Convolution Convolved Image 3 2 38

Background: Image Processing A convolution matrix is used in image processing for tasks such as edge detection, blurring, sharpening, etc. Input Image 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Convolution Convolved Image 3 2 2 39

Background: Image Processing A convolution matrix is used in image processing for tasks such as edge detection, blurring, sharpening, etc. Input Image 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Convolution Convolved Image 3 2 2 3 40

Background: Image Processing A convolution matrix is used in image processing for tasks such as edge detection, blurring, sharpening, etc. Input Image 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Convolution Convolved Image 3 2 2 3 1 41

Background: Image Processing A convolution matrix is used in image processing for tasks such as edge detection, blurring, sharpening, etc. Input Image 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Convolution Convolved Image 3 2 2 3 1 2 42

Background: Image Processing A convolution matrix is used in image processing for tasks such as edge detection, blurring, sharpening, etc. Input Image 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Convolution Convolved Image 3 2 2 3 1 2 0 43

Background: Image Processing A convolution matrix is used in image processing for tasks such as edge detection, blurring, sharpening, etc. Input Image 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Convolution Convolved Image 3 2 2 3 1 2 0 2 1 0 2 2 1 0 0 3 1 0 0 0 1 0 0 0 0 44

Background: Image Processing A convolution matrix is used in image processing for tasks such as edge detection, blurring, sharpening, etc. Input Image 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Identity Convolution 0 0 0 0 1 0 0 0 0 Convolved Image 1 1 1 1 1 1 0 0 1 0 1 0 1 0 0 1 1 0 0 0 1 0 0 0 0 45

Background: Image Processing A convolution matrix is used in image processing for tasks such as edge detection, blurring, sharpening, etc. Input Image 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Blurring Convolution.1.1.1.1.2.1.1.1.1 Convolved Image.4.5.5.5.4.4.2.3.6.3.5.4.4.2.1.5.6.2.1 0.4.3.1 0 0 46

What s a convolution? http://matlabtricks.com/post- 5/3x3- convolution- kernels- with- online- demo Slide from William Cohen

What s a convolution? http://matlabtricks.com/post- 5/3x3- convolution- kernels- with- online- demo Slide from William Cohen

What s a convolution? http://matlabtricks.com/post- 5/3x3- convolution- kernels- with- online- demo Slide from William Cohen

What s a convolution? http://matlabtricks.com/post- 5/3x3- convolution- kernels- with- online- demo Slide from William Cohen

What s a convolution? http://matlabtricks.com/post- 5/3x3- convolution- kernels- with- online- demo Slide from William Cohen

What s a convolution? http://matlabtricks.com/post- 5/3x3- convolution- kernels- with- online- demo Slide from William Cohen

What s a convolution? Basic idea: Pick a 3x3 matrix F of weights Slide this over an image and compute the inner product (similarity) of F and the corresponding field of the image, and replace the pixel in the center of the field with the output of the inner product operation Key point: Different convolutions extract different types of low- level features from an image All that we need to vary to generate these different features is the weights of F Slide adapted from William Cohen

Downsampling Suppose we use a convolution with stride 2 Only 9 patches visited in input, so only 9 pixels in output Input Image 1 1 1 1 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Convolution 1 1 1 1 Convolved Image 54

Downsampling Suppose we use a convolution with stride 2 Only 9 patches visited in input, so only 9 pixels in output Input Image 1 1 1 1 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Convolution 1 1 1 1 Convolved Image 3 55

Downsampling Suppose we use a convolution with stride 2 Only 9 patches visited in input, so only 9 pixels in output Input Image 1 1 1 1 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Convolution 1 1 1 1 Convolved Image 3 3 56

Downsampling Suppose we use a convolution with stride 2 Only 9 patches visited in input, so only 9 pixels in output Input Image 1 1 1 1 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Convolution 1 1 1 1 Convolved Image 3 3 1 57

Downsampling Suppose we use a convolution with stride 2 Only 9 patches visited in input, so only 9 pixels in output Input Image 1 1 1 1 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Convolution 1 1 1 1 Convolved Image 3 3 1 3 58

Downsampling Suppose we use a convolution with stride 2 Only 9 patches visited in input, so only 9 pixels in output Input Image 1 1 1 1 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Convolution 1 1 1 1 Convolved Image 3 3 1 3 1 59

Downsampling Suppose we use a convolution with stride 2 Only 9 patches visited in input, so only 9 pixels in output Input Image 1 1 1 1 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Convolution 1 1 1 1 Convolved Image 3 3 1 3 1 0 60

Downsampling Suppose we use a convolution with stride 2 Only 9 patches visited in input, so only 9 pixels in output Input Image 1 1 1 1 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Convolution 1 1 1 1 Convolved Image 3 3 1 3 1 0 1 61

Downsampling Suppose we use a convolution with stride 2 Only 9 patches visited in input, so only 9 pixels in output Input Image 1 1 1 1 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Convolution 1 1 1 1 Convolved Image 3 3 1 3 1 0 1 0 62

Downsampling Suppose we use a convolution with stride 2 Only 9 patches visited in input, so only 9 pixels in output Input Image 1 1 1 1 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Convolution 1 1 1 1 Convolved Image 3 3 1 3 1 0 1 0 0 63

CONVOLUTIONAL NEURAL NETS 64

Deep Learning Outline Background: Computer Vision Image Classification ILSVRC 2010-2016 Traditional Feature Extraction Methods Convolution as Feature Extraction Convolutional Neural Networks (CNNs) Learning Feature Abstractions Common CNN Layers: Convolutional Layer Max- Pooling Layer Fully- connected Layer (w/tensor input) Softmax Layer ReLU Layer Background: Subgradient Architecture: LeNet Architecture: AlexNet Training a CNN SGD for CNNs Backpropagation for CNNs 65

Convolutional Neural Network (CNN) Typical layers include: Convolutional layer Max- pooling layer Fully- connected (Linear) layer ReLUlayer (or some other nonlinear activation function) Softmax These can be arranged into arbitrarily deep topologies Architecture #1: LeNet- 5 66

Input Image Convolutional Layer CNN key idea: Treat convolution matrix as parameters and learn them! 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Learned Convolution θ 11 θ 12 θ 13 θ 21 θ 22 θ 23 θ 31 θ 32 θ 33 Convolved Image.4.5.5.5.4.4.2.3.6.3.5.4.4.2.1.5.6.2.1 0.4.3.1 0 0 67

Downsampling by Averaging Downsampling by averaging used to be a common approach This is a special case of convolution where the weights are fixed to a uniform distribution The example below uses a stride of 2 Input Image 1 1 1 1 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Convolution 1/4 1/4 1/4 1/4 Convolved Image 3/4 3/4 1/4 3/4 1/4 0 1/4 0 0 68

Max- Pooling Max- pooling is another (common) form of downsampling Instead of averaging, we take the max value within the same range as the equivalently- sized convolution The example below uses a stride of 2 Input Image 1 1 1 1 1 0 1 0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Max- pooling x i,j x i+1,j x i,j+1 x i+1,j+1 Max- Pooled Image 1 1 1 1 1 0 1 0 0 69

Multi- Class Output Output Hidden Layer Input 71

Multi- Class Output Softmax Layer: y k = (b k ) K l=1 (b l) (F) Loss J = K k=1 y k (y k) (E) Output (softmax) y k = (b k) K l=1 (b l) (D) Output (linear) b k = D j=0 kjz j k Output (C) Hidden (nonlinear) z j = (a j ), j Hidden Layer (B) Hidden (linear) a j = M i=0 jix i, j Input (A) Input Given x i, i 72

Whiteboard Training a CNN SGD for CNNs Backpropagation for CNNs 73

Whiteboard Common CNN Layers ReLU Layer Background: Subgradient Fully- connected Layer (w/tensor input) Softmax Layer Convolutional Layer Max- Pooling Layer 74

Convolutional Layer 75

Convolutional Layer 76

Max- Pooling Layer 77

Max- Pooling Layer 78

Convolutional Neural Network (CNN) Typical layers include: Convolutional layer Max- pooling layer Fully- connected (Linear) layer ReLUlayer (or some other nonlinear activation function) Softmax These can be arranged into arbitrarily deep topologies Architecture #1: LeNet- 5 79

Architecture #2: AlexNet CNN for Image Classification (Krizhevsky, Sutskever & Hinton, 2012) 15.3% error on ImageNet LSVRC- 2012 contest Input image (pixels) Five convolutional layers (w/max- pooling) Three fully connected layers 1000- way softmax 80

CNNs for Image Recognition Slide from Kaiming He 81

CNN VISUALIZATIONS 83

3D Visualization of CNN http://scs.ryerson.ca/~aharley/vis/conv/

Convolution of a Color Image Color images consist of 3 floats per pixel for RGB (red, green blue) color values Convolution must also be 3- dimensional 32 32x32x3 image 5x5x3 filter activation map 28 convolve (slide) over all spatial locations 3 32 Figure from Fei- Fei Li & Andrej Karpathy & Justin Johnson (CS231N) 1 28 85

Animation of 3D Convolution http://cs231n.github.io/convolutional- networks/ Figure from Fei- Fei Li & Andrej Karpathy & Justin Johnson (CS231N) 86

MNIST Digit Recognition with CNNs (in your browser) https://cs.stanford.edu/people/karpathy/convnetjs/demo/mnist.html Figure from Andrej Karpathy 87

CNN Summary CNNs Are used for all aspects of computer vision, and have won numerous pattern recognition competitions Able learn interpretable features at different levels of abstraction Typically, consist of convolution layers, pooling layers, nonlinearities, and fully connected layers Other Resources: Readings on course website Andrej Karpathy, CS231n Notes http://cs231n.github.io/convolutional- networks/ 88