Deep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning

Similar documents
Lecture 7 Convolutional Neural Networks

Convolution and Pooling as an Infinitely Strong Prior

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

The Convolution Operation

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Convolutional Neural Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 8. Sinan Kalkan

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Statistical Machine Learning

Spatial Transformer Networks

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Νεςπο-Ασαυήρ Υπολογιστική Neuro-Fuzzy Computing

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Introduction to Convolutional Neural Networks (CNNs)

Lecture 17: Neural Networks and Deep Learning

Introduction to Deep Learning

Pattern Recognition and Machine Learning. Artificial Neural networks

Machine Learning: Basis and Wavelet 김화평 (CSE ) Medical Image computing lab 서진근교수연구실 Haar DWT in 2 levels

RegML 2018 Class 8 Deep learning

Jakub Hajic Artificial Intelligence Seminar I

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

Introduction to Deep Learning CMPT 733. Steven Bergner

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Deep Feedforward Networks

Machine Learning. Boris

Machine Learning Lecture 10

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Feature Design. Feature Design. Feature Design. & Deep Learning

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Machine Learning Lecture 12

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016

Convolutional Neural Networks. Srikumar Ramalingam

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Machine Learning Basics

Convolutional neural networks

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Pattern Recognition and Machine Learning

CSCI 315: Artificial Intelligence through Deep Learning

y(x n, w) t n 2. (1)

Deep Learning. Recurrent Neural Network (RNNs) Ali Ghodsi. October 23, Slides are partially based on Book in preparation, Deep Learning

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

How to do backpropagation in a brain

Final Examination CS 540-2: Introduction to Artificial Intelligence

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Logistic Regression & Neural Networks

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

4. Multilayer Perceptrons

Understanding How ConvNets See

Regularization in Neural Networks

Reading Group on Deep Learning Session 1

1 What a Neural Network Computes

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural networks and optimization

Statistical Machine Learning from Data

Notes on Adversarial Examples

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Learning (CNNs)

Classification of Hand-Written Digits Using Scattering Convolutional Network

Neural networks (NN) 1

Deep Feedforward Networks. Sargur N. Srihari

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Convolutional Neural Network Architecture

Convolutional networks. Sebastian Seung

Lecture 14: Deep Generative Learning

Artificial Neural Networks

Channel Pruning and Other Methods for Compressing CNN

Intelligent Systems Discriminative Learning, Neural Networks

1 Machine Learning Concepts (16 points)

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Neural Networks and Deep Learning

Group Equivariant Convolutional Networks. Taco Cohen

Agenda. Digit Classification using CNN Digit Classification using SAE Visualization: Class models, filters and saliency 2 DCT

CSC321 Lecture 5: Multilayer Perceptrons

UNSUPERVISED LEARNING

EVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN)

Multilayer Neural Networks

Multilayer Neural Networks

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

SGD and Deep Learning

Neural Networks. Intro to AI Bert Huang Virginia Tech

Lecture 5 Neural models for NLP

Lecture 12. Talk Announcement. Neural Networks. This Lecture: Advanced Machine Learning. Recap: Generalized Linear Discriminants

Artificial Neural Networks. Edward Gatt

Midterm: CS 6375 Spring 2015 Solutions

Deep Learning: a gentle introduction

Based on the original slides of Hung-yi Lee

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Lecture 3 Feedforward Networks and Backpropagation

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Grundlagen der Künstlichen Intelligenz

Image Recognition by a Second-Order Convolutional Neural Network with Dynamic Receptive Fields

Transcription:

Convolutional Neural Network (CNNs) University of Waterloo October 30, 2015 Slides are partially based on Book in preparation, by Bengio, Goodfellow, and Aaron Courville, 2015

Convolutional Networks Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers.

Convolution This operation is called convolution. s(t) = x(a)w(t a)da The convolution operation is typically denoted with an asterisk: s(t) = (x w)(t)

Discrete convolution If we now assume that x and w are defined only on integer t, we can define the discrete convolution: s[t] = (x w)(t) = a= x[a]w[t a]

In practice we often use convolutions over more than one axis at a time. s[i, j] = (I K)[i, j] = I [m, n]k[i m, j n] m n The input is usually a multidimensional array of data. The kernel is usually a multidimensional array of parameters that should be learned. we assume that these functions are zero everywhere but the finite set of points for which we store the values. we can implement the infinite summation as a summation over a finite number of array elements.

convolution and cross-correlation convolution is commutative s[i, j] = (I K)[i, j] = m I [i m, j n]k[m, n] n Cross-correlation, s[i, j] = (I K)[i, j] = m I [i + m, j + n]k[m, n] n Many machine learning libraries implement cross-correlation but call it convolution. https://www.youtube.com/watch?v=ma0yonjmzli Fig 9.1 Discrete convolution can be viewed as multiplication by a matrix.

Convolutions Credit: UFLDL Wiki

Convolutions Credit: UFLDL Wiki

Convolutions Credit: UFLDL Wiki

Convolutions Credit: UFLDL Wiki

Convolutions Credit: UFLDL Wiki

Convolutions Credit: UFLDL Wiki

Convolutions Credit: UFLDL Wiki

Convolutions Credit: UFLDL Wiki

Convolutions Credit: UFLDL Wiki

Sparse interactions In feed forward neural network every output unit interacts with every input unit. Convolutional networks, typically have sparse connectivity ( sparse weights) This is accomplished by making the kernel smaller than the input

Sparse interactions When we have m inputs and n outputs, then matrix multiplication requires m n parameters. and the algorithms used in practice have O(m n) runtime (per example). limit the number of connections each output may have to k, then requires only k n parameters and O(k n) runtime.

Parameter sharing In a traditional neural net, each element of the weight matrix is multiplied by one element of the input. i.e. It is used once when computing the output of a layer. In CNNs each member of the kernel is used at every position of the input Instead of learning a separate set of parameters for every location, we learn only one set.

Equivariance A function f (x) is equivariant to a function g if f (g(x)) = g(f (x)). Figure from Wikipedia

Equivariance A convolutional layer have equivariance to translation. For example g(x)[i] = x[i 1] If we apply this transformation to x, then apply convolution, the result will be the same as if we applied convolution to x, then applied the transformation to the output.

Equivariance For images, convolution creates a 2-D map of where certain features appear in the input. Note that convolution is not equivariant to some other transformations, such as changes in the scale or rotation of an image.

Convolutional Networks The first stage (Convolution): the layer performs several convolutions in parallel to produce a set of preactivations. The second stage (Detector): each preactivation is run through a nonlinear activation function (e.g. rectified linear). The third stage (Pooling)

Popular Pooling functions The maximum of a rectangular neighborhood (Max pooling operation) The average of a rectangular neighborhood. The L2 norm of a rectangular neighborhood. A weighted average based on the distance from the central pixel.

Pooling with downsampling Max-pooling with a pool width of 3 and a stride between pools of 2. This reduces the representation size by a factor of 2,which reduces the computational and statistical burden on the next layer.

Pooling and translations pooling helps to make the representation become invariant to small translations of the input. In Right panel, the input has been shifted to the right by 1 pixel. Every value in the bottom row has changed, but only half of the values in the top row have changed.

Pooling and translations Invariance to local translation can be a very useful property if we care more about whether some feature is present than exactly where it is. For example: In a face, we need not know the exact location of the eyes.

Pooling: inputs of varying size Example: we want to classify images of variable size. The input to the classification layer must have a fixed size. In the final pooling output (for example) four sets of summary statistics, one for each quadrant of an image, regardless of the image size.

Pooling: inputs of varying size It is also possible to dynamically pool features together, for example, by running a clustering algorithm on the locations of interesting features (Boureau et al., 2011). i.e. a different set of pooling regions for each image. Learn a single pooling structure that is then applied to all images (Jia et al., 2012).

Convolution and Pooling as an Infinitely Strong Prior A weak prior is a prior distribution with high entropy, such a Gaussian distribution with high variance A strong prior has very low entropy, such as a Gaussian distribution with low variance. An infinitely strong prior places zero probability on some parameters and says a convolutional net is similar to a fully connected net with an infinitely strong prior over its weights.

Convolution and Pooling as an Infinitely Strong Prior The weights for one hidden unit must be identical to the weights of its neighbor, but shifted in space. The weights must be zero, except for in the small, spatially contiguous receptive field assigned to that hidden unit. use of convolution as infinitely strong prior probability distribution over the parameters of a layer. This prior says that the function the layer should learn contains only local interactions and is equivariant to translation.

Convolution and Pooling as an Infinitely Strong Prior the use of pooling is in infinitely strong prior that each unit should be invariant to small translations. convolution and pooling can cause underfitting

Practical issues The input is usually not just a grid of real values. It is a grid of vector-valued observations. For example, a color image has a red, green, and blue intensity at each pixel.

Practical issues When working with images, we usually think of the input and output of the convolution as 3-D tensors. One index into the different channels and two indices into the coordinates of each channel. Software implementations usually work in batch mode, so they will actually use 4-D tensors, with the fourth axis indexing different examples in the batch.

Training Suppose we want to train a convolutional network that incorporates convolution of kernel stack K applied to multi-channel image V with stride s: c(k, V, s) Suppose we want to minimize some loss function J(V, K). During forward propagation, we will need to use c itself to output Z, Z is propagated through the rest of the network and used to compute J.

Training During backpropagation, we will receive a tensor G such that G i,j,k = Z i,j,k J(V, K). To train the network, we need to compute the derivatives with respect to the weights in the kernel. To do so, we can use a function g(g, V, s) i,j,k,l = J(V, K) = G i,m,n V j,m s+k,n s+l. Z i,j,k m,n If this layer is not the bottom layer of the network, we?ll need to compute the gradient with respect to V in order to backpropagate the error farther down. To do so, we can use a function h(k, G, s)i, j, k = V i,j,k J(V, K) = l,m s l+m=j n,p s n+p=k K q,i,m,p G i,l,n. q

Random or Unsupervised Features the most expensive part of convolutional network training is learning the features. When performing supervised training with gradient descent, every gradient step requires a complete run of forward propagation and backward propagation through the entire network. use features that are not trained in a supervised fashion.

Random or Unsupervised Features There are two basic strategies for obtaining convolution kernels without supervised training. One is to simply initialize them randomly. learn them with an unsupervised criterion.

Random or Unsupervised Features Random filters often work surprisingly well in convolutional networks. layers consisting of convolution following by pooling naturally become frequency selective and translation invariant when assigned random weights. that this provides an inexpensive way to choose the architecture of a convolutional network: first evaluate the performance of several convolutional network architectures by training only the last layer, then take the best of these architectures and train the entire architecture using a more expensive approach.