Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Similar documents
Multilayer Neural Networks

Introduction to Neural Networks

Multilayer Neural Networks

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Jakub Hajic Artificial Intelligence Seminar I

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Introduction to Convolutional Neural Networks (CNNs)

Convolutional Neural Networks

y(x n, w) t n 2. (1)

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

SGD and Deep Learning

Neural Networks. Learning and Computer Vision Prof. Olga Veksler CS9840. Lecture 10

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Lecture 17: Neural Networks and Deep Learning

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Based on the original slides of Hung-yi Lee

Neural Networks and Deep Learning

4. Multilayer Perceptrons

Understanding How ConvNets See

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Introduction to Machine Learning

Lecture 4: Perceptrons and Multilayer Perceptrons

Architecture Multilayer Perceptron (MLP)

Neural networks and optimization

Deep Learning (CNNs)

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Reading Group on Deep Learning Session 1

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Deep Learning: a gentle introduction

Course 395: Machine Learning - Lectures

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

CSCI 315: Artificial Intelligence through Deep Learning

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural networks and optimization

Multilayer Perceptron = FeedForward Neural Network

Pattern Classification

Classification with Perceptrons. Reading:

Neural Networks: Backpropagation

Regularization in Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16

Convolutional Neural Networks. Srikumar Ramalingam

1 What a Neural Network Computes

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Introduction to Neural Networks

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Bayesian Networks (Part I)

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning

CSCI567 Machine Learning (Fall 2018)

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Introduction Biologically Motivated Crude Model Backpropagation

Lecture 7 Convolutional Neural Networks

Neural networks and support vector machines

Machine Learning Lecture 10

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Convolutional Neural Network Architecture

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

How to do backpropagation in a brain

Introduction to Deep Learning

Artificial Neural Networks. Edward Gatt

Multilayer Perceptrons and Backpropagation

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

arxiv: v1 [cs.cv] 11 May 2015 Abstract

CSC321 Lecture 9: Generalization

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

AI Programming CS F-20 Neural Networks

Artificial Neural Networks

Deep Feedforward Networks

Neural Networks biological neuron artificial neuron 1

CSE446: Neural Networks Winter 2015

Feature Design. Feature Design. Feature Design. & Deep Learning

CSC321 Lecture 5: Multilayer Perceptrons

Multilayer Perceptrons (MLPs)

CSC 578 Neural Networks and Deep Learning

The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Cheng Soon Ong & Christian Walder. Canberra February June 2018

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Theories of Deep Learning

Introduction to Deep Learning

Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks

Introduction to Neural Networks

Logistic Regression & Neural Networks

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

8. Lecture Neural Networks

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Lab 5: 16 th April Exercises on Neural Networks

RegML 2018 Class 8 Deep learning

Transcription:

ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton

Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Database ImageNet 15M images 22K categories Images collected from Web RGB Images Variable-resolution Human labelers (Amazon s Mechanical Turk crowd-sourcing) ImageNet Large Scale Visual Recognition Challenge (ILSVRC-2010) 1K categories 1.2M training images (~1000 per category) 50,000 validation images 150,000 testing images

Strategy Deep Learning Shallow vs. deep architectures Learn a feature hierarchy all the way from pixels to classifier reference : http://web.engr.illinois.edu/~slazebni/spring14/lec24_cnn.pdf

Neuron - Perceptron Input (raw pixel) x 1 Weights w 1 x 2 x 3 w 2 w 3 f Output: f(w*x+b) w d x d reference : http://en.wikipedia.org/wiki/sigmoid_function#mediaviewer/file:gjl-t(x).svg

Multi-Layer Neural Networks Input Layer Hidden Layer Output Layer Nonlinear classifier Learning can be done by gradient descent Back-Propagation algorithm

Feed Forward Operation input layer: d features hidden layer: output layer: m outputs, one for each class x (1) w ji v kj z 1 x (2) x (d) z m bias unit

Notation for Weights Use w ji to denote the weight between input unit i and hidden unit j input unit i w ji hidden unit j x (i) w ji x (i) y j Use v kj to denote the weight between hidden unit j and output unit k hidden unit j output unit k y j v kj vkj y j z k

Notation for Activation Use net i to denote the activation and hidden unit j net j = d i= 1 x ( i ) w ji + w j 0 hidden unit j y j Use net* k to denote the activation at output unit k net * k N = H j= 1 y j v kj + v k 0 output unit k z j

Network Training 1. Initialize weights w ji and v kj randomly but not to 0 2. Iterate until a stopping criterion is reached choose p input sample x p MNN with weights w ji and v kj output z z = M 1 z m Compare output z with the desired target t; adjust w ji and v kj to move closer to the goal t (by backpropagation)

BackPropagation Learn w ji and v kj by minimizing the training error What is the training error? Suppose the output of MNN for sample x is z and the target (desired output for x ) is t Error on one sample: J 1 2 ( w, v ) = ( t z ) m c= 1 t c z c 2 Training error: J 1 2 n ( i ) ( i ) ( w, v ) = t c z m ( ) c i= 1 c= 1 2 Use gradient descent: v ( 0 ) (,w 0) = repeat until convergence: ( t + 1) ( t ) ( t ) w v = random w η w ( w ) ( t v ) J ( t + 1) ( t ) ( ) = v η v J

BackPropagation: Layered Model activation at hidden unit j net j = d i= 1 x ( i ) w ji + w j 0 output at hidden unit j y ( ) j= f net j activation at output unit k net * k N = H j= 1 y j v kj + v k 0 activation at output unit k ( * z ) k= f net k chain rule chain rule objective function J 1 2 m ( w, v ) = ( t c z c ) c= 1 2 J v kj J w ji

J w BackPropagation of Errors ji = f m ( ) i * ( net j ) x ( tk zk ) f ( netk ) v J kj = ( tk zk ) f' ( netk ) * y j k= 1 unit i unit j v kj error z 1 z m Name backpropagation because during training, errors propagated back from output to hidden layer

Learning Curves classi ification error training time this is a good time to stop training, since after this time we start to overfit Stopping criterion is part of training phase, thus validation data is part of the training data To assess how the network will work on the unseen examples, we still need test data

Momentum Gradient descent finds only a local minima not a problem if J(w) is small at a local minima. Indeed, we do not wish to find w s.t. J(w) = 0 due to overfitting J(w) reasonable local minimum problem if J(w) is large at a local minimum w J(w) global minimum bad local minimum global minimum

Momentum Momentum: popular method to avoid local minima and also speeds up descent in plateau regions weight update at time t is w ( t ) ( t ) ( t 1) w add temporal average direction in which weights have been moving recently = w w J = w + 1 α η + α w w ( t + 1 ) ( t ) ( t 1 ) ( ) steepest descent direction previous direction at α = 0, equivalent to gradient descent at α = 1, gradient descent is ignored, weight update continues in the direction in which it was moving previously (momentum) usually, α is around 0.9

1D Convolution

Neural 1D Convolution Implementation

2D Convolution Matrix reference : http://en.wikipedia.org/wiki/kernel_(image_processing)

Convolutional Filter... Input Feature Map reference : http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/fergus_dl_tutorial_final.pptx

Architecture Trained with stochastic gradient descent on two NVIDIA GPUs for about a week (5~6 days) 650,000 neurons, 60 million parameters, 630 million connections The last layer contains 1,000 neurons which produces a distribution over the 1,000 class labels.

Architecture

Architecture

Architecture

Response-Normalization Layer : the activity of a neuron computed by applying kernel i at position (x, y) The response-normalized activity is given by N : the total # of kernels in the layer n : hyper-parameter, n=5 k : hyper-parameter, k=2 α : hyper-parameter, α=10^(-4) β : hyper-parameter, β =0.75 This aids generalization even though ReLU don t require it. This reduces top-1 error by 1.4, top-5 error rate by 1.2%

Pooling Layer Non-overlapping / overlapping regions Sum or max Max Sum Reduces the error rate of top-1 by 0.4% and top-5 by 0.3% reference : http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/fergus_dl_tutorial_final.pptx

Architecture

First Layer Visualization

ReLU

Learning rule Use stochastic gradient descent with a batch size of 128 examples, momentum of 0.9, and weigh decay of 0.0005 The update rule for weight w was i : the iteration index : the learning rate, initialized at 0.01 and reduced three times prior to termination : the average over the i-th batch D i of the derivative of the objective with respect to w Train for 90 cycles through the training set of 1.2 million images

Fighting overfitting - input This neural net has 60M real-valued parameters and 650,000 neurons It overfils a lot therefore train on five 224x224 patches extracted randomly from 256x256 images, and also their horizontal reflections

Fighting overfitting - Dropout Independently set each hidden unit activity to zero with 0.5 probability Used in the two globally-connected hidden layers at the net's output Doubles the number of iterations required to converge reference : http://www.image-net.org/challenges/lsvrc/2012/supervision.pdf

Results - Classification ILSVRC-2010 test set ILSVRC-2012 test set

Results Classification

Results Retrival

The End Thank you for your attention

Refernces www.cs.toronto.edu/~fritz/absps/imagenet.pd https://prezi.com/jiilm_br8uef/imagenetclassification-with-deep-convolutionalneural-networks/ sglab.kaist.ac.kr/~sungeui/ir/.../second/201454 81오은수.pptx http://alex.smola.org/teaching/cmu2013-10- 701/slides/14_PrincipalComp.pdf Hagit Hel-or (Convolution Slide) http://www.cs.haifa.ac.il/~rita/ml_course/lectures /NN.pdf