Jakub Hajic Artificial Intelligence Seminar I

Similar documents
Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Introduction to Convolutional Neural Networks (CNNs)

Deep Learning (CNNs)

Neural Networks and Deep Learning

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

CSC321 Lecture 5: Multilayer Perceptrons

Deep Neural Networks (1) Hidden layers; Back-propagation

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Machine Learning for Physicists Lecture 1

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Deep Learning: a gentle introduction

Neural networks and optimization

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Convolutional Neural Network Architecture

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Grundlagen der Künstlichen Intelligenz

Lecture 17: Neural Networks and Deep Learning

Introduction to Neural Networks

Deep learning attracts lots of attention.

Deep Learning. Hung-yi Lee 李宏毅

Demystifying deep learning. Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK

The connection of dropout and Bayesian statistics

CSCI567 Machine Learning (Fall 2018)

Deep Neural Networks (1) Hidden layers; Back-propagation

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

ECE521 Lectures 9 Fully Connected Neural Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Introduction to (Convolutional) Neural Networks

Neural networks (NN) 1

From perceptrons to word embeddings. Simon Šuster University of Groningen

Convolutional Neural Networks. Srikumar Ramalingam

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

EVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN)

UNSUPERVISED LEARNING

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

Machine Learning. Neural Networks

Introduction to Machine Learning

Introduction to Deep Learning

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Introduction to Machine Learning (67577)

Introduction to Deep Learning

Knowledge Extraction from DBNs for Images

PATTERN RECOGNITION AND MACHINE LEARNING

STA 414/2104: Lecture 8

Introduction to Deep Neural Networks

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Based on the original slides of Hung-yi Lee

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

A summary of Deep Learning without Poor Local Minima

How to do backpropagation in a brain

Course 395: Machine Learning - Lectures

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Feature Design. Feature Design. Feature Design. & Deep Learning

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Machine Learning Lecture 14

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Neural Networks: A Very Brief Tutorial

STA 414/2104: Lecture 8

Introduction to Deep Learning

Introduction to Machine Learning Spring 2018 Note Neural Networks

Convolutional Neural Networks

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Artificial Neural Networks. MGS Lecture 2

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Introduction to Deep Learning CMPT 733. Steven Bergner

Deep Feedforward Networks

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

Deep Learning Architectures and Algorithms

Neural networks and support vector machines

Neural Networks. William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Deep Neural Networks

Artificial Neuron (Perceptron)

Greedy Layer-Wise Training of Deep Networks

Artificial Neural Networks

Artificial Intelligence

10. Artificial Neural Networks

Bayesian Networks (Part I)

Convolutional neural networks

Spatial Transformer Networks

CS 4700: Foundations of Artificial Intelligence

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Neural Networks. Learning and Computer Vision Prof. Olga Veksler CS9840. Lecture 10

Neural Networks. Advanced data-mining. Yongdai Kim. Department of Statistics, Seoul National University, South Korea

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Transcription:

Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014

Outline Key concepts Deep Belief Networks Convolutional Neural Networks

A couple of questions Convolution Perceptron Feedforward Neural Network Backpropagation Boltzmann Machine Belief Net

Convolution Discrete: f g n f mg n m m Continuous: f gn f mgn mdm Image processing:

Perceptron y n i1 w x i i b

Feedforward Neural Network

Backpropagation Training algorithm for feedforward neural networks Gives a formula for the derivative of the cost function w. r. t. the weights in the network Based on propagating the error of the network backwards through the feedforward network Error at layer k+1 -> error at layer k Error at last layer obtained easily -> error of entire network obtainable

Restricted Boltzmann Machine An artificial neural network capable of learning a probability distribution characterising the (training) data Two layers one hidden, one visible; fully connected Weight matrix, bias units

Restricted Boltzmann Machine Probability of states of hidden neurons given the states of visible neurons: p h j 1 v bj viwij Probability of states of visible neurons given the states of hidden neurons: p Where σ(x) is the logistic function: v i 1 h ai hjwij j i 1/ 1 e x

Belief Network

Hinton et al., 2006

The Problem Representing complex characteristics of the data requires a complex belief network A complex (large) belief network is often impossible to train properly

The Solution The Deep Belief Network by Hinton et al. (2006) layered architecture (layers of RBMs) a greedy training algorithm (layer-by-layer)

Restricted Boltzmann Machine An artificial neural network capable of learning a probability distribution characterising the (training) data Two layers one hidden, one visible; fully connected Weight matrix, bias units

Restricted Boltzmann Machine Probability of states of hidden neurons given the states of visible neurons: p h j 1 v bj viwij Probability of states of visible neurons given the states of hidden neurons: p Where σ(x) is the logistic function: v i 1 h ai hjwij j i 1/ 1 e x

The (training) Procedure 1. Train the first layer as an RBM on the input data 2. Use the output of the layer as a representation of the data, and as the input of the next layer 3. Train the next layer 4. Repeat 2. and 3. 5. Fine tune the network (supervised gradient descent)

The Results MNIST digit recognition handwritten digits 28 x 28 grey values + labels 60 000 digits in training set, 10 000 in test set Hinton et al., 2006: 1.25 % test error Inside the mind of the DBN:

The Usage Standalone With a classifier (e. g. logistic regression) on top As pre-training for supervised models

Convolution Discrete: f g n f mg n m m Continuous: f gn f mgn mdm Image processing:

The Ideas Sparse connectivity

The Ideas Shared weights

The Training Backpropagation adapted to learn convolutional layers also deals with max-pooling layers (send error to argmax only)

The results MNIST less than 1 % (multiple entries, close to 0.2 %) ImageNet ILSVRC object recognition challenge 1 200 000 training images 150 000 testing images various resolutions labeled (using AMT) as ~1000 different categories top-5 and top-1 evaluation

The results Krizhevsky et al., 2012: downsampling of images to 256x256 60 million parameters uses a couple of tricks Rectified linear units overlapping max-pooling output: 1000-way softmax data augmentation translation & reflection, simulated illumination variation dropout

Rectified Linear Units Sigmoid activation: Tanh activation: ReLU: max( 0, x) tanh(x) 1/ 1 e x

Max-pooling Take the maximum of the inputs A form of downsampling

Softmax activation A way to interpret the output the last layer of a network as a probabilty distribution Essential idea: Normalize the outputs so they Lie between 0 and 1 Sum to 1 Formula: pi, where q i m e j1 q i e q j is the input of the i-th unit.

Dropout Training technique When training, randomly set some of the neuron outputs to zero Reduces co-adaptation of neurons (reliance on each ohter)

Data Augmentation Create more data e. g. take not only the existing data, but also transofrmations of the data which make sense (keeps the label, is part of same set,...) Reduces overfitting

The finished product 253,440 186,624 64,896 64,896 43,264 4096-4096- 1000

Results top-5 error: 16.4% second place: ~26%

What else? 253,440 186,624 64,896 64,896 43,264 4096-4096- 1000

Similarity (Krizhevsky, 2012)

Similarity (MUFIN, 2005)

Resources neuralnetworksanddeeplearning.com free online book draft deeplearning.net Python tutorial to DBNs, CNNs image-net.org the ILSVRC challenge http://yann.lecun.com/exdb/mnist/index.html MNIST dataset & results

Papers Bengio, 2009: Learning Deep Architectures for AI survey with details Hinton et al., 2006: A Fast Learning Algorithm for Deep Belief Nets Deep belief networks Krizhevsky et al., 2012: ImageNet Classification with Deep Convolutional Neural Networks Convolutional network Schmidhuber, 2014: Deep Learning in Neural Networks: An Overview huge survey

Presentations http://image-net.org/challenges/lsvrc/2014/slides/googlenet.pptx GoogLeNet, Google s winning implementation with improvements, 2014 http://slideslive.com/38891974/detection-and-localization-objects-in-images J. Matas (ČVUT) talks about the history of object detection and why it is upside down since Krizhevsky 2012, Prague Computer Science Seminar, 2014 https://cw.felk.cvut.cz/wiki/_media/courses/ae4m33mpv/deep_learning_mpv.pdf J. Čech: A Shallow Introduction into the Deep Machine Learning, 2014, a nice readable introduction to Krizhevsky and others with lots of examples