Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Similar documents
CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Lecture 17: Neural Networks and Deep Learning

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)

Understanding How ConvNets See

Introduction to Deep Learning CMPT 733. Steven Bergner

CSC321 Lecture 5: Multilayer Perceptrons

Introduction to Convolutional Neural Networks (CNNs)

Convolutional Neural Networks. Srikumar Ramalingam

Lecture 7 Convolutional Neural Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Introduction to Neural Networks

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Convolutional Neural Networks

An overview of deep learning methods for genomics

Statistical Machine Learning

Based on the original slides of Hung-yi Lee

CSC321 Lecture 9: Generalization

CSCI 315: Artificial Intelligence through Deep Learning

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Jakub Hajic Artificial Intelligence Seminar I

CSC 578 Neural Networks and Deep Learning

Lecture 2: Learning with neural networks

Introduction to Deep Neural Networks

Understanding CNNs using visualisation and transformation analysis

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

CSC321 Lecture 9: Generalization

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

ECE521 Lectures 9 Fully Connected Neural Networks

Deep Learning Lab Course 2017 (Deep Learning Practical)

Deep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Neural Networks and Deep Learning.

Tutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Reading Group on Deep Learning Session 1

Neural networks and optimization

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Introduction to Machine Learning Spring 2018 Note Neural Networks

Bayesian Networks (Part I)

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Topic 3: Neural Networks

Neural networks COMS 4771

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 8. Sinan Kalkan

1 What a Neural Network Computes

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

Deep Learning (CNNs)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Nonlinear Models. Numerical Methods for Deep Learning. Lars Ruthotto. Departments of Mathematics and Computer Science, Emory University.

Maxout Networks. Hien Quoc Dang

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Asaf Bar Zvi Adi Hayat. Semantic Segmentation

Lecture 35: Optimization and Neural Nets

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Demystifying deep learning. Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Deep Learning for NLP

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

Convolutional neural networks

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)

Neural Networks and Deep Learning

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Sequence Modeling with Neural Networks

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Theories of Deep Learning

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Multilayer Perceptron

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Intro to Neural Networks and Deep Learning

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Agenda. Digit Classification using CNN Digit Classification using SAE Visualization: Class models, filters and saliency 2 DCT

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Artificial Intelligence

Machine Learning Basics III

Neural Networks (Part 1) Goals for the lecture

Spatial Transformer Networks

Machine Learning

Introduction to Machine Learning (67577)

Rapid Introduction to Machine Learning/ Deep Learning

CSC 411 Lecture 10: Neural Networks

Tutorial on Machine Learning for Advanced Electronics

Neural Network Training

Natural Language Processing with Deep Learning CS224N/Ling284

Layer-wise Relevance Propagation for Neural Networks with Local Renormalization Layers

Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

Transcription:

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group

INTRODUCTION

Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/ Dimensionality may change! 3

Deep Neural Network: Sequence of Many Simple Nonlinear Coordinate Transformations that disentangle the data Data is sparse (almost lower-dimensional) Linear separation of red and blue classes Vladimir Golkov (TUM) 4

The Increasing Complexity of Features Input features: RGB Layer 1 Feature space coordinates: 45 edge yes/no Green patch yes/no Layer 2 Layer 3 Feature space coordinates: Person yes/no Car wheel yes/no [Zeiler & Fergus, ECCV 2014] 5

NEURAL NETWORKS

Fully-Connected Layer a.k.a. Dense Layer x 0 x L is input feature vector for neural network (one sample). is output vector of neural network with L layers. Layer number l has: Inputs (usually x l 1, i.e. outputs of layer number l 1) Weight matrix W l, bias vector b l - both trained (e.g. with stochastic gradient descent) such that x L for the training samples minimizes some objective (loss) Nonlinearity s l (fixed in advance, for example ReLU z max{0, z}) Output x (l) of layer l Transformation from x l 1 to x l performed by layer l: x (l) = s l W l x l 1 + b l Vladimir Golkov (TUM) 7

Example W l = 0 0.1 1 0.2 0 1 x l 1 = 1 2 3 b l = 0 1.2 1 2 b 1 l b 2 l -2.8 = 0 = 1.2 W l x l 1 + b l = 0 1 + 0.1 2 1 3 + 0 = 0.2 1 + 0 2 + 1 3 + 1.2 = 2.8 4 3 4 Vladimir Golkov (TUM) 8

2D Convolutional Layer Appropriate for 2D structured data (e.g. images) where we want: Locality of feature extraction (far-away pixels do not influence local output) Translation-equivariance (shifting input in space (i, j dimensions) yields same output shifted in the same way) (l) x i,j,k = s l b k l + l l 1 w i i,j j, x k,k i,j, k i, j, k the size of W along the i, j dimensions is called filter size the size of W along the k dimension is the number of input channels (e.g. three (red, green, blue) in first layer) the size of W along the k dimension is the number of filters (number of output channels) Equivalent for 1D, 3D,... Vladimir Golkov (TUM) 9

Handcrafted Convolutional Filters The shown filters are handcrafted But filters that are learned by convolutional networks are optimal In terms of the training set and the final loss In the context of all network layers (weights of all layers optimized jointly) https://en.wikipedia.org/wiki/kernel_(image_processing) 10

Convolutional Network Interactive: http://scs.ryerson.ca/~aharley/vis/conv/flat.html 11

Loss Functions N-class classification: N outputs nonlinearity in last layer: softmax loss: categorical cross-entropy between outputs x L and targets t (sum over all training samples) 2-class classification: 1 output nonlinearity in last layer: sigmoid loss: binary cross-entropy between outputs x L and targets t (sum over all training samples) 2-class classification (alternative formulation) 2 outputs nonlinearity in last layer: softmax loss: categorical cross-entropy between outputs x L and targets t (sum over all training samples) Many regression tasks: linear output loss: mean squared error between outputs x L and targets t (sum over all training samples) Vladimir Golkov (TUM) 12

Neural Network Training Procedure Fix number L of layers Fix sizes of weight arrays and bias vectors For a fully-connected layer, this corresponds to the number of neurons Fix nonlinearities Initialize weights and biases with random numbers Repeat: Select mini-batch (i.e. small subset) of training samples Compute the gradient of the loss with respect to all trainable parameters (all weights and biases) Use chain rule ( error backpropagation ) to compute gradient for deeper layers Perform a gradient-descent step (or similar) towards minimizing the error (Called stochastic gradient descent because every mini-batch is a random subset of the entire training set) Early stopping : Stop when loss on validation set starts increasing (to avoid overfitting) Vladimir Golkov (TUM) 13

GOOD PRACTICES

How to Avoid Overfitting hard constraints on model size (number of layers, number of weights) and connections weight constraints (e.g. convolutional networks: locality, translation-equivariance) pooling layers (downsampling of feature maps) additional loss terms random perturbations of training samples (noise, dropout, data augmentation) early stopping Vladimir Golkov (TUM) 15

Do s and Don ts of Data Representation The data representation should be natural (do not outsource known data transformations to the learning) Use meaningful features, especially complementary ones Data augmentation using natural assumptions Vladimir Golkov (TUM) 16

WHAT DID THE NETWORK LEARN?

Visualization: creative, no canonical way Look at standard networks to gain intuition 18

Image Reconstruction from Deep-Layer Features [Mahendran & Vedaldi, CVPR 2015] [Dosovitskiy & Brox, CVPR 2016] [Dosovitskiy & Brox, NIPS 2016] 19

When Should Machine Learning Be Deep? Data distribution is complex Training data is abundant (or can be augmented to abundance) Optionally: Data is structured Neighborhood structure: convolutional networks (images, ) Sequential structure: recurrent networks (memory, e.g. in time) Vladimir Golkov (TUM) 20

THANK YOU!