Introduction to Deep Learning CMPT 733. Steven Bergner

Similar documents
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

CSCI 315: Artificial Intelligence through Deep Learning

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Introduction to Neural Networks

Deep Feedforward Networks

Jakub Hajic Artificial Intelligence Seminar I

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Deep Feedforward Networks

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

ECE521 Lecture 7/8. Logistic Regression

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

PATTERN RECOGNITION AND MACHINE LEARNING

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Parameter Norm Penalties. Sargur N. Srihari

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 8. Sinan Kalkan

Deep Learning. Convolutional Neural Network (CNNs) Ali Ghodsi. October 30, Slides are partially based on Book in preparation, Deep Learning

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Convolutional Neural Network Architecture

Neural networks and optimization

CSC321 Lecture 5: Multilayer Perceptrons

Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation

Introduction to (Convolutional) Neural Networks

1 What a Neural Network Computes

Learning Deep Architectures for AI. Part II - Vijay Chakilam

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

EVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN)

Machine Learning for Physicists Lecture 1

Lecture 17: Neural Networks and Deep Learning

Neural Networks: Optimization & Regularization

Linear and Logistic Regression. Dr. Xiaowei Huang

ECE521 W17 Tutorial 1. Renjie Liao & Min Bai

SGD and Deep Learning

Course 395: Machine Learning - Lectures

Deep Learning (CNNs)

Nonlinear Classification

STA 414/2104: Lecture 8

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

Machine Learning

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

An overview of deep learning methods for genomics

Classification of Hand-Written Digits Using Scattering Convolutional Network

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Artificial Intelligence

Dreem Challenge report (team Bussanati)

Demystifying deep learning. Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK

Deep learning attracts lots of attention.

Bagging and Other Ensemble Methods

Machine Learning Lecture 12

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Lecture 35: Optimization and Neural Nets

Introduction to Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Advanced Training Techniques. Prajit Ramachandran

Machine Learning Basics

A summary of Deep Learning without Poor Local Minima

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CSC321 Lecture 2: Linear Regression

Welcome to the Machine Learning Practical Deep Neural Networks. MLP Lecture 1 / 18 September 2018 Single Layer Networks (1) 1

Grundlagen der Künstlichen Intelligenz

@SoyGema GEMA PARREÑO PIQUERAS

Introduction to Convolutional Neural Networks (CNNs)

Compressing deep neural networks

CSC321 Lecture 9: Generalization

CSC321 Lecture 15: Exploding and Vanishing Gradients

CSC321 Lecture 5 Learning in a Single Neuron

Maxout Networks. Hien Quoc Dang

CSC 411 Lecture 10: Neural Networks

Convolutional Neural Networks. Srikumar Ramalingam

Course 10. Kernel methods. Classical and deep neural networks.

CSC321 Lecture 9: Generalization

9 Classification. 9.1 Linear Classifiers

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Deep Learning Lecture 2

Understanding Neural Networks : Part I

Deep Learning: Pre- Requisites. Understanding gradient descent, autodiff, and softmax

Machine Learning. Boris

STA 414/2104: Lecture 8

Spatial Transformation

Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Deep Neural Networks (1) Hidden layers; Back-propagation

Machine Learning Lecture 10

Lecture 7 Convolutional Neural Networks

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Introduction to Machine Learning Spring 2018 Note Neural Networks

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Computational statistics

Neural Network Tutorial & Application in Nuclear Physics. Weiguang Jiang ( 蒋炜光 ) UTK / ORNL

Deep Learning & Artificial Intelligence WS 2018/2019

Transcription:

Introduction to Deep Learning CMPT 733 Steven Bergner

Overview Renaissance of artificial neural networks Representation learning vs feature engineering Background Linear Algebra, Optimization Regularization Construction and training of layered learners Frameworks for deep learning 2

Representations matter Transform into the right representation Classify points simply by threshold on radius axis 3

Representations matter Transform into the right representation Classify points simply by threshold on radius axis Single neuron with nonlinearity can do this 4

Depth: layered composition 5

Computational graph 6

Components of learning Hand designed program Input Output Increasingly automated Simple features Abstract features Mapping from features 7

Growing Dataset Size MNIST dataset 8

Basics Linear Algebra and Optimization 9

Linear Algebra Tensor is an array of numbers Multi-dim: 0d scalar, 1d vector, 2d matrix/image, 3d RGB image Matrix (dot) product Dot product of vectors A and B (m = p = 1 in above notation, n=2) 10

Linear Algebra Tensor is an array of numbers Multi-dim: 0d scalar, 1d vector, 2d matrix/image, 3d RGB image Matrix (dot) product Dot product of vectors A and B (m = p = 1 in above notation, n=2) 11

Linear Algebra Tensor is an array of numbers Multi-dim: 0d scalar, 1d vector, 2d matrix/image, 3d RGB image Matrix (dot) product Dot product of vectors A and B (m = p = 1 in above notation, n=2) 12

Linear Algebra Tensor is an array of numbers Multi-dim: 0d scalar, 1d vector, 2d matrix/image, 3d RGB image Matrix (dot) product Dot product of vectors A and B (m = p = 1 in above notation, n=2) 13

Linear algebra: Norms 14

Nonlinearities ReLU Sofplus Logistic Sigmoid [(c) public domain] 15

Approximate Optimization 16

Gradient descent 17

Critical points 18

Critical points Saddle point 1st and 2nd derivative vanish 19

Critical points Saddle point 1st and 2nd derivative vanish Poor conditioning: 1st deriv large in one and small in another direction 20

Tensorflow Playground http://playground.tensorflow.org/ Try out simple network configurations https://cs.stanford.edu/people/karpathy/convnetjs/demo/cla ssify2d.html Visualize linear and non-linear mappings 21

Regularization Reduced generalization error without impacting training error 22

Constrained optimization Unregularized objective 23

Constrained optimization Squared L2 encourages small weights Unregularized objective L2 regularizer 24

Constrained optimization Squared L2 encourages small weights Unregularized objective L1 encourages sparsity of model parameters (weights) L2 regularizer 25

Dataset augmentation 26

Learning curves 27

Learning curves Early stopping before validation error starts to increase 28

Bagging Average multiple models trained on subsets of the data 29

Bagging Average multiple models trained on subsets of the data First subset: learns top loop, Second subset: bottom loop 30

Dropout Random sample of connection weights is set to zero Train diferent network model each time Learn more robust, generalizable features 31

Multitask learning Shared parameters are trained with more data Improved generalization error due to increased statistical strength 32

Components of popular architectures 33

Convolution as edge detector 34

Gabor wavelets (kernels) 35

Gabor wavelets (kernels) Local average, first derivative 36

Gabor wavelets (kernels) Second derivative (curvature) Local average, first derivative 37

Gabor wavelets (kernels) Directional second derivative Second derivative (curvature) Local average, first derivative 38

Gabor-like learned kernels Features extractors provided by pretrained networks 39

Max pooling translation invariance Take max of certain neighbourhood 40

Max pooling translation invariance Take max of certain neighbourhood Ofen combined followed by downsampling 41

Max pooling transform invariance 42

Types of connectivity 43

Types of connectivity 44

Types of connectivity 45

Choosing architecture family 46

Choosing architecture family No structure fully connected 47

Choosing architecture family No structure fully connected Spatial structure convolutional 48

Choosing architecture family No structure fully connected Spatial structure convolutional Sequential structure recurrent 49

Optimization Algorithm Lots of variants address choice of learning rate See Visualization of Algorithms AdaDelta and RMSprop ofen work well 50

Sofware for Deep Learning 51

Current Frameworks Tensorflow / Keras Pytorch DL4J Cafe And many more Most have CPU-only mode but much faster on NVIDIA GPU 52

Development strategy Identify needs: High accuracy or low accuracy? Choose metric Accuracy (% of examples correct), Coverage (% examples processed) Precision TP/(TP+FP), Recall TP/(TP+FN) Amount of error in case of regression Build end-to-end system Start from baseline, e.g. initialize with pre-trained network Refine driven by data 53

Sources I. Goodfellow, Y. Bengio, A. Courville Deep Learning MIT Press 2016 [link] 54