Introduction to Neural Networks

Similar documents
Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Introduction to Convolutional Neural Networks (CNNs)

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Tips for Deep Learning

Stochastic gradient descent; Classification

Deep Feedforward Networks

Neural Network Tutorial & Application in Nuclear Physics. Weiguang Jiang ( 蒋炜光 ) UTK / ORNL

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Deep Feedforward Networks

Neural Networks and Deep Learning

Machine Learning

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

EVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN)

epochs epochs

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Reading Group on Deep Learning Session 1

Tips for Deep Learning

Based on the original slides of Hung-yi Lee

Logistic Regression. COMP 527 Danushka Bollegala

Day 3 Lecture 3. Optimizing deep networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural Networks (Part 1) Goals for the lecture

CSCI567 Machine Learning (Fall 2018)

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Pattern Classification

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Deep Learning & Artificial Intelligence WS 2018/2019

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Multilayer Perceptron = FeedForward Neural Network

Multilayer Perceptron

CSC321 Lecture 5: Multilayer Perceptrons

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Introduction to Machine Learning

Introduction to Neural Networks

1 What a Neural Network Computes

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Computational statistics

CSC 578 Neural Networks and Deep Learning

Lecture 5: Logistic Regression. Neural Networks

ECS171: Machine Learning

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Jakub Hajic Artificial Intelligence Seminar I

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Neural networks and optimization

CS260: Machine Learning Algorithms

Introduction to Deep Learning CMPT 733. Steven Bergner

Multilayer Perceptron

ECE521 Lecture 7/8. Logistic Regression

Nonlinear Classification

Neural Networks: Optimization & Regularization

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Artificial Neural Networks

Architecture Multilayer Perceptron (MLP)

Unit III. A Survey of Neural Network Model

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Neural Networks and Deep Learning.

Intro to Neural Networks and Deep Learning

Multilayer Neural Networks

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

The error-backpropagation algorithm is one of the most important and widely used (and some would say wildly used) learning techniques for neural

Artificial Neural Networks

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Artificial Neural Networks. Edward Gatt

ECE521 Lectures 9 Fully Connected Neural Networks

CSC321 Lecture 8: Optimization

Neural Networks biological neuron artificial neuron 1

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima

J. Sadeghi E. Patelli M. de Angelis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Deep Learning Lab Course 2017 (Deep Learning Practical)

Artificial Neural Networks

Grundlagen der Künstlichen Intelligenz

Logistic Regression & Neural Networks

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

CSC242: Intro to AI. Lecture 21

Multilayer Neural Networks

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Machine Learning

Neural Network Training

Introduction to Convolutional Neural Networks 2018 / 02 / 23

A summary of Deep Learning without Poor Local Minima

Multilayer Perceptrons (MLPs)

Artificial Neural Networks. MGS Lecture 2

4. Multilayer Perceptrons

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Learning from Data: Multi-layer Perceptrons

Machine Learning. Neural Networks. Le Song. CSE6740/CS7641/ISYE6740, Fall Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU

Feedforward Neural Nets and Backpropagation

Understanding Neural Networks : Part I

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Transcription:

CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1

Pattern classification Which category of an input? Example: Character recognition for input images Classifier Output the category of an input Feature extraction x 1 x 2 x n input Classifier output a b c x y z Copyright by Nguyen, Hotta and Nakagawa 2

Supervised learning Learning by a training dataset: pair<input, target> Testing on unseen dataset Generalization ability Training dataset a Input b c Target Copyright by Nguyen, Hotta and Nakagawa 3

Supervised learning Learning by a training dataset: pair<input, target> Testing on unseen dataset Generalization ability Learning Classifier Prediction a output a b c x y z Copyright by Nguyen, Hotta and Nakagawa 4

Human neuron Neural Networks, A Simple Explanation https://www.youtube.com/watch?v=gck_ Copyright by Nguyen, Hotta and Nakagawa 5

Artificial neuron Input Weights x w 1 1 w 2 x 2 net Activation function f y n net = x i w i i=1 y = f(net) w n x n Weighted connections Copyright by Nguyen, Hotta and Nakagawa 6

Activation function Controls when neuron should be activated tanh sigmoid linear ReLU Leaky ReLU Copyright by Nguyen, Hotta and Nakagawa 7

Weighted connection + Activation function A neuron is a feature detector: it is activated for a specific feature x 1-0.82 ReLU x 2 0.49 f x 2 0.82x 1 + 0.49x 2 = 0 x 1 Generated by: https://playground.tensorflow.org Copyright by Nguyen, Hotta and Nakagawa 8

Multi-layer perceptron (MLP) Neurons are arrange into layers Each neuron in a layer share the same input from preceding layer Simple features Complex features x 1 x 2 Layers of neurons Generated by: https://playground.tensorflow.org Copyright by Nguyen, Hotta and Nakagawa 9

MLP as a learnable classifier Output corresponding to an input is constrained by weighted connection These weights are learnable (adjustable) input x 1 Input layer Hidden layer Output layer output Z = h(x, W) x 2 z 1 z 2 Output Input Weight x n Weights (W) X Neural Networks (W) Z Copyright by Nguyen, Hotta and Nakagawa 10

Learning ability of neural networks Linear vs Non-linear With linear activation function: can only learn linear function With non-linear activation function: can learn nonlinear function linear sigmoid tanh relu Copyright by Nguyen, Hotta and Nakagawa 11

Learning ability of neural network Universal approximation theorem [Hornik, 1991]: MLP can learn arbitrary function with a single hidden layer For complex functions, however, may require large hidden layer Deep neural network Contains many hidden layers, can extract complex features Input layer Hidden layers Output layer Copyright by Nguyen, Hotta and Nakagawa 12

Learning in Neural Networks Weighted connection is tuned using the training data <input, target> Objective: Networks could output correct targets corresponding to inputs Training dataset Input pattern Target b Copyright by Nguyen, Hotta and Nakagawa 13

Learning in Neural Networks Loss function (objective function) Difference between output and target Learning: optimization process Minimize the loss (make output match target) Loss Target Output L = T Z = T h X, W = l(w) input x 1 x 2 Input layer Hidden layer Output layer output z 1 Target t 1 z k t k Weights Input x n Weights (W) Loss(L) Copyright by Nguyen, Hotta and Nakagawa 14

Learning in Neural Networks Gradient vector of l for W: W l W W l = l W W l W Weight update Reverse gradient direction W update = W current η l W W η:learning rate Copyright by Nguyen, Hotta and Nakagawa 15

Loss function Logistic regression Probabilistic loss function Binary entropy Cross entropy Multimodal Mean square error Copyright by Nguyen, Hotta and Nakagawa 16

Learning & converge By update weight using gradient, loss is reduced and converge to minima l w w 3 w 0 w 1 w 2 w w Copyright by Nguyen, Hotta and Nakagawa 17

Learning through all training samples After updating weights, new training samples is fed to the networks to continue learning When all training samples is learnt, networks has completed one epoch. Networks must run through many epochs to converge. Weight update strategy Stochastic gradient descent (SGD) Batch update Mini-batch Copyright by Nguyen, Hotta and Nakagawa 18

Momentum Optimizer Learning may stuck on a local minima. Momentum: w retains the latest optimizing direction. It may help the optimizer overcome the local minima. l w w 1 w w 0 w W update = W current η η: learning rate α: momentum parameter l W W + α w Copyright by Nguyen, Hotta and Nakagawa 19

Overfitting & Generalization While training, model complexity increases through each epoch Overfitting: Model is over-complex Poor generalization: good performance on train set but poor on test set Loss train test 1.0 Accuracy 0 Epochs Copyright by Nguyen, Hotta and Nakagawa 20

Prevent overfitting: Regularization Weight decaying Weight noise Early stopping Evaluate performance on a validation set Stop while there is no improvement on validation set Loss validation train Copyright by Nguyen, Hotta and Nakagawa 21

Prevent overfitting: Regularization Dropout Randomly drop the neurons with a predefined probability Good regularization: large ensembles of networks Bayesian perspective Copyright by Nguyen, Hotta and Nakagawa 22

Adam optimizer Adaptive learning rate Copyright by Nguyen, Hotta and Nakagawa 23

GPU implementation Keras + Tensorflow Practice Copyright by Nguyen, Hotta and Nakagawa 24