Deep learning attracts lots of attention.

Similar documents
Deep Learning. Hung-yi Lee 李宏毅

Based on the original slides of Hung-yi Lee

Convolutional Neural Network. Hung-yi Lee

Deep Reinforcement Learning. Scratching the surface

More Tips for Training Neural Network. Hung-yi Lee

Deep Learning Tutorial. 李宏毅 Hung-yi Lee

Classification: Probabilistic Generative Model

Jakub Hajic Artificial Intelligence Seminar I

Regression. Hung-yi Lee 李宏毅

Deep Learning Tutorial. 李宏毅 Hung-yi Lee

Language Modeling. Hung-yi Lee 李宏毅

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Deep Neural Networks (1) Hidden layers; Back-propagation

Grundlagen der Künstlichen Intelligenz

Course 395: Machine Learning - Lectures

Deep Neural Networks (1) Hidden layers; Back-propagation

Neural Networks and Deep Learning

Unsupervised Learning: Linear Dimension Reduction

Convolutional Neural Networks. Srikumar Ramalingam

From perceptrons to word embeddings. Simon Šuster University of Groningen

Convolutional Neural Network Architecture

Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Lecture 17: Neural Networks and Deep Learning

Machine Learning for Physicists Lecture 1

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

PV021: Neural networks. Tomáš Brázdil

Deep Feedforward Networks

Introduction to Convolutional Neural Networks (CNNs)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Feedforward Neural Networks. Michael Collins, Columbia University

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

A summary of Deep Learning without Poor Local Minima

Based on the original slides of Hung-yi Lee

Semi-supervised Learning

Deep Learning for Speech Recognition. Hung-yi Lee

Deep Learning (CNNs)

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Course 10. Kernel methods. Classical and deep neural networks.

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Tips for Deep Learning

Introduction to Deep Learning CMPT 733. Steven Bergner

Deep Learning for NLP

ECE521 Lectures 9 Fully Connected Neural Networks

Deep Recurrent Neural Networks

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)

Restricted Boltzmann Machines

Deep Learning: a gentle introduction

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Learning Deep Architectures for AI. Part I - Vijay Chakilam

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

EIE6207: Deep Learning and Deep Neural Networks

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Deep Learning, Data Irregularities and Beyond

Introduction of Reinforcement Learning

Algorithms for Learning Good Step Sizes

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks 2. 2 Receptive fields and dealing with image inputs

CS60010: Deep Learning

Welcome to the Machine Learning Practical Deep Neural Networks. MLP Lecture 1 / 18 September 2018 Single Layer Networks (1) 1

RAGAV VENKATESAN VIJETHA GATUPALLI BAOXIN LI NEURAL DATASET GENERALITY

Introduction to Neural Networks

Artificial Neural Networks. Historical description

Multilayer Perceptrons (MLPs)

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Tips for Deep Learning

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Very Deep Residual Networks with Maxout for Plant Identification in the Wild Milan Šulc, Dmytro Mishkin, Jiří Matas

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Lecture 35: Optimization and Neural Nets

Neural Networks: Backpropagation

Statistical Machine Learning

Deep Learning Architectures and Algorithms

Machine Learning

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Feature Design. Feature Design. Feature Design. & Deep Learning

Introduction to Machine Learning Spring 2018 Note Neural Networks

CSC321 Lecture 4: Learning a Classifier

Convolutional Neural Networks

Machine Learning. Neural Networks

Is Robustness the Cost of Accuracy? A Comprehensive Study on the Robustness of 18 Deep Image Classification Models

Importance Reweighting Using Adversarial-Collaborative Training

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

1 Introduction 2. 4 Q-Learning The Q-value The Temporal Difference The whole Q-Learning process... 5

Reading Group on Deep Learning Session 1

Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation

CSC321 Lecture 4: Learning a Classifier

Deep Feedforward Networks

Bayesian Networks (Part I)

Machine Learning Lecture 12

Transcription:

Deep Learning

Deep learning attracts lots of attention. I believe you have seen lots of exciting results before. Deep learning trends at Google. Source: SIGMOD/Jeff Dean

Ups and downs of Deep Learning 958: Perceptron (linear model) 969: Perceptron has limitation 980s: Multi-layer perceptron Do not have significant difference from DNN today 986: Backpropagation Usually more than 3 hidden layers is not helpful 989: hidden layer is good enough, why deep? 2006: RBM initialization (breakthrough) 2009: GPU 20: Start to be popular in speech recognition 202: win ILSVRC image competition

Three Steps for Deep Learning Step : define Neural a set of Network function Step 2: goodness of function Step 3: pick the best function Deep Learning is so simple

Neural Network z z z z Neuron Neural Network Different connection leads to different network structures Network parameter θ: all the weights and biases in the neurons

Fully Connect Feedforward Network -2 4 0.98 - - -2 0 0.2 Sigmoid Function z e z z z

Fully Connect Feedforward Network -2 4 0.98 2-0 0.86 3 - -2 0.62 - - -2 0 0.2-2 - 0 0. - 4 2 0.83

Fully Connect Feedforward Network 0-2 0.73 2-0 0.72 3 - -2 0.5 0-0 0.5 - -2 0 0.2 4-2 0.85 This is a function. Input vector, output vector f = 0.62 0.83 f 0 0 = 0.5 0.85 Given network structure, define a function set

Fully Connect Feedforward Network neuron Input x Layer Layer 2 Layer L Output y x 2 y 2 x N y M Input Layer Hidden Layers Output Layer

Deep = Many hidden layers 22 layers http://cs23n.stanford.e du/slides/winter56_le cture8.pdf 9 layers 6.4% 8 layers 7.3% 6.7% AlexNet (202) VGG (204) GoogleNet (204)

Deep = Many hidden layers 0 layers 52 layers Special structure 3.57% 6.4% AlexNet (202) 7.3% 6.7% VGG (204) GoogleNet (204) Residual Net (205) Taipei 0

Matrix Operation -2 4 0.98 y - - -2 0 0.2 y 2 σ 2 + 0 = 0.98 0.2 4 2

Neural Network x y x 2 x N W W 2 W L b b 2 b L x a a 2 y y 2 y M σ W x + b σ W 2 a + b 2 σ W L a L- + b L

Neural Network x y x 2 x N W W 2 W L b b 2 b L x a a 2 y y 2 y M y = f x Using parallel computing techniques to speed up matrix operation = σ W L σ W 2 σ W x + b + b 2 + b L

Output Layer Feature extractor replacing feature engineering x y x x 2 x K Softmax y 2 y M Input Layer Hidden Layers Output Layer = Multi-class Classifier

Example Application Input Output x y 0. is 6 x 6 = 256 Ink No ink 0 x 2 x 256 y 2 0.7 y 0 0.2 is 2 is 0 The image is 2 Each dimension represents the confidence of a digit.

Example Application Handwriting Digit Recognition x y is x 2 y 2 Machine Neural 2 Network is 2 x 256 What is needed is a function Input: 256-dim vector y 0 is 0 output: 0-dim vector

Example Application Input x Layer Layer 2 Layer L Output y is x 2 x N Input Layer A function set containing the candidates for Handwriting Digit Recognition Hidden Layers Output Layer y 2 2 You need to decide the network structure to let a good function in your function set. y 0 is 2 is 0

FAQ Q: How many layers? How many neurons for each layer? Trial and Error + Intuition Q: Can the structure be automatically determined? E.g. Evolutionary Artificial Neural Networks Q: Can we design the network structure? Convolutional Neural Network (CNN)

Three Steps for Deep Learning Step : define Neural a set of Network function Step 2: goodness of function Step 3: pick the best function Deep Learning is so simple

Loss for an Example target x y y x 2 x 256 Given a set of parameters 0 C y, y = i= y i lny i Softmax y 2 y 0 y Cross Entropy y 2 0 y 0 y 0

Total Loss For all training data Total Loss: N L = n= C n x x 2 x N NN NN NN y y 2 y N C C 2 y y 2 x 3 NN y 3 y 3 C 3 C N y N Find a function in function set that minimizes total loss L Find the network parameters θ that minimize total loss L

Three Steps for Deep Learning Step : define Neural a set of Network function Step 2: goodness of function Step 3: pick the best function Deep Learning is so simple

Gradient Descent θ Compute LΤ w w 0.2 0.5 μ LΤ w Compute LΤ w 2 w 2-0. 0.05 μ LΤ w 2 Compute LΤ b b 0.3 0.2 μ LΤ b L = L w L w 2 L b gradient

Gradient Descent θ Compute LΤ w w 0.2 0.5 μ LΤ w Compute LΤ w 2 w 2-0. 0.05 μ LΤ w 2 Compute LΤ b b 0.3 0.2 μ LΤ b Compute μ Compute μ Compute μ Τ L w 0.09 Τ L w Τ L w 2 0.5 Τ L w 2 Τ L b 0.0 Τ L b

Gradient Descent This is the learning of machines in deep learning Even alpha go using this approach. People image Actually.. I hope you are not too disappointed :p

Backpropagation Backpropagation: an efficient way to compute neural network Τ L w in libdnn 台大周伯威同學開發 Ref: http://speech.ee.ntu.edu.tw/~tlkagk/courses/mlds_205_2/lecture/dnn%20b ackprop.ecm.mp4/index.html

Concluding Remarks Step : define Neural a set of Network function Step 2: goodness of function Step 3: pick the best function What are the benefits of deep architecture?

Deeper is Better? Layer X Size Word Error Rate (%) X 2k 24.2 2 X 2k 20.4 3 X 2k 8.4 4 X 2k 7.8 Layer X Size Word Error Rate (%) Not surprised, more parameters, better performance 5 X 2k 7.2 X 3772 22.5 7 X 2k 7. X 4634 22.6 X 6k 22. Seide, Frank, Gang Li, and Dong Yu. "Conversational Speech Transcription Using Context-Dependent Deep Neural Networks." Interspeech. 20.

Universality Theorem Any continuous function f f : R N R M Can be realized by a network with one hidden layer (given enough hidden neurons) Reference for the reason: http://neuralnetworksandde eplearning.com/chap4.html Why Deep neural network not Fat neural network? (next lecture)

深度學習深度學習 My Course: Machine learning and having it deep and structured http://speech.ee.ntu.edu.tw/~tlkagk/courses_mlsd5_2. html 6 hour version: http://www.slideshare.net/tw_dsconf/ss- 6224535 Neural Networks and Deep Learning written by Michael Nielsen http://neuralnetworksanddeeplearning.com/ Deep Learning written by Yoshua Bengio, Ian J. Goodfellow and Aaron Courville http://www.deeplearningbook.org

Acknowledgment 感謝 Victor Chen 發現投影片上的打字錯誤