Deep Learning. Hung-yi Lee 李宏毅

Similar documents
Deep learning attracts lots of attention.

Convolutional Neural Network. Hung-yi Lee

Regression. Hung-yi Lee 李宏毅

More Tips for Training Neural Network. Hung-yi Lee

Based on the original slides of Hung-yi Lee

Classification: Probabilistic Generative Model

Language Modeling. Hung-yi Lee 李宏毅

Deep Reinforcement Learning. Scratching the surface

Jakub Hajic Artificial Intelligence Seminar I

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Neural Networks and Deep Learning

Introduction to Convolutional Neural Networks (CNNs)

Unsupervised Learning: Linear Dimension Reduction

Deep Learning Tutorial. 李宏毅 Hung-yi Lee

Lecture 17: Neural Networks and Deep Learning

Convolutional Neural Network Architecture

Semi-supervised Learning

Deep Learning Tutorial. 李宏毅 Hung-yi Lee

Convolutional Neural Networks. Srikumar Ramalingam

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Intelligence

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Deep Learning (CNNs)

Artificial Neural Networks. Historical description

Intro to Neural Networks and Deep Learning

Deep Neural Networks (1) Hidden layers; Back-propagation

How to do backpropagation in a brain

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Recurrent Neural Networks. COMP-550 Oct 5, 2017

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Course 395: Machine Learning - Lectures

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

CSE446: Neural Networks Winter 2015

Grundlagen der Künstlichen Intelligenz

TTIC 31230, Fundamentals of Deep Learning, Winter David McAllester. The Fundamental Equations of Deep Learning

Multilayer Perceptrons (MLPs)

Deep Learning: a gentle introduction

Neural Networks. Intro to AI Bert Huang Virginia Tech

Machine Learning. Boris

Artifical Neural Networks

Introduction to Neural Networks

Artificial Neural Networks

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Machine Learning for Physicists Lecture 1

CS 4700: Foundations of Artificial Intelligence

Recurrent Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Learning and Neural Networks

Neural networks. Chapter 19, Sections 1 5 1

From perceptrons to word embeddings. Simon Šuster University of Groningen

Artificial Neural Networks Examination, March 2004

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Deep Neural Networks (1) Hidden layers; Back-propagation

Introduction to Deep Learning

CSC321 Lecture 4: Learning a Classifier

Introduction to Machine Learning Spring 2018 Note Neural Networks

Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation

Neural Networks for NLP. COMP-599 Nov 30, 2016

Deep Learning Architectures and Algorithms

ECE521 Lectures 9 Fully Connected Neural Networks

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Convolutional Neural Networks

EIE6207: Deep Learning and Deep Neural Networks

CSC321 Lecture 4: Learning a Classifier

Reading Group on Deep Learning Session 1

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units

Introduction to (Convolutional) Neural Networks

Lecture 35: Optimization and Neural Nets

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

Tips for Deep Learning

Introduction of Reinforcement Learning

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Machine Learning. Neural Networks

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Introduction to Neural Networks

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Deep Recurrent Neural Networks

Understanding How ConvNets See

Deep Feedforward Networks

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

CS 4700: Foundations of Artificial Intelligence

Neural Networks and the Back-propagation Algorithm

Some Applications of Machine Learning to Astronomy. Eduardo Bezerra 20/fev/2018

COMP-4360 Machine Learning Neural Networks

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

Multilayer Neural Networks

Deep Learning. What Is Deep Learning? The Rise of Deep Learning. Long History (in Hind Sight)

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

1 Introduction 2. 4 Q-Learning The Q-value The Temporal Difference The whole Q-Learning process... 5

CSC321 Lecture 5: Multilayer Perceptrons

CMSC 421: Neural Computation. Applications of Neural Networks

Transcription:

Deep Learning Hung-yi Lee 李宏毅

Deep learning attracts lots of attention. I believe you have seen lots of exciting results before. Deep learning trends at Google. Source: SIGMOD 206/Jeff Dean

958: Perceptron (linear model) 969: Perceptron has limitation 980s: Multi-layer perceptron Do not have significant difference from DNN today 986: Backpropagation Usually more than 3 hidden layers is not helpful 989: hidden layer is good enough, why deep? 2006: RBM initialization 2009: GPU Ups and downs of Deep Learning 20: Start to be popular in speech recognition 202: win ILSVRC image competition 205.2: Image recognition surpassing human-level performance 206.3: Alpha GO beats Lee Sedol 206.0: Speech recognition system as good as humans

Three Steps for Deep Learning Step : define Neural a set of Network function Step 2: goodness of function Step 3: pick the best function Deep Learning is so simple

Neural Network z z z z Neuron Neural Network Different connection leads to different network structures Network parameter θ: all the weights and biases in the neurons

Fully Connect Feedforward Network -2 4 0.98 - - -2 0 0.2 Sigmoid Function z e z z z

Fully Connect Feedforward Network -2 4 0.98 2-0 0.86 3 - -2 0.62 - - -2 0 0.2-2 - 0 0. - 4 2 0.83

Fully Connect Feedforward Network 0-2 0.73 2-0 0.72 3 - -2 0.5 0-0 0.5 - -2 0 0.2 4-2 0.85 This is a function. Input vector, output vector f = 0.62 0.83 f 0 0 = 0.5 0.85 Given network structure, define a function set

Fully Connect Feedforward Network neuron Input x Layer Layer 2 Layer L Output y x 2 y 2 x N y M Input Layer Hidden Layers Output Layer

Deep = Many hidden layers 22 layers http://cs23n.stanford.e du/slides/winter56_le cture8.pdf 9 layers 6.4% 8 layers 7.3% 6.7% AlexNet (202) VGG (204) GoogleNet (204)

Deep = Many hidden layers 52 layers 0 layers Special structure Ref: https://www.youtube.com/watch?v= dxb6299gpvi 3.57% 6.4% AlexNet (202) 7.3% 6.7% VGG (204) GoogleNet (204) Residual Net (205) Taipei 0

Matrix Operation -2 4 0.98 y - - -2 0 0.2 y 2 σ 2 + 0 = 0.98 0.2 4 2

Neural Network x y x 2 x N W W 2 W L b b 2 b L x a a 2 y y 2 y M σ W x + b σ W 2 a + b 2 σ W L a L- + b L

Neural Network x y x 2 x N W W 2 W L b b 2 b L x a a 2 y y 2 y M y = f x Using parallel computing techniques to speed up matrix operation = σ W L σ W 2 σ W x + b + b 2 + b L

Output Layer as Multi-Class Classifier Feature extractor replacing feature engineering x y x x 2 x K Softmax y 2 y M Input Layer Hidden Layers Output Layer = Multi-class Classifier

Example Application Input Output x y 0. is 6 x 6 = 256 Ink No ink 0 x 2 x 256 y 2 0.7 y 0 0.2 is 2 is 0 The image is 2 Each dimension represents the confidence of a digit.

Example Application Handwriting Digit Recognition x y is x 2 y 2 Machine Neural 2 Network is 2 x 256 What is needed is a function Input: 256-dim vector y 0 is 0 output: 0-dim vector

Example Application Input x Layer Layer 2 Layer L Output y is x 2 x N Input Layer A function set containing the candidates for Handwriting Digit Recognition Hidden Layers Output Layer y 2 2 You need to decide the network structure to let a good function in your function set. y 0 is 2 is 0

FAQ Q: How many layers? How many neurons for each layer? Trial and Error + Intuition Q: Can the structure be automatically determined? E.g. Evolutionary Artificial Neural Networks Q: Can we design the network structure? Convolutional Neural Network (CNN)

Three Steps for Deep Learning Step : define Neural a set of Network function Step 2: goodness of function Step 3: pick the best function Deep Learning is so simple

Loss for an Example target x y y x 2 x 256 Given a set of parameters 0 l y, y = i= y i lny i Softmax y 2 y 0 y Cross Entropy y 2 0 y 0 y 0

Total Loss For all training data Total Loss: N L = n= l n x x 2 x N NN NN NN y y 2 y N l l 2 y y 2 x 3 NN y 3 y 3 l 3 l N y N Find a function in function set that minimizes total loss L Find the network parameters θ that minimize total loss L

Three Steps for Deep Learning Step : define Neural a set of Network function Step 2: goodness of function Step 3: pick the best function Deep Learning is so simple

Gradient Descent θ Compute LΤ w w 0.2 0.5 μ LΤ w Compute LΤ w 2 w 2-0. 0.05 μ LΤ w 2 Compute LΤ b b 0.3 0.2 μ LΤ b L = L w L w 2 L b gradient

Gradient Descent θ Compute LΤ w w 0.2 0.5 μ LΤ w Compute LΤ w 2 w 2-0. 0.05 μ LΤ w 2 Compute LΤ b b 0.3 0.2 μ LΤ b Compute μ Compute μ Compute μ Τ L w 0.09 Τ L w Τ L w 2 0.5 Τ L w 2 Τ L b 0.0 Τ L b

Gradient Descent This is the learning of machines in deep learning Even alpha go using this approach. People image Actually.. I hope you are not too disappointed :p

Backpropagation Backpropagation: an efficient way to compute neural network Τ L w in libdnn 台大周伯威同學開發 Ref: http://speech.ee.ntu.edu.tw/~tlkagk/courses/mlds_205_2/lecture/dnn%20b ackprop.ecm.mp4/index.html

Three Steps for Deep Learning Step : define Neural a set of Network function Step 2: goodness of function Step 3: pick the best function Deep Learning is so simple

Acknowledgment 感謝 Victor Chen 發現投影片上的打字錯誤