Recurrent Neural Networks. deeplearning.ai. Why sequence models?

Similar documents
Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Natural Language Processing and Recurrent Neural Networks

Sequence Modeling with Neural Networks

Structured Neural Networks (I)

Stephen Scott.

Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST

Introduction to RNNs!

Recurrent Neural Networks. Jian Tang

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves

Long-Short Term Memory and Other Gated RNNs

Recurrent Neural Networks

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

EE-559 Deep learning LSTM and GRU

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

arxiv: v3 [cs.lg] 14 Jan 2018

Lecture 11 Recurrent Neural Networks I

Neural Architectures for Image, Language, and Speech Processing

Lecture 11 Recurrent Neural Networks I

RECURRENT NETWORKS I. Philipp Krähenbühl

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Deep Learning Recurrent Networks 2/28/2018

Recurrent Neural Networks 2. CS 287 (Based on Yoav Goldberg s notes)

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018

EE-559 Deep learning Recurrent Neural Networks

Recurrent and Recursive Networks

Slide credit from Hung-Yi Lee & Richard Socher

Deep Learning. Recurrent Neural Network (RNNs) Ali Ghodsi. October 23, Slides are partially based on Book in preparation, Deep Learning

CSC321 Lecture 16: ResNets and Attention

Neural Networks Language Models

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Improved Learning through Augmenting the Loss

CSC321 Lecture 15: Exploding and Vanishing Gradients

Recurrent Neural Network

Long-Short Term Memory

Generating Sequences with Recurrent Neural Networks

Applied Natural Language Processing

CSC321 Lecture 10 Training RNNs

Natural Language Processing

High Order LSTM/GRU. Wenjie Luo. January 19, 2016

arxiv: v1 [cs.cl] 21 May 2017

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization

Long Short- Term Memory (LSTM) M1 Yuichiro Sawai Computa;onal Linguis;cs Lab. January 15, Deep Lunch

Lecture 15: Exploding and Vanishing Gradients

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials

NEURAL LANGUAGE MODELS

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Presented By: Omer Shmueli and Sivan Niv

Recurrent neural networks

Deep Learning and Lexical, Syntactic and Semantic Analysis. Wanxiang Che and Yue Zhang

CS224N: Natural Language Processing with Deep Learning Winter 2018 Midterm Exam

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting 卷积 LSTM 网络 : 利用机器学习预测短期降雨 施行健 香港科技大学 VALSE 2016/03/23

Lecture 17: Neural Networks and Deep Learning

Recurrent Neural Networks. COMP-550 Oct 5, 2017

On the use of Long-Short Term Memory neural networks for time series prediction

Faster Training of Very Deep Networks Via p-norm Gates

CSCI 315: Artificial Intelligence through Deep Learning

Based on the original slides of Hung-yi Lee

Deep Learning for Computer Vision

a) b) (Natural Language Processing; NLP) (Deep Learning) Bag of words White House RGB [1] IBM

Deep Learning Recurrent Networks 10/11/2017

Gated Recurrent Neural Tensor Network

Memory-Augmented Attention Model for Scene Text Recognition

Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function

From perceptrons to word embeddings. Simon Šuster University of Groningen

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

CSC321 Lecture 15: Recurrent Neural Networks

Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation

Natural Language Processing

Learning Long-Term Dependencies with Gradient Descent is Difficult

Statistical NLP for the Web

Lecture 8: Recurrent Neural Networks

Deep Learning for NLP

MULTIPLICATIVE LSTM FOR SEQUENCE MODELLING

Neural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve)

Conditional Language modeling with attention

A New Concept using LSTM Neural Networks for Dynamic System Identification

Lecture 15: Recurrent Neural Nets

Implicitly-Defined Neural Networks for Sequence Labeling

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Sequence to Sequence Models and Attention

Spatial Transformer. Ref: Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, Spatial Transformer Networks, NIPS, 2015

Speech and Language Processing

A QUESTION ANSWERING SYSTEM USING ENCODER-DECODER, SEQUENCE-TO-SEQUENCE, RECURRENT NEURAL NETWORKS. A Project. Presented to

Short-term water demand forecast based on deep neural network ABSTRACT

Gated Recurrent Networks

Deep Structured Prediction in Handwriting Recognition Juan José Murillo Fuentes, P. M. Olmos (Univ. Carlos III) and J.C. Jaramillo (Univ.

Better Conditional Language Modeling. Chris Dyer

Residual LSTM: Design of a Deep Recurrent Architecture for Distant Speech Recognition

Deep Learning Recurrent Networks 10/16/2017

COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE-

Deep learning for Natural Language Processing and Machine Translation

Long Short-Term Memory (LSTM)

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Multi-Source Neural Translation

Natural Language Processing

(

Transcription:

Recurrent Neural Networks deeplearning.ai Why sequence models?

Examples of sequence data The quick brown fox jumped over the lazy dog. Speech recognition Music generation Sentiment classification There is nothing to like in this movie. DNA sequence analysis AGCCCCTGTGAGGAACTAG AGCCCCTGTGAGGAACTAG Voulez-vous chanter avec moi? Do you want to sing with me? Machine translation Video activity recognition Name entity recognition Running Yesterday, Harry Potter met Hermione Granger. Yesterday, Harry Potter met Hermione Granger.

Recurrent Neural Networks deeplearning.ai Notation

Motivating example x: Harry Potter and Hermione Granger invented a new spell.

Representing words x: Harry Potter and Hermione Granger invented a new spell.! "#$! "%$! "&$! "($

Representing words x: Harry Potter and Hermione Granger invented a new spell.! "#$! "%$! "&$! "($ And = 367 Invented = 4700 A = 1 New = 5976 Spell = 8376 Harry = 4075 Potter = 6830 Hermione = 4200 Gran = 4000

Recurrent Neural Networks deeplearning.ai Recurrent Neural Network Model

Why not a standard network?! "#$ ) "#$! "%$ ) "%$! "' ($ ) "' *$ Problems: - Inputs, outputs can be different lengths in different examples. - Doesn t share features learned across different positions of text.

Recurrent Neural Networks He said, Teddy Roosevelt was a great President. He said, Teddy bears are on sale!

Forward Propagation )- "#$ )- "%$ )- ".$ )- "' *$ + ",$ + "#$ + "%$ + "' (/#$! "#$! "%$! ".$! "' ($

Simplified RNN notation + "1$ = 3(5 66 + "1/#$ + 5 68! "1$ + 9 6 ) )- "1$ = 3(5 ;6 + "1$ + 9 ; )

Recurrent Neural Networks deeplearning.ai Backpropagation through time

Forward propagation and backpropagation '( "&$ '( ")$ '( "*$ '( "+.$! "#$! "&$! ")$! "+,-&$ % "&$ % ")$ % "*$ % "+,$

Forward propagation and backpropagation L "1$ '( "1$, ' "1$ = Backpropagation through time

Recurrent Neural Networks deeplearning.ai Different types of RNNs

Examples of sequence data The quick brown fox jumped over the lazy dog. Speech recognition Music generation Sentiment classification There is nothing to like in this movie. DNA sequence analysis AGCCCCTGTGAGGAACTAG AGCCCCTGTGAGGAACTAG Voulez-vous chanter avec moi? Do you want to sing with me? Machine translation Video activity recognition Name entity recognition Running Yesterday, Harry Potter met Hermione Granger. Yesterday, Harry Potter met Hermione Granger.

Examples of RNN architectures

Examples of RNN architectures

Summary of RNN types () #'% () #'% () #*% () #+,% () " #$% " #$% " #$% & #'% One to one & One to many & #'% & #*% & #+.% Many to one () #'% () #*% () #+,% () #'% () #+,% " #$% " #$% & #'% & #*% & #+.% Many to many & #'% & #+.% Many to many

Recurrent Neural Networks deeplearning.ai Language model and sequence generation

What is language modelling? Speech recognition The apple and pair salad. The apple and pear salad.!(the apple and pair salad) =!(The apple and pear salad) =

Language modelling with an RNN Training set: large corpus of english text. Cats average 15 hours of sleep a day. The Egyptian Mau is a bread of cat. <EOS>

RNN model Cats average 15 hours of sleep a day. <EOS> L &' ()*, & ()* = - & ()* ()* 0 log &' 0 L = - ) L ()* &' ()*, & ()* 0

Recurrent Neural Networks deeplearning.ai Sampling novel sequences

Sampling a sequence from a trained RNN '( "&$ '( "/$ '( "0$ '( ") *$! "#$! "&$! "/$! "0$! ") *$ % "&$ ' "&$ ' "/$ ' ") -.&$

Character-level language model Vocabulary = [a, aaron,, zulu, <UNK>] '( "&$ '( "/$ '( "0$ '( ") *$! "#$! "&$! "/$! "0$! ") *$ % "&$ '( "&$ '( "/$ '( ") -.&$

Sequence generation News Shakespeare President enrique peña nieto, announced sench s sulk former coming football langston paring. I was not at all surprised, said hich langston. Concussion epidemic, to be examined. The mortal moon hath her eclipse in love. And subject of this thou art another this fold. When besser be my love to me see sabl s. For whose are ruse of mine eyes heaves. The gray football the told some and this has on the uefa icon, should money as.

Recurrent Neural Networks deeplearning.ai Vanishing gradients with RNNs

Vanishing gradients with RNNs '( "&$ '( "-$ '( "/$ '( ") *$! "#$! "&$! "-$! "/$! ") *$ % "&$ % "-$ % ").$ % "/$ % '( Exploding gradients.

Recurrent Neural Networks deeplearning.ai Gated Recurrent Unit (GRU)

RNN unit! "#$ = &(( )! "#*+$, - "#$ + / ) )

GRU (simplified) The cat, which already ate, was full. [Cho et al., 2014. On the properties of neural machine translation: Encoder-decoder approaches] [Chung et al., 2014. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling]

Full GRU 5 "#$ = tanh(( > [ 5 "#*+$, - "#$ ] + / > ) Γ 2 = 3(( 2 5 "#*+$, - "#$ + / 2 ) 5 "#$ = Γ 2 5 "#$ + 1 Γ 2 + 5 "#*+$ The cat, which ate already, was full.

Recurrent Neural Networks deeplearning.ai LSTM (long short term memory) unit

GRU and LSTM GRU LSTM! #$% = tanh(, - Γ /! #$12%, 4 #$% + 6 - ) Γ 8 = 9(, 8! #$12%, 4 #$% + 6 8 ) Γ / = 9(, /! #$12%, 4 #$% + 6 / )! #$% = Γ 8! #$% + 1 Γ 8! #$12% = #$% =! #$% [Hochreiter & Schmidhuber 1997. Long short-term memory]

LSTM units GRU! #$% = tanh(, - Γ /! #$12%, 4 #$% + 6 - ) Γ 8 = 9(, 8! #$12%, 4 #$% + 6 8 ) Γ / = 9(, /! #$12%, 4 #$% + 6 / )! #$% = Γ 8! #$% + 1 Γ 8! #$12% = #$% =! #$% LSTM! #$% = tanh(, - = #$12%, 4 #$% + 6 - ) Γ 8 = 9(, 8 = #$12%, 4 #$% + 6 8 ) Γ > = 9(, > = #$12%, 4 #$% + 6 > ) Γ? = 9(,? = #$12%, 4 #$% + 6? )! #$% = Γ 8! #$% + Γ >! #$12% = #$% = Γ?! #$% [Hochreiter & Schmidhuber 1997. Long short-term memory]

LSTM in pictures D #$%! #$% = tanh(, - = #$12%, 4 #$% + 6 - ) Γ 8 = 9(, 8 = #$12%, 4 #$% + 6 8 ) Γ > = 9(, > = #$12%, 4 #$% + 6 > ) Γ? = 9(,? = #$12%, 4 #$% + 6? )! #$% = Γ 8! #$% + Γ >! #$12%! #$12% = #$12% = #$%! * #$% B #$% C #$% *! #$% A #$% tanh = #$% forget gate update gate tanh output gate * softmax -! #$% = #$% = #$% = Γ?! #$% D #2% D #F% 4 #$% D #G%! #E% softmax = #2%! #2% *! #2% - softmax = #F%! #F% *! #F% -- * softmax = #G% --! #G% = #E% = #2% = #2% = #F% = #F% = #G% 4 #2% 4 #F% 4 #G%

Recurrent Neural Networks deeplearning.ai Bidirectional RNN

Getting information from the future He said, Teddy bears are on sale! He said, Teddy Roosevelt was a great President!!" #)%!" #(%!" #*%!" #.%!" #-%!" #/%!" #$% + #,% + #)% + #(% + #*% + #.% + #-% + #/% + #$% ' #-% ' #)% ' #(% ' #*% ' #.% ' #/% ' #$% He said, Teddy bears are on sale!

Bidirectional RNN (BRNN)

Recurrent Neural Networks deeplearning.ai Deep RNNs

Deep RNN example, "#$, "%$, "&$, "'$ ( [&]"+$ ( [&]"#$ ( [&]"%$ ( [&]"&$ ( [&]"'$ ( [%]"+$ ( [%]"#$ ( [%]"%$ ( [%]"&$ ( [%]"'$ ( [#]"+$! "#$! "%$! "&$! "'$