Gated Recurrent Networks
|
|
- Shana Lee
- 5 years ago
- Views:
Transcription
1 Gated Recurrent Networks Davide Bacciu Dipartimento di Informatica Università di Pisa Intelligent Systems for Pattern Recognition (ISPR)
2 Outline Recurrent Neural Networks Gradient Issues Dealing with Sequences in NN t = 0 t = 1 t = 2 t = N Variable size data describing sequentially dependent information c 3 c N Neural models need to capture dynamic context c t to perform predictions Recurrent Neural Network Fully adaptive (Elman, SRN, ) Randomized approaches (Reservoir Computing) Introduce (deep) gated recurrent networks
3 Outline Recurrent Neural Networks Gradient Issues Lecture Outline RNN Repetita Motivations Learning long-term dependencies is difficult Gradient issues Gated RNN Long-Short Term Memories (LSTM) Gated Recurrent Units (GRU) Advanced topics Understanding and exploiting memory encoding
4 Outline Recurrent Neural Networks (RNN) Gradient Issues Unfolding RNN (Forward Pass) By now you should be familiar with the concept of model unfolding/unrolling on the data Graphics colah.github.io model q 1 data x 0 x 1 x 2 x t memory encoding unfolding Map an arbitrary length sequence x 0.. x t to fixedlength encoding h t
5 Outline Recurrent Neural Networks (RNN) Gradient Issues Supervised Recurrent Tasks output hidden input element to element sequence to item item to sequence sequence to sequence Graphics karpathy.github.io
6 Outline Recurrent Neural Networks (RNN) Gradient Issues A Non-Gated RNN (a.k.a. Vanilla) y t = f(w out h t + b out ) h t 1 h t h t i g t i h t = tanh(g t ) W h i h t 1 x t i W in x t g t h t 1, x t = W h h t 1 + W in x t + b h
7 Outline Recurrent Neural Networks Gradient Issues Learning to Encode Input History Hidden state h t summarizes information on the history of the input signal up to time t
8 Outline Recurrent Neural Networks Gradient Issues Learning Long-Term Dependencies is Difficult When the time gap between the observation and the state grows there is little residual information of the input inside of the memory What is the cause? J. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen, TUM, 1991
9 Outline Recurrent Neural Networks Gradient Issues Exploding/Vanishing Gradient Short story: Gradients propagated over many stages tend to Vanish (often) No learning Explode (rarely) Instability and oscillations Weights are shared between time steps sum gradient contributions through time h 1 h t 2 h t 1 δl t δh t y t h t δl t δw δh t 2 δh t 1 δh t x 1 δh t 3 x t 2 δh t 2 x t 1 δh t 1 x t Bengio, Simard and Frasconi, Learning long-term dependencies with gradient descent is difficult. TNN, 1994
10 Outline Recurrent Neural Networks Gradient Issues A Closer Look at the Gradient δl t δw = t k=1 δl t δh t δh k δh t δh k δw This is a parameter matrix we have a Jacobian Inside here you have chain rule δh t δh k = δh t δh t 1 δh t 1 δh t 2 δh k+1 δh k δl t δw = t k=1 δl t δh t t 1 δh l+1 l=k δh l δh k δw The gradient is a recursive product of hidden activation gradients (Jacobian)
11 Outline Recurrent Neural Networks Gradient Issues Bounding the Gradient (I) Given h l = tanh(w h h l 1 + W in x l ) then δh l+1 δh l = D l+1 W l T where the activation Jacobian is D l+1 = diag(1 tanh 2 W h h l + W in x l+1 ) δl t δh t = δl t δh k t 1 δhl+1 l=k δh l = δl t 1 t δh k l=k D l+1 W l T We are interested in the gradient magnitude δl t δh t
12 Outline Recurrent Neural Networks Gradient Issues Bounding the Gradient (II) δl t = δl t 1 t δh t δh k l=k D l+1 W l T δl t δh k t 1 l=k D l+1 W l T = δl t δh k t 1 l=k σ D l+1 σ W l T Bounded by the spectral radius σ Can shrink to zero or increase exponentially depending on the spectral properties σ < 1 vanishish σ > 1 exploding
13 Outline Recurrent Neural Networks Gradient Issues Gradient Clipping for Exploding Gradients Take g = δl t If g δw > θ 0 then g = θ 0 g Rescaling does not work for gradient vanish as total gradient is a sum of time dependent gradients (preserving relative contribution from each time δl t makes it exponentially decay) = t δw k=1 g δl t δh t δh t δh k
14 Gated Networks Long-Short Term Memory Gated Recurrent Units Tackling Gradient Issues Solution seems to be having the Jacobian with σ = 1 (activation function?) Linear dominated by the eigenvalues of W h to the power of t Linear with weight 1 (state identity) h t = h t 1 + c(x t ) Has the desired spectral properties but does not work in practice as it quickly saturates memory (e.g. with replicated/non-useful inputs and states)
15 Gated Networks Long-Short Term Memory Gated Recurrent Units Gating Units Mixture of experts the origin of gating + output Local Expert 1 Local Expert 2 Local Expert K Softmax Gating Network input x Jacobs et al (1991), Adaptive Mixtures of Local Experts,
16 Gated Networks Long-Short Term Memory Gated Recurrent Units Long-Short Term Memory (LSTM) Cell h t 1 h t Lets start from the vanilla RNN unit g t + x t S. Hochreiter, J. Schmidhuber, Long short-term memory". Neural Computation, Neural Comp. 1997
17 Gated Networks Long-Short Term Memory Gated Recurrent Units LSTM Design Step 1 h t 1 h t Introduce a linear/identity memory c t c t c t g t Combines past internal state c t 1 with current input x t x t
18 Gated Networks Long-Short Term Memory Gated Recurrent Units LSTM Design Step 2 (Gates) h t 1 c t 1 h t + c t Input gate Controls how inputs contribute to the internal state g t + I t (x t, h t 1 ) + Logistic sigmoid x t
19 Gated Networks Long-Short Term Memory Gated Recurrent Units LSTM Design Step 2 (Gates) h t 1 c t 1 h t c t + Forget gate Controls how past internal state c t 1 contributes to c t F t (x t, h t 1) + g t + + Logistic sigmoid x t
20 Gated Networks Long-Short Term Memory Gated Recurrent Units LSTM Design Step 2 (Gates) h t 1 c t 1 + h t c t + Output gate Controls what part of the internal state is propagated out of the cell + g t + O t (x t, h t 1 ) + Logistic sigmoid x t
21 Gated Networks Long-Short Term Memory Gated Recurrent Units LSTM in Equations 1) Compute activation of input and forget gates I t = σ(w Ih h t 1 + W Iin x t + b I ) F t = σ(w Fh h t 1 + W Fin x t + b F ) 2) Compute input potential and internal state g t = tanh(w h h t 1 + W in x t + b h ) c t = F t c t 1 + I t g t 3) Compute output gate and output state O t = σ(w Oh h t 1 + W Oin x t + b O ) h t = O t tanh(c t ) element-wise multiplication
22 Gated Networks Long-Short Term Memory Gated Recurrent Units Deep LSTM 1 h t 1 3 h t 1 1 h t h t 3 x t LSTM CELL LSTM CELL LSTM CELL 2 h t 1 h t 2 y t LSTM layers extract information at increasing levels of abstraction (enlarging context)
23 Gated Networks Long-Short Term Memory Gated Recurrent Units Training LSTM Original LSTM training algorithm was a mixture of RTRL and BPTT BPTT on internal state gradient RTRL-like truncation on other recurrent connections No exact gradient calculation! All current LSTM implementation use full BPTT training Introduced by Graves and Schmidhuber in 2005 Typically use Adam or RMSProp optimizer
24 Gated Networks Long-Short Term Memory Gated Recurrent Units Regularizing LSTM - Dropout Randomly disconnect units from the network during training N. Srivastava et al, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, JLMR 2014
25 Gated Networks Long-Short Term Memory Gated Recurrent Units Regularizing LSTM - Dropout Randomly disconnect units from the network during training N. Srivastava et al, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, JLMR 2014
26 Gated Networks Long-Short Term Memory Gated Recurrent Units Regularizing LSTM - Dropout Randomly disconnect units from the network during training N. Srivastava et al, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, JLMR 2014
27 Gated Networks Long-Short Term Memory Gated Recurrent Units Regularizing LSTM - Dropout Randomly disconnect units from the network during training Regulated by unit dropping hyperparameter Prevents unit coadaptation Committee machine effect Need to adapt prediction phase Drop units for the whole sequence! N. Srivastava et al, Dropout: A Simple Way to Prevent Neural Networks from Overfitting, JLMR 2014
28 Gated Networks Long-Short Term Memory Gated Recurrent Units Regularizing LSTM - Dropout Randomly disconnect units from the network during training x x You can also drop single connections (dropconnect) x x x Regulated by unit dropping hyperparameter Prevents unit coadaptation Committee machine effect Need to adapt prediction phase Drop units for the whole sequence!
29 Gated Networks Long-Short Term Memory Gated Recurrent Units Practicalities Minibatch and Truncated BP Minibatch (MB) Training Data Average gradient on full data Stochastic GD sequenceby-sequence MB1 MB2 MBn δl MB1 δw δl MB2 δw δl MBn δw Truncated gradient propagation h k 1 x h k h k+1 h 1 h T
30 Gated Networks Long-Short Term Memory Gated Recurrent Units Gated Recurrent Unit (GRU) Reset acts directly on output state (no internal state and no output gate) h t 1 + h t h t = (1 z t ) h t 1 + z t h t h t = tanh(w hh (r t h t 1 ) + W hin x t + b h ) 1 Reset and update gates when coupled act as input and forget gates + z t + r t h t z t = σ(w zh h t 1 + W zin x t + b z ) + r t = σ(w rh h t 1 + W rin x t + b r ) C. Kyunghyun et al, Learning Phrase Representations using RNN Encoder- Decoder for Statistical Machine Translation, EMNLP 2014 x t
31 Advanced Models & Software Conclusions Bidirectional LSTM Character Recognition Character distribution 1 output for each character plus no output symbol LSTM layers Preprocessed input Original input A. Graves, A novel connectionist system for unconstrained handwriting recognition, TPAMI 2009
32 Advanced Models & Software Conclusions Bidirectional LSTM Character Recognition Character distribution 1 output for each character plus no output symbol LSTM left-to-right LSTM right-to-left LSTM layers Preprocessed input Original input A. Graves, A novel connectionist system for unconstrained handwriting recognition, TPAMI 2009
33 Advanced Models & Software Conclusions Generative Use of LSTM/GRU e l l Teacher forcing at training time LSTM3 LSTM2 LSTM1 Refeeding output at prediction time H e l A. Graves, Generating Sequences With Recurrent Neural Networks, 2013 Bypass connections
34 Advanced Models & Software Conclusions Character Generation Fun Shakespeare PANDARUS: Alas, I think he shall be come approached and the day When little srain would be attain'd into being never fed, And who is but a chain and subjects of his death, I should not sleep. Second Senator: They are away this miseries, produced upon my soul, Breaking and strongly should be buried, when I perish The earth and thoughts of many states.
35 Advanced Models & Software Conclusions Character Generation Fun Linux Kernel Code /* * If this error is set, we will need anything right after that BSD. */ static void action_new_function(struct s_stat_info *wb) { unsigned long flags; int lel_idx_bit = e->edd, *sys & ~((unsigned long) *FIRST_COMPAT); buf[0] = 0xFFFFFFFF & (bit << 4); min(inc, slist->bytes); printk(kern_warning "Memory allocated %02x/%02x, " "original MLL instead\n"), min(min(multi_run - s->len, max) * num_data_in), frame_pos, sz + first_seg); div_u64_w(val, inb_p); spin_unlock(&disk->queue_lock); mutex_unlock(&s->sock->mutex); mutex_unlock(&func->mutex); return disassemble(info->pending_bh); }
36 Advanced Models & Software Conclusions Generate Sad Jokes A 3-LSTM layers neural network to generate English jokes character by character Why did the boy stop his homework? Because they re bunny boo! Q: Why did the death penis learn string? A: Because he wanted to have some roasts case! What do you get if you cross a famous California little boy with an elephant for players? Market holes.
37 Advanced Models & Software Conclusions Understanding Memory Representation At Layer-3 neuron show some form of context induced representation of subsequences (words)
38 Advanced Models & Software Conclusions Understanding Memory Representation Neurons in early recurrent layers tend to organize according to sequence suffix
39 Advanced Models & Software Conclusions Recursive Gated Networks Recursive LSTM cell for binary trees Unfolding on parse trees for sentiment analysis
40 Advanced Models & Software Conclusions Encoder-Decoder Architectures The sequence to sequence framework Deep LSTM model LSTM activations used as encoding of the full input sequence to seed output sentence generation I Sutskever et al, Sequence to Sequence Learning with Neural Networks, NIPS, 2014
41 Advanced Models & Software Conclusions More Differentiable Compositions CNN-LSTM Composition for image-to-sequence (NeuralTalk) A cat is sitting on a toilet seat A. Karpathy and L. Fei-Fei, Deep Visual-Semantic Alignments for Generating Image Descriptions, CVPR A woman holding a teddy bear in front of a mirror
42 Advanced Models & Software Conclusions RNN A Modern View RNN are only for sequential/structured data? RNN as stateful systems
43 Advanced Models & Software Conclusions Software Standard LSTM and GRU are available in all deep learning frameworks (Python et al) as well as in Matlab s Neural Network Toolbox If you want to play with one-element ahead sequence generation try out char-rnn implementations (ORIGINAL) on_tutorial.html
44 Advanced Models & Software Conclusions Take Home Messages Learning long-term dependencies can be difficult due to gradient vanish/explosion Gated RNN solution Gates are neurons whose output is used to scale another neuron s output Use gates to determine what information can enter (or exit) the internal state Training gated RNN non always straightforward Deep RNN can be used in generative mode Can seed the network with neural embeddings Deep RNN as stateful and differentiable machines
45 Deep Autoencoder Advanced Models & Software Conclusions Next Lecture Next week will have two practical lectures on deep learning frameworks Antonio Carta Pytorch Francesco Crecchi Tensorflow Coming up next Advanced DL topics Attention models, Variational Deep Learning, Midterm 2 discussions Thursday 3 rd May h Room N
Lecture 11 Recurrent Neural Networks I
Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks
More informationRecurrent Neural Networks 2. CS 287 (Based on Yoav Goldberg s notes)
Recurrent Neural Networks 2 CS 287 (Based on Yoav Goldberg s notes) Review: Representation of Sequence Many tasks in NLP involve sequences w 1,..., w n Representations as matrix dense vectors X (Following
More informationLecture 11 Recurrent Neural Networks I
Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor niversity of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks
More informationRecurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST
1 Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST Summary We have shown: Now First order optimization methods: GD (BP), SGD, Nesterov, Adagrad, ADAM, RMSPROP, etc. Second
More informationDeep Learning. Recurrent Neural Network (RNNs) Ali Ghodsi. October 23, Slides are partially based on Book in preparation, Deep Learning
Recurrent Neural Network (RNNs) University of Waterloo October 23, 2015 Slides are partially based on Book in preparation, by Bengio, Goodfellow, and Aaron Courville, 2015 Sequential data Recurrent neural
More informationRecurrent Neural Networks with Flexible Gates using Kernel Activation Functions
2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors: S. Scardapane, S. Van Vaerenbergh,
More informationStructured Neural Networks (I)
Structured Neural Networks (I) CS 690N, Spring 208 Advanced Natural Language Processing http://peoplecsumassedu/~brenocon/anlp208/ Brendan O Connor College of Information and Computer Sciences University
More informationarxiv: v3 [cs.lg] 14 Jan 2018
A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe
More informationIntroduction to RNNs!
Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed Outline Why Recurrent Neural Networks (RNNs)? The Vanilla RNN unit The RNN forward pass Backpropagation refresher The RNN
More informationRecurrent Neural Network
Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks
More informationRecurrent Neural Networks. Jian Tang
Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating
More informationCSC321 Lecture 10 Training RNNs
CSC321 Lecture 10 Training RNNs Roger Grosse and Nitish Srivastava February 23, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 10 Training RNNs February 23, 2015 1 / 18 Overview Last time, we saw
More informationLong-Short Term Memory and Other Gated RNNs
Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling
More informationRecurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves
Recurrent Neural Networks Deep Learning Lecture 5 Efstratios Gavves Sequential Data So far, all tasks assumed stationary data Neither all data, nor all tasks are stationary though Sequential Data: Text
More informationSequence Modeling with Neural Networks
Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes
More informationDeep Learning Autoencoder Models
Deep Learning Autoencoder Models Davide Bacciu Dipartimento di Informatica Università di Pisa Intelligent Systems for Pattern Recognition (ISPR) Generative Models Wrap-up Deep Learning Module Lecture Generative
More informationRecurrent Neural Networks
Charu C. Aggarwal IBM T J Watson Research Center Yorktown Heights, NY Recurrent Neural Networks Neural Networks and Deep Learning, Springer, 218 Chapter 7.1 7.2 The Challenges of Processing Sequences Conventional
More informationConvolutional Neural Networks II. Slides from Dr. Vlad Morariu
Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate
More informationRecurrent Neural Networks (Part - 2) Sumit Chopra Facebook
Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech
More informationCSC321 Lecture 15: Exploding and Vanishing Gradients
CSC321 Lecture 15: Exploding and Vanishing Gradients Roger Grosse Roger Grosse CSC321 Lecture 15: Exploding and Vanishing Gradients 1 / 23 Overview Yesterday, we saw how to compute the gradient descent
More informationRecurrent Neural Networks. deeplearning.ai. Why sequence models?
Recurrent Neural Networks deeplearning.ai Why sequence models? Examples of sequence data The quick brown fox jumped over the lazy dog. Speech recognition Music generation Sentiment classification There
More informationCSCI 315: Artificial Intelligence through Deep Learning
CSCI 315: Artificial Intelligence through Deep Learning W&L Winter Term 2017 Prof. Levy Recurrent Neural Networks (Chapter 7) Recall our first-week discussion... How do we know stuff? (MIT Press 1996)
More informationHigh Order LSTM/GRU. Wenjie Luo. January 19, 2016
High Order LSTM/GRU Wenjie Luo January 19, 2016 1 Introduction RNN is a powerful model for sequence data but suffers from gradient vanishing and explosion, thus difficult to be trained to capture long
More informationRecurrent neural networks
12-1: Recurrent neural networks Prof. J.C. Kao, UCLA Recurrent neural networks Motivation Network unrollwing Backpropagation through time Vanishing and exploding gradients LSTMs GRUs 12-2: Recurrent neural
More informationSlide credit from Hung-Yi Lee & Richard Socher
Slide credit from Hung-Yi Lee & Richard Socher 1 Review Recurrent Neural Network 2 Recurrent Neural Network Idea: condition the neural network on all previous words and tie the weights at each time step
More informationLecture 15: Exploding and Vanishing Gradients
Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train
More informationDeep Learning Recurrent Networks 2/28/2018
Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good
More informationAnalysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function
Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function Austin Wang Adviser: Xiuyuan Cheng May 4, 2017 1 Abstract This study analyzes how simple recurrent neural
More informationDeep Sequence Models. Context Representation, Regularization, and Application to Language. Adji Bousso Dieng
Deep Sequence Models Context Representation, Regularization, and Application to Language Adji Bousso Dieng All Data Are Born Sequential Time underlies many interesting human behaviors. Elman, 1990. Why
More informationNatural Language Processing and Recurrent Neural Networks
Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information
More informationGenerating Sequences with Recurrent Neural Networks
Generating Sequences with Recurrent Neural Networks Alex Graves University of Toronto & Google DeepMind Presented by Zhe Gan, Duke University May 15, 2015 1 / 23 Outline Deep recurrent neural network based
More informationarxiv: v1 [cs.cl] 21 May 2017
Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we
More informationLong-Short Term Memory
Long-Short Term Memory Sepp Hochreiter, Jürgen Schmidhuber Presented by Derek Jones Table of Contents 1. Introduction 2. Previous Work 3. Issues in Learning Long-Term Dependencies 4. Constant Error Flow
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationIntroduction to Deep Neural Networks
Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationStephen Scott.
1 / 35 (Adapted from Vinod Variyam and Ian Goodfellow) sscott@cse.unl.edu 2 / 35 All our architectures so far work on fixed-sized inputs neural networks work on sequences of inputs E.g., text, biological
More informationNeural Networks Language Models
Neural Networks Language Models Philipp Koehn 10 October 2017 N-Gram Backoff Language Model 1 Previously, we approximated... by applying the chain rule p(w ) = p(w 1, w 2,..., w n ) p(w ) = i p(w i w 1,...,
More informationModelling Time Series with Neural Networks. Volker Tresp Summer 2017
Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,
More informationDeep Learning and Lexical, Syntactic and Semantic Analysis. Wanxiang Che and Yue Zhang
Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10 Part 2: Introduction to Deep Learning Part 2.1: Deep Learning Background What is Machine Learning? From Data
More informationBased on the original slides of Hung-yi Lee
Based on the original slides of Hung-yi Lee New Activation Function Rectified Linear Unit (ReLU) σ z a a = z Reason: 1. Fast to compute 2. Biological reason a = 0 [Xavier Glorot, AISTATS 11] [Andrew L.
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization
TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Vanishing and Exploding Gradients ReLUs Xavier Initialization Batch Normalization Highway Architectures: Resnets, LSTMs and GRUs Causes
More informationCSE 446 Dimensionality Reduction, Sequences
CSE 446 Dimensionality Reduction, Sequences Administrative Final review this week Practice exam questions will come out Wed Final exam next week Wed 8:30 am Today Dimensionality reduction examples Sequence
More informationEE-559 Deep learning LSTM and GRU
EE-559 Deep learning 11.2. LSTM and GRU François Fleuret https://fleuret.org/ee559/ Mon Feb 18 13:33:24 UTC 2019 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE The Long-Short Term Memory unit (LSTM) by Hochreiter
More informationOn the use of Long-Short Term Memory neural networks for time series prediction
On the use of Long-Short Term Memory neural networks for time series prediction Pilar Gómez-Gil National Institute of Astrophysics, Optics and Electronics ccc.inaoep.mx/~pgomez In collaboration with: J.
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More informationCOMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE-
Workshop track - ICLR COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE- CURRENT NEURAL NETWORKS Daniel Fojo, Víctor Campos, Xavier Giró-i-Nieto Universitat Politècnica de Catalunya, Barcelona Supercomputing
More informationLong Short- Term Memory (LSTM) M1 Yuichiro Sawai Computa;onal Linguis;cs Lab. January 15, Deep Lunch
Long Short- Term Memory (LSTM) M1 Yuichiro Sawai Computa;onal Linguis;cs Lab. January 15, 2015 @ Deep Lunch 1 Why LSTM? OJen used in many recent RNN- based systems Machine transla;on Program execu;on Can
More informationLecture 8: Recurrent Neural Networks
Lecture 8: Recurrent Neural Networks André Martins Deep Structured Learning Course, Fall 2018 André Martins (IST) Lecture 8 IST, Fall 2018 1 / 85 Announcements The deadline for the project midterm report
More informationRECURRENT NETWORKS I. Philipp Krähenbühl
RECURRENT NETWORKS I Philipp Krähenbühl RECAP: CLASSIFICATION conv 1 conv 2 conv 3 conv 4 1 2 tu RECAP: SEGMENTATION conv 1 conv 2 conv 3 conv 4 RECAP: DETECTION conv 1 conv 2 conv 3 conv 4 RECAP: GENERATION
More informationCSC321 Lecture 16: ResNets and Attention
CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Spring 2018 http://vllab.ee.ntu.edu.tw/dlcv.html (primary) https://ceiba.ntu.edu.tw/1062dlcv (grade, etc.) FB: DLCV Spring 2018 Yu-Chiang Frank Wang 王鈺強, Associate Professor
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationFaster Training of Very Deep Networks Via p-norm Gates
Faster Training of Very Deep Networks Via p-norm Gates Trang Pham, Truyen Tran, Dinh Phung, Svetha Venkatesh Center for Pattern Recognition and Data Analytics Deakin University, Geelong Australia Email:
More informationRecurrent Neural Networks (RNNs) Lecture 9 - Networks for Sequential Data RNNs & LSTMs. RNN with no outputs. RNN with no outputs
Recurrent Neural Networks (RNNs) RNNs are a family of networks for processing sequential data. Lecture 9 - Networks for Sequential Data RNNs & LSTMs DD2424 September 6, 2017 A RNN applies the same function
More informationMachine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26
Machine Translation 10: Advanced Neural Machine Translation Architectures Rico Sennrich University of Edinburgh R. Sennrich MT 2018 10 1 / 26 Today s Lecture so far today we discussed RNNs as encoder and
More informationDeep Learning Tutorial. 李宏毅 Hung-yi Lee
Deep Learning Tutorial 李宏毅 Hung-yi Lee Outline Part I: Introduction of Deep Learning Part II: Why Deep? Part III: Tips for Training Deep Neural Network Part IV: Neural Network with Memory Part I: Introduction
More informationContents. (75pts) COS495 Midterm. (15pts) Short answers
Contents (75pts) COS495 Midterm 1 (15pts) Short answers........................... 1 (5pts) Unequal loss............................. 2 (15pts) About LSTMs........................... 3 (25pts) Modular
More informationLearning Long-Term Dependencies with Gradient Descent is Difficult
Learning Long-Term Dependencies with Gradient Descent is Difficult Y. Bengio, P. Simard & P. Frasconi, IEEE Trans. Neural Nets, 1994 June 23, 2016, ICML, New York City Back-to-the-future Workshop Yoshua
More informationLearning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials
Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Contents 1 Pseudo-code for the damped Gauss-Newton vector product 2 2 Details of the pathological synthetic problems
More informationCombining Static and Dynamic Information for Clinical Event Prediction
Combining Static and Dynamic Information for Clinical Event Prediction Cristóbal Esteban 1, Antonio Artés 2, Yinchong Yang 1, Oliver Staeck 3, Enrique Baca-García 4 and Volker Tresp 1 1 Siemens AG and
More informationSequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018
Sequence Models Ji Yang Department of Computing Science, University of Alberta February 14, 2018 This is a note mainly based on Prof. Andrew Ng s MOOC Sequential Models. I also include materials (equations,
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationTopics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)
Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 3: Introduction to Deep Learning (continued) Course Logistics - Update on course registrations - 6 seats left now -
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationDemystifying deep learning. Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK
Demystifying deep learning Petar Veličković Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK London Data Science Summit 20 October 2017 Introduction
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationMaking Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation
Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation Dr. Yanjun Qi Department of Computer Science University of Virginia Tutorial @ ACM BCB-2018 8/29/18 Yanjun Qi / UVA
More informationRecurrent Neural Networks
Recurrent Neural Networks Datamining Seminar Kaspar Märtens Karl-Oskar Masing Today's Topics Modeling sequences: a brief overview Training RNNs with back propagation A toy example of training an RNN Why
More informationDeep Learning for NLP
Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning
More informationRECURRENT NEURAL NETWORKS WITH FLEXIBLE GATES USING KERNEL ACTIVATION FUNCTIONS
2018 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 17 20, 2018, AALBORG, DENMARK RECURRENT NEURAL NETWORKS WITH FLEXIBLE GATES USING KERNEL ACTIVATION FUNCTIONS Simone Scardapane,
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationARTIFICIAL INTELLIGENCE. Artificial Neural Networks
INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html
More information11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)
11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes
More informationEE-559 Deep learning Recurrent Neural Networks
EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret https://fleuret.org/ee559/ Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent
More informationMachine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016
Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text
More informationNEURAL LANGUAGE MODELS
COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationCSCI 252: Neural Networks and Graphical Models. Fall Term 2016 Prof. Levy. Architecture #7: The Simple Recurrent Network (Elman 1990)
CSCI 252: Neural Networks and Graphical Models Fall Term 2016 Prof. Levy Architecture #7: The Simple Recurrent Network (Elman 1990) Part I Multi-layer Neural Nets Taking Stock: What can we do with neural
More informationCSC321 Lecture 15: Recurrent Neural Networks
CSC321 Lecture 15: Recurrent Neural Networks Roger Grosse Roger Grosse CSC321 Lecture 15: Recurrent Neural Networks 1 / 26 Overview Sometimes we re interested in predicting sequences Speech-to-text and
More informationNeural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve)
Neural Turing Machine Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve) Introduction Neural Turning Machine: Couple a Neural Network with external memory resources The combined
More informationGoogle s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill
More informationNeural Architectures for Image, Language, and Speech Processing
Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent
More informationNeural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016
Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph
More informationCS224N: Natural Language Processing with Deep Learning Winter 2018 Midterm Exam
CS224N: Natural Language Processing with Deep Learning Winter 2018 Midterm Exam This examination consists of 17 printed sides, 5 questions, and 100 points. The exam accounts for 20% of your total grade.
More informationCS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning
CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network
More information(2pts) What is the object being embedded (i.e. a vector representing this object is computed) when one uses
Contents (75pts) COS495 Midterm 1 (15pts) Short answers........................... 1 (5pts) Unequal loss............................. 2 (15pts) About LSTMs........................... 3 (25pts) Modular
More informationEve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates
Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates Hiroaki Hayashi 1,* Jayanth Koushik 1,* Graham Neubig 1 arxiv:1611.01505v3 [cs.lg] 11 Jun 2018 Abstract Adaptive
More informationIndex. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,
Index A Activation functions, neuron/perceptron binary threshold activation function, 102 103 linear activation function, 102 rectified linear unit, 106 sigmoid activation function, 103 104 SoftMax activation
More informationNatural Language Processing
Natural Language Processing Pushpak Bhattacharyya CSE Dept, IIT Patna and Bombay LSTM 15 jun, 2017 lgsoft:nlp:lstm:pushpak 1 Recap 15 jun, 2017 lgsoft:nlp:lstm:pushpak 2 Feedforward Network and Backpropagation
More informationDeep Learning Lab Course 2017 (Deep Learning Practical)
Deep Learning Lab Course 207 (Deep Learning Practical) Labs: (Computer Vision) Thomas Brox, (Robotics) Wolfram Burgard, (Machine Learning) Frank Hutter, (Neurorobotics) Joschka Boedecker University of
More informationLarge Vocabulary Continuous Speech Recognition with Long Short-Term Recurrent Networks
Large Vocabulary Continuous Speech Recognition with Long Short-Term Recurrent Networks Haşim Sak, Andrew Senior, Oriol Vinyals, Georg Heigold, Erik McDermott, Rajat Monga, Mark Mao, Françoise Beaufays
More informationGated Recurrent Neural Tensor Network
Gated Recurrent Neural Tensor Network Andros Tjandra, Sakriani Sakti, Ruli Manurung, Mirna Adriani and Satoshi Nakamura Faculty of Computer Science, Universitas Indonesia, Indonesia Email: andros.tjandra@gmail.com,
More informationSpatial Transformer. Ref: Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, Spatial Transformer Networks, NIPS, 2015
Spatial Transormer Re: Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, Spatial Transormer Networks, NIPS, 2015 Spatial Transormer Layer CNN is not invariant to scaling and rotation
More informationIntroduction to Convolutional Neural Networks 2018 / 02 / 23
Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that
More informationRecurrent and Recursive Networks
Neural Networks with Applications to Vision and Language Recurrent and Recursive Networks Marco Kuhlmann Introduction Applications of sequence modelling Map unsegmented connected handwriting to strings.
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationMULTIPLICATIVE LSTM FOR SEQUENCE MODELLING
MULTIPLICATIVE LSTM FOR SEQUENCE MODELLING Ben Krause, Iain Murray & Steve Renals School of Informatics, University of Edinburgh Edinburgh, Scotland, UK {ben.krause,i.murray,s.renals}@ed.ac.uk Liang Lu
More information