Deep Structured Prediction in Handwriting Recognition Juan José Murillo Fuentes, P. M. Olmos (Univ. Carlos III) and J.C. Jaramillo (Univ.
|
|
- Roxanne Bryan
- 5 years ago
- Views:
Transcription
1 Deep Structured Prediction in Handwriting Recognition Juan José Murillo Fuentes, P. M. Olmos (Univ. Carlos III) and J.C. Jaramillo (Univ. Sevilla) Computational and Biological Learning Lab Dep. of Engineering University of Cambridge Nov
2 Introduction
3 Deep Structured Prediction in Handwriting Recognition 3/44 Deep Structured Prediction? A defined concept with a very broad meaning. In ICML 17 Workshop on Deep Structured Prediction: many real problems involve highly dependent, structured variables. In such scenarios, it is desired or even necessary to model correlations and dependencies between the multiple input and output variables. Such problems arise in a wide range of domains, from natural language processing, computer vision, computational biology and others. In Wikipedia: Structured prediction or structured (output) learning is an umbrella term for supervised machine learning techniques that involves predicting structured objects, rather than scalar discrete or real values (...) application domains including bioinformatics, natural language processing, speech recognition, and computer vision.
4 Deep Structured Prediction in Handwriting Recognition 4/44 Goals Objectives: Understand deep learning concepts when recurrent networks (LSTM) are involved Overview temporal clasification with CTC See how theory can be implemented in automatically transcripting handwritten text
5 Letter (digits) recognition with CNN
6 Deep Structured Prediction in Handwriting Recognition 6/44 Letters (digits) recognition Let s start with the problem of digits recognition (MNIST data base) Multiclass classification problem Here the output structure is simple compared to a whole text line Straight forward solution: Apply a simple NN?
7 Deep Structured Prediction in Handwriting Recognition 7/44 MNIST with Simple NN We use softmax (output, ŷ k ) and cross-entropy (cost function, L) ŷ k = e ak P K k=1 ea0 k L(y, ŷ) = 1 N NX KX n=1 k=1 y (n) k log(y (n) k )+(1 y(n) k ) log(1 y(n) k ) Training: Back-propagation combined with Stochastic Gradient Descend Figures: Lipton, Berkowitz (2015) A Critical Review of Recurrent Neural Networks for Sequence Learning
8 MNIST with a very simple NN y = σ (Wx + b) (1) where σ ( ) is the sigmoid function applied entry-wise (outputs between 0 and 1) Deep Structured Prediction in Handwriting Recognition murillo@us.es, jjm77@cam.ac.uk 8/44
9 Deep Structured Prediction in Handwriting Recognition 9/44 Learning of simple NN for MNIST We used the sample code in TF, but using ADAM as optimizer and a larger number of iterations. It is a NN with just input-output layers, minibatchs of 100 images. Accuracy of 92% in the test set, number of correct outputs / size of test set. Results showed with TensorBoard : Orange is test results, thin blue is train results and dark blue is averaged train results.
10 Deep Structured Prediction in Handwriting Recognition 10/44 MNIST with Multilayer and CNN: architecture Convolutional NN is, perhaps, the most straight forward way to find and use structures in the prediction Convolutional NN allow for a fully connected structure where weights are tied : same weights are used several times. Multilayer architecture (see TensorFlow documentation or Göner s blog ) : with = 3,273,504 weights where the MLP has an intermediate layer of 1024 units
11 Deep Structured Prediction in Handwriting Recognition 11/44 MNIST with Multilayer and CNN: results We get an accuracy of 99.3%. Best known solution uses a similar structure with 2 CNN layers to get 99.79% (see for a list of best solutions and for the best solution) In is reported the best NN (MLP 6 layers) solution to have an accuracy of 99.65%
12 Problems in Long Term
13 Deep Structured Prediction in Handwriting Recognition 13/44 Problems in long term dependences Digit recognition In the recognition of digits the input was restricted to some bits to provide one output out of 10 The size of the digits is uniform Learning local structures is enough Not a hard problem Two problems in recognition of letters in text lines: 1. In word and line recognition we should better exploit the information beyond some bits that a convolution can provide We need to recover long dependences 2. Furthermore: we are to detect the positions of characters (labellings) within the imagen. we need to recover the sequence of letters To cope with this problems 1. We use recurrent structures: long short-term memory (LSTM) as an evolution of recurrent NN (RNN) 2. We use the connectionist temporal classification (CTC)
14 RNN and LSTM
15 Deep Structured Prediction in Handwriting Recognition 15/44 RNN In digits recognition, CNN first look for good local features encoding the digits In a word or a line we also should use information from previous and following letters Recurrent NN (RNN) are networks were every unit locally processes part of the input but including the result of the processing in the previous unit. They may use as previous result the output of the unit or its state In case of using its state they can provide an output at every unit or at the end While CNN are performing a filtering, RNN are updating a state or memory y (1) y (2) y (3) y (4) y (5) y (6) h (t) = tanh(b + W h h h(t 1) + W x x (t) ) y (t) = c + W y h h(t) h (1) h (2) h (3) h (4) h (5) h (6) x (1) x (2) x (3) x (4) x (5) x (6) Figure: A simple Recurrent Neural Network
16 Deep Structured Prediction in Handwriting Recognition 16/44 The challenge of long term dependencies Gradients propagated over many stages tend to either vanish (most of the time) or explode (rarely but with much damage to the optimization) (Goodfellow et al 2016) This can be also be seen when computing the state from previous ones, h (t) =tanh(b+w h h tanh(b+wh h tanh(b+wh h (...)+W xx (t 3) )+W x x (t 2) )+W x x (t 1) ) In the simplest linear case h (t) = (W h h )t h (0) +... Figure: Repeated function composition, by Goodfellow, Bengio and Courville, 2016
17 Strategies to gain long-term dependency in RNN We comment on three strategies 1. Assign skip connections: add direct connections between variables far apart difficult to tune we have the same problem but for a larger delay Remark: this idea was successfully applied trough layers (not through time) (ImageNet winner 2015) 2. Removing connections: remove length one connections and replace them by longer ones 3. Leaky units: units with linear auto-connections. We accumulate a running average µ (t) of some value v (t) by Using Leaky units is the key idea of the LSTM cells µ (t) αµ (t 1) + (1 α)v (t) (2) Deep Structured Prediction in Handwriting Recognition murillo@us.es, jjm77@cam.ac.uk 17/44
18 Deep Structured Prediction in Handwriting Recognition 18/44 LSTM cell x (t) x (t) x (t) Input Gate o i (t) Cell c (t) Output Gate (t) h (t) i (t) = σ(wx i x(t) + Wh i h(t 1) + b i ) f (t) = σ(wx f x(t) + Wh f h(t 1) + b f ) o (t) = σ(wx o x(t) + Wh o h(t 1) + b o ) c (t) = tanh(wx c x(t) + Wh c h(t 1) + b j ) f (t) x (t) Forget Gate c (t) = f (t) c (t 1) + i (t) c (t) h (t) = tanh(c (t) ) o (t) This is the typical representation for LSTM It is the folded scheme, may be hard to understand Gates acts as leaky units: input gate decides what part of the processed input c is stored in new state, forget gate decides what part of state (memory) c remains, the output gate decides what part of the memory exits Then, in next unit we process next input, previous memory c and previous output h.
19 Deep Structured Prediction in Handwriting Recognition 19/44 LSTM vs RNN Unfolded Figure: Unfolded RNN (up) and LSTM (down), by C. Olah s blog
20 Deep Structured Prediction in Handwriting Recognition 20/44 LSTM unfolded, by C. Olah s blog (a) State Flow (b) Forget Gate f (t) = σ(w f x x(t) +W f h h(t 1) +b f ) (c) Input Gate i (t) = σ(w i x x(t) +W i h h(t 1) +b i ) (d) State Update c (t) = f (t) c (t 1) + i (t) c (t) (e) Output h (t) = tanh(c (t) ) o (t)
21 Deep Structured Prediction in Handwriting Recognition 21/44 Further Comments on RNN Bidirectional (Bi-RNN): In sequences we can have a RNN running from start to the end and another from end to begining ĥ (1) ĥ (2) ĥ (3) ĥ (4) ĥ (5) ĥ (6) h (1) h (2) h (3) h (4) h (5) h (6) x (1) x (2) x (3) x (4) x (5) x (6) Figure: Bidirectional RNN Multidimensional: In images we may have one RNN starting from every corner h NW i,j h SW i,j = LSTM ( x i,j,h NW i,j 1,hNW i 1,j,hNW i 1,j±1 = LSTM ( x i,j,h SW i,j+1,hsw i+1,j,hsw i+1,j±1 ) ) h NE i,j h NE i,j = LSTM ( x i,j,h NE i,j+1,hne i 1,j,hNE i 1,j±1 = LSTM ( x i,j,h NE i,j+1,hne i+1,j,hne i+1,j±1 Stacking: We can have several RNN arranged in several layers See A. Karpathy s blog for amazing results with RNN and also nice interpretations ) )
22 Connectionist Temporal Classification
23 Deep Structured Prediction in Handwriting Recognition 23/44 Labelling alignment is a problem with RNNs Since the RNN network only outputs local classifications, a post-processing stage is required to give the final label sequence Suppose we have the following image and we want to recognise the text Training data must be pre-segmented? (locate where every letter is) Labelling unsegmented sequence data, denoted as temporal classification, is a well known problem in real-world sequence learning Suppose that we feed a RNN with the image above to provide 120 outputs but we have 44 letters as labels "monasteries, manors, townships, or wards and" In the training we need to translate from labels (44) to outputs of the RNN (120) We will need two temporal indexes, u for labels, and t for outputs of the RNN Several consecutive output indexes of the RNN correspond to the same label index
24 Deep Structured Prediction in Handwriting Recognition 24/44 Introduction to CTC (I) The connectionist temporal classification (CTC) does not require pre-segmented training data, or external post-processing to extract the label sequence from the network outputs. It brought the phoneme error rate on TIMIT to a record value (Graves et al ICASSP 2013) CTC models all aspects of the sequence with a single neural network:
25 Deep Structured Prediction in Handwriting Recognition 25/44 Introduction to CTC (II) We want something on top of a RNN that translates from probabilities for labels at T times of a sequence (image, audio,...) into a sequence of U labels (letters,phonemes,...). It avoids segmentation, providing the position of every label, u, in the training sequence (image, audio,...) We want to use backpropagation to train the whole system
26 Deep Structured Prediction in Handwriting Recognition 26/44 Notation A is the label alphabet. E.g., characters in the Latin alphabet. A = A {blank} X = (x (1), x (2),...,x (T) ) is the input sequence to the RNN Y = (y (1), y (2),...,y (T) ) is the RNN output, where y (t) [0,1], k = 1,..., A, and A k=1 y(t) k = 1 (softmax layer at the RNN outputs). y (t) k Probability of observing label k at time t. ˆL = (ˆl 1,ˆl 2,...,ˆl U ) is the estimated out label sequence, ˆl j A. Objective The recurrent neural network is defined by some weights w i Determine a mapping h(x) = ˆL from one sequence of length T to another of (unknown) length U, after training with a set of pairs (X,L). k
27 Deep Structured Prediction in Handwriting Recognition 27/44 From Network Outputs to Labellings Define the probability distribution over all possible paths in the set A T : p(π X) = T i=1 y (t) π t π A T RNN and CTC We compute above the input-output (X-Y) response of the RNN. We next build the CTC output. Different sequences Y may provide the same output L due to repeated outputs & blanks. A many-to-one map B : A T A T is defined. Remove repeated labels Remove blanks B(a ab ) = B( aa abb) = aab Outputting a new label when the network switches from predicting no label to predicting a label, or from predicting one label to another. For any L A U with U T then p(l X) = p(π X) π B 1 (L)
28 From Network Outputs to Labellings In p(l X) = p(π X), L A U π B 1 (L) Collapsing The CTC collapses different paths into the same label sequence L: makes it possible to use unsegmented data. Solution The CTC solution is the most probable labelling of the input sequence h(x) = arg max L A T p(l X) Deep Structured Prediction in Handwriting Recognition murillo@us.es, jjm77@cam.ac.uk 28/44
29 Deep Structured Prediction in Handwriting Recognition 29/44 Output Two different problems Training: estimate the probability p(l X) for some L. Output: find the L providing the largest p(l X). Solutions to the CTC Solution 1. Best Path Decoding (Trivial): Let π = arg max π p(π X) (sequence of highest RNN outputs), then L B(π ) Solution 2. Prefix Search Decoding: Based on dynamical programming to calculate the probabilities of successive extensions to labelling prefixes. Solution 2 is harder to solve Forward-backward algorithm can be applied But the complexity grows exponentially with the length of the input sequence Heuristics are used: breaking the input into subsequences
30 Notes on the best solution Example Suppose you have two consecutive time instants t and t + 1 and you want to know the most probable label sequence when blank has probabilities y t = 0.7 and y t+1 = 0.6 A has probabilities y t A = 0.3 and y t+1 A = 0.4 all other probabilities are zero. The most probable labelling is A because it adds the pb of options -A, AA and A-, 0.58 compared to the pb of - -, Figure: A.Graves Ph.D 2008 Hence Solution 1 would provide - while Solution 2, A Deep Structured Prediction in Handwriting Recognition murillo@us.es, jjm77@cam.ac.uk 30/44
31 Deep Structured Prediction in Handwriting Recognition 31/44 Training Training: estimate the probability p(l X) for some given L. The problem is easier than computing the output: here we know L and its length. Note that the CTC has nothing to be trained, is allows for a translation from RNN to labelling and we discuss how we can train the RNN backwards. Objective function uses the target labelings in the training set (L,X), L = ln p(l X) = ln p(l X) (3) (L,X) Given the derivatives with respect to the RNN outputs, the weight gradients of the RNN can be computed with standard backpropagation, for some training pair (L,X), ln p(l X) = y (t) k 1 p(l X)y (t) k (L,X) u:l u =k α(t,u)β(t,u) (4) We need to efficiently compute p(l X): solved with a dynamic programming algorithm, with a forward and backward algorithm to provide α(t,u) and β(t,u).
32 Deep Structured Prediction in Handwriting Recognition 32/44 CTC Forward-Backward (I) Given L, we define L with blanks added to the beginning and end and inserted between every pair of labels. For t = 1,...,T and u = 1...,2U + 1 we define the following two sets of paths and cumulated probabilities V(t,u) = { π A t : B(π) = L u 2,π t = l u } α(t,u) = t π V(t,u) i=1 W(t,u) = { π A T t : B( ˆπ + π) = L ˆπ V(t,u) } β(t,u) = y (i) π i π W(t,u) T i=t y (i) π i Figure: A.Graves et al. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. (2006)
33 Deep Structured Prediction in Handwriting Recognition 33/44 CTC Forward-Backward (II) Can be computed recursively from previous values in t and u ( ) α(t,u) = f α(t 1,u), α(t 1,u 1),α(t 1,u 2) ( ) β(t,u) = g β(t + 1,u), β(t + 1,u + 1),α(t + 1,u + 2) p(l X) = α(t,2u) + α(t,2u + 1)
34 Line Recognition
35 IAM lines database English, 300 dpi, PNG, 256 grey levels 5545 training data 616 validation 2772 test data we measure the CER (character error rate, %) Deep Structured Prediction in Handwriting Recognition 35/44
36 Deep Structured Prediction in Handwriting Recognition 36/44 IAM line recognition TensorFlow has the LSTM, Bi-LSTM and CTC implemented, but not the M-LSTM We use the framework implemented by researchers at Universidad de Aachen (Germany), where the layer MDLSTM is implemented in CUDA Programming the MLSTM may end with memory problems (even in GPUs) Following state of the art works we train a solution with 3 and 5 layers (Conv+MLSTM), with architecture of the type Figure: Structure adapted from Pham et al 2014.
37 Deep Structured Prediction in Handwriting Recognition 37/44 IAM line recognition In particular, for the intermediate layers we use the following (5 layers option), with Dropout (forward) of 25%, Adam as optimizer, minibatchs of 10 images, Figure: Structure following V. Paul, et al Int. Conf. on Frontiers in Handwritting Recognition
38 Deep Structured Prediction in Handwriting Recognition 38/44 IAM line recognition: CER Each Epoch 1 h 50 in a GPU Tesla P GB. It gets a CER = 5.7%. Note that no language rules have been used here.
39 IAM line recognition: CTC CTC output after the transcription of a line of the test set. Deep Structured Prediction in Handwriting Recognition murillo@us.es, jjm77@cam.ac.uk 39/44
40 AM line recognition: text transcripted Deep Structured Prediction in Handwriting Recognition 40/44
41 Conclusions
42 Deep Structured Prediction in Handwriting Recognition 42/44 Conclusions In structured input/outputs (handwriting, speech,...) the LSTM is widely exploited Temporal classification is a problem solved with CTC no need of grammar or dictionary, that can be later used to further improve the solution CNNs, LSTM and CTC are programmed in TensorFlow multidimensional LSTM is not (CUDA was used) Future?: Current works on augmented RNN include (see C. Olag s blog on this topic): Neural Turing Machines (Graves et al Neural Turing Machines 2014) Attentional Interfaces: Atenttion has been included in Bluche 2016 before the CTC layer... THANK YOU for your attention
43 Readings
44 Deep Structured Prediction in Handwriting Recognition 44/44 Readings I. Goodfellow, Y. Bengio, A. Courville Deep Learning. MIT Press 2016, Chapters 6 to 9 for concepts on deep learning, Chapter 10 in particular for recurrent networks (but see below for LSTM). M. Görner, Tensor Flow and Deep Learning without a PhD, Quick Review of Main concepts and examples on DL Z. C. Lipton, John Berkowitz, Charles Elkan A Critical Review of Recurrent Neural Networks for Sequence Learning K. Cho, Natural Language Understanding with Distributed Representation, 2016, For a detailed explanation on LSTM C. Olag Understanding LSTM Networks For a good explanation on LSTM A. Graves Supervised sequence labelling with recurrent neural networks Ph. D. Thesis On the explanation of the CTC T. Bluche, Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition NIPS 2016, state-of-the-art solution to hand writing recognition problem
Lecture 11 Recurrent Neural Networks I
Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks
More informationLecture 11 Recurrent Neural Networks I
Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor niversity of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks
More informationDeep Learning Recurrent Networks 2/28/2018
Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good
More informationRecurrent Neural Network
Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationModelling Time Series with Neural Networks. Volker Tresp Summer 2017
Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationConvolutional Neural Networks II. Slides from Dr. Vlad Morariu
Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate
More informationCSC321 Lecture 16: ResNets and Attention
CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the
More informationRecurrent Neural Networks (Part - 2) Sumit Chopra Facebook
Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech
More informationRNNLIB: Connectionist Temporal Classification and Transcription Layer
RNNLIB: Connectionist Temporal Classification and Transcription Layer Wantee Wang 2015-02-08 16:56:40 +0800 Contents 1 The Name 2 2 The Theory 2 2.1 List of Symbols............................ 2 2.2 Training
More informationStephen Scott.
1 / 35 (Adapted from Vinod Variyam and Ian Goodfellow) sscott@cse.unl.edu 2 / 35 All our architectures so far work on fixed-sized inputs neural networks work on sequences of inputs E.g., text, biological
More informationCSC321 Lecture 15: Exploding and Vanishing Gradients
CSC321 Lecture 15: Exploding and Vanishing Gradients Roger Grosse Roger Grosse CSC321 Lecture 15: Exploding and Vanishing Gradients 1 / 23 Overview Yesterday, we saw how to compute the gradient descent
More informationLong-Short Term Memory and Other Gated RNNs
Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling
More informationRecurrent and Recursive Networks
Neural Networks with Applications to Vision and Language Recurrent and Recursive Networks Marco Kuhlmann Introduction Applications of sequence modelling Map unsegmented connected handwriting to strings.
More informationDeep Learning. Recurrent Neural Network (RNNs) Ali Ghodsi. October 23, Slides are partially based on Book in preparation, Deep Learning
Recurrent Neural Network (RNNs) University of Waterloo October 23, 2015 Slides are partially based on Book in preparation, by Bengio, Goodfellow, and Aaron Courville, 2015 Sequential data Recurrent neural
More informationRecurrent Neural Networks
Recurrent Neural Networks Datamining Seminar Kaspar Märtens Karl-Oskar Masing Today's Topics Modeling sequences: a brief overview Training RNNs with back propagation A toy example of training an RNN Why
More informationMachine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016
Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text
More informationarxiv: v3 [cs.lg] 14 Jan 2018
A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationSlide credit from Hung-Yi Lee & Richard Socher
Slide credit from Hung-Yi Lee & Richard Socher 1 Review Recurrent Neural Network 2 Recurrent Neural Network Idea: condition the neural network on all previous words and tie the weights at each time step
More informationNeural Architectures for Image, Language, and Speech Processing
Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent
More informationDeep Recurrent Neural Networks
Deep Recurrent Neural Networks Artem Chernodub e-mail: a.chernodub@gmail.com web: http://zzphoto.me ZZ Photo IMMSP NASU 2 / 28 Neuroscience Biological-inspired models Machine Learning p x y = p y x p(x)/p(y)
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization
TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Vanishing and Exploding Gradients ReLUs Xavier Initialization Batch Normalization Highway Architectures: Resnets, LSTMs and GRUs Causes
More informationRecurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves
Recurrent Neural Networks Deep Learning Lecture 5 Efstratios Gavves Sequential Data So far, all tasks assumed stationary data Neither all data, nor all tasks are stationary though Sequential Data: Text
More informationRecurrent Neural Networks with Flexible Gates using Kernel Activation Functions
2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors: S. Scardapane, S. Van Vaerenbergh,
More informationA summary of Deep Learning without Poor Local Minima
A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given
More informationDeep Learning Sequence to Sequence models: Attention Models. 17 March 2018
Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:
More informationAnalysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function
Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function Austin Wang Adviser: Xiuyuan Cheng May 4, 2017 1 Abstract This study analyzes how simple recurrent neural
More informationEE-559 Deep learning Recurrent Neural Networks
EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret https://fleuret.org/ee559/ Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationHIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU
April 4-7, 2016 Silicon Valley HIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU Minmin Sun, NVIDIA minmins@nvidia.com April 5th Brief Introduction of CTC AGENDA Alpha/Beta Matrix
More informationEnd-to-end Automatic Speech Recognition
End-to-end Automatic Speech Recognition Markus Nussbaum-Thom IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, USA Markus Nussbaum-Thom. February 22, 2017 Nussbaum-Thom: IBM Thomas J. Watson
More informationGenerating Sequences with Recurrent Neural Networks
Generating Sequences with Recurrent Neural Networks Alex Graves University of Toronto & Google DeepMind Presented by Zhe Gan, Duke University May 15, 2015 1 / 23 Outline Deep recurrent neural network based
More informationRecurrent Neural Networks. deeplearning.ai. Why sequence models?
Recurrent Neural Networks deeplearning.ai Why sequence models? Examples of sequence data The quick brown fox jumped over the lazy dog. Speech recognition Music generation Sentiment classification There
More informationDeep Learning Recurrent Networks 10/11/2017
Deep Learning Recurrent Networks 10/11/2017 1 Which open source project? Related math. What is it talking about? And a Wikipedia page explaining it all The unreasonable effectiveness of recurrent neural
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationDeep Learning for Automatic Speech Recognition Part II
Deep Learning for Automatic Speech Recognition Part II Xiaodong Cui IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Fall, 2018 Outline A brief revisit of sampling, pitch/formant and MFCC DNN-HMM
More informationCS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning
CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network
More informationRecurrent Neural Networks. Jian Tang
Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating
More informationMachine Learning Lecture 10
Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes
More informationNeural Networks in Structured Prediction. November 17, 2015
Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving
More informationDeep Learning: a gentle introduction
Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationIndex. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,
Index A Activation functions, neuron/perceptron binary threshold activation function, 102 103 linear activation function, 102 rectified linear unit, 106 sigmoid activation function, 103 104 SoftMax activation
More informationNatural Language Processing and Recurrent Neural Networks
Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationGoogle s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill
More informationCSC321 Lecture 10 Training RNNs
CSC321 Lecture 10 Training RNNs Roger Grosse and Nitish Srivastava February 23, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 10 Training RNNs February 23, 2015 1 / 18 Overview Last time, we saw
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationMachine Learning Lecture 12
Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability
More informationNeural Networks 2. 2 Receptive fields and dealing with image inputs
CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There
More informationMemory-Augmented Attention Model for Scene Text Recognition
Memory-Augmented Attention Model for Scene Text Recognition Cong Wang 1,2, Fei Yin 1,2, Cheng-Lin Liu 1,2,3 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences
More informationBased on the original slides of Hung-yi Lee
Based on the original slides of Hung-yi Lee New Activation Function Rectified Linear Unit (ReLU) σ z a a = z Reason: 1. Fast to compute 2. Biological reason a = 0 [Xavier Glorot, AISTATS 11] [Andrew L.
More informationarxiv: v1 [cs.cl] 21 May 2017
Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we
More informationLearning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials
Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Contents 1 Pseudo-code for the damped Gauss-Newton vector product 2 2 Details of the pathological synthetic problems
More informationSequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018
Sequence Models Ji Yang Department of Computing Science, University of Alberta February 14, 2018 This is a note mainly based on Prof. Andrew Ng s MOOC Sequential Models. I also include materials (equations,
More informationLecture 5: Logistic Regression. Neural Networks
Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture
More informationWelcome to the Machine Learning Practical Deep Neural Networks. MLP Lecture 1 / 18 September 2018 Single Layer Networks (1) 1
Welcome to the Machine Learning Practical Deep Neural Networks MLP Lecture 1 / 18 September 2018 Single Layer Networks (1) 1 Introduction to MLP; Single Layer Networks (1) Steve Renals Machine Learning
More informationSegmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationSequence Modeling with Neural Networks
Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationLecture 15: Exploding and Vanishing Gradients
Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train
More informationIntroduction to Deep Neural Networks
Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationSequence Transduction with Recurrent Neural Networks
Alex Graves graves@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4 Abstract Many machine learning tasks can be expressed as the transformation or transduction
More informationSpeech and Language Processing
Speech and Language Processing Lecture 5 Neural network based acoustic and language models Information and Communications Engineering Course Takahiro Shinoaki 08//6 Lecture Plan (Shinoaki s part) I gives
More informationIntroduction to RNNs!
Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed Outline Why Recurrent Neural Networks (RNNs)? The Vanilla RNN unit The RNN forward pass Backpropagation refresher The RNN
More informationRecurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST
1 Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST Summary We have shown: Now First order optimization methods: GD (BP), SGD, Nesterov, Adagrad, ADAM, RMSPROP, etc. Second
More informationarxiv: v1 [cs.ne] 14 Nov 2012
Alex Graves Department of Computer Science, University of Toronto, Canada graves@cs.toronto.edu arxiv:1211.3711v1 [cs.ne] 14 Nov 2012 Abstract Many machine learning tasks can be expressed as the transformation
More informationMULTIPLICATIVE LSTM FOR SEQUENCE MODELLING
MULTIPLICATIVE LSTM FOR SEQUENCE MODELLING Ben Krause, Iain Murray & Steve Renals School of Informatics, University of Edinburgh Edinburgh, Scotland, UK {ben.krause,i.murray,s.renals}@ed.ac.uk Liang Lu
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationDeep Learning and Lexical, Syntactic and Semantic Analysis. Wanxiang Che and Yue Zhang
Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10 Part 2: Introduction to Deep Learning Part 2.1: Deep Learning Background What is Machine Learning? From Data
More informationDeep Learning Autoencoder Models
Deep Learning Autoencoder Models Davide Bacciu Dipartimento di Informatica Università di Pisa Intelligent Systems for Pattern Recognition (ISPR) Generative Models Wrap-up Deep Learning Module Lecture Generative
More informationA Tutorial On Backward Propagation Through Time (BPTT) In The Gated Recurrent Unit (GRU) RNN
A Tutorial On Backward Propagation Through Time (BPTT In The Gated Recurrent Unit (GRU RNN Minchen Li Department of Computer Science The University of British Columbia minchenl@cs.ubc.ca Abstract In this
More informationRecurrent Neural Networks 2. CS 287 (Based on Yoav Goldberg s notes)
Recurrent Neural Networks 2 CS 287 (Based on Yoav Goldberg s notes) Review: Representation of Sequence Many tasks in NLP involve sequences w 1,..., w n Representations as matrix dense vectors X (Following
More informationFeature Design. Feature Design. Feature Design. & Deep Learning
Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately
More informationOn the use of Long-Short Term Memory neural networks for time series prediction
On the use of Long-Short Term Memory neural networks for time series prediction Pilar Gómez-Gil National Institute of Astrophysics, Optics and Electronics ccc.inaoep.mx/~pgomez In collaboration with: J.
More informationRECURRENT NEURAL NETWORKS WITH FLEXIBLE GATES USING KERNEL ACTIVATION FUNCTIONS
2018 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 17 20, 2018, AALBORG, DENMARK RECURRENT NEURAL NETWORKS WITH FLEXIBLE GATES USING KERNEL ACTIVATION FUNCTIONS Simone Scardapane,
More informationWhat Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1
What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single
More informationStatistical Machine Learning
Statistical Machine Learning Lecture 9 Numerical optimization and deep learning Niklas Wahlström Division of Systems and Control Department of Information Technology Uppsala University niklas.wahlstrom@it.uu.se
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationAnalysis of Multilayer Neural Network Modeling and Long Short-Term Memory
Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory Danilo López, Nelson Vera, Luis Pedraza International Science Index, Mathematical and Computational Sciences waset.org/publication/10006216
More informationSeq2Seq Losses (CTC)
Seq2Seq Losses (CTC) Jerry Ding & Ryan Brigden 11-785 Recitation 6 February 23, 2018 Outline Tasks suited for recurrent networks Losses when the output is a sequence Kinds of errors Losses to use CTC Loss
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationBased on the original slides of Hung-yi Lee
Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services
More informationConvolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting 卷积 LSTM 网络 : 利用机器学习预测短期降雨 施行健 香港科技大学 VALSE 2016/03/23
Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting 卷积 LSTM 网络 : 利用机器学习预测短期降雨 施行健 香港科技大学 VALSE 2016/03/23 Content Quick Review of Recurrent Neural Network Introduction
More informationNeural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016
Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph
More informationJakub Hajic Artificial Intelligence Seminar I
Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationUnfolded Recurrent Neural Networks for Speech Recognition
INTERSPEECH 2014 Unfolded Recurrent Neural Networks for Speech Recognition George Saon, Hagen Soltau, Ahmad Emami and Michael Picheny IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598 gsaon@us.ibm.com
More informationIntroduction to Deep Learning
Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2015 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 1 / 39 Outline 1 Universality of Neural Networks
More informationDeep Learning Recurrent Networks 10/16/2017
Deep Learning Recurrent Networks 10/16/2017 1 Which open source project? Related math. What is it talking about? And a Wikipedia page explaining it all The unreasonable effectiveness of recurrent neural
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationContents. (75pts) COS495 Midterm. (15pts) Short answers
Contents (75pts) COS495 Midterm 1 (15pts) Short answers........................... 1 (5pts) Unequal loss............................. 2 (15pts) About LSTMs........................... 3 (25pts) Modular
More informationHigh Order LSTM/GRU. Wenjie Luo. January 19, 2016
High Order LSTM/GRU Wenjie Luo January 19, 2016 1 Introduction RNN is a powerful model for sequence data but suffers from gradient vanishing and explosion, thus difficult to be trained to capture long
More informationDemystifying deep learning. Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK
Demystifying deep learning Petar Veličković Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK London Data Science Summit 20 October 2017 Introduction
More information