Deep Structured Prediction in Handwriting Recognition Juan José Murillo Fuentes, P. M. Olmos (Univ. Carlos III) and J.C. Jaramillo (Univ.

Size: px
Start display at page:

Download "Deep Structured Prediction in Handwriting Recognition Juan José Murillo Fuentes, P. M. Olmos (Univ. Carlos III) and J.C. Jaramillo (Univ."

Transcription

1 Deep Structured Prediction in Handwriting Recognition Juan José Murillo Fuentes, P. M. Olmos (Univ. Carlos III) and J.C. Jaramillo (Univ. Sevilla) Computational and Biological Learning Lab Dep. of Engineering University of Cambridge Nov

2 Introduction

3 Deep Structured Prediction in Handwriting Recognition 3/44 Deep Structured Prediction? A defined concept with a very broad meaning. In ICML 17 Workshop on Deep Structured Prediction: many real problems involve highly dependent, structured variables. In such scenarios, it is desired or even necessary to model correlations and dependencies between the multiple input and output variables. Such problems arise in a wide range of domains, from natural language processing, computer vision, computational biology and others. In Wikipedia: Structured prediction or structured (output) learning is an umbrella term for supervised machine learning techniques that involves predicting structured objects, rather than scalar discrete or real values (...) application domains including bioinformatics, natural language processing, speech recognition, and computer vision.

4 Deep Structured Prediction in Handwriting Recognition 4/44 Goals Objectives: Understand deep learning concepts when recurrent networks (LSTM) are involved Overview temporal clasification with CTC See how theory can be implemented in automatically transcripting handwritten text

5 Letter (digits) recognition with CNN

6 Deep Structured Prediction in Handwriting Recognition 6/44 Letters (digits) recognition Let s start with the problem of digits recognition (MNIST data base) Multiclass classification problem Here the output structure is simple compared to a whole text line Straight forward solution: Apply a simple NN?

7 Deep Structured Prediction in Handwriting Recognition 7/44 MNIST with Simple NN We use softmax (output, ŷ k ) and cross-entropy (cost function, L) ŷ k = e ak P K k=1 ea0 k L(y, ŷ) = 1 N NX KX n=1 k=1 y (n) k log(y (n) k )+(1 y(n) k ) log(1 y(n) k ) Training: Back-propagation combined with Stochastic Gradient Descend Figures: Lipton, Berkowitz (2015) A Critical Review of Recurrent Neural Networks for Sequence Learning

8 MNIST with a very simple NN y = σ (Wx + b) (1) where σ ( ) is the sigmoid function applied entry-wise (outputs between 0 and 1) Deep Structured Prediction in Handwriting Recognition murillo@us.es, jjm77@cam.ac.uk 8/44

9 Deep Structured Prediction in Handwriting Recognition 9/44 Learning of simple NN for MNIST We used the sample code in TF, but using ADAM as optimizer and a larger number of iterations. It is a NN with just input-output layers, minibatchs of 100 images. Accuracy of 92% in the test set, number of correct outputs / size of test set. Results showed with TensorBoard : Orange is test results, thin blue is train results and dark blue is averaged train results.

10 Deep Structured Prediction in Handwriting Recognition 10/44 MNIST with Multilayer and CNN: architecture Convolutional NN is, perhaps, the most straight forward way to find and use structures in the prediction Convolutional NN allow for a fully connected structure where weights are tied : same weights are used several times. Multilayer architecture (see TensorFlow documentation or Göner s blog ) : with = 3,273,504 weights where the MLP has an intermediate layer of 1024 units

11 Deep Structured Prediction in Handwriting Recognition 11/44 MNIST with Multilayer and CNN: results We get an accuracy of 99.3%. Best known solution uses a similar structure with 2 CNN layers to get 99.79% (see for a list of best solutions and for the best solution) In is reported the best NN (MLP 6 layers) solution to have an accuracy of 99.65%

12 Problems in Long Term

13 Deep Structured Prediction in Handwriting Recognition 13/44 Problems in long term dependences Digit recognition In the recognition of digits the input was restricted to some bits to provide one output out of 10 The size of the digits is uniform Learning local structures is enough Not a hard problem Two problems in recognition of letters in text lines: 1. In word and line recognition we should better exploit the information beyond some bits that a convolution can provide We need to recover long dependences 2. Furthermore: we are to detect the positions of characters (labellings) within the imagen. we need to recover the sequence of letters To cope with this problems 1. We use recurrent structures: long short-term memory (LSTM) as an evolution of recurrent NN (RNN) 2. We use the connectionist temporal classification (CTC)

14 RNN and LSTM

15 Deep Structured Prediction in Handwriting Recognition 15/44 RNN In digits recognition, CNN first look for good local features encoding the digits In a word or a line we also should use information from previous and following letters Recurrent NN (RNN) are networks were every unit locally processes part of the input but including the result of the processing in the previous unit. They may use as previous result the output of the unit or its state In case of using its state they can provide an output at every unit or at the end While CNN are performing a filtering, RNN are updating a state or memory y (1) y (2) y (3) y (4) y (5) y (6) h (t) = tanh(b + W h h h(t 1) + W x x (t) ) y (t) = c + W y h h(t) h (1) h (2) h (3) h (4) h (5) h (6) x (1) x (2) x (3) x (4) x (5) x (6) Figure: A simple Recurrent Neural Network

16 Deep Structured Prediction in Handwriting Recognition 16/44 The challenge of long term dependencies Gradients propagated over many stages tend to either vanish (most of the time) or explode (rarely but with much damage to the optimization) (Goodfellow et al 2016) This can be also be seen when computing the state from previous ones, h (t) =tanh(b+w h h tanh(b+wh h tanh(b+wh h (...)+W xx (t 3) )+W x x (t 2) )+W x x (t 1) ) In the simplest linear case h (t) = (W h h )t h (0) +... Figure: Repeated function composition, by Goodfellow, Bengio and Courville, 2016

17 Strategies to gain long-term dependency in RNN We comment on three strategies 1. Assign skip connections: add direct connections between variables far apart difficult to tune we have the same problem but for a larger delay Remark: this idea was successfully applied trough layers (not through time) (ImageNet winner 2015) 2. Removing connections: remove length one connections and replace them by longer ones 3. Leaky units: units with linear auto-connections. We accumulate a running average µ (t) of some value v (t) by Using Leaky units is the key idea of the LSTM cells µ (t) αµ (t 1) + (1 α)v (t) (2) Deep Structured Prediction in Handwriting Recognition murillo@us.es, jjm77@cam.ac.uk 17/44

18 Deep Structured Prediction in Handwriting Recognition 18/44 LSTM cell x (t) x (t) x (t) Input Gate o i (t) Cell c (t) Output Gate (t) h (t) i (t) = σ(wx i x(t) + Wh i h(t 1) + b i ) f (t) = σ(wx f x(t) + Wh f h(t 1) + b f ) o (t) = σ(wx o x(t) + Wh o h(t 1) + b o ) c (t) = tanh(wx c x(t) + Wh c h(t 1) + b j ) f (t) x (t) Forget Gate c (t) = f (t) c (t 1) + i (t) c (t) h (t) = tanh(c (t) ) o (t) This is the typical representation for LSTM It is the folded scheme, may be hard to understand Gates acts as leaky units: input gate decides what part of the processed input c is stored in new state, forget gate decides what part of state (memory) c remains, the output gate decides what part of the memory exits Then, in next unit we process next input, previous memory c and previous output h.

19 Deep Structured Prediction in Handwriting Recognition 19/44 LSTM vs RNN Unfolded Figure: Unfolded RNN (up) and LSTM (down), by C. Olah s blog

20 Deep Structured Prediction in Handwriting Recognition 20/44 LSTM unfolded, by C. Olah s blog (a) State Flow (b) Forget Gate f (t) = σ(w f x x(t) +W f h h(t 1) +b f ) (c) Input Gate i (t) = σ(w i x x(t) +W i h h(t 1) +b i ) (d) State Update c (t) = f (t) c (t 1) + i (t) c (t) (e) Output h (t) = tanh(c (t) ) o (t)

21 Deep Structured Prediction in Handwriting Recognition 21/44 Further Comments on RNN Bidirectional (Bi-RNN): In sequences we can have a RNN running from start to the end and another from end to begining ĥ (1) ĥ (2) ĥ (3) ĥ (4) ĥ (5) ĥ (6) h (1) h (2) h (3) h (4) h (5) h (6) x (1) x (2) x (3) x (4) x (5) x (6) Figure: Bidirectional RNN Multidimensional: In images we may have one RNN starting from every corner h NW i,j h SW i,j = LSTM ( x i,j,h NW i,j 1,hNW i 1,j,hNW i 1,j±1 = LSTM ( x i,j,h SW i,j+1,hsw i+1,j,hsw i+1,j±1 ) ) h NE i,j h NE i,j = LSTM ( x i,j,h NE i,j+1,hne i 1,j,hNE i 1,j±1 = LSTM ( x i,j,h NE i,j+1,hne i+1,j,hne i+1,j±1 Stacking: We can have several RNN arranged in several layers See A. Karpathy s blog for amazing results with RNN and also nice interpretations ) )

22 Connectionist Temporal Classification

23 Deep Structured Prediction in Handwriting Recognition 23/44 Labelling alignment is a problem with RNNs Since the RNN network only outputs local classifications, a post-processing stage is required to give the final label sequence Suppose we have the following image and we want to recognise the text Training data must be pre-segmented? (locate where every letter is) Labelling unsegmented sequence data, denoted as temporal classification, is a well known problem in real-world sequence learning Suppose that we feed a RNN with the image above to provide 120 outputs but we have 44 letters as labels "monasteries, manors, townships, or wards and" In the training we need to translate from labels (44) to outputs of the RNN (120) We will need two temporal indexes, u for labels, and t for outputs of the RNN Several consecutive output indexes of the RNN correspond to the same label index

24 Deep Structured Prediction in Handwriting Recognition 24/44 Introduction to CTC (I) The connectionist temporal classification (CTC) does not require pre-segmented training data, or external post-processing to extract the label sequence from the network outputs. It brought the phoneme error rate on TIMIT to a record value (Graves et al ICASSP 2013) CTC models all aspects of the sequence with a single neural network:

25 Deep Structured Prediction in Handwriting Recognition 25/44 Introduction to CTC (II) We want something on top of a RNN that translates from probabilities for labels at T times of a sequence (image, audio,...) into a sequence of U labels (letters,phonemes,...). It avoids segmentation, providing the position of every label, u, in the training sequence (image, audio,...) We want to use backpropagation to train the whole system

26 Deep Structured Prediction in Handwriting Recognition 26/44 Notation A is the label alphabet. E.g., characters in the Latin alphabet. A = A {blank} X = (x (1), x (2),...,x (T) ) is the input sequence to the RNN Y = (y (1), y (2),...,y (T) ) is the RNN output, where y (t) [0,1], k = 1,..., A, and A k=1 y(t) k = 1 (softmax layer at the RNN outputs). y (t) k Probability of observing label k at time t. ˆL = (ˆl 1,ˆl 2,...,ˆl U ) is the estimated out label sequence, ˆl j A. Objective The recurrent neural network is defined by some weights w i Determine a mapping h(x) = ˆL from one sequence of length T to another of (unknown) length U, after training with a set of pairs (X,L). k

27 Deep Structured Prediction in Handwriting Recognition 27/44 From Network Outputs to Labellings Define the probability distribution over all possible paths in the set A T : p(π X) = T i=1 y (t) π t π A T RNN and CTC We compute above the input-output (X-Y) response of the RNN. We next build the CTC output. Different sequences Y may provide the same output L due to repeated outputs & blanks. A many-to-one map B : A T A T is defined. Remove repeated labels Remove blanks B(a ab ) = B( aa abb) = aab Outputting a new label when the network switches from predicting no label to predicting a label, or from predicting one label to another. For any L A U with U T then p(l X) = p(π X) π B 1 (L)

28 From Network Outputs to Labellings In p(l X) = p(π X), L A U π B 1 (L) Collapsing The CTC collapses different paths into the same label sequence L: makes it possible to use unsegmented data. Solution The CTC solution is the most probable labelling of the input sequence h(x) = arg max L A T p(l X) Deep Structured Prediction in Handwriting Recognition murillo@us.es, jjm77@cam.ac.uk 28/44

29 Deep Structured Prediction in Handwriting Recognition 29/44 Output Two different problems Training: estimate the probability p(l X) for some L. Output: find the L providing the largest p(l X). Solutions to the CTC Solution 1. Best Path Decoding (Trivial): Let π = arg max π p(π X) (sequence of highest RNN outputs), then L B(π ) Solution 2. Prefix Search Decoding: Based on dynamical programming to calculate the probabilities of successive extensions to labelling prefixes. Solution 2 is harder to solve Forward-backward algorithm can be applied But the complexity grows exponentially with the length of the input sequence Heuristics are used: breaking the input into subsequences

30 Notes on the best solution Example Suppose you have two consecutive time instants t and t + 1 and you want to know the most probable label sequence when blank has probabilities y t = 0.7 and y t+1 = 0.6 A has probabilities y t A = 0.3 and y t+1 A = 0.4 all other probabilities are zero. The most probable labelling is A because it adds the pb of options -A, AA and A-, 0.58 compared to the pb of - -, Figure: A.Graves Ph.D 2008 Hence Solution 1 would provide - while Solution 2, A Deep Structured Prediction in Handwriting Recognition murillo@us.es, jjm77@cam.ac.uk 30/44

31 Deep Structured Prediction in Handwriting Recognition 31/44 Training Training: estimate the probability p(l X) for some given L. The problem is easier than computing the output: here we know L and its length. Note that the CTC has nothing to be trained, is allows for a translation from RNN to labelling and we discuss how we can train the RNN backwards. Objective function uses the target labelings in the training set (L,X), L = ln p(l X) = ln p(l X) (3) (L,X) Given the derivatives with respect to the RNN outputs, the weight gradients of the RNN can be computed with standard backpropagation, for some training pair (L,X), ln p(l X) = y (t) k 1 p(l X)y (t) k (L,X) u:l u =k α(t,u)β(t,u) (4) We need to efficiently compute p(l X): solved with a dynamic programming algorithm, with a forward and backward algorithm to provide α(t,u) and β(t,u).

32 Deep Structured Prediction in Handwriting Recognition 32/44 CTC Forward-Backward (I) Given L, we define L with blanks added to the beginning and end and inserted between every pair of labels. For t = 1,...,T and u = 1...,2U + 1 we define the following two sets of paths and cumulated probabilities V(t,u) = { π A t : B(π) = L u 2,π t = l u } α(t,u) = t π V(t,u) i=1 W(t,u) = { π A T t : B( ˆπ + π) = L ˆπ V(t,u) } β(t,u) = y (i) π i π W(t,u) T i=t y (i) π i Figure: A.Graves et al. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. (2006)

33 Deep Structured Prediction in Handwriting Recognition 33/44 CTC Forward-Backward (II) Can be computed recursively from previous values in t and u ( ) α(t,u) = f α(t 1,u), α(t 1,u 1),α(t 1,u 2) ( ) β(t,u) = g β(t + 1,u), β(t + 1,u + 1),α(t + 1,u + 2) p(l X) = α(t,2u) + α(t,2u + 1)

34 Line Recognition

35 IAM lines database English, 300 dpi, PNG, 256 grey levels 5545 training data 616 validation 2772 test data we measure the CER (character error rate, %) Deep Structured Prediction in Handwriting Recognition 35/44

36 Deep Structured Prediction in Handwriting Recognition 36/44 IAM line recognition TensorFlow has the LSTM, Bi-LSTM and CTC implemented, but not the M-LSTM We use the framework implemented by researchers at Universidad de Aachen (Germany), where the layer MDLSTM is implemented in CUDA Programming the MLSTM may end with memory problems (even in GPUs) Following state of the art works we train a solution with 3 and 5 layers (Conv+MLSTM), with architecture of the type Figure: Structure adapted from Pham et al 2014.

37 Deep Structured Prediction in Handwriting Recognition 37/44 IAM line recognition In particular, for the intermediate layers we use the following (5 layers option), with Dropout (forward) of 25%, Adam as optimizer, minibatchs of 10 images, Figure: Structure following V. Paul, et al Int. Conf. on Frontiers in Handwritting Recognition

38 Deep Structured Prediction in Handwriting Recognition 38/44 IAM line recognition: CER Each Epoch 1 h 50 in a GPU Tesla P GB. It gets a CER = 5.7%. Note that no language rules have been used here.

39 IAM line recognition: CTC CTC output after the transcription of a line of the test set. Deep Structured Prediction in Handwriting Recognition murillo@us.es, jjm77@cam.ac.uk 39/44

40 AM line recognition: text transcripted Deep Structured Prediction in Handwriting Recognition 40/44

41 Conclusions

42 Deep Structured Prediction in Handwriting Recognition 42/44 Conclusions In structured input/outputs (handwriting, speech,...) the LSTM is widely exploited Temporal classification is a problem solved with CTC no need of grammar or dictionary, that can be later used to further improve the solution CNNs, LSTM and CTC are programmed in TensorFlow multidimensional LSTM is not (CUDA was used) Future?: Current works on augmented RNN include (see C. Olag s blog on this topic): Neural Turing Machines (Graves et al Neural Turing Machines 2014) Attentional Interfaces: Atenttion has been included in Bluche 2016 before the CTC layer... THANK YOU for your attention

43 Readings

44 Deep Structured Prediction in Handwriting Recognition 44/44 Readings I. Goodfellow, Y. Bengio, A. Courville Deep Learning. MIT Press 2016, Chapters 6 to 9 for concepts on deep learning, Chapter 10 in particular for recurrent networks (but see below for LSTM). M. Görner, Tensor Flow and Deep Learning without a PhD, Quick Review of Main concepts and examples on DL Z. C. Lipton, John Berkowitz, Charles Elkan A Critical Review of Recurrent Neural Networks for Sequence Learning K. Cho, Natural Language Understanding with Distributed Representation, 2016, For a detailed explanation on LSTM C. Olag Understanding LSTM Networks For a good explanation on LSTM A. Graves Supervised sequence labelling with recurrent neural networks Ph. D. Thesis On the explanation of the CTC T. Bluche, Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition NIPS 2016, state-of-the-art solution to hand writing recognition problem

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks

More information

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor niversity of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks

More information

Deep Learning Recurrent Networks 2/28/2018

Deep Learning Recurrent Networks 2/28/2018 Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good

More information

Recurrent Neural Network

Recurrent Neural Network Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017 Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate

More information

CSC321 Lecture 16: ResNets and Attention

CSC321 Lecture 16: ResNets and Attention CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the

More information

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech

More information

RNNLIB: Connectionist Temporal Classification and Transcription Layer

RNNLIB: Connectionist Temporal Classification and Transcription Layer RNNLIB: Connectionist Temporal Classification and Transcription Layer Wantee Wang 2015-02-08 16:56:40 +0800 Contents 1 The Name 2 2 The Theory 2 2.1 List of Symbols............................ 2 2.2 Training

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Vinod Variyam and Ian Goodfellow) sscott@cse.unl.edu 2 / 35 All our architectures so far work on fixed-sized inputs neural networks work on sequences of inputs E.g., text, biological

More information

CSC321 Lecture 15: Exploding and Vanishing Gradients

CSC321 Lecture 15: Exploding and Vanishing Gradients CSC321 Lecture 15: Exploding and Vanishing Gradients Roger Grosse Roger Grosse CSC321 Lecture 15: Exploding and Vanishing Gradients 1 / 23 Overview Yesterday, we saw how to compute the gradient descent

More information

Long-Short Term Memory and Other Gated RNNs

Long-Short Term Memory and Other Gated RNNs Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling

More information

Recurrent and Recursive Networks

Recurrent and Recursive Networks Neural Networks with Applications to Vision and Language Recurrent and Recursive Networks Marco Kuhlmann Introduction Applications of sequence modelling Map unsegmented connected handwriting to strings.

More information

Deep Learning. Recurrent Neural Network (RNNs) Ali Ghodsi. October 23, Slides are partially based on Book in preparation, Deep Learning

Deep Learning. Recurrent Neural Network (RNNs) Ali Ghodsi. October 23, Slides are partially based on Book in preparation, Deep Learning Recurrent Neural Network (RNNs) University of Waterloo October 23, 2015 Slides are partially based on Book in preparation, by Bengio, Goodfellow, and Aaron Courville, 2015 Sequential data Recurrent neural

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks Datamining Seminar Kaspar Märtens Karl-Oskar Masing Today's Topics Modeling sequences: a brief overview Training RNNs with back propagation A toy example of training an RNN Why

More information

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text

More information

arxiv: v3 [cs.lg] 14 Jan 2018

arxiv: v3 [cs.lg] 14 Jan 2018 A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Slide credit from Hung-Yi Lee & Richard Socher

Slide credit from Hung-Yi Lee & Richard Socher Slide credit from Hung-Yi Lee & Richard Socher 1 Review Recurrent Neural Network 2 Recurrent Neural Network Idea: condition the neural network on all previous words and tie the weights at each time step

More information

Neural Architectures for Image, Language, and Speech Processing

Neural Architectures for Image, Language, and Speech Processing Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent

More information

Deep Recurrent Neural Networks

Deep Recurrent Neural Networks Deep Recurrent Neural Networks Artem Chernodub e-mail: a.chernodub@gmail.com web: http://zzphoto.me ZZ Photo IMMSP NASU 2 / 28 Neuroscience Biological-inspired models Machine Learning p x y = p y x p(x)/p(y)

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Vanishing and Exploding Gradients ReLUs Xavier Initialization Batch Normalization Highway Architectures: Resnets, LSTMs and GRUs Causes

More information

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves Recurrent Neural Networks Deep Learning Lecture 5 Efstratios Gavves Sequential Data So far, all tasks assumed stationary data Neither all data, nor all tasks are stationary though Sequential Data: Text

More information

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions 2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors: S. Scardapane, S. Van Vaerenbergh,

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018 Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:

More information

Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function

Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function Austin Wang Adviser: Xiuyuan Cheng May 4, 2017 1 Abstract This study analyzes how simple recurrent neural

More information

EE-559 Deep learning Recurrent Neural Networks

EE-559 Deep learning Recurrent Neural Networks EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret https://fleuret.org/ee559/ Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

HIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU

HIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU April 4-7, 2016 Silicon Valley HIGH PERFORMANCE CTC TRAINING FOR END-TO-END SPEECH RECOGNITION ON GPU Minmin Sun, NVIDIA minmins@nvidia.com April 5th Brief Introduction of CTC AGENDA Alpha/Beta Matrix

More information

End-to-end Automatic Speech Recognition

End-to-end Automatic Speech Recognition End-to-end Automatic Speech Recognition Markus Nussbaum-Thom IBM Thomas J. Watson Research Center Yorktown Heights, NY 10598, USA Markus Nussbaum-Thom. February 22, 2017 Nussbaum-Thom: IBM Thomas J. Watson

More information

Generating Sequences with Recurrent Neural Networks

Generating Sequences with Recurrent Neural Networks Generating Sequences with Recurrent Neural Networks Alex Graves University of Toronto & Google DeepMind Presented by Zhe Gan, Duke University May 15, 2015 1 / 23 Outline Deep recurrent neural network based

More information

Recurrent Neural Networks. deeplearning.ai. Why sequence models?

Recurrent Neural Networks. deeplearning.ai. Why sequence models? Recurrent Neural Networks deeplearning.ai Why sequence models? Examples of sequence data The quick brown fox jumped over the lazy dog. Speech recognition Music generation Sentiment classification There

More information

Deep Learning Recurrent Networks 10/11/2017

Deep Learning Recurrent Networks 10/11/2017 Deep Learning Recurrent Networks 10/11/2017 1 Which open source project? Related math. What is it talking about? And a Wikipedia page explaining it all The unreasonable effectiveness of recurrent neural

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Deep Learning for Automatic Speech Recognition Part II

Deep Learning for Automatic Speech Recognition Part II Deep Learning for Automatic Speech Recognition Part II Xiaodong Cui IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Fall, 2018 Outline A brief revisit of sampling, pitch/formant and MFCC DNN-HMM

More information

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network

More information

Recurrent Neural Networks. Jian Tang

Recurrent Neural Networks. Jian Tang Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating

More information

Machine Learning Lecture 10

Machine Learning Lecture 10 Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes

More information

Neural Networks in Structured Prediction. November 17, 2015

Neural Networks in Structured Prediction. November 17, 2015 Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving

More information

Deep Learning: a gentle introduction

Deep Learning: a gentle introduction Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow, Index A Activation functions, neuron/perceptron binary threshold activation function, 102 103 linear activation function, 102 rectified linear unit, 106 sigmoid activation function, 103 104 SoftMax activation

More information

Natural Language Processing and Recurrent Neural Networks

Natural Language Processing and Recurrent Neural Networks Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill

More information

CSC321 Lecture 10 Training RNNs

CSC321 Lecture 10 Training RNNs CSC321 Lecture 10 Training RNNs Roger Grosse and Nitish Srivastava February 23, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 10 Training RNNs February 23, 2015 1 / 18 Overview Last time, we saw

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Machine Learning Lecture 12

Machine Learning Lecture 12 Machine Learning Lecture 12 Neural Networks 30.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability

More information

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Neural Networks 2. 2 Receptive fields and dealing with image inputs CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There

More information

Memory-Augmented Attention Model for Scene Text Recognition

Memory-Augmented Attention Model for Scene Text Recognition Memory-Augmented Attention Model for Scene Text Recognition Cong Wang 1,2, Fei Yin 1,2, Cheng-Lin Liu 1,2,3 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences

More information

Based on the original slides of Hung-yi Lee

Based on the original slides of Hung-yi Lee Based on the original slides of Hung-yi Lee New Activation Function Rectified Linear Unit (ReLU) σ z a a = z Reason: 1. Fast to compute 2. Biological reason a = 0 [Xavier Glorot, AISTATS 11] [Andrew L.

More information

arxiv: v1 [cs.cl] 21 May 2017

arxiv: v1 [cs.cl] 21 May 2017 Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we

More information

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Contents 1 Pseudo-code for the damped Gauss-Newton vector product 2 2 Details of the pathological synthetic problems

More information

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018 Sequence Models Ji Yang Department of Computing Science, University of Alberta February 14, 2018 This is a note mainly based on Prof. Andrew Ng s MOOC Sequential Models. I also include materials (equations,

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Welcome to the Machine Learning Practical Deep Neural Networks. MLP Lecture 1 / 18 September 2018 Single Layer Networks (1) 1

Welcome to the Machine Learning Practical Deep Neural Networks. MLP Lecture 1 / 18 September 2018 Single Layer Networks (1) 1 Welcome to the Machine Learning Practical Deep Neural Networks MLP Lecture 1 / 18 September 2018 Single Layer Networks (1) 1 Introduction to MLP; Single Layer Networks (1) Steve Renals Machine Learning

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

Sequence Modeling with Neural Networks

Sequence Modeling with Neural Networks Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Lecture 15: Exploding and Vanishing Gradients

Lecture 15: Exploding and Vanishing Gradients Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train

More information

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Sequence Transduction with Recurrent Neural Networks

Sequence Transduction with Recurrent Neural Networks Alex Graves graves@cs.toronto.edu Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4 Abstract Many machine learning tasks can be expressed as the transformation or transduction

More information

Speech and Language Processing

Speech and Language Processing Speech and Language Processing Lecture 5 Neural network based acoustic and language models Information and Communications Engineering Course Takahiro Shinoaki 08//6 Lecture Plan (Shinoaki s part) I gives

More information

Introduction to RNNs!

Introduction to RNNs! Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed Outline Why Recurrent Neural Networks (RNNs)? The Vanilla RNN unit The RNN forward pass Backpropagation refresher The RNN

More information

Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST

Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST 1 Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST Summary We have shown: Now First order optimization methods: GD (BP), SGD, Nesterov, Adagrad, ADAM, RMSPROP, etc. Second

More information

arxiv: v1 [cs.ne] 14 Nov 2012

arxiv: v1 [cs.ne] 14 Nov 2012 Alex Graves Department of Computer Science, University of Toronto, Canada graves@cs.toronto.edu arxiv:1211.3711v1 [cs.ne] 14 Nov 2012 Abstract Many machine learning tasks can be expressed as the transformation

More information

MULTIPLICATIVE LSTM FOR SEQUENCE MODELLING

MULTIPLICATIVE LSTM FOR SEQUENCE MODELLING MULTIPLICATIVE LSTM FOR SEQUENCE MODELLING Ben Krause, Iain Murray & Steve Renals School of Informatics, University of Edinburgh Edinburgh, Scotland, UK {ben.krause,i.murray,s.renals}@ed.ac.uk Liang Lu

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Deep Learning and Lexical, Syntactic and Semantic Analysis. Wanxiang Che and Yue Zhang

Deep Learning and Lexical, Syntactic and Semantic Analysis. Wanxiang Che and Yue Zhang Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10 Part 2: Introduction to Deep Learning Part 2.1: Deep Learning Background What is Machine Learning? From Data

More information

Deep Learning Autoencoder Models

Deep Learning Autoencoder Models Deep Learning Autoencoder Models Davide Bacciu Dipartimento di Informatica Università di Pisa Intelligent Systems for Pattern Recognition (ISPR) Generative Models Wrap-up Deep Learning Module Lecture Generative

More information

A Tutorial On Backward Propagation Through Time (BPTT) In The Gated Recurrent Unit (GRU) RNN

A Tutorial On Backward Propagation Through Time (BPTT) In The Gated Recurrent Unit (GRU) RNN A Tutorial On Backward Propagation Through Time (BPTT In The Gated Recurrent Unit (GRU RNN Minchen Li Department of Computer Science The University of British Columbia minchenl@cs.ubc.ca Abstract In this

More information

Recurrent Neural Networks 2. CS 287 (Based on Yoav Goldberg s notes)

Recurrent Neural Networks 2. CS 287 (Based on Yoav Goldberg s notes) Recurrent Neural Networks 2 CS 287 (Based on Yoav Goldberg s notes) Review: Representation of Sequence Many tasks in NLP involve sequences w 1,..., w n Representations as matrix dense vectors X (Following

More information

Feature Design. Feature Design. Feature Design. & Deep Learning

Feature Design. Feature Design. Feature Design. & Deep Learning Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately

More information

On the use of Long-Short Term Memory neural networks for time series prediction

On the use of Long-Short Term Memory neural networks for time series prediction On the use of Long-Short Term Memory neural networks for time series prediction Pilar Gómez-Gil National Institute of Astrophysics, Optics and Electronics ccc.inaoep.mx/~pgomez In collaboration with: J.

More information

RECURRENT NEURAL NETWORKS WITH FLEXIBLE GATES USING KERNEL ACTIVATION FUNCTIONS

RECURRENT NEURAL NETWORKS WITH FLEXIBLE GATES USING KERNEL ACTIVATION FUNCTIONS 2018 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 17 20, 2018, AALBORG, DENMARK RECURRENT NEURAL NETWORKS WITH FLEXIBLE GATES USING KERNEL ACTIVATION FUNCTIONS Simone Scardapane,

More information

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Lecture 9 Numerical optimization and deep learning Niklas Wahlström Division of Systems and Control Department of Information Technology Uppsala University niklas.wahlstrom@it.uu.se

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory Danilo López, Nelson Vera, Luis Pedraza International Science Index, Mathematical and Computational Sciences waset.org/publication/10006216

More information

Seq2Seq Losses (CTC)

Seq2Seq Losses (CTC) Seq2Seq Losses (CTC) Jerry Ding & Ryan Brigden 11-785 Recitation 6 February 23, 2018 Outline Tasks suited for recurrent networks Losses when the output is a sequence Kinds of errors Losses to use CTC Loss

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

Based on the original slides of Hung-yi Lee

Based on the original slides of Hung-yi Lee Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services

More information

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting 卷积 LSTM 网络 : 利用机器学习预测短期降雨 施行健 香港科技大学 VALSE 2016/03/23

Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting 卷积 LSTM 网络 : 利用机器学习预测短期降雨 施行健 香港科技大学 VALSE 2016/03/23 Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting 卷积 LSTM 网络 : 利用机器学习预测短期降雨 施行健 香港科技大学 VALSE 2016/03/23 Content Quick Review of Recurrent Neural Network Introduction

More information

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

Unfolded Recurrent Neural Networks for Speech Recognition

Unfolded Recurrent Neural Networks for Speech Recognition INTERSPEECH 2014 Unfolded Recurrent Neural Networks for Speech Recognition George Saon, Hagen Soltau, Ahmad Emami and Michael Picheny IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598 gsaon@us.ibm.com

More information

Introduction to Deep Learning

Introduction to Deep Learning Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2015 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 1 / 39 Outline 1 Universality of Neural Networks

More information

Deep Learning Recurrent Networks 10/16/2017

Deep Learning Recurrent Networks 10/16/2017 Deep Learning Recurrent Networks 10/16/2017 1 Which open source project? Related math. What is it talking about? And a Wikipedia page explaining it all The unreasonable effectiveness of recurrent neural

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Contents. (75pts) COS495 Midterm. (15pts) Short answers

Contents. (75pts) COS495 Midterm. (15pts) Short answers Contents (75pts) COS495 Midterm 1 (15pts) Short answers........................... 1 (5pts) Unequal loss............................. 2 (15pts) About LSTMs........................... 3 (25pts) Modular

More information

High Order LSTM/GRU. Wenjie Luo. January 19, 2016

High Order LSTM/GRU. Wenjie Luo. January 19, 2016 High Order LSTM/GRU Wenjie Luo January 19, 2016 1 Introduction RNN is a powerful model for sequence data but suffers from gradient vanishing and explosion, thus difficult to be trained to capture long

More information

Demystifying deep learning. Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK

Demystifying deep learning. Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK Demystifying deep learning Petar Veličković Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK London Data Science Summit 20 October 2017 Introduction

More information