Chinese Character Handwriting Generation in TensorFlow

Size: px
Start display at page:

Download "Chinese Character Handwriting Generation in TensorFlow"

Transcription

1 Chinese Character Handwriting Generation in TensorFlow Heri Zhao, Jiayu Ye, Ke Xu Abstract Recurrent neural network(rnn) has been proved to be successful in several sequence generation task. RNN can generate reasonable prediction of the sequence by remembering previous state. For example, trained with Wikipedia text, RNN can predict the word in a sentence or even generate new masterpiece. Also, if trained with Linux kernel code, RNN can imitate writing C code. Furthermore, generating English real handwritten strokes is also one of the capabilities of RNN. So we are curious same technique also applies to Chinese character which got much more different characters and complicated stroke sequence. We experimented with a joint Mixture Density Network and RNN model to generate handwritten Chinese Characters. I. INTRODUCTION Chinese Characters recognition has obtained great achievements using traditional Convolutional Neural Network[1], but drawing handwritten characters is a completely different task. Given the success of generating handwritten English Characters[2], we aim to craft a model for handwritten Chinese characters. As an old writing system with thousands of years history, Chinese character provides a drastically different challenge compared to English characters. First of all, it has tens of thousands of characters in the dictionary compared to only 26 characters in English. Also there are written styles that greatly diverge from the canonical representation (standard font). And each character contains much more strokes than English characters. Fortunately, Recurrent Neural Network(RNN) provides a generic model that represents a sequence of data. Researchers have used RNN to generate fake Chinese characters[3], and it shows a great potential of generating real human-recognizable Chinese characters. This model generates new strokes based on the stated stored in RNN. It provides meaningful parts which are composed of non-trivial amount strokes. Mixture density network(mdn) is applied here to generate multi-variable neural network[4]. Without doubt, other researchers have already proposed a framework to generated handwritten Chinese characters conditional on a specific Chinese character[5]. This paper provides a generative model that utilizes the RNN and Gaussian Mixture Model(GMM) to predict the next positions and the states of the pen. The GMM here is a modified version over the Mixture Density Network. Besides the primitive input strokes, RNN is also conditioned on a condensed representation of each characters so that the system will draw specific characters that are human-recognizable. Given these successful explorations, we experimented with a model using generalized MDN to synthesis handwritten Chinese characters via Tensorflow[6]. This generative model is trained with KanjiVG[7] data targeting at producing real strokes. The rest of this paper is organized as follows. Section II introduces the dataset. Section III and IV are the baseline and oracle models. Section V describes the core generative RNN model. Section VI explains the evaluation metrics. Section VII compares the proposed model with state-of-art approaches. Section VIII reports experimental results. II. DATASET AND REPRESENTATION The dataset used for training is KanjiVG, where each Chinese character is represented as a variable length of points sequence in the order of pen movements: [[x 1, y 1, eos 1, eoc 1 ],... [x n, y n, eos n, eoc n ]] (1) x i and y i are the xy-coordinates of each point, eos i (end of stroke) is a single bit indicating whether the current point is the end point of a stroke, 1

2 #strokes in Char #correct #Total Random Accuracy Bigram Accuracy Bigram w/ distancecost /(3-1)! = 50% 70.37% 85.19% /(4-1)! = 16.7% 29.63% 58.18% /(5-1)! = 4.16% 13.04% 22.06% /(6-1)! = 0.83% 1.8% 13.51% /(7-1)! = 0.001% N/A 8.00% TABLE I: Baseline Results and eoc i (end of character) is also a single bit indicating whether the character is end or not. This representation is better than the image based character, because the sequences contains not only the spatial information, but also the time order for each stroke. This is very useful when generating characters later. Here is a visualized example of character 卒, this character contains 8 strokes: Fig. 1: A visualized example with stroke orders The xy-coordinates have been scaled down with coefficient 15 to help to fit training parameters, and then scaled up back to visualize it in svg format. III. BASELINE ENHANCED N-GRAM Use the given dataset, the baseline algorithm is trying to predict the next stroke based on the current stroke, given all of the strokes for a character as prior knowledge. The cost not only depends on the how frequency a (pre-stroke, post-stroke) combination happens in the training set, but also depends on the distance between the previous stroke and post stroke. The n-gram model is trained and validated on 185 random characters from the dataset. A. Algorithm The algorithm is a stroke-order search problem, which can be easily solved by any search algorithm: State: binary tuple [previous-stroke, list of strokes that has not yet been selected] Start state: the first stroke ground truth End state: the stokes list is empty SuccAndCost: Given a state, return (action, newstate, cost) based on the bigram cost with distance cost B. Distance Cost Stroke distance cost has been introduced in order to utilize the spatial information in predicting next states. The distance is simply the Manhattan Distance of two center points of strokes (s 1, s 2 ) with a coefficient k: DistanceCost = k ManhattanDistance(s 1, s 2 ) (2) This distance cost takes spatial information into consideration, and prefer the closest stroke in predicting, which is a very command stroke movement in Chinese. C. Results Applied uniform cost search on the strokereorderproblem with distance cost (k = 3), Table I shows the results of baseline model in predicting characters with number of stroke from 3 to 7. More complicated character with more than 7 strokes are not tested in this model, because the accuracy will be very closed to the random accuracy, which means the model would not give any meaningful predictions. The baseline model performs very well with small number of strokes, but underperforms as the number of strokes increases. D. Baseline Take Away Based on the results of baseline model, it is not quite interesting only to predict the order of strokes. And thus, Section V describes another 2

3 method, which will do character generation instead of only prediction. The generative method not only provides the order of the strokes, but also generates stylized characters. The focus of the rest of the report is on the generative model, and the evaluation also focuses on how similar the character that the model generates compared with the original characters, instead of only evaluating on order of strokes. IV. ORACLE For the problem of stroke prediction and character generation, the oracle model is human being, which is then the character dataset in this case. In terms of stroke order prediction, the oracle algorithm takes all the strokes information from dataset (Chinese Dictionary) and behaves like dictionary lookup, cheating to look at the correct answers. In terms of handwriting generation, the oracle uses the characters in dataset directly as output, without adding any variations. Therefore, the oracle s accuracy is 100% for stroke prediction, and loss is 0 for RNN model. Fig. 2 shows the baseline and oracle results, as well as the comparison. At a time t, input x t and previous hidden state h t 1 are feed into the RNN to produce the output y t and new hidden state h t. The hidden state h t represents the compressed information of the previous input sequence. But it s well known that RNN is hard to train until Long Short Term Memory appeared[9]. B. Long Short Term Memory(LSTM) The intuition behind LSTM is that, cell states C t is added into the RNN flow to keep track of the internal states. The states are controlled by three kind of gates: forgot gate f t, input gate i t and output gate o t. As formally described in [9], f t = σ(w f [h t 1, x t ] + b f ) (3) i t = σ(w i [h t 1, x t ] + b i ) (4) i t = σ(w i [h t 1, x t ] + b i ) (5) C t = tanh(w C [h t 1, x t ] + b C ) (6) C t = f t C t 1 + i t C t (7) o t = σ(w o [h t 1, x t ] + b o ) (8) h t = o t tanh(c t ) (9) All the W are the weight matrix to be trained and so as the bias term b. C. Gated Recurrent Unit(GRU) GRU provides a simpler way to abstract the cell data and hidden state. It uses a single state to combine the result of output gate and forget gate. However, this simplified model does provide similar performance [10]. As formally described in [9], Fig. 2: Baseline/Oracle Comparison V. CONDITIONAL GENERATIVE RECURRENT NEURAL NETWORK A. Recurrent Neural Network (RNN) A recurrent neural network is a general model to represent a sequential data series. The hidden state is feed into the hidden node in a loop mechanism. z t = σ(w z [h t 1, x t ]) (10) r t = σ(w r [h t 1, x t ]) (11) h t = tanh(w [r t h t 1 ]) (12) h t = (1 z t ) h t 1 + z t h t (13) Given a complex input data representation as mentioned in Section II, we prefer to modify a simpler model to tackle the task. Applied the model to our data, x t R 1 4 is just the one data entry with position and eos/eoc information. 3

4 Fig. 3: A visualized example of GRU unit from [9] D. Adjusted GRU with class information Besides input data, class information is also feed into the GRU. Currently an one-hot vector c R #classes is used to let RNN condition on the specific class. An modified version of GRU is listed as follow, z t = σ(w z [h t 1, x t ] + M z c) (14) r t = σ(w r [h t 1, x t ] + M r c) (15) h t = tanh(w [r t h t 1 ]) (16) h t = (1 z t ) h t 1 + z t h t (17) M z and M r are additional parameters to tune in the RNN so that the output of RNN will be based on the character class. E. Mixture Density Network When writing a Chinese character, we do not care the exact location of the next end point of the previous sequence. So the problem is defined as an multi-valued output problem that traditional forward neural network wouldn t perform well. So Mixture Density Network comes into place[4]. In a high level view, MDN is a generic way of describe multi-valued data using Gaussian Mixtures Model and embedded Neural Network. The inference is defined as probability distribution of target t conditioned on the input x, referred from [4](22). m p(t x) = α i (x)φ i (t x) (18) i=1 Here, α represents the importance of the Gaussian model and φ is Gaussian distributions. Then it s RNN s responsibility to generate the parameters for m number of Gaussian distributions µ and σ. The train loss is defined as negative log of the overall inference, referred from [2](26). T L(x) = log( π j t N (x t+1 µ j t, σ j t )) t=1 j (19) Notice we truncated the end of character and end of stroke loss to keep the model simple. VI. EVALUATION METRICS Due to the nature of generative model. Evaluation can t fully represent the errors the model make. We have implemented a preliminary verification method by calculating the total location difference of each stroke. L(c) = log( location) (20) Notice that if strokes are not in a right order, the location loss is still high. Furthermore, we utilize an existing Chinese character classifier to provide more formal data. Say c is the class, Classifier outputs a c given an character image Image. Then the errors is L(all c) = c Classifier(Image) c (21) VII. LITERATURE REVIEW This section provides comparison of the proposed approach with English sentences synthesis[2] state-of-art approach, and Chinese character drawing[5] state-of-art approach. We compare them in the aspects of dataset, models, and results. A. Dataset 1) Compared with English synthesis approach: Although this approach for English can reach very high accuracy and performs very well on English words dataset, it is probably not applicable to Chinese characters. Because Chinese characters are far more complicated than English sentences, which have only fixed number of letters. And each Chinese character is compose of multiple different strokes with different combinations. Besides, Chinese characters have more spatial information, like a graph, instead of an ordered English letter sequence. 4

5 (a) Printed styled characters (b) Online handwritten characters [11] Fig. 4: Comparison of datasets the English sequence generation is pure generative task, there is no actual evaluation metric to compare with. 2) Compared with Chinese drawing approach: The Chinese drawing approach uses the online handwritten Chinese character dataset, which contains more than two million training characters written by human. And there are multiple different samples(written by different people) for a given character. However, for our proposed approach, we use the dataset with printed styled characters, where one character only corresponding to one sample. This make the training more difficult due to the dataset limitation. Fig. 4 shows the differences of the dataset between printed styled characters and online handwritten characters. D. Result comparison The chinese paper reaches the mean accuracy of 93.98% for 3755 classes. Due to the time constraint, we are only experimenting the model with 5 classes which yields a mean accuracy of 14%. VIII. E XPERIMENTS AND A NALYSIS In this section, we present the experimental results on what experiments we did, and how we got the the end results by simplifying features or pruning training sets. B. Specific GMM and general MDN As mentioned above, we are employing more generic MDN model with less complexity. Compared to Chinese character drawing paper [5], we truncate the end of character and end of stroke data in training phase. Also, instead of using a dense matrix with character embedding, we are only using an one-hot vector to represent the specific class information. A. Taste the full dataset Initially, we use the full dataset which contains more than 10,000 characters. As we are using a one-hot vector to represent the class information. It provides a huge vector which produces very poor result when sampling conditional characters. Because the per stroke data is only of size 4 or 5 (dependent on if the end of stroke or end of character are included), the massive class index vector dilutes the weight matrix of the actual stroke weight in the RNN. Hence we tried to reduce the dataset to gain more intuition. C. Inside GRU or filtering layer In the sequence generation paper [2], author proposed another filter lay in RNN so that the network is conditioned on a specific character. However, it s targeted to provide cursive writing in a window of several English characters. To avoid the complication, we directly embedded the onehot class vector inside the RNN cell. Given that B. Result for Limited Dataset Then we limit the dataset to be simply size of 5 to verify our ideas. It produces some occluded 5

6 or prolonged sampling examples which are humanrecognizable to some extent. We believe that s because the dataset is polluted with non-fix aspectratio scaled images. C. Remove Training Noise and Modify Batch Then we look into the data generation process, it does provides random generated data with different scales to enlarge the number of training examples. As we want to generate real Chinese characters that are comparable to canonical representation, we removed the random sampling. Also some inconsistent batching mechanism are removed since we don t want to RNN to learn the transition from one character to another. D. Remove eos and eoc We try to remove the end of stroke and end of character from the training set, so that the model can only focus on generating the next xycoordinates. The model converges after around 8,000 batches, and finally is able to generate human recognizable characters. Fig. 5 shows 20 results that are purely generated by RNN for the character 七. Experiment Convergence Training Loss Batches until Convergence Full No > 30, 000 N/A Limited No > 30, 000 N/A No eos&eoc Yes 1.1 8, 000 TABLE II: Training Loss and Convergence IX. CONCLUSIONS This paper presents a joint Mixture Density Network and RNN model to generate handwritten Chinese Characters. Although there are existing work of drawing character, considering all the difficulties stated above, we tried several different methods, and showed some preliminary results here. The final model only applies to very limited dataset and character classes. There are still much future work we can do for this interesting topic. REFERENCES [1] Y. H. Zhang, Deep Convolutional Network for Handwritten Chinese Character Recognition, cs231n. [2] A. Graves, Generating Sequences With Recurrent Neural Networks arxiv: v5 [cs.ne] 5 Jun [3] S. Otoro, Recurrent Net Dreams Up Fake Chinese Characters in Vector Format with TensorFlow, studio otoro [4] C.M. Bishop, Mixture Density Networks, Neural Computing Research Group Report: NCRG/94/004. [5] X.Y. Zhang, F. Yin, Y.M. Zhang, C.L. Liu, Y. Bengio. Drawing and Recognizing Chinese Characters with Recurrent Neural Network, arxiv: v1 [cs.cv] 21 Jun 2016 [6] Google Research TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. [7] KanjiVG: A description of the sinographs (or kanji) used by the Japanese language. [8] Colah, Understanding LSTM Networks. [9] S. Hochreiter, J. Schmidhuber LONG SHORT-TERM MEMORY, Neural Computation 9(8): , 1997 [10] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, arxiv: v1 [cs.ne] 11 Dec 2014 [11] C.L Liu, F. Yin, D.H. Wang, Q.F. Wang, CASIA Online and Offline Chinese Handwriting Databases, National Laboratory of Pattern Recognition (NLPR) Fig. 5: Results purely generated by RNN E. Training Loss and Explanation Table II shows the training results for each different attempts. 6

Generating Sequences with Recurrent Neural Networks

Generating Sequences with Recurrent Neural Networks Generating Sequences with Recurrent Neural Networks Alex Graves University of Toronto & Google DeepMind Presented by Zhe Gan, Duke University May 15, 2015 1 / 23 Outline Deep recurrent neural network based

More information

arxiv: v1 [cs.cl] 21 May 2017

arxiv: v1 [cs.cl] 21 May 2017 Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we

More information

Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition

Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition Jianshu Zhang, Yixing Zhu, Jun Du and Lirong Dai National Engineering Laboratory for Speech and Language Information

More information

Deep Learning. Recurrent Neural Network (RNNs) Ali Ghodsi. October 23, Slides are partially based on Book in preparation, Deep Learning

Deep Learning. Recurrent Neural Network (RNNs) Ali Ghodsi. October 23, Slides are partially based on Book in preparation, Deep Learning Recurrent Neural Network (RNNs) University of Waterloo October 23, 2015 Slides are partially based on Book in preparation, by Bengio, Goodfellow, and Aaron Courville, 2015 Sequential data Recurrent neural

More information

arxiv: v3 [cs.lg] 14 Jan 2018

arxiv: v3 [cs.lg] 14 Jan 2018 A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe

More information

EE-559 Deep learning Recurrent Neural Networks

EE-559 Deep learning Recurrent Neural Networks EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret https://fleuret.org/ee559/ Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent

More information

EE-559 Deep learning LSTM and GRU

EE-559 Deep learning LSTM and GRU EE-559 Deep learning 11.2. LSTM and GRU François Fleuret https://fleuret.org/ee559/ Mon Feb 18 13:33:24 UTC 2019 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE The Long-Short Term Memory unit (LSTM) by Hochreiter

More information

Neural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve)

Neural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve) Neural Turing Machine Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve) Introduction Neural Turning Machine: Couple a Neural Network with external memory resources The combined

More information

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017 Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,

More information

NEURAL LANGUAGE MODELS

NEURAL LANGUAGE MODELS COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,

More information

Improved Learning through Augmenting the Loss

Improved Learning through Augmenting the Loss Improved Learning through Augmenting the Loss Hakan Inan inanh@stanford.edu Khashayar Khosravi khosravi@stanford.edu Abstract We present two improvements to the well-known Recurrent Neural Network Language

More information

A Tutorial On Backward Propagation Through Time (BPTT) In The Gated Recurrent Unit (GRU) RNN

A Tutorial On Backward Propagation Through Time (BPTT) In The Gated Recurrent Unit (GRU) RNN A Tutorial On Backward Propagation Through Time (BPTT In The Gated Recurrent Unit (GRU RNN Minchen Li Department of Computer Science The University of British Columbia minchenl@cs.ubc.ca Abstract In this

More information

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018 Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:

More information

Introduction to RNNs!

Introduction to RNNs! Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed Outline Why Recurrent Neural Networks (RNNs)? The Vanilla RNN unit The RNN forward pass Backpropagation refresher The RNN

More information

Lecture 15: Exploding and Vanishing Gradients

Lecture 15: Exploding and Vanishing Gradients Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train

More information

Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function

Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function Austin Wang Adviser: Xiuyuan Cheng May 4, 2017 1 Abstract This study analyzes how simple recurrent neural

More information

CSC321 Lecture 15: Recurrent Neural Networks

CSC321 Lecture 15: Recurrent Neural Networks CSC321 Lecture 15: Recurrent Neural Networks Roger Grosse Roger Grosse CSC321 Lecture 15: Recurrent Neural Networks 1 / 26 Overview Sometimes we re interested in predicting sequences Speech-to-text and

More information

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017 Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion

More information

Recurrent Neural Networks. Jian Tang

Recurrent Neural Networks. Jian Tang Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating

More information

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech

More information

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks

More information

Long-Short Term Memory and Other Gated RNNs

Long-Short Term Memory and Other Gated RNNs Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling

More information

Recurrent Neural Networks

Recurrent Neural Networks Recurrent Neural Networks Datamining Seminar Kaspar Märtens Karl-Oskar Masing Today's Topics Modeling sequences: a brief overview Training RNNs with back propagation A toy example of training an RNN Why

More information

Memory-Augmented Attention Model for Scene Text Recognition

Memory-Augmented Attention Model for Scene Text Recognition Memory-Augmented Attention Model for Scene Text Recognition Cong Wang 1,2, Fei Yin 1,2, Cheng-Lin Liu 1,2,3 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences

More information

CSC321 Lecture 16: ResNets and Attention

CSC321 Lecture 16: ResNets and Attention CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the

More information

High Order LSTM/GRU. Wenjie Luo. January 19, 2016

High Order LSTM/GRU. Wenjie Luo. January 19, 2016 High Order LSTM/GRU Wenjie Luo January 19, 2016 1 Introduction RNN is a powerful model for sequence data but suffers from gradient vanishing and explosion, thus difficult to be trained to capture long

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

CSC321 Lecture 10 Training RNNs

CSC321 Lecture 10 Training RNNs CSC321 Lecture 10 Training RNNs Roger Grosse and Nitish Srivastava February 23, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 10 Training RNNs February 23, 2015 1 / 18 Overview Last time, we saw

More information

Presented By: Omer Shmueli and Sivan Niv

Presented By: Omer Shmueli and Sivan Niv Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan

More information

Task-Oriented Dialogue System (Young, 2000)

Task-Oriented Dialogue System (Young, 2000) 2 Review Task-Oriented Dialogue System (Young, 2000) 3 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Speech Recognition Hypothesis are there any action movies to see

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Deep Learning Recurrent Networks 2/28/2018

Deep Learning Recurrent Networks 2/28/2018 Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good

More information

CSC321 Lecture 15: Exploding and Vanishing Gradients

CSC321 Lecture 15: Exploding and Vanishing Gradients CSC321 Lecture 15: Exploding and Vanishing Gradients Roger Grosse Roger Grosse CSC321 Lecture 15: Exploding and Vanishing Gradients 1 / 23 Overview Yesterday, we saw how to compute the gradient descent

More information

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves Recurrent Neural Networks Deep Learning Lecture 5 Efstratios Gavves Sequential Data So far, all tasks assumed stationary data Neither all data, nor all tasks are stationary though Sequential Data: Text

More information

Deep Sequence Models. Context Representation, Regularization, and Application to Language. Adji Bousso Dieng

Deep Sequence Models. Context Representation, Regularization, and Application to Language. Adji Bousso Dieng Deep Sequence Models Context Representation, Regularization, and Application to Language Adji Bousso Dieng All Data Are Born Sequential Time underlies many interesting human behaviors. Elman, 1990. Why

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Contents. (75pts) COS495 Midterm. (15pts) Short answers

Contents. (75pts) COS495 Midterm. (15pts) Short answers Contents (75pts) COS495 Midterm 1 (15pts) Short answers........................... 1 (5pts) Unequal loss............................. 2 (15pts) About LSTMs........................... 3 (25pts) Modular

More information

Substroke Approach to HMM-based On-line Kanji Handwriting Recognition

Substroke Approach to HMM-based On-line Kanji Handwriting Recognition Sixth International Conference on Document nalysis and Recognition (ICDR 2001), pp.491-495 (2001-09) Substroke pproach to HMM-based On-line Kanji Handwriting Recognition Mitsuru NKI, Naoto KIR, Hiroshi

More information

A QUESTION ANSWERING SYSTEM USING ENCODER-DECODER, SEQUENCE-TO-SEQUENCE, RECURRENT NEURAL NETWORKS. A Project. Presented to

A QUESTION ANSWERING SYSTEM USING ENCODER-DECODER, SEQUENCE-TO-SEQUENCE, RECURRENT NEURAL NETWORKS. A Project. Presented to A QUESTION ANSWERING SYSTEM USING ENCODER-DECODER, SEQUENCE-TO-SEQUENCE, RECURRENT NEURAL NETWORKS A Project Presented to The Faculty of the Department of Computer Science San José State University In

More information

Recurrent Neural Network

Recurrent Neural Network Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks

More information

Sequence Modeling with Neural Networks

Sequence Modeling with Neural Networks Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes

More information

Recurrent Neural Networks

Recurrent Neural Networks Charu C. Aggarwal IBM T J Watson Research Center Yorktown Heights, NY Recurrent Neural Networks Neural Networks and Deep Learning, Springer, 218 Chapter 7.1 7.2 The Challenges of Processing Sequences Conventional

More information

Lecture 15: Recurrent Neural Nets

Lecture 15: Recurrent Neural Nets Lecture 15: Recurrent Neural Nets Roger Grosse 1 Introduction Most of the prediction tasks we ve looked at have involved pretty simple kinds of outputs, such as real values or discrete categories. But

More information

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill

More information

COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE-

COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE- Workshop track - ICLR COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE- CURRENT NEURAL NETWORKS Daniel Fojo, Víctor Campos, Xavier Giró-i-Nieto Universitat Politècnica de Catalunya, Barcelona Supercomputing

More information

Neural Networks Language Models

Neural Networks Language Models Neural Networks Language Models Philipp Koehn 10 October 2017 N-Gram Backoff Language Model 1 Previously, we approximated... by applying the chain rule p(w ) = p(w 1, w 2,..., w n ) p(w ) = i p(w i w 1,...,

More information

Recurrent and Recursive Networks

Recurrent and Recursive Networks Neural Networks with Applications to Vision and Language Recurrent and Recursive Networks Marco Kuhlmann Introduction Applications of sequence modelling Map unsegmented connected handwriting to strings.

More information

Machine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26

Machine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26 Machine Translation 10: Advanced Neural Machine Translation Architectures Rico Sennrich University of Edinburgh R. Sennrich MT 2018 10 1 / 26 Today s Lecture so far today we discussed RNNs as encoder and

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

YNU-HPCC at IJCNLP-2017 Task 1: Chinese Grammatical Error Diagnosis Using a Bi-directional LSTM-CRF Model

YNU-HPCC at IJCNLP-2017 Task 1: Chinese Grammatical Error Diagnosis Using a Bi-directional LSTM-CRF Model YNU-HPCC at IJCNLP-2017 Task 1: Chinese Grammatical Error Diagnosis Using a Bi-directional -CRF Model Quanlei Liao, Jin Wang, Jinnan Yang and Xuejie Zhang School of Information Science and Engineering

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

CSC321 Lecture 9 Recurrent neural nets

CSC321 Lecture 9 Recurrent neural nets CSC321 Lecture 9 Recurrent neural nets Roger Grosse and Nitish Srivastava February 3, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 9 Recurrent neural nets February 3, 2015 1 / 20 Overview You

More information

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network

More information

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions 2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors: S. Scardapane, S. Van Vaerenbergh,

More information

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory Danilo López, Nelson Vera, Luis Pedraza International Science Index, Mathematical and Computational Sciences waset.org/publication/10006216

More information

CSCI 315: Artificial Intelligence through Deep Learning

CSCI 315: Artificial Intelligence through Deep Learning CSCI 315: Artificial Intelligence through Deep Learning W&L Winter Term 2017 Prof. Levy Recurrent Neural Networks (Chapter 7) Recall our first-week discussion... How do we know stuff? (MIT Press 1996)

More information

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor niversity of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks

More information

Learning Long-Term Dependencies with Gradient Descent is Difficult

Learning Long-Term Dependencies with Gradient Descent is Difficult Learning Long-Term Dependencies with Gradient Descent is Difficult Y. Bengio, P. Simard & P. Frasconi, IEEE Trans. Neural Nets, 1994 June 23, 2016, ICML, New York City Back-to-the-future Workshop Yoshua

More information

On the use of Long-Short Term Memory neural networks for time series prediction

On the use of Long-Short Term Memory neural networks for time series prediction On the use of Long-Short Term Memory neural networks for time series prediction Pilar Gómez-Gil National Institute of Astrophysics, Optics and Electronics ccc.inaoep.mx/~pgomez In collaboration with: J.

More information

Natural Language Processing and Recurrent Neural Networks

Natural Language Processing and Recurrent Neural Networks Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network architecture Geoffrey Hinton with Nitish Srivastava Kevin Swersky Feed-forward neural networks These are

More information

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Neural Networks 2. 2 Receptive fields and dealing with image inputs CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There

More information

(2pts) What is the object being embedded (i.e. a vector representing this object is computed) when one uses

(2pts) What is the object being embedded (i.e. a vector representing this object is computed) when one uses Contents (75pts) COS495 Midterm 1 (15pts) Short answers........................... 1 (5pts) Unequal loss............................. 2 (15pts) About LSTMs........................... 3 (25pts) Modular

More information

Recurrent Autoregressive Networks for Online Multi-Object Tracking. Presented By: Ishan Gupta

Recurrent Autoregressive Networks for Online Multi-Object Tracking. Presented By: Ishan Gupta Recurrent Autoregressive Networks for Online Multi-Object Tracking Presented By: Ishan Gupta Outline Multi Object Tracking Recurrent Autoregressive Networks (RANs) RANs for Online Tracking Other State

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Caesar s Taxi Prediction Services

Caesar s Taxi Prediction Services 1 Caesar s Taxi Prediction Services Predicting NYC Taxi Fares, Trip Distance, and Activity Paul Jolly, Boxiao Pan, Varun Nambiar Abstract In this paper, we propose three models each predicting either taxi

More information

Recurrent Neural Networks. deeplearning.ai. Why sequence models?

Recurrent Neural Networks. deeplearning.ai. Why sequence models? Recurrent Neural Networks deeplearning.ai Why sequence models? Examples of sequence data The quick brown fox jumped over the lazy dog. Speech recognition Music generation Sentiment classification There

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

DenseRAN for Offline Handwritten Chinese Character Recognition

DenseRAN for Offline Handwritten Chinese Character Recognition DenseRAN for Offline Handwritten Chinese Character Recognition Wenchao Wang, Jianshu Zhang, Jun Du, Zi-Rui Wang and Yixing Zhu University of Science and Technology of China Hefei, Anhui, P. R. China Email:

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Vanishing and Exploding Gradients ReLUs Xavier Initialization Batch Normalization Highway Architectures: Resnets, LSTMs and GRUs Causes

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST

Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST 1 Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST Summary We have shown: Now First order optimization methods: GD (BP), SGD, Nesterov, Adagrad, ADAM, RMSPROP, etc. Second

More information

(

( Class 15 - Long Short-Term Memory (LSTM) Study materials http://colah.github.io/posts/2015-08-understanding-lstms/ (http://colah.github.io/posts/2015-08-understanding-lstms/) http://karpathy.github.io/2015/05/21/rnn-effectiveness/

More information

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Contents 1 Pseudo-code for the damped Gauss-Newton vector product 2 2 Details of the pathological synthetic problems

More information

Structured Neural Networks (I)

Structured Neural Networks (I) Structured Neural Networks (I) CS 690N, Spring 208 Advanced Natural Language Processing http://peoplecsumassedu/~brenocon/anlp208/ Brendan O Connor College of Information and Computer Sciences University

More information

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text

More information

Long Short- Term Memory (LSTM) M1 Yuichiro Sawai Computa;onal Linguis;cs Lab. January 15, Deep Lunch

Long Short- Term Memory (LSTM) M1 Yuichiro Sawai Computa;onal Linguis;cs Lab. January 15, Deep Lunch Long Short- Term Memory (LSTM) M1 Yuichiro Sawai Computa;onal Linguis;cs Lab. January 15, 2015 @ Deep Lunch 1 Why LSTM? OJen used in many recent RNN- based systems Machine transla;on Program execu;on Can

More information

GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018)

GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018) GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018) Jiaxuan You, Rex Ying, Xiang Ren, William L. Hamilton, Jure Leskovec Presented by: Jesse Bettencourt and Harris Chan March 9, 2018 University

More information

The Design Procedure. Output Equation Determination - Derive output equations from the state table

The Design Procedure. Output Equation Determination - Derive output equations from the state table The Design Procedure Specification Formulation - Obtain a state diagram or state table State Assignment - Assign binary codes to the states Flip-Flop Input Equation Determination - Select flipflop types

More information

Recurrent Neural Networks 2. CS 287 (Based on Yoav Goldberg s notes)

Recurrent Neural Networks 2. CS 287 (Based on Yoav Goldberg s notes) Recurrent Neural Networks 2 CS 287 (Based on Yoav Goldberg s notes) Review: Representation of Sequence Many tasks in NLP involve sequences w 1,..., w n Representations as matrix dense vectors X (Following

More information

Conditional Language modeling with attention

Conditional Language modeling with attention Conditional Language modeling with attention 2017.08.25 Oxford Deep NLP 조수현 Review Conditional language model: assign probabilities to sequence of words given some conditioning context x What is the probability

More information

Neural Architectures for Image, Language, and Speech Processing

Neural Architectures for Image, Language, and Speech Processing Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent

More information

Other Topologies. Y. LeCun: Machine Learning and Pattern Recognition p. 5/3

Other Topologies. Y. LeCun: Machine Learning and Pattern Recognition p. 5/3 Y. LeCun: Machine Learning and Pattern Recognition p. 5/3 Other Topologies The back-propagation procedure is not limited to feed-forward cascades. It can be applied to networks of module with any topology,

More information

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding

More information

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep

More information

Short-term water demand forecast based on deep neural network ABSTRACT

Short-term water demand forecast based on deep neural network ABSTRACT Short-term water demand forecast based on deep neural network Guancheng Guo 1, Shuming Liu 2 1,2 School of Environment, Tsinghua University, 100084, Beijing, China 2 shumingliu@tsinghua.edu.cn ABSTRACT

More information

Deep Learning for NLP

Deep Learning for NLP Deep Learning for NLP Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Greg Durrett Outline Motivation for neural networks Feedforward neural networks Applying feedforward neural networks

More information

Faster Training of Very Deep Networks Via p-norm Gates

Faster Training of Very Deep Networks Via p-norm Gates Faster Training of Very Deep Networks Via p-norm Gates Trang Pham, Truyen Tran, Dinh Phung, Svetha Venkatesh Center for Pattern Recognition and Data Analytics Deakin University, Geelong Australia Email:

More information

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Conditional Image Generation with PixelCNN Decoders

Conditional Image Generation with PixelCNN Decoders Conditional Image Generation with PixelCNN Decoders Aaron van den Oord 1 Nal Kalchbrenner 1 Oriol Vinyals 1 Lasse Espeholt 1 Alex Graves 1 Koray Kavukcuoglu 1 1 Google DeepMind NIPS, 2016 Presenter: Beilun

More information

KNOWLEDGE-GUIDED RECURRENT NEURAL NETWORK LEARNING FOR TASK-ORIENTED ACTION PREDICTION

KNOWLEDGE-GUIDED RECURRENT NEURAL NETWORK LEARNING FOR TASK-ORIENTED ACTION PREDICTION KNOWLEDGE-GUIDED RECURRENT NEURAL NETWORK LEARNING FOR TASK-ORIENTED ACTION PREDICTION Liang Lin, Lili Huang, Tianshui Chen, Yukang Gan, and Hui Cheng School of Data and Computer Science, Sun Yat-Sen University,

More information

Tasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based

Tasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based UNDERSTANDING CNN ADAS Tasks Self Driving Localizati on Perception Planning/ Control Driver state Vehicle Diagnosis Smart factory Methods Traditional Deep-Learning based Non-machine Learning Machine-Learning

More information

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Representational Power of Restricted Boltzmann Machines and Deep Belief Networks Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber Introduction Representational abilities of functions with some

More information

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Introduction to Convolutional Neural Networks 2018 / 02 / 23 Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018 Sequence Models Ji Yang Department of Computing Science, University of Alberta February 14, 2018 This is a note mainly based on Prof. Andrew Ng s MOOC Sequential Models. I also include materials (equations,

More information

WaveNet: A Generative Model for Raw Audio

WaveNet: A Generative Model for Raw Audio WaveNet: A Generative Model for Raw Audio Ido Guy & Daniel Brodeski Deep Learning Seminar 2017 TAU Outline Introduction WaveNet Experiments Introduction WaveNet is a deep generative model of raw audio

More information