Chinese Character Handwriting Generation in TensorFlow

Similar documents
Generating Sequences with Recurrent Neural Networks

arxiv: v1 [cs.cl] 21 May 2017

Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition

Deep Learning. Recurrent Neural Network (RNNs) Ali Ghodsi. October 23, Slides are partially based on Book in preparation, Deep Learning

arxiv: v3 [cs.lg] 14 Jan 2018

EE-559 Deep learning Recurrent Neural Networks

EE-559 Deep learning LSTM and GRU

Neural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve)

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

NEURAL LANGUAGE MODELS

Improved Learning through Augmenting the Loss

A Tutorial On Backward Propagation Through Time (BPTT) In The Gated Recurrent Unit (GRU) RNN

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Introduction to RNNs!

Lecture 15: Exploding and Vanishing Gradients

Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function

CSC321 Lecture 15: Recurrent Neural Networks

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Recurrent Neural Networks. Jian Tang

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Lecture 11 Recurrent Neural Networks I

Long-Short Term Memory and Other Gated RNNs

Recurrent Neural Networks

Memory-Augmented Attention Model for Scene Text Recognition

CSC321 Lecture 16: ResNets and Attention

High Order LSTM/GRU. Wenjie Luo. January 19, 2016

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

CSC321 Lecture 10 Training RNNs

Presented By: Omer Shmueli and Sivan Niv

Task-Oriented Dialogue System (Young, 2000)

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Deep Learning Recurrent Networks 2/28/2018

CSC321 Lecture 15: Exploding and Vanishing Gradients

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves

Deep Sequence Models. Context Representation, Regularization, and Application to Language. Adji Bousso Dieng

Introduction to Machine Learning Midterm Exam

Contents. (75pts) COS495 Midterm. (15pts) Short answers

Substroke Approach to HMM-based On-line Kanji Handwriting Recognition

A QUESTION ANSWERING SYSTEM USING ENCODER-DECODER, SEQUENCE-TO-SEQUENCE, RECURRENT NEURAL NETWORKS. A Project. Presented to

Recurrent Neural Network

Sequence Modeling with Neural Networks

Recurrent Neural Networks

Lecture 15: Recurrent Neural Nets

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

COMPARING FIXED AND ADAPTIVE COMPUTATION TIME FOR RE-

Neural Networks Language Models

Recurrent and Recursive Networks

Machine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26

Lecture 17: Neural Networks and Deep Learning

YNU-HPCC at IJCNLP-2017 Task 1: Chinese Grammatical Error Diagnosis Using a Bi-directional LSTM-CRF Model

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

CSC321 Lecture 9 Recurrent neural nets

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory

CSCI 315: Artificial Intelligence through Deep Learning

Lecture 11 Recurrent Neural Networks I

Learning Long-Term Dependencies with Gradient Descent is Difficult

On the use of Long-Short Term Memory neural networks for time series prediction

Natural Language Processing and Recurrent Neural Networks

Introduction to Machine Learning Midterm Exam Solutions

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

Neural Networks 2. 2 Receptive fields and dealing with image inputs

(2pts) What is the object being embedded (i.e. a vector representing this object is computed) when one uses

Recurrent Autoregressive Networks for Online Multi-Object Tracking. Presented By: Ishan Gupta

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Caesar s Taxi Prediction Services

Recurrent Neural Networks. deeplearning.ai. Why sequence models?

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

DenseRAN for Offline Handwritten Chinese Character Recognition

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization

From perceptrons to word embeddings. Simon Šuster University of Groningen

Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST

(

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials

Structured Neural Networks (I)

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Long Short- Term Memory (LSTM) M1 Yuichiro Sawai Computa;onal Linguis;cs Lab. January 15, Deep Lunch

GraphRNN: A Deep Generative Model for Graphs (24 Feb 2018)

The Design Procedure. Output Equation Determination - Derive output equations from the state table

Recurrent Neural Networks 2. CS 287 (Based on Yoav Goldberg s notes)

Conditional Language modeling with attention

Neural Architectures for Image, Language, and Speech Processing

Other Topologies. Y. LeCun: Machine Learning and Pattern Recognition p. 5/3

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

Short-term water demand forecast based on deep neural network ABSTRACT

Deep Learning for NLP

Faster Training of Very Deep Networks Via p-norm Gates

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Conditional Image Generation with PixelCNN Decoders

KNOWLEDGE-GUIDED RECURRENT NEURAL NETWORK LEARNING FOR TASK-ORIENTED ACTION PREDICTION

Tasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based

Representational Power of Restricted Boltzmann Machines and Deep Belief Networks. Nicolas Le Roux and Yoshua Bengio Presented by Colin Graber

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Neural Networks and Deep Learning

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018

WaveNet: A Generative Model for Raw Audio

Transcription:

Chinese Character Handwriting Generation in TensorFlow Heri Zhao, Jiayu Ye, Ke Xu Abstract Recurrent neural network(rnn) has been proved to be successful in several sequence generation task. RNN can generate reasonable prediction of the sequence by remembering previous state. For example, trained with Wikipedia text, RNN can predict the word in a sentence or even generate new masterpiece. Also, if trained with Linux kernel code, RNN can imitate writing C code. Furthermore, generating English real handwritten strokes is also one of the capabilities of RNN. So we are curious same technique also applies to Chinese character which got much more different characters and complicated stroke sequence. We experimented with a joint Mixture Density Network and RNN model to generate handwritten Chinese Characters. I. INTRODUCTION Chinese Characters recognition has obtained great achievements using traditional Convolutional Neural Network[1], but drawing handwritten characters is a completely different task. Given the success of generating handwritten English Characters[2], we aim to craft a model for handwritten Chinese characters. As an old writing system with thousands of years history, Chinese character provides a drastically different challenge compared to English characters. First of all, it has tens of thousands of characters in the dictionary compared to only 26 characters in English. Also there are written styles that greatly diverge from the canonical representation (standard font). And each character contains much more strokes than English characters. Fortunately, Recurrent Neural Network(RNN) provides a generic model that represents a sequence of data. Researchers have used RNN to generate fake Chinese characters[3], and it shows a great potential of generating real human-recognizable Chinese characters. This model generates new strokes based on the stated stored in RNN. It provides meaningful parts which are composed of non-trivial amount strokes. Mixture density network(mdn) is applied here to generate multi-variable neural network[4]. Without doubt, other researchers have already proposed a framework to generated handwritten Chinese characters conditional on a specific Chinese character[5]. This paper provides a generative model that utilizes the RNN and Gaussian Mixture Model(GMM) to predict the next positions and the states of the pen. The GMM here is a modified version over the Mixture Density Network. Besides the primitive input strokes, RNN is also conditioned on a condensed representation of each characters so that the system will draw specific characters that are human-recognizable. Given these successful explorations, we experimented with a model using generalized MDN to synthesis handwritten Chinese characters via Tensorflow[6]. This generative model is trained with KanjiVG[7] data targeting at producing real strokes. The rest of this paper is organized as follows. Section II introduces the dataset. Section III and IV are the baseline and oracle models. Section V describes the core generative RNN model. Section VI explains the evaluation metrics. Section VII compares the proposed model with state-of-art approaches. Section VIII reports experimental results. II. DATASET AND REPRESENTATION The dataset used for training is KanjiVG, where each Chinese character is represented as a variable length of points sequence in the order of pen movements: [[x 1, y 1, eos 1, eoc 1 ],... [x n, y n, eos n, eoc n ]] (1) x i and y i are the xy-coordinates of each point, eos i (end of stroke) is a single bit indicating whether the current point is the end point of a stroke, 1

#strokes in Char #correct #Total Random Accuracy Bigram Accuracy Bigram w/ distancecost 3 23 27 1/(3-1)! = 50% 70.37% 85.19% 4 32 55 1/(4-1)! = 16.7% 29.63% 58.18% 5 15 68 1/(5-1)! = 4.16% 13.04% 22.06% 6 15 111 1/(6-1)! = 0.83% 1.8% 13.51% 7 10 125 1/(7-1)! = 0.001% N/A 8.00% TABLE I: Baseline Results and eoc i (end of character) is also a single bit indicating whether the character is end or not. This representation is better than the image based character, because the sequences contains not only the spatial information, but also the time order for each stroke. This is very useful when generating characters later. Here is a visualized example of character 卒, this character contains 8 strokes: Fig. 1: A visualized example with stroke orders The xy-coordinates have been scaled down with coefficient 15 to help to fit training parameters, and then scaled up back to visualize it in svg format. III. BASELINE ENHANCED N-GRAM Use the given dataset, the baseline algorithm is trying to predict the next stroke based on the current stroke, given all of the strokes for a character as prior knowledge. The cost not only depends on the how frequency a (pre-stroke, post-stroke) combination happens in the training set, but also depends on the distance between the previous stroke and post stroke. The n-gram model is trained and validated on 185 random characters from the dataset. A. Algorithm The algorithm is a stroke-order search problem, which can be easily solved by any search algorithm: State: binary tuple [previous-stroke, list of strokes that has not yet been selected] Start state: the first stroke ground truth End state: the stokes list is empty SuccAndCost: Given a state, return (action, newstate, cost) based on the bigram cost with distance cost B. Distance Cost Stroke distance cost has been introduced in order to utilize the spatial information in predicting next states. The distance is simply the Manhattan Distance of two center points of strokes (s 1, s 2 ) with a coefficient k: DistanceCost = k ManhattanDistance(s 1, s 2 ) (2) This distance cost takes spatial information into consideration, and prefer the closest stroke in predicting, which is a very command stroke movement in Chinese. C. Results Applied uniform cost search on the strokereorderproblem with distance cost (k = 3), Table I shows the results of baseline model in predicting characters with number of stroke from 3 to 7. More complicated character with more than 7 strokes are not tested in this model, because the accuracy will be very closed to the random accuracy, which means the model would not give any meaningful predictions. The baseline model performs very well with small number of strokes, but underperforms as the number of strokes increases. D. Baseline Take Away Based on the results of baseline model, it is not quite interesting only to predict the order of strokes. And thus, Section V describes another 2

method, which will do character generation instead of only prediction. The generative method not only provides the order of the strokes, but also generates stylized characters. The focus of the rest of the report is on the generative model, and the evaluation also focuses on how similar the character that the model generates compared with the original characters, instead of only evaluating on order of strokes. IV. ORACLE For the problem of stroke prediction and character generation, the oracle model is human being, which is then the character dataset in this case. In terms of stroke order prediction, the oracle algorithm takes all the strokes information from dataset (Chinese Dictionary) and behaves like dictionary lookup, cheating to look at the correct answers. In terms of handwriting generation, the oracle uses the characters in dataset directly as output, without adding any variations. Therefore, the oracle s accuracy is 100% for stroke prediction, and loss is 0 for RNN model. Fig. 2 shows the baseline and oracle results, as well as the comparison. At a time t, input x t and previous hidden state h t 1 are feed into the RNN to produce the output y t and new hidden state h t. The hidden state h t represents the compressed information of the previous input sequence. But it s well known that RNN is hard to train until Long Short Term Memory appeared[9]. B. Long Short Term Memory(LSTM) The intuition behind LSTM is that, cell states C t is added into the RNN flow to keep track of the internal states. The states are controlled by three kind of gates: forgot gate f t, input gate i t and output gate o t. As formally described in [9], f t = σ(w f [h t 1, x t ] + b f ) (3) i t = σ(w i [h t 1, x t ] + b i ) (4) i t = σ(w i [h t 1, x t ] + b i ) (5) C t = tanh(w C [h t 1, x t ] + b C ) (6) C t = f t C t 1 + i t C t (7) o t = σ(w o [h t 1, x t ] + b o ) (8) h t = o t tanh(c t ) (9) All the W are the weight matrix to be trained and so as the bias term b. C. Gated Recurrent Unit(GRU) GRU provides a simpler way to abstract the cell data and hidden state. It uses a single state to combine the result of output gate and forget gate. However, this simplified model does provide similar performance [10]. As formally described in [9], Fig. 2: Baseline/Oracle Comparison V. CONDITIONAL GENERATIVE RECURRENT NEURAL NETWORK A. Recurrent Neural Network (RNN) A recurrent neural network is a general model to represent a sequential data series. The hidden state is feed into the hidden node in a loop mechanism. z t = σ(w z [h t 1, x t ]) (10) r t = σ(w r [h t 1, x t ]) (11) h t = tanh(w [r t h t 1 ]) (12) h t = (1 z t ) h t 1 + z t h t (13) Given a complex input data representation as mentioned in Section II, we prefer to modify a simpler model to tackle the task. Applied the model to our data, x t R 1 4 is just the one data entry with position and eos/eoc information. 3

Fig. 3: A visualized example of GRU unit from [9] D. Adjusted GRU with class information Besides input data, class information is also feed into the GRU. Currently an one-hot vector c R #classes is used to let RNN condition on the specific class. An modified version of GRU is listed as follow, z t = σ(w z [h t 1, x t ] + M z c) (14) r t = σ(w r [h t 1, x t ] + M r c) (15) h t = tanh(w [r t h t 1 ]) (16) h t = (1 z t ) h t 1 + z t h t (17) M z and M r are additional parameters to tune in the RNN so that the output of RNN will be based on the character class. E. Mixture Density Network When writing a Chinese character, we do not care the exact location of the next end point of the previous sequence. So the problem is defined as an multi-valued output problem that traditional forward neural network wouldn t perform well. So Mixture Density Network comes into place[4]. In a high level view, MDN is a generic way of describe multi-valued data using Gaussian Mixtures Model and embedded Neural Network. The inference is defined as probability distribution of target t conditioned on the input x, referred from [4](22). m p(t x) = α i (x)φ i (t x) (18) i=1 Here, α represents the importance of the Gaussian model and φ is Gaussian distributions. Then it s RNN s responsibility to generate the parameters for m number of Gaussian distributions µ and σ. The train loss is defined as negative log of the overall inference, referred from [2](26). T L(x) = log( π j t N (x t+1 µ j t, σ j t )) t=1 j (19) Notice we truncated the end of character and end of stroke loss to keep the model simple. VI. EVALUATION METRICS Due to the nature of generative model. Evaluation can t fully represent the errors the model make. We have implemented a preliminary verification method by calculating the total location difference of each stroke. L(c) = log( location) (20) Notice that if strokes are not in a right order, the location loss is still high. Furthermore, we utilize an existing Chinese character classifier to provide more formal data. Say c is the class, Classifier outputs a c given an character image Image. Then the errors is L(all c) = c Classifier(Image) c (21) VII. LITERATURE REVIEW This section provides comparison of the proposed approach with English sentences synthesis[2] state-of-art approach, and Chinese character drawing[5] state-of-art approach. We compare them in the aspects of dataset, models, and results. A. Dataset 1) Compared with English synthesis approach: Although this approach for English can reach very high accuracy and performs very well on English words dataset, it is probably not applicable to Chinese characters. Because Chinese characters are far more complicated than English sentences, which have only fixed number of letters. And each Chinese character is compose of multiple different strokes with different combinations. Besides, Chinese characters have more spatial information, like a graph, instead of an ordered English letter sequence. 4

(a) Printed styled characters (b) Online handwritten characters [11] Fig. 4: Comparison of datasets the English sequence generation is pure generative task, there is no actual evaluation metric to compare with. 2) Compared with Chinese drawing approach: The Chinese drawing approach uses the online handwritten Chinese character dataset, which contains more than two million training characters written by human. And there are multiple different samples(written by different people) for a given character. However, for our proposed approach, we use the dataset with printed styled characters, where one character only corresponding to one sample. This make the training more difficult due to the dataset limitation. Fig. 4 shows the differences of the dataset between printed styled characters and online handwritten characters. D. Result comparison The chinese paper reaches the mean accuracy of 93.98% for 3755 classes. Due to the time constraint, we are only experimenting the model with 5 classes which yields a mean accuracy of 14%. VIII. E XPERIMENTS AND A NALYSIS In this section, we present the experimental results on what experiments we did, and how we got the the end results by simplifying features or pruning training sets. B. Specific GMM and general MDN As mentioned above, we are employing more generic MDN model with less complexity. Compared to Chinese character drawing paper [5], we truncate the end of character and end of stroke data in training phase. Also, instead of using a dense matrix with character embedding, we are only using an one-hot vector to represent the specific class information. A. Taste the full dataset Initially, we use the full dataset which contains more than 10,000 characters. As we are using a one-hot vector to represent the class information. It provides a huge vector which produces very poor result when sampling conditional characters. Because the per stroke data is only of size 4 or 5 (dependent on if the end of stroke or end of character are included), the massive class index vector dilutes the weight matrix of the actual stroke weight in the RNN. Hence we tried to reduce the dataset to gain more intuition. C. Inside GRU or filtering layer In the sequence generation paper [2], author proposed another filter lay in RNN so that the network is conditioned on a specific character. However, it s targeted to provide cursive writing in a window of several English characters. To avoid the complication, we directly embedded the onehot class vector inside the RNN cell. Given that B. Result for Limited Dataset Then we limit the dataset to be simply size of 5 to verify our ideas. It produces some occluded 5

or prolonged sampling examples which are humanrecognizable to some extent. We believe that s because the dataset is polluted with non-fix aspectratio scaled images. C. Remove Training Noise and Modify Batch Then we look into the data generation process, it does provides random generated data with different scales to enlarge the number of training examples. As we want to generate real Chinese characters that are comparable to canonical representation, we removed the random sampling. Also some inconsistent batching mechanism are removed since we don t want to RNN to learn the transition from one character to another. D. Remove eos and eoc We try to remove the end of stroke and end of character from the training set, so that the model can only focus on generating the next xycoordinates. The model converges after around 8,000 batches, and finally is able to generate human recognizable characters. Fig. 5 shows 20 results that are purely generated by RNN for the character 七. Experiment Convergence Training Loss Batches until Convergence Full No > 30, 000 N/A Limited No > 30, 000 N/A No eos&eoc Yes 1.1 8, 000 TABLE II: Training Loss and Convergence IX. CONCLUSIONS This paper presents a joint Mixture Density Network and RNN model to generate handwritten Chinese Characters. Although there are existing work of drawing character, considering all the difficulties stated above, we tried several different methods, and showed some preliminary results here. The final model only applies to very limited dataset and character classes. There are still much future work we can do for this interesting topic. REFERENCES [1] Y. H. Zhang, Deep Convolutional Network for Handwritten Chinese Character Recognition, cs231n. [2] A. Graves, Generating Sequences With Recurrent Neural Networks arxiv:1308.0850v5 [cs.ne] 5 Jun 2014. [3] S. Otoro, Recurrent Net Dreams Up Fake Chinese Characters in Vector Format with TensorFlow, studio otoro [4] C.M. Bishop, Mixture Density Networks, Neural Computing Research Group Report: NCRG/94/004. [5] X.Y. Zhang, F. Yin, Y.M. Zhang, C.L. Liu, Y. Bengio. Drawing and Recognizing Chinese Characters with Recurrent Neural Network, arxiv:1606.06539v1 [cs.cv] 21 Jun 2016 [6] Google Research TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems. [7] KanjiVG: A description of the sinographs (or kanji) used by the Japanese language. [8] Colah, Understanding LSTM Networks. [9] S. Hochreiter, J. Schmidhuber LONG SHORT-TERM MEMORY, Neural Computation 9(8):17351780, 1997 [10] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling, arxiv:1412.3555v1 [cs.ne] 11 Dec 2014 [11] C.L Liu, F. Yin, D.H. Wang, Q.F. Wang, CASIA Online and Offline Chinese Handwriting Databases, National Laboratory of Pattern Recognition (NLPR) Fig. 5: Results purely generated by RNN E. Training Loss and Explanation Table II shows the training results for each different attempts. 6