a) b) (Natural Language Processing; NLP) (Deep Learning) Bag of words White House RGB [1] IBM

Similar documents
Incorporating Pre-Training in Long Short-Term Memory Networks for Tweets Classification

Gated Recurrent Neural Tensor Network

arxiv: v1 [cs.cl] 22 Jun 2015

Deep learning for Natural Language Processing and Machine Translation

arxiv: v1 [cs.cl] 31 May 2015

Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations

Neural Networks for NLP. COMP-599 Nov 30, 2016

Lecture 8: Recurrent Neural Networks

Deep Learning for Natural Language Processing

CS230: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention

Word2Vec Embedding. Embedding. Word Embedding 1.1 BEDORE. Word Embedding. 1.2 Embedding. Word Embedding. Embedding.

Overview Today: From one-layer to multi layer neural networks! Backprop (last bit of heavy math) Different descriptions and viewpoints of backprop

Introduction to RNNs!

Introduction to Deep Neural Networks

Spatial Transformer. Ref: Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu, Spatial Transformer Networks, NIPS, 2015

arxiv: v2 [cs.lg] 26 Jan 2016

CSC321 Lecture 10 Training RNNs

CSC321 Lecture 15: Recurrent Neural Networks

Improved Learning through Augmenting the Loss

High Order LSTM/GRU. Wenjie Luo. January 19, 2016

arxiv: v1 [cs.cl] 5 Jul 2017

EE-559 Deep learning LSTM and GRU

Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition

arxiv: v1 [cs.cl] 19 Nov 2015

Implicitly-Defined Neural Networks for Sequence Labeling

A QUESTION ANSWERING SYSTEM USING ENCODER-DECODER, SEQUENCE-TO-SEQUENCE, RECURRENT NEURAL NETWORKS. A Project. Presented to

Character-level Language Modeling with Gated Hierarchical Recurrent Neural Networks

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale

Faster Training of Very Deep Networks Via p-norm Gates

Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST

arxiv: v1 [cs.cl] 21 May 2017

Multi-Source Neural Translation

Continuous Space Language Model(NNLM) Liu Rong Intern students of CSLT

Recurrent Neural Network

Deep Learning For Mathematical Functions

Recurrent Neural Networks. Jian Tang

Learning to translate with neural networks. Michael Auli

Multi-Source Neural Translation

Improving Sequence-to-Sequence Constituency Parsing

Deep Learning and Lexical, Syntactic and Semantic Analysis. Wanxiang Che and Yue Zhang

arxiv: v1 [cs.cl] 7 Feb 2017

Analysis of Railway Accidents Narratives Using Deep Learning

Machine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26

CKY-based Convolutional Attention for Neural Machine Translation

arxiv: v1 [cs.cl] 28 Feb 2015

arxiv: v2 [cs.cl] 1 Jan 2019

arxiv: v2 [cs.ne] 7 Apr 2015

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Language Modeling. Hung-yi Lee 李宏毅

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I

Based on the original slides of Hung-yi Lee

Bidirectional Attention for SQL Generation. Tong Guo 1 Huilin Gao 2. Beijing, China

Random Coattention Forest for Question Answering

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Interpreting, Training, and Distilling Seq2Seq Models

Conditional Language modeling with attention

Today s Lecture. Dropout

arxiv: v1 [cs.cl] 12 Mar 2016

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Lecture 17: Neural Networks and Deep Learning

A Deep Learning Analytic Suite for Maximizing Twitter Impact

Inter-document Contextual Language Model

Generating Text via Adversarial Training

Word Attention for Sequence to Sequence Text Understanding

YNU-HPCC at IJCNLP-2017 Task 1: Chinese Grammatical Error Diagnosis Using a Bi-directional LSTM-CRF Model

KNOWLEDGE-GUIDED RECURRENT NEURAL NETWORK LEARNING FOR TASK-ORIENTED ACTION PREDICTION

Scalable Bayesian Learning of Recurrent Neural Networks for Language Modeling

Deep Gate Recurrent Neural Network

arxiv: v1 [stat.ml] 18 Nov 2017

A Sentence Interaction Network for Modeling Dependence between Sentences

Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction

Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition

Recurrent Neural Networks. deeplearning.ai. Why sequence models?

Review Networks for Caption Generation

Learning to Process Natural Language in Big Data Environment

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018

A Neural-Symbolic Approach to Design of CAPTCHA

Deep Learning for NLP Part 2

Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates

Representations for Language: From Word Embeddings to Sentence Meanings

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018

A Survey of Techniques for Sentiment Analysis in Movie Reviews and Deep Stochastic Recurrent Nets

Recurrent Residual Learning for Sequence Classification

Leveraging Context Information for Natural Question Generation

Natural Language Processing (CSEP 517): Language Models, Continued

Introduction to CNNs and LSTMs for NLP

Convolutional Neural Networks Applied on Weather Radar Echo Extrapolation

Neural Architectures for Image, Language, and Speech Processing

Explaining Predictions of Non-Linear Classifiers in NLP

Slide credit from Hung-Yi Lee & Richard Socher

Exploiting Contextual Information via Dynamic Memory Network for Event Detection

Structured Neural Networks (I)

Natural Language Understanding. Kyunghyun Cho, NYU & U. Montreal

Deep Learning Presenter: Dr. Denis Krompaß Siemens Corporate Technology Machine Intelligence Group

Direct Output Connection for a High-Rank Language Model

Explaining Predictions of Non-Linear Classifiers in NLP

Deep Learning for Natural Language Processing

Key-value Attention Mechanism for Neural Machine Translation

Transcription:

c 1. (Natural Language Processing; NLP) (Deep Learning) RGB IBM 135 8511 5 6 52 yutat@jp.ibm.com a) b) 2. 1 0 2 1 Bag of words White House 2 [1] 2015 4 Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited. 23 205

2 Bag of words Neural networks improve the accuracy. 1 (NP) (VP) A B C RGB [2] 1 3. 3.1 (Recurrent Neural Networks; RNN) T X X = (x 1, x 2,...,x TX ) T Y Y =(y 1,y 2,...,y TY ) ŷ t=o(w o h L t ), h l t=f l (W l [h l 1 t ; h l t 1]), {l 1 l L} h R N l l = θ h 0 t = x t W l N l (N l 1 + N l ) w o ŷ o f h 0 h T +1 RNN h t 1 t RNN [3] RNN x y W 1 x 206 24 Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited.

2 2 RNN [4] king woman man queen RNN t =1 t = T X (Bi-directional) RNN ŷ t=o(w o [ h L t ; h L t ]), h l t=f l ( W l [ h l 1 t ; h l 1 t ; h l t 1]), h l t=f l ( W l [ h l 1 t ; h l 1 t ; h l t+1]), h, W h, W 2 RNN 2 3.2 RNN (T X T Y ). RNN [5] RNN T X = T Y [6 8] 3 RNN (encoder) (decoder) RNN 1) 2) RNN (End of Sentence; EOS) t Long 2015 4 Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited. 25 207

Short-Term Memory (LSTM) [9] Gated Recurrent Unit (GRU) [7] RNN 3 [6] RNN RNN 3(a) 2 BOS RNN [7] [8] RNN RNN t [7] RNN t RNN 3(b) [8] RNN t RNN 3(c) RNN [10, 11] RNN [12] 4. (Convolutional Neural Networks; CNN) 1 w z l t = W l [h l 1 t w/2 ; h l 1 t ; ; h l 1 t+ w/2 ], 4 1 CNN 2 z t h 0 t = x t (max-pooling) h i t i z t,i t h l i =max ( {z l t,i} T t=1). 4 [13] CNN [14] CNN 2 [15] CNN CNN 208 26 Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited.

[16] CNN [17, 18] [19] k k (k-max-pooling) k with...that CNN RNN 3 [20] 5. 5 (Recursive Neural Networks) 3 RNN DAG (Directed Acyclic Graph) 1 [21] 5 1 2 h h L, h R h h = f(w [h L; h R]). [22] [23] [24] [25] 6 6 2 6. (Feedforward Neural Networks) ŷ=o(w o h L ), h l =f l (W l h l 1 ). [26, 27] 3 [28] 2015 4 Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited. 27 209

[29] [30] 7. 3 RNN 4 CNN 5 6 [31, 32] [33] [34] [1] O. Delalleau and Y. Bengio, Shallow vs. deep sumproduct networks, In Advances in Neural Information Processing Systems (NIPS), pp. 666 674, 2011. [2] A. Krizhevsky, I. Sutskever and G. E. Hinton, Imagenet classification with deep convolutional neural networks, In Advances in Neural Information Processing Systems (NIPS), pp. 1097 1105, 2012. [3] T. Mikolov, M. Karafiát, L. Burget, J. Cernocký and S. Khudanpur, Recurrent neural network based language model, In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1045 1048, 2010. [4] T. Mikolov, W. Yih and G. Zweig, Linguistic regularities in continuous space word representations, In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL- HLT), pp. 746 751, 2013. [5] M. Sundermeyer, T. Alkhouli, J. Wuebker and H. Ney, Translation modeling with bidirectional recurrent neural networks, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 14 25, [6] I. Sutskever, O. Vinyals and Q. V. V. Le, Sequence to sequence learning with neural networks, In Advances in Neural Information Processing Systems (NIPS), Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence and K. Q. Weinberger (eds.), pp. 3104 3112, [7] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk and Y. Bengio, Learning phrase representations using RNN encoder decoder for statistical machine translation, In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP), pp. 1724 1734, [8] D. Bahdanau, K. Cho and Y. Bengio, Neural machine translation by jointly learning to align and translate, arxiv:1409.0473. [9] S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural Computation, 9(8), pp. 1735 1780, 1997. [10] O. Vinyals, A. Toshev, S. Bengio and D. Erhan, Show and tell: A neural image caption generator, arxiv:1411.4555. [11] A. Karpathy and L. Fei-Fei, Deep visual-semantic alignments for generating image descriptions, arxiv:1412.2306. [12] O. Vinyals, L. Kaiser, T. Koo, S. Petrov, I. Sutskever and G. Hinton, Grammar as a foreign language, arxiv:1412.7449. [13] R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu and P. P. Kuksa, Natural language processing (almost) from scratch, Journal of Machine Learning Research, 12, pp. 2493 2537, 2011. [14] D. Zeng, G. Zhou and J. Zhao, Relation classification via convolutional deep neural network, In Proceedings of the International Conference on Computational Linguistics (COLING), pp. 2335 2344, [15] W. Yih, X. He and C. Meek, Semantic parsing for single-relation question answering, In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 643 648, [16] Y. Kim, Convolutional neural networks for sentence classification, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746 1751, [17] C. N. dos Santos and B. Zadrozny, Learning character-level representations for part-of-speech tagging, In Proceedings of the International Conference on Machine Learning (ICML), pp. 1818 1826, 210 28 Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited.

[18] C. N. dos Santos and M. Gatti, Deep convolutional neural networks for sentiment analysis of short texts, In Proceedingsof the International Conference on Computational Linguistics (COLING), pp. 69 78, [19] N. Kalchbrenner, E. Grefenstette and P. Blunsom, A convolutional neural network for modelling sentences, In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 655 665, [20] N. Kalchbrenner and P. Blunsom, Recurrent continuous translation models, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1700 1709, 2013. [21] R. Socher, Recursive deep learning for natural language processing and computer vision, PhD thesis, Stanford University, [22] R. Socher, A. Perelygin, J. Wu, J. Chuang, C. Manning, A. Ng and C. Potts, Recursive deep models for semantic compositionality over a sentiment treebank, In Proceedings of the Conference on Empirical Methods on Natural Language Processing (EMNLP), pp. 1631 1642, 2013. [23] M. Iyyer, J. Boyd-Graber, L. Claudino, R. Socher and H. Daumé III, A neural network for factoid question answering over paragraphs, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 633 644, [24] R. Socher, A. Karpathy, Q. V. Le, C. D. Manning and A. Y. Ng, Grounded compositional semantics for finding and describing images with sentences, Transactions of the Association for Computational Linguistics, pp. 207 218, [25] O. Irsoy and C. Cardie, Deep recursive neural networks for compositionality in language, In Advances in Neural Information Processing Systems (NIPS), pp. 2096 2104, [26] J. Ma, Y. Zhang, T. Xiao and J. Zhu, Tagging the web: Building a robust web tagger with neural network, In Proceedings of the Annual Meeting of the Association for Computational Linguistics, pp. 144 154, [27] Y. Tsuboi, Neural networks leverage corpus-wide information for part-of-speech tagging, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 938 950, [28] Y. Bengio, R. Ducharme, P. Vincent and C. Janvin, A neural probabilistic language model, Journal of Machine Learning Research, 3, pp. 1137 1155, 2003. [29] J. Devlin, R. Zbib, Z. Huang, T. Lamar, R. Schwartz and J. Makhoul, Fast and robust neural network joint models for statistical machine translation, In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 1370 1380, [30] D. Chen and C. Manning, A fast and accurate dependency parser using neural networks, In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 740 750, [31] Q. V. Le, T. Sarlós and A. J. Smola, Fastfood computing Hilbert space expansions in loglinear time, In Proceedings of the International Conference on Machine Learning (ICML), pp. 244 252, 2013. [32] N. Pham and R. Pagh, Fast and scalable polynomial kernels via explicit feature maps, In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 239 247, 2013. [33] D. Bollegala, 29(2), pp. 195 201, [34] 19(3), pp. 8 9, 2015 4 Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited. 29 211