Word2Vec Embedding. Embedding. Word Embedding 1.1 BEDORE. Word Embedding. 1.2 Embedding. Word Embedding. Embedding.

c Word Embedding Embedding Word2Vec Embedding Word EmbeddingWord2Vec 1. Embedding 1.1 BEDORE 0 1 BEDORE 113 0033 2 35 10 4F y katayama@bedore.jp Word Embedding Embedding 1.2 Embedding Embedding Word Embedding Embedding Word Embedding 2017 11 Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited. 23 717

2. Embedding 1 Word2Vec 2013 Word2Vec [1] Word2Vec Word Embedding queen woman + man = king Embedding 1 Word2Vec Word2Vec Word2Vec Embedding 2013 2 Word2Vec Embedding 3 Embedding Embedding 1.3 Embedding Embedding Embedding Skip-thought Vectors [2] Recurrent Neural Network (RNN) Word2Vec Embedding Embedding 2.1 One-hot Embedding Onehot One-hot 1 One-hot Bag-of-words BoW BoW One-hot 2 One-hot BoW 1 One-hot 2.2 One-hot = (Distributional Hypothesis) [3] [4] NICT Dice count-based predictive [5] count-based Wikipedia n n-gram v c v c - 718 24 Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited.

1 2 One-hot The pen is mightier than the sword BoW I wear my pen as others do their sword BoW the 1 0 0 0 0 1 0 2 the 0 0 0 0 0 0 0 0 0 0 pen 0 1 0 0 0 0 0 1 pen 0 0 0 1 0 0 0 0 0 1 be 0 0 1 0 0 0 0 1 be 0 0 0 0 0 0 0 0 0 0 mighty 0 0 0 1 0 0 0 1 mighty 0 0 0 0 0 0 0 0 0 0 than 0 0 0 0 1 0 0 1 than 0 0 0 0 0 0 0 0 0 0 sword 0 0 0 0 0 0 1 1 sword 0 0 0 0 0 0 0 0 1 1 I 0 0 0 0 0 0 0 0 I 1 0 0 0 0 0 0 0 0 1 wear 0 0 0 0 0 0 0 0 wear 0 1 0 0 0 0 0 0 0 1 my 0 0 0 0 0 0 0 0 my 0 0 1 0 0 0 0 0 0 1 as 0 0 0 0 0 0 0 0 as 0 0 0 0 1 0 0 0 0 1 others 0 0 0 0 0 0 0 0 other 0 0 0 0 0 1 0 0 0 1 do 0 0 0 0 0 0 0 0 do 0 0 0 0 0 0 1 0 0 1 their 0 0 0 0 0 0 0 0 their 0 0 0 0 0 0 0 1 0 1 X c count-based predictive One-hot manyto-many Hinton (Local Representation) (Distributed Representation) Distributional Hypothesis count-based Distributional Representation predictive Distributed Representation [6] 2.2.1 LSI count-based Latent Semantic Indexing (LSI) [7] X X ij i j LSI X X r v r U c r V r r Σ X = UΣV r u i v j r m X X r m m LSI LSI probabilistic LSI (plsi) [8] Latent Dirichlet Allocation (LDA) [9] m 2017 11 Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited. 25 719

v m U c m V v j j LSI plsi U V LSI LSI U u i plsi i plsi (i) (ii) (iii) (i) d d,u,v plsi LDA U, V U plsi LDA 2.2.2 Word2Vec predictive Word2Vec Word2Vec 2013 Mikolov Skip-gram with Negative Sampling (SGNS) Skip-gram w c 1 T T t=1 c j c j 0 log P (w t+j w t) (1) w 2 v w input v w output input output Python gensim [10] input Skip-gram V w I w O P (w O w I)= ( ) exp v wo v wi ) (2) V w=1 (v exp w v wi h Skip-gram 2 (W, b) f f(w x + b) 1 b = 0 1 One-hot input W One-hot input v h V 1 w k One-hot v wk 2 W output v V h softmax softmax softmax (x) i = exp xi i exp(xi) g g(w k ) l g(w k ) l = ( ) exp v wl v wk ) V w=1 (v exp w v wk w k = w I,w l = w O g(w k ) l = P (w O w I) (2) V softmax Negative Sampling [11] softmax Skipgram sigmoid σ (x) = log P (wo wi) 1 1+exp ( x) 720 26 Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited.

( ) log σ v wo v wi + k i=1 E wi P n(w) [ ( )] log σ v wi v wi (3) P n (w) Negative Sampling 3/4 (3) softmax k 5 20 10 5 2.3 Word2Vec Word2Vec Embedding 2017 Word2Vec Embedding Word2Vec 2.3.1 Glove Word2Vec 2014 Glove [12] Glove count-based predictive V i,j=1 ( ) f (X ij) v wj v wi + b i + b j log X ij b Word2Vec f X X ij w i window w j 2.3.2 SGNS shifted PMI [13] count-based predictive Pointwise Mutual Information (PMI) PMI(x, y) =log P (x, y) P (x) P (y) PMI shifted PMI PMI PMI Embedding LexVec [14] 2.3.3 fasttext Embedding Facebook 2016 fasttext [15] n-gram (character n-gram) sub-word subword 3-gram 6-gram sub-word fasttext egg 3-gram eg egg gg 4-gram egg egg 5-gram egg sub-word fast- Text sub-word english-born fasttext british-born polish-born skip-gram most-capped ex-scotland fasttext fasttext sub-word sub-word 2.3.4 Character-based Embedding sub-word Embedding (Character-based Embedding) RNN 10 2 ASCII 128 One-hot 2 Embedding 2017 11 Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited. 27 721

2 MeCab Convolutional Neural Network Embedding [16] 2.3.5 Word Embedding Embedding Embedding Embedding Word2Vec Embedding [17] Word Embedding Meta Embedding [18] WordNet Embedding Word2Vec WordNet WordNet AutoExtend [19] WordNet Poincaré Embeddings [20] 3. Word Embedding 3.1 Embedding Embedding Embedding Wikipedia Word Embedding Embedding gensim ipadic neologd ChaSen JUMAN MeCab MeCab MeCab (ipadic) mecab-ipadic-neologd [21] 2 mecab-ipadic-neologd ipadic Word Embedding neologd 3.2 Embedding Embedding Word2Vec One-hot Embedding Embedding Embed- 722 28 Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited.

ding Embedding Wikipedia Embedding 3.3 Embedding Embedding Embedding Embedding SGNS Negative Sampling Embedding SGNS Kaji and Kobayashi [22] SGNS 3.4 Embedding Word2Vec fast- Text 400 10 5 3-fold BEDORE FAQ 400 FAQ 3 Word2Vec+LSTM 0.39 fasttext+lstm 0.41 fine-tuned Word2Vec+LSTM 0.43 fine-tuned fasttext+lstm 0.42 BoW+NN 0.32 FAQ FAQ 905 Wikipedia BEDORE MeCab/mecab-ipadic-neologd Word2Vec fasttext 300 Word2Vec fasttext TF-IDF RNN Long Short-Term Memory (LSTM) LSTM Embedding LSTM 2 400 BoW 905 3 (BoW NN) softmax BoW Word2Vec fasttext 3 Word2Vec fasttext fasttext BoW Embedding 2017 11 Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited. 29 723

BoW Embedding Embedding 4. Embedding Embedding Word2Vec Embedding Embedding [1]T.Mikolov,K.Chen,G.CorradoandJ.Dean, Efficient estimation of word representations in vector space, arxiv: 1301.3781, 2013. [2]R.Kiros,Y.Zhu,R.R.Salakhutdinov,R.Zemel, R. Urtasun, A. Torralba and S. Fidler, Skip-thought vectors, In Advances in Neural Information Processing Systems, 28, pp. 3294 3302, 2015. [3] M. Sahlgren, The distributional hypothesis, Italian Journal of Linguistics, 20, pp. 33 53, 2008. [4] ALAGIN, Advanced LAn- Guage INfomation Forum, 2011. [5] M. Baroni, G. Dinu and G. Kruszewski, Don t count, predict!: A systematic comparison of contextcounting vs. context-predicting semantic vectors, In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pp. 238 247, 2014. [6] J. Turian, L. Ratinov and Y. Bengio, Word representations: A simple and general method for semisupervised learning, In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384 394, 2010. [7]S.Deerwester,S.T.Dumais,G.W.Furnas,T.K. Landauer and R. Harshman, Indexing by latent semantic analysis, Journal of the American Society for Information Science, 41, pp. 391 407, 1990. [8] T. Hofmann, Probabilistic latent semantic indexing, In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50 57, 1999. [9] D. M. Blei, A. Y. Ng and M. I. Jordan, Latent dirichlet allocation, Journal of Machine Learning Research, 3, pp. 993 1022, 2003. [10] R. Řehůřek and P. Sojka, Software framework for topic modelling with large corpora, In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45 50, 2010. [11] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado and J. Dean, Distributed representations of words and phrases and their compositionality, In Advances in Neural Information Processing Systems, 26, pp. 3111 3119, 2013. [12] J. Pennington, R. Socher and C. D. Manning, Glove: Global vectors for word representation, In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pp. 1532 1543, 2014. [13] O. Levy and Y. Goldberg, Neural word embedding as implicit matrix factorization, In Advances in Neural Information Processing Systems, 27, pp. 2177 2185, 2014. [14] A. Salle, A. Villavicencio and M. Idiart, Matrix factorization using window sampling and negative sampling for improved word representations, In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 419 424, 2016. [15] P. Bojanowski, E. Grave, A. Joulin and T. Mikolov, Enriching word vectors with subword information, Transactions of the Association for Computational Linguistics, 5, pp. 135 146, 2017. [16] F. Liu, H. Lu, C. Lo and G. Neubig, Learning character-level compositionality with visual features, arxiv: 1704.04859, 2017. [17] J. Garten, K. Sagae, V. Ustun and M. Dehghani, Combining distributed vector representations for words, In Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pp. 95 101, 2015. [18] W. Yin and H. Schütze, Learning word metaembeddings, In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pp. 1351 1360, 2016. [19] S. Rothe and H. Schütze, Autoextend: Extending word embeddings to embeddings for synsets and lexemes, arxiv: 1507.01127, 2015. [20] M. Nickel and D. Kiela, Poincaré embeddings for learning hierarchical representations, arxiv: 1705.08039, 2017. [21] T. Sato, Neologism dictionary based on the language resources on the web for mecab, https://github. com/neologd/mecab-ipadic-neologd, 2015. [22] N. Kaji and H. Kobayashi, Incremental skip-gram model with negative sampling, arxiv: 1704.03956, 2017. 724 30 Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited.