Title 古典中国語 ( 漢文 ) の形態素解析とその応用 Author(s) 安岡, 孝一 ; ウィッテルン, クリスティアン ; 守岡, 知彦 ; 池田, 巧 ; 山崎, 直樹 ; 二階堂, 善弘 ; 鈴木, 慎吾 ; 師, 茂 Citation 情報処理学会論文誌 (2018), 59(2): 323-331 Issue Date 2018-02-15 URL http://hdl.handle.net/2433/229121 The copyright of this material is r Processing Society of Japan (IPSJ). on this web site with the agreement IPSJ. Please be complied with Copyr Right Code of Ethics of the IPSJ if any u derivative work, distribute or make part or whole thereof. All Rights R Information Processing Society of J は情報処理学会に帰属します 本著作物は著作権者である情報処理学会の許可のもとに掲載するものです Type Journal Article Textversion publisher Kyoto University
1,a) 1,b) 1,c) 1,d) 2,e) 2,f) 3,g) 4,h) 2017 5 9, 2017 11 7 MeCab 4 MeCab MeCab XEmacs CHISE MeCab MeCab Linked Data WWW MeCab Morphological Analysis of Classical Chinese Texts and Its Application Koichi Yasuoka 1,a) Christian Wittern 1,b) Tomohiko Morioka 1,c) Takumi Ikeda 1,d) Naoki Yamazaki 2,e) Yoshihiro Nikaido 2,f) Shingo Suzuki 3,g) Shigeki Moro 4,h) Received: May 9, 2017, Accepted: November 7, 2017 Abstract: A method to analyze classical Chinese texts is proposed. In the method, we use our original morphological analyzer based on MeCab. We propose a new four-level word-class system to represent the predicate-object structure of classical Chinese. In order to make a corpus for classical Chinese on MeCab, we have constructed a MeCab-corpus editor based on XEmacs CHISE. In order to control the corpus effectively, and to refactor our four-level word-class system, we have converted it into Linked Data on WWW. As an applied study for our morpholgical analysis of classical Chinese texts, we have tried to extract named entities: names of places, job titles, and names of people. As a result we are able to extract names of places from classical Chinese texts almost perfectly. But we have found some difficulties to extract job titles or names of people. Keywords: classical Chinese corpus, linked data, named entity extraction 1 Kyoto University, Kyoto 606 8501, Japan 2 Kansai University, Suita, Osaka 564 8680, Japan 3 Osaka University, Minoh, Osaka 562 8558, Japan 4 Hanazono University, Kyoto 604 8456, Japan a) yasuoka@kanji.zinbun.kyoto-u.ac.jp b) wittern@zinbun.kyoto-u.ac.jp c) tomo@kanji.zinbun.kyoto-u.ac.jp d) ikeda@zinbun.kyoto-u.ac.jp e) ymzknk@kansai-u.ac.jp f) nikaido@kansai-u.ac.jp g) suzukish@lang.osaka-u.ac.jp h) s-moro@hanazono.ac.jp 1. c 2018 Information Processing Society of Japan 323
Fig. 1 1 MeCab-corpus editor for classical Chinese. 2008 4 2013 4 [1] [2], [3] [4] [5], [6] 2. MeCab [7] MeCab MeCab MeCab v n p MeCab 1 2 IPA [8], [9] MeCab IPA MeCab MeCab 4 c 2018 Information Processing Society of Japan 324
Fig. 2 2 A new four-level word-class system for classical Chinese. MeCab 2010 4 3 B 22300087 MeCab XEmacs CHISE [10] c 2018 Information Processing Society of Japan 325
1 F / // Table 1 F-measures on MeCab-corpola for classical Chinese. M K R M 100 97/90/88/80 97/87/85/82 K 89/85/82/83 100 95/88/83/79 R 93/86/83/80 85/73/72/64 100 MeCab 1 v,,,,*,*,,,,* v,,,,*,*,,,, v,,,,*,*,,,, n,,,,*,*,,,,* n,,,,*,*,,,,* MeCab MeCab 4 2 1 2 3 4 n v p 3 9 [11] 44 88 MeCab MeCab MeCab [12] M 69 K 68 R 320 MeCab 5,500 MeCab 0.994 F / // 1 R R K 46,000 1 3.9 3. Linked Data MeCab MeCab Linked Data [13] MeCab 3 1 CHISE 3 3 Linked Data WWW [14] n,,, n,,, c 2018 Information Processing Society of Japan 326
3 Linked Data Fig. 3 Linked Data around. 4. MeCab 2013 4 3 B 25280122 4.1 MeCab n,,, n,,, 2 2 MeCab 2 10% 10%n,,, 2 90% MeCab 2 MeCab 3 MeCab 1 1 MeCab v,,, v,,, n,,, c 2018 Information Processing Society of Japan 327
2 F / // Table 2 F-measures on MeCab-dictionaries for classical Chinese. P M R α 96/86/85/76 93/90/90/77 96/83/81/71 β 96/89/88/84 93/90/90/76 96/83/81/71 γ 96/86/84/73 93/90/90/77 94/81/79/69 1 MeCab MeCab 2 1 n,,, v,,, MeCab 46,000 MeCab 2,000 6,300 400 v,,,,*,*,,,,* v,,,,*,*,,,, n,,,,*,*,,,,* 3 MeCab [15] α MeCab β α 1 γ α α 111 β 1,240 γ 0 1 P 88 P [12] M 69 R 320 α β γ 2,000 MeCab 0.996 F / // 2 P α β F α γ F P P F n,,,,*,*,,,,* v,,,,*,*,,,, n,,,,*,*,,,,* n,,,,*,*,,,,* n,,,,*,*,,,,* n,,,,*,*,,,,* 1 α γ M β F M β R α β F γ F R γ F β R β P β M R c 2018 Information Processing Society of Japan 328
β MeCab 4.2 MeCab MeCab n,,,,*,*,,,,* n,,,,*,*,,,,* v,,,,*,*,,,, v,,,,*,*,,,, n,,,,*,*,,,,* *1 1 *1 v,,, v,,, v,,, 4.3 MeCab n,,, n,,, n,,, n,,, 16 6 n,,,,*,*,,,,* n,,,,*,*,,,,* 10 10 9 n,,, 1 n,,,* 1 c 2018 Information Processing Society of Japan 329
5. F [8], [12], [15] F F F [1] Vol.21, No.3, pp.8 18 (2007). [2] Jiang, W., Huang, L., Liu, Q. and Lü, Y.: A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging, Proc. ACL-08, pp.897 904 (2008). [3] Shen, M., Liu, H., Kawahara, D. and Kurohashi, S.: Chinese Morphological Analysis with Character-level POS Tagging, Proc. ACL-2014, pp.253 258 (2014). [4] Huang, L., Peng, Y., Wang, H. and Wu, Z.: Statistical Part-of-Speech Tagging for Classical Chinese, Proc. TSD 2002, pp.115 122 (2002). [5] Yasuoka, K., Yamazaki, N., Wittern, C., Nikaido, Y. and Morioka, T.: A Morphological Analysis of Classical Chinese Texts, Proc. Digital Humanities 2014, pp.410 412 (2014). [6] Wittern, C., 27 pp.3 14 (2016). [7] MeCab Vol.2008-CH-79, pp.17 22 (2008). [8] MeCab Vol.2009-CH-84, No.3, pp.1 5 (2009). [9] Morioka, T.: A Prototype of a Classical Chinese Morphological Analyzer based on MeCab, Proc. Osaka Symposium on Digital Humanities 2011, p.36 (2011). [10] 23 pp.75 83 (2012). [11] Pulleyblank, E.G.: Outline of Classical Chinese Grammar, UBC Press (1995). [12] 2012 pp.39 46 (2012). [13] Linked Data 2013 pp.187 194 (2013). [14] CHISE 25 pp.33 46 (2014). [15] Wittern, C. 2014 pp.63 68 (2014). 1965 1990 1990 1997 2000 2009 2015 1962 1991 1998 1998 2001 2009 2012 c 2018 Information Processing Society of Japan 330
1969 1999 1999 COE 2000 2009 1973 2007 2008 2011 2012 1962 1990 1993 1999 2013 1972 1995 2001 3DCG 1962 1962 1985 1997 1997 1998 2004 2005 c 2018 Information Processing Society of Japan 331