Acquiring Strongly-related Events using Predicate-argument Co-occurring Statistics and Caseframe

1 1 16 Web 96% 79.1% 2 Acquiring Strongly-related Events using Predicate-argument Co-occurring Statistics and Caseframe Tomohide Shibata 1 and Sadao Kurohashi 1 This paper proposes a method for automatically acquiring strongly-related events from a large corpus using predicate-argument co-occurring statistics and caseframe. The co-occurrence measure is calculated using an association rule mining method, and the importance of an argument for each predicateargument is judged. Then, the argument alignment in the pair of predicatearguments is performed by using a caseframe. We conducted experiments using a Web corpus consisting of 1.6G sentences. The accuracy for the extracted event pairs was 96%, and the accuracy of the argument alignment was 79.1%. The number of acquired event pairs was about 20 thousands. 1 Kyoto University 1. 1) ( ) 2) 3) P A 1 P A 2 A 1 :{,,...} A 1 :{,,...} A 2 :{,...} A 2 :{,...} A 3 :{ } A 1 A 2 P A 1 P A 2 A 3 P A 2 P A 2 Chambers 4),5) () (1) a. b. (1-a) P A 2 (1-b) P A 1 Chambers 2 P A 1 A 2 : {,...} P A 2 A 3 : { } 1 c 2011 Information Processing Society of Japan

A 2:{,...} A 3:{ } P A 1 A 2 :{,...} P A 2 P A 1 A 1 :{,,...} P A 2 2 3 4 5 6 7 2. 2.1 WordNet 6) WordNet LifeNet 7) 8 41 EventNet Openmind Commonsense Knowledge Base 8) Regneri Amazon Mechanical Turk 9) 22 493 2.2 Lin 10) X is the author of Y X wrote Y X,Y Chambers 4),5) accused XX claimedx argueddismissed X / 12) - 13) 14) 3. 1 Web P A 1 P A 2 P A 1 P A 2 15) 1) 1 10 2 c 2011 Information Processing Society of Japan

Web コーパス述語項構造ペアの抽出 PA1 彼ガ財布ヲ拾う財布ヲ拾うドライバーガ財布ヲ拾う PA2 警察ニ届ける警察ニ届ける届ける 1 P A 1 P A 2 拾う : 10 ガヲ男, 女の子, 財布, 電話, 格フレームに基づく項のアライメント PA1 財布ヲ拾う警察ニ届ける A1 : { 人, 男, } ガ A2 : { 財布, } ヲ 1 述語項構造ペアの共起度計算拾う PA2 届ける : 20 ガヲニ男, 人, 財布, 金, 警察, 交番, A1 : { 人, 男, } ガ A2 : { 財布, } ヲ届ける A3 : { 警察 } ニ P A 1 P A 2 P A 1 P A 2 4. (2) a. b. 2 77,, 105,,, 502,, ID, 956,, 1829,, 1901,, 1 (P A 1 P A 2 ) P A 1 P A 2 P A 2 P A 1 ( P A 1 P A 2 ) P A 2 ( P A 2 ) 16) - 2,000 2 n (P (c n)) c 77 P A 1:, P A 2: P A 1: 3 c 2011 Information Processing Society of Japan

77, P A 2: 77 P A 1 :, P A 2 : P A 1 : 77, P A 2 : 5. 4 15) 5.1 15) I = I 1, I 2,, I m t (t I) T (T = t 1, t 2,, t n ) X Y (X, Y I, X Y = φ) X Y X antecedent (left-hand side, lhs)y consequent (right-hand side, rhs) 3 support confidence lift support(x Y ) = C(X Y ) T confidence(x Y ) = C(X Y ) C(X) lift(x Y ) = confidence(x Y ) support(y ) = support(x Y ) support(x) C(X) X support XY confidence X Y lift X Y (1) (2) (3) 3 () P A 1 P A 2 - - -, - - - - - - - -, - Apriori 17) abc t 1 abcd t 2 t 1 t 2 Apriori support confidence 5.2 Apriori Apriori 4 3 X P A 1 P A 1 0 Y P A 2 P A 2 0 lift lift-min lift-max lift-max Apriori 3 ( 1 ) - - ( 2 ) - 4 c 2011 Information Processing Society of Japan

4 () :1 (2), (2), (3513), (80), :10 (4), (2), (580), (136), :1 (164), (144), (103400), (4797), :20 (11), (8), (8), (6), (2587), P A 1 - P A 2 - P A 1-6. 5 P A 1 P A 2 P A 1P A 2 - - Web 1) 4 P A 1 cf 1 P A 2 cf 2 P A 1 P A 2 ( 1 ) P A 1 P A 2 5 (2) P A 2 ( 2 ) argmax cf 1,cf 2 max a sim(arg 1, a(arg 1)) (4) a a a P A 1 P A 2 arg 1 P A 1 a(arg 1 ) arg 1 P A 2 a arg1 a(arg 1 ) sim arg 1 a(arg 1) cosine :10:20 sim 2 cosine :10 ( 4, 2, 2, ) :20 ( 11, 8, 0, ) P A 1 P A 2 P A 1 10 P A 2 20 P A 1 P A 2,, 7. 7.1 1 60 60 16 5 c 2011 Information Processing Society of Japan

5 96(96.0%) 4(4.0%) 76(79.1%) 20(20.8%) 7 ( 6 ) P A 1 P A 2 6 (5 ) P A 1 P A 2 (1) - (2) - - (3) (4) (5) (6) (7) - (8) - - JUMAN 1 KNP 2 4 5.2 Apriori support 1.0 10 7 confidence 1.0 10 3 lift-min, lift-max 1010,000 16 1) 30,000 1 251 4.7 7.2 7.2.1 5 2 100 1 http://nlp.kuee.kyoto-u.ac.jp/nl-resource/juman.html 2 http://nlp.kuee.kyoto-u.ac.jp/nl-resource/knp.html (1) (2) A 1 :{,,...} A1 :{,,...} A 2 :{ } A 1 :{,,,...} A 1 :{,,,...} A 2 :{ } A 3 :{ } (3) A 1 :{,,,...} A 1 :{,,,...} A 1 :{,...} (4) A 1:{,...} A 2 :{,,...} A 2 :{,,...} (5) A 1 :{,,...} A 2 :{,,...} A 1 :{,,...} A 2 :{,,...} A 3 :{ } A 1 :{,,...} (6) A 2:{,,...} A 2 :{,,...} A 1 :{,,...} (7) A 1 :{ } A 1 :{ } 5 96% 6 ( 6 (8)) 7.2.2 96 5 79.1% 7 7 (6) P A 1 A 1 P A 2 A 1 A 1 A 3 A 1 :{,,...} A 2:{,,...} A 3 :{,,...} A 2:{,,...} A 3 :{,,...} 6 c 2011 Information Processing Society of Japan

7 (7) P A 2 P A 1 P A 1 P A 2 A 2 :{,,,...} A 2:{,,,...} A 1 :{ } 8 ( ) P A 1 P A 2 0.163 (3,768 / 23,180) 0.282 (549 / 1,944) 0.176 (474 / 2,689) 0.272 (753 / 2,764) 0.483 (7,106 / 14,713) 0.321 (1,054 / 3,284) 0.163 (344 / 2,113) 0.338 (1,042 / 3,086) 0.282 (549 / 1,944) 7.2.3 4) ( F 0.75 18) ) Web 2 w v e(w, d)e(v, g) w d v g d g e(w, d) e(v, g) pmi(e(w, d), e(v, g)) = log P (e(w, d), e(v, g)) P (e(w, d))p (e(v, g)) k (k 5 ) 8 P A 1 P A 2 P A 1P A 2 7.2.4 2 3 (5) 2 ( lift ) Chamber 2 1 8. 7 c 2011 Information Processing Society of Japan

3 ([, ] ) RTE(Recognizing Textual Entailment) 1) Kawahara, D. and Kurohashi, S.: A Fully-Lexicalized Probabilistic Model for Japanese Syntactic and Case Structure Analysis, Proceedings of the HLT- NAACL2006, pp.176 183 (2006). 2) Bean, D. and Riloff, E.: Unsupervised Learning of Contextual Role Knowledge for Coreference Resolution, HLT-NAACL 2004: Main Proceedings, pp.297 304 (2004). 3) Gerber, M. and Chai, J.: Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp.1583 1592 (2010). 4) Chambers, N. and Jurafsky, D.: Unsupervised Learning of Narrative Event Chains, Proceedings of ACL-08: HLT, pp.789 797 (2008). 5) Chambers, N. and Jurafsky, D.: Unsupervised Learning of Narrative Schemas and their Participants, Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp.602 610 (2009). 6) Miller, G. A.: Wordnet: A lexical detabase for English, Communications of the ACM (1995). 7) Singh, P. and Williams, W.: LifeNet: A Propositional Model of Ordinary Human Activity, Proceedings of Workshop on Distributed and Collaborative Knowledge Capture (2003). 8) Espinosa, J. and Lieberman, H.: EventNet: Inferring Temporal Relations Between Commonsense Events, Proceedings of the 4th Mexican International Conference on Artificial Intelligence, pp.61 69 (2005). 9) Regneri, M., Koller, A. and Pinkal, M.: Learning Script Knowledge with Web Experiments, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp.979 988 (2010). 10) Lin, D. and Pantel, P.: Discovery of Inference Rules for Question Answering, Natural Language Engineering, Vol.7, No.4, pp.343 360 (2001). 11) Szpektor, I. and Dagan, I.: Learning Entailment Rules for Unary Templates, Proceedings of the 22nd International Conference on Computational Linguistics (COL- ING), pp.849 856 (2008). 12) Fujiki, T., Nanba, H. and Okumura, M.: Automatic Acquisition of Script Knowledge from a Text Collection, Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003), pp.91 94 (2003). 13) Torisawa, K.: Acquiring Inference Rules with Temporal Constraints by using Japanese Coordinated Sentences and Noun-Verb Co-occurrences, Proceedings of Human Language Technology Conference/North American chapter of the Association for Computational Linguistics annual meeting (HLT-NAACL06), pp. 57 64 (2006). 14) Abe, S., Inui, K. and Matsumoto, Y.: Two-phased event relation acquisition: coupling the relation-oriented and argument-oriented approaches, Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp. 1 8 (2008). 15) Agrawal, R., Imielinski, T. and Swami, A.: Mining association rules between sets of items in large databases, Proceedings of the ACM-SIGMOD 1993 International Conference on Management of Data (1993), pp.207 216 (1993). 16) Kazama, J. and Torisawa, K.: Inducing Gazetteers for Named Entity Recognition by Large-Scale Clustering of Dependency Relations, Proceedings of ACL-08: HLT, pp.407 415 (2008). 17) Borgelt, C. and Kruse, R.: Induction of Association Rules: Apriori Implementation, Proceedings of 15th Conference on Computational Statistics, pp.395 400 (2002). 18) Sasano, R., Kawahara, D. and Kurohashi, S.: Improving Coreference Resolution Using Bridging Reference Resolution and Automatically Acquired Synonyms, Discourse Anaphora and Anaphor Resolution Colloquium, pp.125 136 (2007). 8 c 2011 Information Processing Society of Japan