Acquiring Strongly-related Events using Predicate-argument Co-occurring Statistics and Caseframe

Similar documents
Automatically Evaluating Text Coherence using Anaphora and Coreference Resolution

A Fully-Lexicalized Probabilistic Model for Japanese Zero Anaphora Resolution

A Hierarchical Bayesian Model for Unsupervised Induction of Script Knowledge

Recognizing Implicit Discourse Relations through Abductive Reasoning with Large-scale Lexical Knowledge

Mining coreference relations between formulas and text using Wikipedia

Removing trivial associations in association rule discovery

Chinese Zero Pronoun Resolution: A Joint Unsupervised Discourse-Aware Model Rivaling State-of-the-Art Resolvers

Fertilization of Case Frame Dictionary for Robust Japanese Case Analysis

Excitatory or Inhibitory: A New Semantic Orientation Extracts Contradiction and Causality from the Web

Utilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition

P R + RQ P Q: Transliteration Mining Using Bridge Language

Determining Word Sense Dominance Using a Thesaurus

Capturing Salience with a Trainable Cache Model for Zero-anaphora Resolution

Toponym Disambiguation using Ontology-based Semantic Similarity

A Surface-Similarity Based Two-Step Classifier for RITE-VAL

Proposition Knowledge Graphs. Gabriel Stanovsky Omer Levy Ido Dagan Bar-Ilan University Israel

Assignment 7 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran

An Introduction to String Re-Writing Kernel

Learning Features from Co-occurrences: A Theoretical Analysis

Linking people in videos with their names using coreference resolution (Supplementary Material)

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Joint Inference for Event Timeline Construction

A Continuous-Time Model of Topic Co-occurrence Trends

Latent Dirichlet Allocation Based Multi-Document Summarization

Learning Textual Entailment using SVMs and String Similarity Measures

A Linguistic Inspection of Textual Entailment

Learning Scripts as Hidden Markov Models

Coreference Resolution with! ILP-based Weighted Abduction

Annotating Spatial Containment Relations Between Events

CLRG Biocreative V

Data Mining and Knowledge Discovery. Petra Kralj Novak. 2011/11/29

Association Rule. Lecturer: Dr. Bo Yuan. LOGO

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Text Mining. March 3, March 3, / 49

KSU Team s System and Experience at the NTCIR-11 RITE-VAL Task

(1) [John (i) ] attacked [Bob (j) ]. Police arrested him (i).

The Winograd Schema Challenge and Reasoning about Correlation

Classification & Information Theory Lecture #8

Sequences and Information

Chapter 2 Quality Measures in Pattern Mining

Discovery of Frequent Word Sequences in Text. Helena Ahonen-Myka. University of Helsinki

It s a Contradiction No, it s Not: A Case Study using Functional Relations

An Approach to Classification Based on Fuzzy Association Rules

Annotation tasks and solutions in CLARIN-PL

Be principled! A Probabilistic Model for Lexical Entailment

First Order Logic Implication (4A) Young W. Lim 4/6/17

Integrating Order Information and Event Relation for Script Event Prediction

The Noisy Channel Model and Markov Models

D2: For each type 1 quantifier Q, Q acc (R) = {a : Q(aR) = 1}.

CPDA Based Fuzzy Association Rules for Learning Achievement Mining

Reflexives and non-fregean quantifiers

EXTRACTION AND VISUALIZATION OF GEOGRAPHICAL NAMES IN TEXT

BnO at NTCIR-10 RITE: A Strong Shallow Approach and an Inference-based Textual Entailment Recognition System

Information Extraction and GATE. Valentin Tablan University of Sheffield Department of Computer Science NLP Group

Probabilistic Coordination Disambiguation in a Fully-lexicalized Japanese Parser

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Latent Variable Models in NLP

情報工学実験 4: データマイニング班 (week 6) 機械学習概観の振り返り

Global Machine Learning for Spatial Ontology Population

Towards a Probabilistic Model for Lexical Entailment

Contexts for Quantification

10/17/04. Today s Main Points

Semantics and Pragmatics of NLP Pronouns

Automatic Generation of Shogi Commentary with a Log-Linear Language Model

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

Discovering Classes of Strongly Equivalent Logic Programs with Negation as Failure in the Head

Topic #3 Predicate Logic. Predicate Logic

Semantic Similarity from Corpora - Latent Semantic Analysis

Encoding Tree Pair-based Graphs in Learning Algorithms: the Textual Entailment Recognition Case

An Improved Stemming Approach Using HMM for a Highly Inflectional Language

Logical Agents. September 14, 2004

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Introduction to Semantics. Common Nouns and Adjectives in Predicate Position 1

Machine Learning for Interpretation of Spatial Natural Language in terms of QSR

Learning Random Walk Models for Inducing Word Dependency Distributions

The Benefits of a Model of Annotation

Toponym Disambiguation by Arborescent Relationships

Annotating Spatial Containment Relations Between Events. Kirk Roberts, Travis Goodwin, and Sanda Harabagiu

Tuning as Linear Regression

Textual Entailment as a Directional Relation

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

NUL System at NTCIR RITE-VAL tasks

Title 古典中国語 ( 漢文 ) の形態素解析とその応用 安岡, 孝一 ; ウィッテルン, クリスティアン ; 守岡, 知彦 ; 池田, 巧 ; 山崎, 直樹 ; 二階堂, 善弘 ; 鈴木, 慎吾 ; 師, 茂. Citation 情報処理学会論文誌 (2018), 59(2):

Maschinelle Sprachverarbeitung

Location Name Disambiguation Exploiting Spatial Proximity and Temporal Consistency

Mining Exceptional Relationships with Grammar-Guided Genetic Programming

2002 Journal of Software, )

Maschinelle Sprachverarbeitung

Latent Dirichlet Allocation Introduction/Overview

Extraction of Opposite Sentiments in Classified Free Format Text Reviews

Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics

Entropy as an Indicator of Context Boundaries An Experiment Using a Web Search Engine

A Model for Multimodal Reference Resolution

Mining Positive and Negative Fuzzy Association Rules

Recognizing Spatial Containment Relations between Event Mentions

Domain Adaptation for Word Sense Disambiguation under the Problem of Covariate Shift

Francisco M. Couto Mário J. Silva Pedro Coutinho

Transcription:

1 1 16 Web 96% 79.1% 2 Acquiring Strongly-related Events using Predicate-argument Co-occurring Statistics and Caseframe Tomohide Shibata 1 and Sadao Kurohashi 1 This paper proposes a method for automatically acquiring strongly-related events from a large corpus using predicate-argument co-occurring statistics and caseframe. The co-occurrence measure is calculated using an association rule mining method, and the importance of an argument for each predicateargument is judged. Then, the argument alignment in the pair of predicatearguments is performed by using a caseframe. We conducted experiments using a Web corpus consisting of 1.6G sentences. The accuracy for the extracted event pairs was 96%, and the accuracy of the argument alignment was 79.1%. The number of acquired event pairs was about 20 thousands. 1 Kyoto University 1. 1) ( ) 2) 3) P A 1 P A 2 A 1 :{,,...} A 1 :{,,...} A 2 :{,...} A 2 :{,...} A 3 :{ } A 1 A 2 P A 1 P A 2 A 3 P A 2 P A 2 Chambers 4),5) () (1) a. b. (1-a) P A 2 (1-b) P A 1 Chambers 2 P A 1 A 2 : {,...} P A 2 A 3 : { } 1 c 2011 Information Processing Society of Japan

A 2:{,...} A 3:{ } P A 1 A 2 :{,...} P A 2 P A 1 A 1 :{,,...} P A 2 2 3 4 5 6 7 2. 2.1 WordNet 6) WordNet LifeNet 7) 8 41 EventNet Openmind Commonsense Knowledge Base 8) Regneri Amazon Mechanical Turk 9) 22 493 2.2 Lin 10) X is the author of Y X wrote Y X,Y Chambers 4),5) accused XX claimedx argueddismissed X / 12) - 13) 14) 3. 1 Web P A 1 P A 2 P A 1 P A 2 15) 1) 1 10 2 c 2011 Information Processing Society of Japan

Web コーパス 述語項構造ペアの抽出 PA1 彼ガ財布ヲ拾う 財布ヲ拾う ドライバーガ財布ヲ拾う PA2 警察ニ届ける 警察ニ届ける 届ける 1 P A 1 P A 2 拾う : 10 ガ ヲ 男, 女の子, 財布, 電話, 格フレームに基づく項のアライメント PA1 財布ヲ拾う 警察ニ届ける A1 : { 人, 男, } ガ A2 : { 財布, } ヲ 1 述語項構造ペアの共起度計算 拾う PA2 届ける : 20 ガ ヲ ニ 男, 人, 財布, 金, 警察, 交番, A1 : { 人, 男, } ガ A2 : { 財布, } ヲ 届ける A3 : { 警察 } ニ P A 1 P A 2 P A 1 P A 2 4. (2) a. b. 2 77,, 105,,, 502,, ID, 956,, 1829,, 1901,, 1 (P A 1 P A 2 ) P A 1 P A 2 P A 2 P A 1 ( P A 1 P A 2 ) P A 2 ( P A 2 ) 16) - 2,000 2 n (P (c n)) c 77 P A 1:, P A 2: P A 1: 3 c 2011 Information Processing Society of Japan

77, P A 2: 77 P A 1 :, P A 2 : P A 1 : 77, P A 2 : 5. 4 15) 5.1 15) I = I 1, I 2,, I m t (t I) T (T = t 1, t 2,, t n ) X Y (X, Y I, X Y = φ) X Y X antecedent (left-hand side, lhs)y consequent (right-hand side, rhs) 3 support confidence lift support(x Y ) = C(X Y ) T confidence(x Y ) = C(X Y ) C(X) lift(x Y ) = confidence(x Y ) support(y ) = support(x Y ) support(x) C(X) X support XY confidence X Y lift X Y (1) (2) (3) 3 () P A 1 P A 2 - - -, - - - - - - - -, - Apriori 17) abc t 1 abcd t 2 t 1 t 2 Apriori support confidence 5.2 Apriori Apriori 4 3 X P A 1 P A 1 0 Y P A 2 P A 2 0 lift lift-min lift-max lift-max Apriori 3 ( 1 ) - - ( 2 ) - 4 c 2011 Information Processing Society of Japan

4 () :1 (2), (2), (3513), (80), :10 (4), (2), (580), (136), :1 (164), (144), (103400), (4797), :20 (11), (8), (8), (6), (2587), P A 1 - P A 2 - P A 1-6. 5 P A 1 P A 2 P A 1P A 2 - - Web 1) 4 P A 1 cf 1 P A 2 cf 2 P A 1 P A 2 ( 1 ) P A 1 P A 2 5 (2) P A 2 ( 2 ) argmax cf 1,cf 2 max a sim(arg 1, a(arg 1)) (4) a a a P A 1 P A 2 arg 1 P A 1 a(arg 1 ) arg 1 P A 2 a arg1 a(arg 1 ) sim arg 1 a(arg 1) cosine :10:20 sim 2 cosine :10 ( 4, 2, 2, ) :20 ( 11, 8, 0, ) P A 1 P A 2 P A 1 10 P A 2 20 P A 1 P A 2,, 7. 7.1 1 60 60 16 5 c 2011 Information Processing Society of Japan

5 96(96.0%) 4(4.0%) 76(79.1%) 20(20.8%) 7 ( 6 ) P A 1 P A 2 6 (5 ) P A 1 P A 2 (1) - (2) - - (3) (4) (5) (6) (7) - (8) - - JUMAN 1 KNP 2 4 5.2 Apriori support 1.0 10 7 confidence 1.0 10 3 lift-min, lift-max 1010,000 16 1) 30,000 1 251 4.7 7.2 7.2.1 5 2 100 1 http://nlp.kuee.kyoto-u.ac.jp/nl-resource/juman.html 2 http://nlp.kuee.kyoto-u.ac.jp/nl-resource/knp.html (1) (2) A 1 :{,,...} A1 :{,,...} A 2 :{ } A 1 :{,,,...} A 1 :{,,,...} A 2 :{ } A 3 :{ } (3) A 1 :{,,,...} A 1 :{,,,...} A 1 :{,...} (4) A 1:{,...} A 2 :{,,...} A 2 :{,,...} (5) A 1 :{,,...} A 2 :{,,...} A 1 :{,,...} A 2 :{,,...} A 3 :{ } A 1 :{,,...} (6) A 2:{,,...} A 2 :{,,...} A 1 :{,,...} (7) A 1 :{ } A 1 :{ } 5 96% 6 ( 6 (8)) 7.2.2 96 5 79.1% 7 7 (6) P A 1 A 1 P A 2 A 1 A 1 A 3 A 1 :{,,...} A 2:{,,...} A 3 :{,,...} A 2:{,,...} A 3 :{,,...} 6 c 2011 Information Processing Society of Japan

7 (7) P A 2 P A 1 P A 1 P A 2 A 2 :{,,,...} A 2:{,,,...} A 1 :{ } 8 ( ) P A 1 P A 2 0.163 (3,768 / 23,180) 0.282 (549 / 1,944) 0.176 (474 / 2,689) 0.272 (753 / 2,764) 0.483 (7,106 / 14,713) 0.321 (1,054 / 3,284) 0.163 (344 / 2,113) 0.338 (1,042 / 3,086) 0.282 (549 / 1,944) 7.2.3 4) ( F 0.75 18) ) Web 2 w v e(w, d)e(v, g) w d v g d g e(w, d) e(v, g) pmi(e(w, d), e(v, g)) = log P (e(w, d), e(v, g)) P (e(w, d))p (e(v, g)) k (k 5 ) 8 P A 1 P A 2 P A 1P A 2 7.2.4 2 3 (5) 2 ( lift ) Chamber 2 1 8. 7 c 2011 Information Processing Society of Japan

3 ([, ] ) RTE(Recognizing Textual Entailment) 1) Kawahara, D. and Kurohashi, S.: A Fully-Lexicalized Probabilistic Model for Japanese Syntactic and Case Structure Analysis, Proceedings of the HLT- NAACL2006, pp.176 183 (2006). 2) Bean, D. and Riloff, E.: Unsupervised Learning of Contextual Role Knowledge for Coreference Resolution, HLT-NAACL 2004: Main Proceedings, pp.297 304 (2004). 3) Gerber, M. and Chai, J.: Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp.1583 1592 (2010). 4) Chambers, N. and Jurafsky, D.: Unsupervised Learning of Narrative Event Chains, Proceedings of ACL-08: HLT, pp.789 797 (2008). 5) Chambers, N. and Jurafsky, D.: Unsupervised Learning of Narrative Schemas and their Participants, Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp.602 610 (2009). 6) Miller, G. A.: Wordnet: A lexical detabase for English, Communications of the ACM (1995). 7) Singh, P. and Williams, W.: LifeNet: A Propositional Model of Ordinary Human Activity, Proceedings of Workshop on Distributed and Collaborative Knowledge Capture (2003). 8) Espinosa, J. and Lieberman, H.: EventNet: Inferring Temporal Relations Between Commonsense Events, Proceedings of the 4th Mexican International Conference on Artificial Intelligence, pp.61 69 (2005). 9) Regneri, M., Koller, A. and Pinkal, M.: Learning Script Knowledge with Web Experiments, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp.979 988 (2010). 10) Lin, D. and Pantel, P.: Discovery of Inference Rules for Question Answering, Natural Language Engineering, Vol.7, No.4, pp.343 360 (2001). 11) Szpektor, I. and Dagan, I.: Learning Entailment Rules for Unary Templates, Proceedings of the 22nd International Conference on Computational Linguistics (COL- ING), pp.849 856 (2008). 12) Fujiki, T., Nanba, H. and Okumura, M.: Automatic Acquisition of Script Knowledge from a Text Collection, Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2003), pp.91 94 (2003). 13) Torisawa, K.: Acquiring Inference Rules with Temporal Constraints by using Japanese Coordinated Sentences and Noun-Verb Co-occurrences, Proceedings of Human Language Technology Conference/North American chapter of the Association for Computational Linguistics annual meeting (HLT-NAACL06), pp. 57 64 (2006). 14) Abe, S., Inui, K. and Matsumoto, Y.: Two-phased event relation acquisition: coupling the relation-oriented and argument-oriented approaches, Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp. 1 8 (2008). 15) Agrawal, R., Imielinski, T. and Swami, A.: Mining association rules between sets of items in large databases, Proceedings of the ACM-SIGMOD 1993 International Conference on Management of Data (1993), pp.207 216 (1993). 16) Kazama, J. and Torisawa, K.: Inducing Gazetteers for Named Entity Recognition by Large-Scale Clustering of Dependency Relations, Proceedings of ACL-08: HLT, pp.407 415 (2008). 17) Borgelt, C. and Kruse, R.: Induction of Association Rules: Apriori Implementation, Proceedings of 15th Conference on Computational Statistics, pp.395 400 (2002). 18) Sasano, R., Kawahara, D. and Kurohashi, S.: Improving Coreference Resolution Using Bridging Reference Resolution and Automatically Acquired Synonyms, Discourse Anaphora and Anaphor Resolution Colloquium, pp.125 136 (2007). 8 c 2011 Information Processing Society of Japan