Automatically Evaluating Text Coherence using Anaphora and Coreference Resolution

Similar documents
Capturing Salience with a Trainable Cache Model for Zero-anaphora Resolution

Chinese Zero Pronoun Resolution: A Joint Unsupervised Discourse-Aware Model Rivaling State-of-the-Art Resolvers

Acquiring Strongly-related Events using Predicate-argument Co-occurring Statistics and Caseframe

Measuring Variability in Sentence Ordering for News Summarization

Discourse Structure and Co-Reference: An Empirical Study

An Empirical Investigation of the Relation Between Discourse Structure and Co-Reference

Task Oriented Dialog, Classic Models

Recognizing Implicit Discourse Relations through Abductive Reasoning with Large-scale Lexical Knowledge

2002 Journal of Software, )

Mining coreference relations between formulas and text using Wikipedia

A Fully-Lexicalized Probabilistic Model for Japanese Zero Anaphora Resolution

a) b) (Natural Language Processing; NLP) (Deep Learning) Bag of words White House RGB [1] IBM

Utilizing Portion of Patent Families with No Parallel Sentences Extracted in Estimating Translation of Technical Terms

Joint Inference for Event Timeline Construction

Resolving Entity Coreference in Croatian with a Constrained Mention-Pair Model. Goran Glavaš and Jan Šnajder TakeLab UNIZG

Effectiveness of complex index terms in information retrieval

A Deterministic Word Dependency Analyzer Enhanced With Preference Learning

Latent Variable Models in NLP

Margin-based Decomposed Amortized Inference

Penn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark

A Support Vector Method for Multivariate Performance Measures

Coreference Resolution with ILP-based Weighted Abduction

Chunking with Support Vector Machines

Minimally Supervised Event Causality Identification

Semantics and Pragmatics of NLP Pronouns

Automatic Generation of Shogi Commentary with a Log-Linear Language Model

Text Mining. March 3, March 3, / 49

Multi-dimensional Dependency Grammar as Multigraph Description

arxiv: v2 [cs.cl] 25 Sep 2016

An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition

Statistical Models of the Annotation Process

Right Frontier Constraint for discourses in non canonical order

Prenominal Modifier Ordering via MSA. Alignment

A Study for Evaluating the Importance of Various Parts of Speech (POS) for Information Retrieval (IR)

Toponym Disambiguation using Ontology-based Semantic Similarity

Linguistically Aware Coreference Evaluation Metrics

Semantic Role Labeling via Tree Kernel Joint Inference

A Discriminative Model for Semantics-to-String Translation

Unsupervised Coreference Resolution in a Nonparametric Bayesian Model

Tuning as Linear Regression

Classification of Publications Based on Statistical Analysis of Scientific Terms Distributions

The Winograd Schema Challenge and Reasoning about Correlation

Which Step Do I Take First? Troubleshooting with Bayesian Models

An Introduction to String Re-Writing Kernel

Domain Adaptation for Word Sense Disambiguation under the Problem of Covariate Shift

TnT Part of Speech Tagger

Multi-Task Structured Prediction for Entity Analysis: Search Based Learning Algorithms

Scalable Term Selection for Text Categorization

Spatial Role Labeling CS365 Course Project

2. Tree-to-String [6] (Phrase Based Machine Translation, PBMT)[1] [2] [7] (Targeted Self-Training) [7] Tree-to-String [3] Tree-to-String [4]

Machine Learning for Interpretation of Spatial Natural Language in terms of QSR

SYNTHER A NEW M-GRAM POS TAGGER

NTT/NAIST s Text Summarization Systems for TSC-2

Transition-Based Parsing

CS460/626 : Natural Language Processing/Speech, NLP and the Web

Learning and Inference over Constrained Output

A Boosting-based Algorithm for Classification of Semi-Structured Text using Frequency of Substructures

Kernels for Structured Natural Language Data

CLRG Biocreative V

A Model for Multimodal Reference Resolution

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

BLANC: Implementing the Rand index for coreference evaluation

BnO at NTCIR-10 RITE: A Strong Shallow Approach and an Inference-based Textual Entailment Recognition System

Evaluation Strategies

Lecture 13: Structured Prediction

Improved Decipherment of Homophonic Ciphers

Optimum parameter selection for K.L.D. based Authorship Attribution for Gujarati

A Boosted Semi-Markov Perceptron

Global Models of Document Structure Using Latent Permutations

Linking people in videos with their names using coreference resolution (Supplementary Material)

Coreference Resolution with! ILP-based Weighted Abduction

Learning Word Representations by Jointly Modeling Syntagmatic and Paradigmatic Relations

CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview

Fertilization of Case Frame Dictionary for Robust Japanese Case Analysis

CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss

Identification of Bursts in a Document Stream

International Journal "Information Theories & Applications" Vol.14 /

Chapter 14 (Partially) Unsupervised Parsing

Polyhedral Outer Approximations with Application to Natural Language Parsing

CLEF 2017: Multimodal Spatial Role Labeling (msprl) Task Overview

An Improved Stemming Approach Using HMM for a Highly Inflectional Language

P R + RQ P Q: Transliteration Mining Using Bridge Language

On a New Model for Automatic Text Categorization Based on Vector Space Model

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

ChemDataExtractor: A toolkit for automated extraction of chemical information from the scientific literature

Detecting Heavy Rain Disaster from Social and Physical Sensors

Measuring Term Specificity Information for Assessing Sentiment Orientation of Documents in a Bayesian Learning Framework

Scoring Coreference Partitions of Predicted Mentions: A Reference Implementation

Learning Features from Co-occurrences: A Theoretical Analysis

Structure and Complexity of Grammar-Based Machine Translation

From Language towards. Formal Spatial Calculi

Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics

10/17/04. Today s Main Points

Nantonac Collaborative Filtering Recommendation Based on Order Responces. 1 SD Recommender System [23] [26] Collaborative Filtering; CF

CS Lecture 18. Topic Models and LDA

Information Extraction and GATE. Valentin Tablan University of Sheffield Department of Computer Science NLP Group

Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs

Click Prediction and Preference Ranking of RSS Feeds

3.8.1 Sum Individuals and Discourse Referents for Sum Individuals

Creating a Gold Standard Corpus for the Extraction of Chemistry-Disease Relations from Patent Texts

Transcription:

1 1 Barzilay 1) Automatically Evaluating Text Coherence using Anaphora and Coreference Resolution Ryu Iida 1 and Takenobu Tokunaga 1 We propose a metric for automatically evaluating discourse coherence of a text using the outputs of coreference and zero-anaphora resolution models. According to the idea that one tends to frequently and appropriately utilise anaphoric or coreference relations when writing a coherent text, we introduce a metric of discourse coherence based on the weighted frequency of automatically identified coreference and zero-anaphoric relations. We empirically evaluated our metric by comparing it to other existing approaches such as Barzilay et al. 1) using Japanese newspaper articles as a target data set. The results indicates that our metric, especially one based on the outputs of noun phrase coreference resolution, better reflects discourse coherence of texts the baseline model. 1 Graduate School of Information Science and Engineering, Tokyo Institute of Technology 1. 14) Barzilay entity-grid 1) entity-grid Barzilay 87.2% 90.4% 21) Barzilay entity-grid 8) F 0.346 entity-grid 1

2 3 4 5 4 6 7 8 2. 1),2),11) 13),16) Barzilay entity-grid 1) 5) Karamanis 11) Cb Barzilay entity-grid 1) 1 entity-grid 2 2 S S 20 ranking SVM 10) 87.2% 90.4% 21) entity-grid S 1 S 2 S 3 1 entity-grid... S 1 X S X O... S 2 S... S 3 S X... S O X // 2 1 entity-grid Lin 13) Penn Discourse Treebank PDTB 17) entity-grid Barzilay 3. : 2 PDTB ( 1 ) ( 2 ) 2

coherence(t ) = 1 N N score(j) (1) j score(j) = log max P (coref i, j) (2) i T T 1 j T N i j P (coref i, j) entity-grid entity-grid 1 2 7 4. 3 T j 2 4.1 4) 3) Ng 15) Ng 15) 6) 22) 2 Ng 2 Denis 4) Ng 2 Denis SVM 20) C4.5 18) MegaM 3 4.2 3 23) NAIST 7) F 0.346 1 6 7 2 Ng 6) 3 http://www.cs.utah.edu/ hal/megam/ 3

1 F 7,593 0.345 0.348 0.346 1,632 0.632 0.566 0.597 2 NAIST 1.4β 1,753 24,263 651,986 18,526 10,206 696 9,287 250,901 7,593 4,396 1 2 5) Iida 8) 3 2 Ng 15) Iida 8) 5 696 1 7,593 1,632 1 2 8) 2 F 6 5. NAIST 23) 9),19) 1 1 11 1 8 1 14 17 10 12 2 2 entity-grid 6. 1: 4 5 1 4

3 F original 0.477 0.792 0.595 random 1 0.409 0.751 0.530 random 2 0.405 0.740 0.523 random 3 0.412 0.746 0.531 random 4 0.413 0.746 0.532 random 5 0.406 0.744 0.525 4 F original 0.632 0.566 0.597 random 1 0.639 0.507 0.566 random 2 0.665 0.499 0.570 random 3 0.663 0.502 0.572 random 4 0.644 0.506 0.567 random 5 0.641 0.497 0.560 3 F original random n 20 n 3 F 1 4 4 7. 2: Barzilay entity-grid 1) 2 1 213 156 Barzilay 1) 20 2 1 20 5 15 5 N/A random 0.500 0 entity-grid (-coref) 0.673 2 (a) entity-grid (+coref) 0.707 2 np: ant 0.733 1 np: ant + ana 0.732 1 (b) np: ant + ana scm 0.761 0 zero: ant 0.523 325 zero: ant + ana 0.517 326 (c) zero: ant + ana scm 0.631 272 (a) + (b) 0.782 0 (a) + (c) 0.729 1 (a) + (b) + (c) 0.794 0 np zero ant Ng 15) ant+ana Denis 4) ant+ana scm Denis Barzilay entity-grid 2 +coref coref 2 entity-grid 21) 4 Barzilay 21) entity-grid 5 N/A 2 5

SVM 5 entity-grid entity-grid 7 Denis 4) 5 np: ant + ana scm 3 4 S 3 S 3 5 entity-grid 5 (a)+(c) entity-grid entity-grid entity-grid 3 3 5 (a)+(b)+(c) entity-grid 9 : : coherence(t )=8.973 coherence(t )=2.846 S 1 1 S 1 (= S 4) S 2 2 1 S 2 (= S 6) S 3 S 4 3 2 S 3 (= S 1) S 4 = (S 2) 1 1 S 5 3 4 S 5 (= S 5) S 6 4 S 6 (= S 3) 3 8. entity-grid 3 9 entity-grid 6

: : coherence(t )=3.369 coherence(t )=4.883 S 1 S 1 (= S 4) S 2 S 3 S 4 S 5 1 1 S 2 (= S 6) S 3 (= S 5) S 4 (= S 3) S 5 (= S 1) 2 2 2 1 1 S 6 S 6 (= S 2) 4 entity-grid A : 23680014 7 1) Barzilay, R. and Lapata, M.: Modeling Local Coherence: An Entity-Based Approach, Computational Linguistics, Vol.34, No.1, pp.1 34 (2008). 2) Bollegala, D., Okazaki, N. and Ishizuka, M.: A Bottom-Up Approach to Sentence Ordering for Multi-Document Summarization, In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp.385 392 (2006). 3) Cai, J. and Strube, M.: End-to-End Coreference Resolution via Hypergraph Partitioning, In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp.143 151 (2010). 4) Denis, P. and Baldridge, J.: Joint Determination of Anaphoricity and Coreference Resolution using Integer Programming, Proc. of HLT/NAACL, pp.236 243 (2007). 5) Grosz, B.J., Joshi, A.K. and Weinstein, S.: Centering: A framework for modeling the local coherence of discourse, Computational Linguistics, Vol.21, No.2, pp. 203 226 (1995). 6) Iida, R., Inui, K. and Matsumoto, Y.: Anaphora resolution by antecedent identification followed by anaphoricity determination, ACM Transactions on Asian Language Information Processing (TALIP), Vol.4, No.4, pp.417 434 (2005). 7) Iida, R., Komachi, M., Inui, K. and Matsumoto, Y.: Annotating a Japanese Text Corpus with Predicate-Argument and Coreference Relations, Proceeding of the ACL Workshop Linguistic Annotation Workshop, pp.132 139 (2007). 8) Iida, R. and Poesio, M.: A Cross-Lingual ILP Solution to Zero Anaphora Resolution, In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT 2011), pp.804 813 (2011). 9) Imamura, K., Saito, K. and Izumi, T.: Discriminative Approach to Predicatec 2011 Information Processing Society of Japan

Argument Structure Analysis with Zero-Anaphora Resolution, Proceedings of ACL- IJCNLP, Short Papers, pp.85 88 (2009). 10) Joachims, T.: Optimizing Search Engines Using Clickthrough Data, Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), pp. 133 142 (2002). 11) Karamanis, N., Poesio, M., Mellish, C. and Oberlander, J.: Evaluating centeringbased metrics of coherence using a reliably annotated corpus, In Proceedings of ACL 2004, pp.391 398 (2004). 12) Lapata, M.: Probabilistic Text Structuring: Experiments with Sentence Ordering, In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp.545 552 (2003). 13) Lin, Z., Ng, H.T. and Kan, M.-Y.: Automatically Evaluating Text Coherence Using Discourse Relations, Proceeding of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language TEchnologies (ACL-HLT), pp. 997 1006 (2011). 14) Mann, W.C. and Thompson, S.A.: Rhetorical Structure Theory: Toward a functional theory of text organization, Text, Vol.8, No.3, pp.243 281 (1988). 15) Ng, V. and Cardie, C.: Improving Machine Learning Approaches to Coreference Resolution, In Proceedings of the 40th ACL, pp.104 111 (2002). 16) Okazaki, N., Matsuo, Y. and Ishizuka, M.: Improving Chronological Sentence Ordering by Precedence Relation, In Proceedings of Coling 2004, pp.750 756 (2004). 17) Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A. and Webber, B.: The Penn Discourse Treebank 2.0, In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008) (2008). 18) Quinlan, J. R.: C4.5: Programs for Machine Learning, The Morgan Kaufmann Series in Machine Learning, Morgan Kaufmann (1993). 19) Taira, H., Fujita, S. and Nagata, M.: Predicate Argument Structure Analysis Using Transformation Based Learning, Proceedings of the ACL 2010 Conference Short Papers, pp.162 167 (2010). 20) Vapnik, V. N.: Statistical Learning Theory, Adaptive and Learning Systems for Signal Processing Communications, and control, John Wiley & Sons (1998). 21) entity grid Vol.17, No.1, pp.161 182 (2010). 22) Vol.45, No.3, pp.906 918 (2004). 23) : NAIST Vol.17, No.2, pp.25 50 (2010). 8