Automatic Evaluation in Text Summarization

Size: px

Start display at page:

Download "Automatic Evaluation in Text Summarization"

Lorraine Kelley
5 years ago
Views:

1 1 Automatic Evaluation in Text Summarization Hidetsugu Nanba Tsutomu Hirao Hiroshima City University NTT NTT Communication Science Laboratories keywords: text summarization, summary evaluation Summary Recently, the evaluation of computer-produced summaries has become recognized as one of the problem areas that must be addressed in the field of automatic text summarization, and a number of automatic and manual methods have been proposed Manual methods evaluate summaries correctly, because humans evaluate them, but are costly On the other hand, automatic methods, which use evaluation tools or programs, are low cost, although these methods cannot evaluate summaries as accurately as manual methods To solve this problem, new automatic methods have been investigated, which can reduce the errors of traditional automatic methods by using several evaluation results obtained manually In this paper, we survey the state of the art of automatic evaluation methods for text summarization 1 Luhn[Luhn 58] () ( ) ( ) ( ) BLEU[Papineni 02] BLEU ROUGE[Lin 04b] 2005 ACL ()

2 a Luhn [Luhn 58] Luhn F 1990 ( ) [Mani 99, Nanba 00] (snipet) [Tombros 98] Text Summarization Challenge(TSC) 1 Tipster SUMMAC Document Understanding Conference(DUC) 2 ( ) Nenkova SCU(Summary Content Unit) () [Nenkova 07] SCU SCU R C () (N N ) 3 1 ROUGE ROUGE-N [Lin 03] 3 N e n-gram(c) ROUGE-N(C,R) = e n-gram(r) Count clip (e) Count(e) n-gram(c) N n-gram(r) N Count(e) N Count clip (e) N Count(e n-gram(c)) Count(e n-gram(r)) Lin N 1 4 (1) N= 1,2 3 DUC

3 3 ROUGE-N Lin ( ) ROUGE-S [Lin 04a, Lin 04b] ROUGE-S(C,R) = (1 + γ2 )P S (C,R)R S (C,R) R S (C,R) + γ 2 P S (C,R) (2) P S (C,R) R S (C,R) e bigram(c) skip-bigram(c) P S (C,R) = e bigram(c) skip-bigram(c) e bigram(c) skip-bigram(c) R S (C,R) = e bigram(r) skip-bigram(r) Count clip (e) Count(e) (3) Count clip (e) Count(e) (4) ROUGE-S ROUGE-SU [Lin 04a, Lin 04b] ROUGE-S ROUGE-SU ROUGE-N 3 2 ESK ROUGE [ 06] (Extended String Subsequence Kernek: ESK) ESK Lodhi String Subsequence Kernel (SSK)[Lodhi 02] Cancedda Word Sequence Kernel (WSK)[Cancedda 03] ESK d ESK 1 ROUGE-1 3 ROUGE-2 1 ROUGE-S 3 ROUGE-SU 4 ESK d=2 7 + λ + 2λ 2 + λ 3 + λ 4 + λ 5 + λ 6 + λ 9 [ 06] λ(0 λ 1) 1 λ 2 λ 2 ESK R C Becoming a cosmonaut:{spaceman} is my great dream:{dream} Becoming an astronaut:{spaceman} is my ambition:{dream} cosmonaut astronaut SPACEMAN ambition dream DREAM Becoming DREAM R a cosmonaut:spaceman is my great 5 C 4 λ 5 λ 4 ESK d=2 (C,R) C R C R 15 ESK d=2 (C,R) = λ 9 + λ 2 + λ 4 + λ 6 + λ λ 2 + λ λ = 7 + λ + 2λ 2 + λ 3 + λ 4 + λ 5 + λ 6 + λ 9 1 R C ROUGE-1 2 ROUGE-S SU ESK d=2 ROUGE ESK 1 ESK λ = 0 ROUGE [ 06] λ 05

4 a Basic Elements ROUGE ESK Basic Elements (BE) [Hovy 06] BE BE BE () <, 1, 2 > R C BE R : <,, > <,, > C : <,, > <,, > [ 06] 2 R 1 BE C 2 BE BE ROUGE ( ) [ 03] n C 1 C n C 1 C n H(C 1,R) H(C n,r) R C 1 C n H(C 1,R) H(C n,r) X C i 4 sim i n i=1 H(C i,r) sim i X sim i X [Yasuda 03] Nanba [Nanba 06] (Yasuda ) Yasuda Yasuda A B 2 A B A B A B A B 05 X X (1) S 1 S n (2) T 1 T m 4 3

5 5 (3) () (1) X T 1 T m (2) 3 ROUGE X (3) S 1 S n T 1 T m (4) ROUGE S 1 S n (5) 2 4 X S 1 S n (6) ( 3 ) 5 (7) ( ) 05 X Nanba TSC2 Yasuda ROUGE Yasuda ROUGE 4 3 Yasuda ( 1 ) [ 07] n H(C i,r) ROUGE-1 ROUGE- S ESK H(C i,r) H(C 1, R) = β 0 + β 1 ROUGE-1(C 1, R) + β 2 ROUGE-S(C 1, R) + β 3 ESK(C 1, R) + ϵ 1 H(C 2, R) = β 0 + β 1 ROUGE-1(C 2, R) + β 2 ROUGE-S(C 2, R) + β 3 ESK(C 2, R) + ϵ 2 (5) H(C n, R) = β 0 + β 1 ROUGE-1(C n, R) + β 2 ROUGE-S(C n, R) + β 3 ESK(C n, R) + ϵ n 2 1 ROUGE-1(C 1, R) ROUGE-S(C 1, R) ESK(C 1, R) 1 ROUGE-1(C 2, R) ROUGE-S(C 2, R) ESK(C 2, R) X= ROUGE-1(C n, R) ROUGE-S(C n, R) ESK(C n, R) (5) y = Xβ + ϵ ˆβ ˆβ = (X T X) 1 X T y (6) ˆβ ROUGE (6) 32 [ 07] E = {ROUGE-1,ROUGE-S,ESK} {ROUGE-1} {ROUGE-S} {ESK} {ROUGE-1, ROUGE-S} {ROUGE-1, ESK} {ROUGE-S, ESK} {ROUGE-1, ROUGE-S, ESK} M 1,M 2,,M 7 AIC [Akaike 73] AIC c (Corrected-AIC) [Sugiura 78] p H(C i,r) ϵ i β =(β 0,β 1,β 2,β 3 ) T 2p(p + 1) AIC c(m) = AIC(M) + y = (H(C 1,R),H(C 2,R),,H(C n,r)) T n p 1 ϵ = (ϵ 1,ϵ 2,,ϵ n ) X AIC(M) (7) 3 7 5,

6 a AIC c AIC c AIC c AIC c M 1 S M 2 S M 3 S M 4 S M 5 S M 6 S M 7 S AIC(M) = 2MLL + 2p, (8) MLL = n 2 {(1 + log(2πσ2 )) + log( R p n )} R p () AIC c AIC c best (= min i AIC c (M i )) AIC c (= AIC c (M i ) AIC c best ) τ M 1 M 7 AIC c AIC c 2 τ = 5 M 2,M 4,M 5,M 7 ( )/4 = 072 [ 06] TSC3 DUC2004 ROUGE-2 ESK ( [Hirao 04] ) 5 [Akaike 73] Akaike, H: Information Theory as an Extension of the Maximum Likelihood Principle, in Proc of the 2nd International Symposium on Information Theory, pp (1973) [Cancedda 03] Cancedda, N, Gaussier, E, Goutte, C, and Renders, J-M: Word Sequence Kernels, Journal of Machine Learning Research, Vol 3, No Feb, pp (2003) [Hirao 04] Hirao, T, Okumura, M, Fukushima, T, and Nanba, H: Text Summarization Challenge 3 Text Summarization Evaluation at NTCIR Workshop 4, in Working Notes of the 4th NTCIR Workhop Meeting, pp (2004) [Hovy 06] Hovy, E, Lin, C-Y, Zhou, L, and Fukumoto, J: Automated Summarization Evaluation with Basic Elements, in Proc of the 5th Conference on Language Resources and Evaluation (2006)

7 7 [Lin 03] Lin, C-Y and Hovy, E: Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics, in Proc of the 4th Meeting of the North American Chapter of the Association for Computational Linguistics and Human Language Technology, pp (2003) [Lin 04a] Lin, C-Y: ROUGE: A Package for Automatic Evaluation of Summaries, in Proc of Workshop on Text Summarization Branches Out, Post Conference Workshop of ACL 2004, pp (2004) [Lin 04b] Lin, C-Y and Och, F: Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics, in Proc of the 42nd Annual Meeting of the Association for Computational Linguistics, pp (2004) [Lodhi 02] Lodhi, H, Saunders, C, Shawe-Taylor, J, Cristianini, N, and Watkins, C: Text Classification Using String Kernel, Journal of Machine Learning Research, Vol 2, No Feb, pp (2002) [Luhn 58] Luhn, H: The Automatic Creation of Literature Abstracts, IBM Journal of Research and Development, Vol 2, No 2, pp (1958) [Mani 99] Mani, I, Gates, B, and Bloedom, E: Improving Summaries by Revising Them, in Proc of the 37th Annual Meeting of the Association for Computational Linguistics, pp (1999) [Nanba 00] Nanba, H and Okumura, M: Producing More Readable Extracts by Revising Them, in Proc of the 18th International Conference on Computational Linguistics, pp (2000) [Nanba 06] Nanba, H and Okumura, M: An Automatic Method for Summary Evaluation Using Multiple Evaluation Results by a Manual Method, in Proc of COLING/ACL 206 Main Conference Poster Sessions, pp (2006) [Nenkova 07] Nenkova, A, Passonneau, R, and McKeown, K: The Pyramid Method: Incorporating Human Content Selection Variation in Summarization Evaluation, ACM Transactions on Speech and Language Processing, Vol 4, Issue 2 (2007) [Papineni 02] Papineni, K, Roukos, S, Ward, T, and W- J, Z: BLEU: a Method for Automatic Evaluation of Machine Translation, in Proc of the 40th Annual Meeting of the Association for Computational Linguistics, pp (2002) [Sugiura 78] Sugiura, N: Further Analysis of the Data by Akaike s Information Criterion and the Finite Corrections, Theory and Methods, Vol A7, No 1, pp (1978) [Tombros 98] Tombros, A and Sanderson, M: Advantages of Query Biased Summarization in Information Retrieval, in Proc of the 21st Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval, pp 2 10 (1998) [Yasuda 03] Yasuda, K, Sugaya, F, Takezawa, T, Yamamoto, S and Yanagida, M: Applications of Automatic Evaluation Methods to Measuring a Capability of Speech Translation System, in Proc of the 10th Conference of the European Chapter of the Association for Computational Linguistics, pp (2003) [ 03], Arrigan, T,,, NL-158 pp (2003) [ 06],,,, Vol 47, No 6, pp (2006) [ 07],,,,, Vol 22, No 2B, pp (2007) [ 06],,,, Basic Element, FI-084, pp (2006) ( ) ACL ACM nanba@hiroshima-cuacjp nanba ( )NTT 2000 NTT () ACL

ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation

ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin and Franz Josef Och Information Sciences Institute University of Southern California 4676 Admiralty Way