Exploring Content Models for Multi-Document Summarization

Similar documents
U.S. Plant Patents and the Imazio Decision

The Crucible. Background

The Crucible. By Arthur Miller

EDITORIAL NOTE: PERSONAL/COMMERCIAL DETAILS ONLY HAVE BEEN DELETED. IN THE DISTRICT COURT AT AUCKLAND CIV [2016] NZDC 12626

MOTION REGARDING PARENTING TIME FOC 65

1 Evaluation of SMT systems: BLEU

10-1 Sequences as Functions. Determine whether each sequence is arithmetic. Write yes or no. 1. 8, 2, 12, 22

The League of Nations

Delta RV Eighth Grade Social Studies Revised-2010

Rubrics and Exemplars: Grade 8 Responses to Biography & Literature

Presents Clever Alice From "The Fairy Book" by Miss Mulock - 1 -

E!!! E

The units on the average rate of change in this situation are. change, and we would expect the graph to be. ab where a 0 and b 0.

The Great Gatsby. SAT Prep Test Examples Jestice March 2017

GEOGRAPHIC INFORMATION SYSTEM ANALYST I GEOGRAPHIC INFORMATION SYSTEM ANALYST II

BIG IDEAS. Area of Learning: SOCIAL STUDIES Urban Studies Grade 12. Learning Standards. Curricular Competencies

Beliefs Mini-Story Text

1 Propositional Logic

T L S H. Doug Johnson

SPIRITUAL GIFTS. ( ) ( ) 1. Would you describe yourself as an effective public speaker?

{ Jurafsky & Martin Ch. 6:! 6.6 incl.

I. How we believe II. Pattern recognition III. Polygraphs/memory IV. Astrology. 6-Nov-14 Phys 192 Lecture 9 1

Lesson 4: Efficiently Adding Integers and Other Rational Numbers

Case: 2:17-cv ALM-KAJ Doc #: 1 Filed: 10/26/17 Page: 1 of 10 PAGEID #: 1 UNITED STATES DISTRICT COURT SOUTHERN DISTRICT OF OHIO EASTERN DIVISION

Language Models. CS6200: Information Retrieval. Slides by: Jesse Anderton

Internet Appendix for Sentiment During Recessions

The Role of Pre-trial Settlement in International Trade Disputes (1)

Miss Johnson s Plant Experiment

Evaluating Text Summarization Systems with a Fair Baseline from Multiple Reference Summaries

PENGUIN READERS. Five Famous Fairy Tales

Latent Variable Models in NLP

Trick or Treat UNIT 19 FICTION. #3893 Nonfiction & Fiction Paired Texts 100 Teacher Created Resources

Punctuated Equilibrium and Institutional Friction in Comparative Perspective

REVIEW OF THE EMERGING ISSUES TASK FORCE

Grade 5 Lesson 14 Use the biography by Walter Dean Myers entitled James Forten on pages in your student reader to answer the questions below.

Generating Settlements. Through Lateral Thinking Techniques

A fast and simple algorithm for training neural probabilistic language models

Two-Variable Regression Model: The Problem of Estimation

Chapter 19 Classwork Famous Scientist Biography Isaac...

Evaluation. Brian Thompson slides by Philipp Koehn. 25 September 2018

Which pages are those of job ads?

Saturday Science Lesson Plan Fall 2008

2-5 Dividing Integers

coven Emily Lisa Benjamin High Noon Books Novato, CA

Little Saddleslut (Greek version of Cinderella)

Student Activities. The Murder at the Vicarage. English Readers. Part 1 (Chapters 1 8) 4 Vocabulary Find seven words connected to the church.

THE FEDERAL DEATH PENALTY SYSTEM

Q1. Amina is making designs with two different shapes.

Notes on Mechanism Designy

Math 6 Mid-Winter Recess

IN THE SUPREME COURT OF BELIZE, A.D KIRK HALL BOWEN & BOWEN LTD.

Exam III Review Math-132 (Sections 7.1, 7.2, 7.3, 7.4, 7.5, 7.6, 8.1, 8.2, 8.3)

Read the selection and choose the best answer to each question. Then fill in the answer on your answer document. Lights Out

a n d n d n i n n d a o n e o t f o b g b f h n a f o r b c d e f g c a n h i j b i g y i t n d p n a p p s t e v i s w o m e b

Organic Chemistry Syllabus

Physics 105 Spring 2017

CRF for human beings

An extended summary of the NCGR/Berkeley Double-Blind Test of Astrology undertaken by Shawn Carlson and published in 1985

Homework 4, Part B: Structured perceptron

What is Text mining? To discover the useful patterns/contents from the large amount of data that can be structured or unstructured.

HUMAN GEOGRAPHY AND CONTEMPORARY SOCIETY Global Patterns and Processes Spring 2009

Grade Common Core Math

Kesten C. Green Univ. of South Australia, Adelaide. March 23, Washington, D.C. 12 th International Conference on Climate Change

4. What does it mean once the letter "d" is formed when you draw a line on the moon?

The Shunammite Woman s Land Restored 2 Kings 8:1-6

Categorization ANLP Lecture 10 Text Categorization with Naive Bayes

ANLP Lecture 10 Text Categorization with Naive Bayes

A Discriminative Model for Semantics-to-String Translation

BIOGRAPHY OF MICHAEL FARADAY PART - 1. By SIDDHANT AGNIHOTRI B.Sc (Silver Medalist) M.Sc (Applied Physics) Facebook: sid_educationconnect

Amy's Halloween Secret

Calculator Exam 2009 University of Houston Math Contest. Name: School: There is no penalty for guessing.

Language Modeling. Michael Collins, Columbia University

TIPS PLANNING FORM FOR INFANTS AND TODDLERS

Physics 103 Astronomy Syllabus and Schedule Fall 2016

A Probabilistic Model for Canonicalizing Named Entity Mentions. Dani Yogatama Yanchuan Sim Noah A. Smith

OREGON POPULATION FORECAST PROGRAM

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

Applied Natural Language Processing

GOVERNMENT MAPPING WORKSHOP RECOVER Edmonton s Urban Wellness Plan Mapping Workshop December 4, 2017

Houston County School System Mathematics

Combining Difference-in-difference and Matching for Panel Data Analysis

Do you know the man that dropped out of school and still was one of the greatest physicists of the 20th century? That is Albert Einstein.

The Making Of A Great Contraction. With A Liquidity Trap And A Jobless Recovery. Columbia University

Chapter 9: Apportionment

4. The table shows the number of toll booths driven through compared to the cost of using a Toll Tag.

Comparative Summarization via Latent Dirichlet Allocation

Explorers 4 Teacher s notes for the Comprehension Test: The Snow Queen

Do not copy, post, or distribute

Appendix A: Topline questionnaire

Explorers 3 Teacher s notes for the Comprehension Test: The Magic Flute

Princes of Prophecy Alex s Program. by L. Ann Marie and some help from SOS

Name: Period: - Unit 2: Solving Equations. o One and two step equations: o Multi-step equations : o Multi-Step equations with fractions: /20

AP Human Geography. Course Materials

Writing: Looking Out the Window

Annual PHA Plan. Effective July 1, 2017 June 30, Decatur Housing Authority

Statistical Methods for NLP

Houston County School System Mathematics

The Source of True Wisdom. Proverbs 1. Memory Verse: Proverbs 2:6


Identify the scale of measurement most appropriate for each of the following variables. (Use A = nominal, B = ordinal, C = interval, D = ratio.

Transcription:

Exploring Content Models for Multi-Document Summarization Aria Haghighi UC Berkeley Lucy Vanderwende MSR Redmond

Summarization

Summarization D

Summarization D S

Sentence Extraction D S

Sentence Extraction D S

Summarization

Summarization Representation P C ( ) Sotomayor: 0.16 court: 0.13 supreme: 0.13 nominee: 0.11...

Summarization Representation Extraction P C ( ) Sotomayor: 0.16 court: 0.13 supreme: 0.13 nominee: 0.11...

SumBasic: Representation [Nenkova & Vanderwende] 2006

SumBasic: Representation Simple Unigram MLE [Nenkova & Vanderwende] 2006

SumBasic: Representation Simple Unigram MLE P C (w) = ˆP D (w) Sotomayor: 0.15 Washington: 0.13 supreme: 0.12 Obama: 0.10... [Nenkova & Vanderwende] 2006

SumBasic: Extraction Score(S) = 1 S w S P C(w)

SumBasic: Extraction Sentence Score Score(S) = 1 S w S P C(w)

SumBasic: Extraction Sentence Score Score(S) = 1 S w S P C(w) Obama announced the nomination of Sonia Sotomayor

SumBasic: Extraction Sentence Score Score(S) = 1 S w S P C(w) Obama announced the nomination of Sonia Sotomayor 0.12 0.01 0.05 0.04 0.15

SumBasic: Extraction Sentence Score Score(S) = 1 S w S P C(w) Score: 0.074 Obama announced the nomination of Sonia Sotomayor 0.12 0.01 0.05 0.04 0.15

SumBasic: Extraction Score(S) = 1 S w S P C(w) S = {}

SumBasic: Extraction Score(S) = 1 S w S P C(w) S = {} while words(s) <L:

SumBasic: Extraction Score(S) = 1 S w S P C(w) S = {} while words(s) <L: S = max S/ S Score(S)

SumBasic: Extraction Score(S) = 1 S w S P C(w) S = {} while words(s) <L: S = max S/ S Score(S) S = S S

SumBasic: Extraction Score(S) = 1 S w S P C(w) S = {} while words(s) <L: S = max S/ S Score(S) S = S S P C (w) =P C (w) 2, for w S

Experimental Results

Experimental Results DUC 2006

Experimental Results DUC 2006 50 document sets, 25 docs each

Experimental Results DUC 2006 50 document sets, 25 docs each Max 250 tokens

Experimental Results DUC 2006 50 document sets, 25 docs each Max 250 tokens ROUGE-2

Experimental Results DUC 2006 50 document sets, 25 docs each Max 250 tokens ROUGE-2 Recall over bigrams w/o stop words against human summaries

Experimental Results DUC 2006 50 document sets, 25 docs each Max 250 tokens ROUGE-2 Recall over bigrams w/o stop words against human summaries Bad at summary quality, decent at content

SumBasic: Performance SumBasic 5.3 4 7 10

SumBasic: Extraction Issues

SumBasic: Extraction Issues What are we optimizing?

SumBasic: Extraction Issues What are we optimizing? Without word length, when to stop?

SumBasic: Extraction Issues What are we optimizing? Without word length, when to stop? Not Recall Oriented

SumBasic: Extraction Issues What are we optimizing? Without word length, when to stop? Not Recall Oriented No direct penalty for missing freq. words

KLSum: Extraction P C ( ) Sotomayor: 0.15 Washington: 0.13 supreme: 0.12

KLSum: Extraction P C ( ) Sotomayor: 0.15 Washington: 0.13 supreme: 0.12

KLSum: Extraction P C ( ) Sotomayor: 0.15 Washington: 0.13 supreme: 0.12 P S ( ) Sotomayor: 0.20 Obama: 0.14 Washington: 0.11

KLSum: Extraction P C ( ) Sotomayor: 0.15 Washington: 0.13 supreme: 0.12 P S ( ) Sotomayor: 0.20 Obama: 0.14 Washington: 0.11

KLSum: Extraction P C ( ) Sotomayor: 0.15 Washington: 0.13 supreme: 0.12 P S ( ) Sotomayor: 0.18 Washington: 0.11 supreme: 0.10

KLSum: Extraction P C ( ) S = min KL(P CP S ) S:words(S) L P S ( ) Sotomayor: 0.15 Washington: 0.13 supreme: 0.12 Sotomayor: 0.18 Washington: 0.11 supreme: 0.10 See Paper For Details

KLSum: Performance SumBasic 5.3 KLSum 6.0 4 7 10

Improving Representation

Improving Representation Flexible stop words

Improving Representation Flexible stop words e.g. stock in financial document sets

Improving Representation Flexible stop words e.g. stock in financial document sets No pref. for multi-document usage

Improving Representation Flexible stop words e.g. stock in financial document sets No pref. for multi-document usage Many docs indicate content importance

Improving Representation Flexible stop words e.g. stock in financial document sets No pref. for multi-document usage Many docs indicate content importance Prefer words in early sentences

Adding Topics Background Distribution

Adding Topics Background Distribution φ B the: 0.12 of: 0.09... washington: 0.04 policy: 0.03

Adding Topics Content Distribution

Adding Topics Content Distribution φ C Sotomayor: 0.16 supreme: 0.13 Obama: 12 court: 11 nominee: 10...

Adding Topics Document-Specific Distribution similar to [Daume & Marcu, 2006]

Adding Topics Document-Specific Distribution φ D Rush: 0.12 radical: 0.11 right: 0.09 court: 11 nominee: 10 similar to [Daume & Marcu, 2006]

Adding Topics Document-Specific Distribution φ D φ D Rush: 0.12 radical: 0.11 right: 0.09 court: 11 nominee: 10 Hutchinson: 0.14 past: 0.11 comments: 0.09 circuit: 11 statement: 10 similar to [Daume & Marcu, 2006]

Adding Topics Document-Specific Distribution φ D φ D Rush: 0.12 radical: 0.11 right: 0.09 court: 11 nominee: 10 Hutchinson: 0.14 past: 0.11 comments: 0.09 circuit: 11 statement: 10 φ D media: 0.12 believe: 0.11 rights: 0.09 cover: 0.11 fox: 0.10 similar to [Daume & Marcu, 2006]

Adding Topics Document-Specific Distribution φ D φ D Rush: 0.12 radical: 0.11 right: 0.09 court: 11 nominee: 10 Hutchinson: 0.14 past: 0.11 comments: 0.09 circuit: 11 statement: 10 φ D φ D media: 0.12 believe: 0.11 rights: 0.09 cover: 0.11 fox: 0.10 race: 0.13 Berkeley: 0.11 firemen: 0.09 panel: 11 decisions: 10 similar to [Daume & Marcu, 2006]

Adding Topics Document Level....

Adding Topics Document Level B D C. ψ t...

Adding Topics Document Level B D C. ψ t.. B D C. ψ t

Adding Topics Sentence B D C ψ t

Adding Topics Sentence B D C Word ψ t W

Adding Topics Sentence B D C Word Z ψ t W

Adding Topics Sentence B D C Word {B,D,C} Z ψ t W

Adding Topics Sentence B D C Word Z ψ t W

Adding Topics Sentence B D C Word Z ψ t W φ B φ D φ C

Adding Topics Representation Extraction φ C Sotomayor: 0.16 supreme: 0.13 Obama: 0.12 court: 0.11 nominee: 0.10...

Adding Topics Representation Extraction φ C Sotomayor: 0.16 supreme: 0.13 Obama: 0.12 court: 0.11 nominee: 0.10...

Performance SumBasic 5.3 KLSum 6.0 +Topic 6.3 4 7 10

Adding Bigrams Each sentence is a bag of bigrams

Adding Bigrams Each sentence is a bag of bigrams φ C Obama announced:0.09 Sonia Sotomayor: 0.08 Supreme Court: 0.05...

Performance SumBasic 5.3 KLSum 6.0 +Topic 6.3 +Bigram 7.8 4 7 10

Structured Content Models

Structured Content Models General Content Sotomayor: 0.16 supreme: 0.13 court: 0.12...

Structured Content Models General Content Sotomayor: 0.16 supreme: 0.13 court: 0.12... Specific Content Sub-Stories

Structured Content Models General Content Sotomayor: 0.16 supreme: 0.13 court: 0.12... Specific Content Sub-Stories born: 0.15 puerto: 0.13 mother: 0.12 father: 0.10... confirmation: 0.16 Republican: 0.11 senators: 0.08 Limbaugh: 0.07... race: 0.11 identity: 0.09 firefighters: 0.07 discrimination: 0.05...

Example Sub-Story Biography Sonia Sotomayor Born to Puerto Rican parents who moved to the Bronx in New York during World War Two.

Example Sub-Story Biography Sonia Sotomayor Born to Puerto Rican parents born: 0.15 puerto: 0.13 mother: 0.12 who moved to the Bronx in New York during World War Two. father: 0.10...

Example Sub-Story Confirmation Senate Republicans have combed over Sotomayor s record on the federal bench ahead of confirmation hearings.

Example Sub-Story Confirmation confirmation: 0.16 Republican: 0.11 senators: 0.08 Senate Republicans have combed over Sotomayor s record on the federal bench ahead of confirmation hearings. Limbaugh: 0.07...

Structured Content Models φ C0 Sotomayor: 0.16 supreme: 0.13 Obama: 0.12... φ C1 φ C2 φ C3 born: 0.15 puerto: 0.13 mother: 0.12 father: 0.10... confirmation: 0.16 Republican: 0.11 senators: 0.08 Limbaugh: 0.07... race: 0.11 identity: 0.09 firefighters: 0.07 discrimination: 0.05...

Structured Topics General or Specific?....

Structured Topics General or Specific? B D C.. ψ t..

Structured Topics General or Specific? B D C G S.... ψ t ψ G

Structured Topics General or Specific? B D C G S.... ψ t B D C ψ G ψ t

Structured Topics General or Specific? B D C G S.... ψ t B D C ψ G G S ψ t ψ G

Structured Topics What specific topic?

Structured Topics What specific topic? Z S

Structured Topics What specific topic? Z S {1,2,3}

Structured Topics What specific topic? Z S 1

Structured Topics What specific topic? Z S 1 Sticky Specific Topics P (Z S Z S )= σ (1 σ)θ Z S Z S o.w. if Z S = Z S

Structured Topics What specific topic? Z S 1 Sticky Specific Topics P (Z S Z S )= σ (1 σ)θ Z S Z S o.w. if Z S = Z S

Structured Topics G S Sentence Z S B D C Word ψ G Z ψ t φ B φ D φ C0 W φ C1 φ C2 φ C3

Structured Topics Representation Extraction φ C0 Sotomayor: 0.16 supreme: 0.13 Obama: 0.12 court: 0.11 nominee: 0.10...

Structured Topics Representation Extraction φ C0 Sotomayor: 0.16 supreme: 0.13 Obama: 0.12 court: 0.11 nominee: 0.10...

Performance SumBasic 5.3 +Topic 6.3 +Struct 6.4 +Bigram 8.3 4 7 10

HierSum: Manual Evaluation Pairwise Comparison 23 participants, 69 judgments Each participant sees reference summary and two model summaries Pythy: State-of-the-art discriminative system. Highest automatic score and highperforming on manual content evaluation

Manual Evaluation Question Pythy Ours Overall 20 49 Redundancy 21 48 Coherence 15 54 Focus 28 41

DUC 07 Results SumBasic 5.9 Pythy 8.7 Ours 9.3 4 7 10

Example General Summary Former House Speaker Newt Gingrich is asking a judge to force his estranged wife to turn over money he says she is hoarding. On Thursday, accusations of wrongdoing and the mining of dirt in the former U.S. House speaker's divorce case gave way to a secret settlement between Gingrich and his wife of 18 years, Marianne Gingrich. Gingrich filed for divorce July 29 amid allegations he is having an affair with 33-year-old congressional aide Callista Bisek. Newt Gingrich's attorney, Randy Evans, says Marianne Gingrich has refused to even discuss a settlement until she questions Bisek.

Example Topical Summary Callista Bisek Marianne Gingrich also wants to depose Callista Bisek, a congressional aide with whom the former U.S. House speaker has a relationship. And in motions filed last week in Superior Court for the District of Columbia, Callista Bisek, a clerk for the House Agriculture Committee, asked a judge to overturn a Georgia court order requiring her to answer questions about her relationship with Gingrich. Mayoue has said he intends to question Bisek about all aspects of her relationship with Newt Gingrich.

Example Topical Summary Gingrich Bio / Post-Speaker Life Gingrich is best known leading the Republican Party's takeover of the House in 1994. During that so-called Republican Revolution, Gingrich emphasized that "family values" should be a core pillar in American society. Since resigning as speaker and from the congressional seat he held for 20 years, Gingrich has been making a living giving speeches, sitting on corporate boards, consulting and appearing as a political analyst on Fox News. U.S. Rep. J.D. Hayworth (R-Ariz.) argued that Gingrich's new job as a political commentator for Fox News makes it inappropriate to include him in political gatherings. "Time marches on. He's gone on to other pursuits," Hayworth said.

Conclusion KL objective for sentence extraction summarization Topic models can yield state-of-the-art automatic and manual summary eval Also structured content models for topical summarization

Conclusion Thanks! Questions? If not, I have more examples...

Topical Summarization

Topical Summarization