A Hierarchical Bayesian Model for Unsupervised Induction of Script Knowledge

Similar documents
A Hierarchical Bayesian Model for Unsupervised Learning of Script Knowledge

Latent Dirichlet Allocation Introduction/Overview

Integrating Order Information and Event Relation for Script Event Prediction

Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Unsupervised Rank Aggregation with Distance-Based Models

Latent Dirichlet Allocation (LDA)

Topic Segmentation with An Ordering-Based Topic Model

Acquiring Strongly-related Events using Predicate-argument Co-occurring Statistics and Caseframe

Latent Variable Models Probabilistic Models in the Study of Language Day 4

Gentle Introduction to Infinite Gaussian Mixture Modeling

Learning Features from Co-occurrences: A Theoretical Analysis

Hierarchical Bayesian Nonparametrics

Joint Emotion Analysis via Multi-task Gaussian Processes

Which Step Do I Take First? Troubleshooting with Bayesian Models

A Bayesian Model of Diachronic Meaning Change

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

An Introduction to Bayesian Machine Learning

Measuring Topic Quality in Latent Dirichlet Allocation

Collaborative topic models: motivations cont

Classification, Linear Models, Naïve Bayes

Generative Clustering, Topic Modeling, & Bayesian Inference

Study Notes on the Latent Dirichlet Allocation

Topics in Natural Language Processing

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

CSCI 5822 Probabilistic Model of Human and Machine Learning. Mike Mozer University of Colorado

Structure Learning in Sequential Data

Knowledge Extraction from DBNs for Images

Tests for Two Coefficient Alphas

Global Models of Document Structure Using Latent Permutations

Advanced Machine Learning

Bayesian Nonparametrics for Speech and Signal Processing

Intelligent Systems I

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Text mining and natural language analysis. Jefrey Lijffijt

Nonparametric Bayesian Methods (Gaussian Processes)

Evaluation Methods for Topic Models

Online Bayesian Passive-Agressive Learning

Learning Scripts as Hidden Markov Models

Unsupervised Learning with Permuted Data

Language Information Processing, Advanced. Topic Models

Distance dependent Chinese restaurant processes

A Continuous-Time Model of Topic Co-occurrence Trends

Topic Models and Applications to Short Documents

Be able to define the following terms and answer basic questions about them:

Lecture 6: Graphical Models: Learning

Multiple Aspect Ranking Using the Good Grief Algorithm. Benjamin Snyder and Regina Barzilay MIT

Document and Topic Models: plsa and LDA

CSC 2541: Bayesian Methods for Machine Learning

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Machine Learning Summer School

Non-Parametric Bayes

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Introduction to Probabilistic Machine Learning

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process

Content-based Recommendation

The supervised hierarchical Dirichlet process

Probability and Information Theory. Sargur N. Srihari

Latent Dirichlet Allocation Based Multi-Document Summarization

arxiv: v2 [cs.cl] 1 Jan 2019

Deep Learning for NLP

Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

Recent Advances in Bayesian Inference Techniques

Non-parametric Clustering with Dirichlet Processes

Introduction to Machine Learning Midterm Exam

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Alexander Klippel and Chris Weaver. GeoVISTA Center, Department of Geography The Pennsylvania State University, PA, USA

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)

Machine Learning, Fall 2009: Midterm

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

CS 188: Artificial Intelligence Fall 2011

Topic Modelling and Latent Dirichlet Allocation

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Bayesian Mixtures of Bernoulli Distributions

Uncovering the Latent Structures of Crowd Labeling

Directed and Undirected Graphical Models

Applying hlda to Practical Topic Modeling

Intelligent Systems (AI-2)

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

CSE 473: Artificial Intelligence Autumn Topics

STAT 518 Intro Student Presentation

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Lecture 13 : Variational Inference: Mean Field Approximation

Pattern Recognition and Machine Learning

A Brief Overview of Nonparametric Bayesian Models

CSC321 Lecture 18: Learning Probabilistic Models

Intelligent Systems (AI-2)

Bayesian Analysis for Natural Language Processing Lecture 2

CLRG Biocreative V

Classification & Information Theory Lecture #8

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

arxiv: v1 [stat.ml] 5 Dec 2016

8: Hidden Markov Models

EMPHASIZING MODULARITY IN MACHINE LEARNING PROGRAMS. Praveen Narayanan IIS talk series, Feb

Chapter 16. Structured Probabilistic Models for Deep Learning

Machine Learning Techniques for Computer Vision

Probabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov

Web-Mining Agents. Multi-Relational Latent Semantic Analysis. Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme

Transcription:

A Hierarchical Bayesian Model for Unsupervised Induction of Script Knowledge Lea Frermann Ivan Titov Manfred Pinkal April, 28th 2014 1 / 22

Contents 1 Introduction 2 Technical Background 3 The Script Model 4 Evaluation & Results 5 Conclusion 1 / 22

Scripts [...] A script is a predetermined, stereotyped sequence of actions that define a well-known situation. 1 1 Schank and Abelson (1975) 1 / 22

Scripts [...] A script is a predetermined, stereotyped sequence of actions that define a well-known situation. 1 Example situation: Eating in a Restaurant Look at menu Order your food Wait for your food Eat food Pay Event Sequence Description (ESD): explicit instantiation of a script 1 Schank and Abelson (1975) 1 / 22

Scripts [...] A script is a predetermined, stereotyped sequence of actions that define a well-known situation. 1 Example situation: Eating in a Restaurant Look at menu Order your food Wait for your food Eat food Pay a way of providing AI/NLP systems with world knowledge coherence estimation summarization 1 Schank and Abelson (1975) 1 / 22

Script Characteristics ESD 1 ESD 2 Look at menu Check the menu Order your food Order the meal Wait for your food Wait for meal Talk to friends Eat food Have meal Pay Pay the bill learn sequential ordering constraints Learn event types (paraphrases) Learn participant types (paraphrases) 2 / 22

Script Characteristics ESD 1 ESD 2 Look at menu Check the menu Order your food Order the meal Wait for your food Wait for meal Talk to friends Eat food Have meal Pay Pay the bill learn sequential ordering constraints learn event types (paraphrases) Learn event types (paraphrases) 2 / 22

Script Characteristics ESD 1 ESD 2 Look at menu Check the menu Order your food Order the meal Wait for your food Wait for meal Talk to friends Eat food Have meal Pay Pay the bill learn sequential ordering constraints learn event types (paraphrases) model participant types as latent variables 2 / 22

Event Sequence Descriptions (ESDs) from non-expert annotators (web experiments) noisy (grammar/spelling) variable number and detail of event descriptions few ESDs per scenario get menucard search for items order items eat items pay the bill quit restaurant enter the front door let hostess seat you tell waitress your drink order tell waitress your food order wait for food eat and drink get check from waitress give waitress credit card take charge slip from waitress sign slip and add in a tip leave slip on table put up credit card exit the restaurant 3 / 22

Unsupervised Learning of Scripts from Natural Text Chambers and Jurafsky (2008) infer script-like templates from news text ( Narrative Chains ) learn event sets and ordering information in a 3-step process identification of relevant events temporal classification clustering Chambers and Jurafsky (2009) learn events and participants jointly (but no ordering) script information often left implicit in natural text (world knowledge) 4 / 22

Unsupervised Learning of Scripts from ESDs Regneri et al 2010 collect sets of event sequence description for various scripts learn event types and orderings align events across descriptions based on semantic similarity compute graph representation using Multiple Sequence Alignment (MSA) Regneri et al. (2011) learn participant types based on those event graphs pipeline architecture MSA-based graphs cannot encode some script characteristics (e.g. event optionality) 5 / 22

The Proposed Model (I) A Bayesian Script Model joint learning of event types and ordering constraints from ESDs generalized Mallows Model for modeling ordering constraints Bayesian Models of Ordering in NLP document-level ordering constraints in structured text (Wikipedia Articles) (Chen et al., 2009) integrate a GMM into a standard topic model 6 / 22

The Proposed Model (II) Informed Prior Knowledge from WordNet alleviates the problem of limited training data encode correlations between words based on WordNet similarity in the language model priors Encoding Correlations through a logistic normal distribution The correlated topic model Blei and Lafferty (2006) 6 / 22

Contents 1 Introduction 2 Technical Background 3 The Script Model 4 Evaluation & Results 5 Conclusion 6 / 22

The Mallows Model (Mallows, 1957) A probability distribution over permutations of items distance measure between two permutations π 1 and π 2 : d(π 1, π 2 ) parameters: σ, the canonical ordering (identity ordering [1,2,3,...,n]) ρ > 0, a dispersion parameter ( distance penalty) The probability of an observed permutation π P(π; ρ, σ) = e ρ d(π,σ) ψ(ρ) 7 / 22

The Generalized Mallows Model (Fligner and Verducci, 1986) Generalization to item-specific dispersion parameters ρ = [ρ 1, ρ 2,...] for items in π = [π 1, π 2,...] GMM(π; ρ, σ) e i ρ i d(π i,σ i ) i e ρ i d(π i,σ i ) Relation to our model items (π i ) ˆ= event types model event type-specific temporal flexibility 8 / 22

Contents 1 Introduction 2 Technical Background 3 The Script Model 4 Evaluation & Results 5 Conclusion 8 / 22

Generative Story I: Ordering Generation for esd d do Cooking Pasta π 1 get 3 boil 4 put 7 wait 2 grate 5 add 6 drain 9 / 22

Generative Story I: Ordering Generation draw an event type permutation π for esd d do π GMM(ρ, ν) Cooking Pasta π 1 get 3 boil 4 put 7 wait 2 grate 5 add 6 drain Conjugate prior (GMM 0) 10 / 22

Generative Story II: Event Type Generation realize event type e with success probability θ e for esd d do π GMM(ρ, ν) t : t e Binomial(θ e ) Cooking Pasta π 1 get 3 boil 4 put 7 wait 2 grate 5 add 6 drain Conjugate prior (Beta) 11 / 22

Generative Story II: Event Type Generation realize event type e with success probability θ e for esd d do π GMM(ρ, ν) t : t e Binomial(θ e ) Cooking Pasta t 1 get 3 boil 4 put 2 grate 6 drain Conjugate prior (Beta) 11 / 22

Generative Story III: Participant Type Generation for each event e t, realize participant type p with success probability ϕ e p for esd d do π GMM(ρ, ν) t : t e Binomial(θ e ) for event e t do u e : u p e Binomial(ϕ e p) Cooking Pasta t 1 get 3 boil 4 put 2 grate 6 drain u get u put... u drain pasta water cheese pot salt stove strainer pasta water cheese pot salt stove strainer pasta water cheese pot salt stove strainer 12 / 22

Generative Story III: Participant Type Generation for each event e t, realize participant type p with success probability ϕ e p for esd d do π GMM(ρ, ν) t : t e Binomial(θ e ) for event e t do u e : u p e Binomial(ϕ e p) Cooking Pasta t u 1 get pot 3 boil water 4 put pasta 2 grate cheese 6 drain water, pot u get u put... u drain pasta water cheese pot salt stove strainer pasta water cheese pot salt stove strainer pasta water cheese pot salt stove strainer 12 / 22

Generative Story IV: Lexical Realization draw a lexical realizations for each realized event and participant for esd d do π GMM(ρ, ν) t : t e Binomial(θ e ) for event e t u e : u p e Binomial(ϕ e p) for e t do w e Mult(ϑ e) for p u e do w p Mult(ϖ p) Cooking Pasta t u 1 get pot 3 boil water 4 put pasta 2 grate cheese 6 drain water, pot 13 / 22

Generative Story IV: Lexical Realization draw a lexical realizations for each realized event and participant for esd d do π GMM(ρ, ν) t : t e Binomial(θ e ) for event e t u e : u p e Binomial(ϕ e p) for e t do w e Mult(ϑ e) for p u e do w p Mult(ϖ p) Cooking Pasta fetch saucepan boil water add noodles grate cheese drain water from pot 13 / 22

Generative Story IV: Lexical Realization draw a lexical realizations for each realized event and participant for esd d do π GMM(ρ, ν) t : t e Binomial(θ e ) for event e t u e : u p e Binomial(ϕ e p) for e t do w e Mult(ϑ e) for p u e do w p Mult(ϖ p) Cooking Pasta fetch saucepan boil water add noodles grate cheese drain water from pot Informed asymmetric Dirichlet priors 13 / 22

Informed Asymmetric Dirichlet Priors Tie together the prior values ( pseudo counts ) of semantically related words Multivariate Normal N (0, Σ)

Informed Asymmetric Dirichlet Priors Tie together the prior values ( pseudo counts ) of semantically related words Multivariate Normal N (0, Σ) saucepan pot cheese noodles pasta saucepan 7 6 1 0 1 pot 6 12 0 0 1 cheese 1 0 13 3 3 noodles 0 0 3 7 5 pasta 1 1 3 5 6...

Informed Asymmetric Dirichlet Priors Tie together the prior values ( pseudo counts ) of semantically related words Multivariate Normal N (0, Σ) semantic similarity: shared WordNet synsets saucepan pot cheese noodles pasta saucepan 7 6 1 0 1 pot 6 12 0 0 1 cheese 1 0 13 3 3 noodles 0 0 3 7 5 pasta 1 1 3 5 6... δ N (0, Σ) φ Dirichlet(δ) w Multinomial(φ) 14 / 22

Inference Collapsed Gibbs Sampling for approximate inference Slice Sampling for continuous distributions GMM and N (0, Σ) Parameters to be estimated (after collapsing) latent ESD labels z = {t, u, π} GMM dispersion parameter ρ language model parameters δ (partic), γ (event) 15 / 22

Contents 1 Introduction 2 Technical Background 3 The Script Model 4 Evaluation & Results 5 Conclusion 15 / 22

Data I Collection of ESDs (Regneri et al., 2010) sets of explicit descriptions of event sequences created from non-experts via web experiments Scenario Name ESDs Avg events Answer the telephone 55 4.47 Buy from vending machine 32 4.53 Make scrambled eggs 20 10.3 Eat in fast food restaurant 15 8.93 Take a shower 21 11.29......... test set (10 scenarios) separate development set (5 scenarios) 16 / 22

Evaluation Setup (Regneri et al., 2010) Binary event paraphrase classification [get pot, fetch saucepan] [add pasta, add water] Binary follow-up classification [fetch pot, put pasta into pot] [put pasta into pot, fetch pot] true false true false Metric precision = true system true gold true system recall = true system true gold true gold F = 2 precision recall precision + recall 17 / 22

The Event Paraphrase Task 1 0.9 R10 BS 0.8 0.7 Score 0.6 0.5 0.4 0.3 0.2 Precision Recall F1 18 / 22

The Event Ordering Task 1 0.9 R10 BS 0.8 0.7 Score 0.6 0.5 0.4 0.3 0.2 Precision Recall F1 19 / 22

Influence of Model Components F1 F1 0.9 1 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.9 1 0.8 0.7 0.6 0.5 0.4 0.3 0.2 Full -GMM Event Paraphrase Task Return Food Vending Shower Microwave Event Ordering Task Return Food Vending Shower Microwave 20 / 22

Influence of Model Components F1 F1 0.9 1 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.9 1 0.8 0.7 0.6 0.5 0.4 0.3 0.2 Full -inf.prior Return Food (15) Event Paraphrase Task Vending (32) Shower (21) Event Ordering Task Microwave (59) Return Food Vending Shower Microwave 20 / 22

Induced Clustering Cooking food in the microwave {get} {open,take} {put,place} {close} {set,select,enter,turn} {start} {wait} {remove,take,open} {push,press,turn} 21 / 22

Contents 1 Introduction 2 Technical Background 3 The Script Model 4 Evaluation & Results 5 Conclusion 21 / 22

Conclusion A hierarchical Bayesian Script model joint model of event types and ordering constraints competitive performance with a recent pipeline-based model inclusion of word similarity knowledge as correlations in the language model priors explicitly target apparent script characteristics (event optionality, event type-specific temporal flexibility) GMM as an effective model for event ordering constraints 22 / 22

Conclusion A hierarchical Bayesian Script model joint model of event types and ordering constraints competitive performance with a recent pipeline-based model inclusion of word similarity knowledge as correlations in the language model priors explicitly target apparent script characteristics (event optionality, event type-specific temporal flexibility) GMM as an effective model for event ordering constraints Thank you! 22 / 22

References I A. Barr and E.A. Feigenbaum. 1986. The handbook of artificial intelligence. 1 (1981). The Handbook of Artificial Intelligence. Addison-Wesley. D. Blei and J. Lafferty. 2006. Correlated topic models. In Y. Weiss, B. Schölkopf, and J. Platt, editors, Advances in Neural Information Processing Systems 18, pages 147 154. MIT Press, Cambridge, MA. N. Chambers and D. Jurafsky. 2008. Unsupervised learning of narrative event chains. In Proceedings of ACL-08: HLT, pages 789 797. Association for Computational Linguistics, Columbus, Ohio. 20 / 22

References II N. Chambers and D. Jurafsky. 2009. Unsupervised learning of narrative schemas and their participants. In Proceedings of ACL-09 and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 602 610. Association for Computational Linguistics, Stroudsburg, PA, USA. H. Chen, S. R. K. Branavan, R. Barzilay, and D. R. Karger. 2009. Content modeling using latent permutations. J. Artif. Int. Res., 36(1):129 163. M. Fligner and J. Verducci. 1986. Distance based ranking models. Journal of the Royal Statistical Society, Series B, 48:359 369. C. L. Mallows. 1957. Non-null ranking models. Biometrika, 44:114 130. 21 / 22

References III M. Regneri, A. Koller, and M. Pinkal. 2010. Learning script knowledge with web experiments. In Proceedings of ACL 2010. Association for Computational Linguistics, Uppsala, Sweden. M. Regneri, A. Koller, J. Ruppenhofer, and M. Pinkal. 2011. Learning script participants from unlabeled data. In Proceedings of RANLP 2011. Hissar, Bulgaria. R. C. Schank and R. P. Abelson. 1975. Scripts, plans and knowledge. In PN Johnson-Laird and PC Wason, editors, Thinking: Readings in Cognitive Science, Proceedings of the Fourth International Joint Conference on Artificial Intelligence, pages 151 157. 22 / 22