Language to Logical Form with Neural Attention

Size: px
Start display at page:

Download "Language to Logical Form with Neural Attention"

Transcription

1 Language to Logical Form with Neural Attention Mirella Lapata and Li Dong Institute for Language, Cognition and Computation School of Informatics University of Edinburgh 1 / 32

2 Semantic Parsing Natural Language (NL) Parser Machine Executable Language 2 / 32

3 Semantic Parsing: Querying a Database NL What are the capitals of states bordering Texas? Parser DB λ x. capital(y, x) state(y) next_to(y, Texas) 3 / 32

4 Semantic Parsing: Instructing a Robot 1 NL at the chair, move forward three steps past the sofa LF Parser λ a.pre(a, x.chair(x)) move(a) len(a, 3) dir(a, forward) past(a, y.sofa(y)) : sentence! logical form 1 Courtesy: Artzi et al., / 32

5 Search Semantic Parsing: Answering Questions with Freebase titanic 1997 NL KB Who are the male actors in Titanic? Parser λ x. y. gender(male, x) cast(titanic, x, y) Titanic 1997 Drama film/romance 3h 30m 7.7/10 IMDb 88% Rotten Tomatoes James Cameron's "Titanic" is an epic, action-packed romance set against the ill-fated maiden voyage of the R.M.S. Titanic; the pride and joy of the White Star Line and, at the time, the larg More Initial release: November 18, 1997 (London) Director: James Cameron Featured song: My Heart Will Go On Cast Leonardo DiCaprio Jack Dawson Kate Winslet Rose DeWitt Bukater Billy Zane Caledon Hockley Gloria Stuart Rose DeWitt Bukater Kathy Bates Molly Brown Feedback 5 / 32

6 Supervised Approaches Induce parsers from sentences paired with logical forms Question Who are the male actors in Titanic? Logical Form λx. y. gender(male,x) cast(titanic,x,y) Parsing (Ge and Mooney, 2005; Lu et al., 2008; Zhao and Huang, 2015) Inductive logic programming (Zelle and Mooney, 1996; Tang and Mooney, 2000; Thomspon and Mooney, 2003) Machine translation (Wong and Mooney, 2006; Wong and Mooney, 2007; Andreas et al., 2013) CCG grammar induction (Zettlemoyer and Collins, 2005; Zettle- moyer and Collins, 2007; Kwiatkowski et al., 2010; Kwiatkowski et al., 2011) 6 / 32

7 Indirect Supervision Induce parsers from questions paired with side information Question Who are the male actors in Titanic? Answer {DICAPRIO, BILLYZANE... } Answers to questions (Clarke et al., 2010; Liang et al., 2013) System demonstrations (Chen and Mooney, 2011; Goldwasser and Roth, 2011; Artzi and Zettlemoyer, 2013) Distant supervision (Cai and Yates, 2013; Reddy et al., 2014) 7 / 32

8 Indirect Supervision Induce parsers from questions paired with side information Question Who are the male actors in Titanic? Logical Form Latent λx. y. gender(male,x) cast(titanic,x,y) Answer {DICAPRIO, BILLYZANE... } Answers to questions (Clarke et al., 2010; Liang et al., 2013) System demonstrations (Chen and Mooney, 2011; Goldwasser and Roth, 2011; Artzi and Zettlemoyer, 2013) Distant supervision (Cai and Yates, 2013; Reddy et al., 2014) 7 / 32

9 In General Developing semantic parsers requires linguistic expertise! high-quality lexicons based on underlying grammar formalism manually-built templates based on underlying grammar formalism grammar-based features pertaining to Logical Form and Natural Language Domain- and representation-specific! 8 / 32

10 Our Goal: All Purpose Semantic Parsing Question Who are the male actors in Titanic? Logical Form λx. y. gender(male,x) cast(titanic,x,y) 9 / 32

11 Our Goal: All Purpose Semantic Parsing Question Who are the male actors in Titanic? Logical Form λx. y. gender(male,x) cast(titanic,x,y) Learn from NL descriptions paired with meaning representations Use minimal domain (and grammar) knowledge Model is general and can be used across meaning representations 9 / 32

12 Problem Formulation Model maps natural language input q = x 1 x q to a logical form representation of its meaning a = y 1 y a. p(a q) = a t=1 p(y t y <t,q) where y <t = y 1 y t 1 Encoder encodes natural language input q into a vector representation Decoder generates y 1,,y a conditioned on the encoding vector. Vinyals et al., (2015a,b), Kalchbrenner and Blunsom (2013), Cho et al., (2014), Sutskever et al., (2014), Karpathy and Fei-Fei, (2015) 10 / 32

13 Encoder Decoder Framework Attention Layer what microsoft jobs do not require a bscs? answer(j,(compa ny(j,'microsoft'),j ob(j),not((req_de g(j,'bscs'))))) Input Utterance Sequence Encoder Sequence/Tree Decoder Logical Form 11 / 32

14 SEQ2SEQ Model encoder decoder h l t = ( h l ) t 1,hl 1 t h 0 t = W q e(x t ) h 0 t = W a e(y t 1 ) p(y t y <t,q) = softmax ( W o h L ) e(yt t ) 12 / 32

15 SEQ2TREE MODEL (lambda $0 e (and (>(departure_time $0) 1600:ti) (from $0 dallas:ci))) root lambda $0 e < n > and < n > < n > > < n > 1600:ti from $0 dallas:ci departure_time $0 13 / 32

16 SEQ2TREE MODEL (lambda $0 e (and (>(departure_time $0) 1600:ti) (from $0 dallas:ci))) root lambda $0 e < n > and < n > < n > > < n > 1600:ti from $0 dallas:ci departure_time $0 13 / 32

17 SEQ2TREE MODEL (lambda $0 e (and (>(departure_time $0) 1600:ti) (from $0 dallas:ci))) root lambda $0 e < n > and < n > < n > > < n > 1600:ti from $0 dallas:ci departure_time $0 13 / 32

18 SEQ2TREE MODEL (lambda $0 e (and (>(departure_time $0) 1600:ti) (from $0 dallas:ci))) root lambda $0 e < n > and < n > < n > > < n > 1600:ti from $0 dallas:ci departure_time $0 13 / 32

19 SEQ2TREE MODEL (lambda $0 e (and (>(departure_time $0) 1600:ti) (from $0 dallas:ci))) root lambda $0 e < n > and < n > < n > > < n > 1600:ti from $0 dallas:ci departure_time $0 13 / 32

20 SEQ2TREE MODEL (lambda $0 e (and (>(departure_time $0) 1600:ti) (from $0 dallas:ci))) root lambda $0 e < n > and < n > < n > > < n > 1600:ti from $0 dallas:ci departure_time $0 13 / 32

21 SEQ2TREE MODEL lambda $0 e <n> </s> and <n> <n> </s> > <n> 1600:ti </s> from $0 dallas:ci </s> <n> Nonterminal Start decoding Parent feeding Encoder unit Decoder unit departure _time $0 </s> 14 / 32

22 Parent Feeding Connections y 1 =A y 2 =B y 3 =<n> y 4 =</s> q <s> t 1 t 2 t 3 t 4 y 5 =C y 6 =</s> t 5 t 6 <(> A SEQ2TREE decoding example for the logical form A B (C) Hidden vector of the parent nonterminal is concatenated with the inputs and fed to the. p(a q) = p(y 1 y 2 y 3 y 4 q)p(y 5 y 6 y 3,q) 15 / 32

23 Attention Mechanism Attention Scores Scores computed by current h vector and all h vectors of encoder The encoder-side context vector c t is obtained in the form of a weighted sum, which is further used to predict y t Bahdanau et al., (2015), Luong et al., (2015b), Xu et al., (2015) 16 / 32

24 Training and Inference Our goal is to maximize the likelihood of the generated logical forms given natural language utterances as input. minimize (q,a) D logp(a q) where D is the set of all natural language-logical form training pairs At test time, we predict the logical form for an input utterance q by: â = argmax a p ( a q ) Iterating over all possible a s to obtain the optimal prediction is impractical Probability p(a q) decomposed so that we can use greedy/beam search. 17 / 32

25 Argument Identification Many LFs contain named entities and numbers aka rare words. Does not make sense to replace them with special unknown word symbol. Identify entities and numbers in input questions and replace them with their type names and unique IDs. jobs with a salary of job(ans), salary_greater_than(ans, 40000, year) Pre-processed examples are used as training data. After decoding, a post-processing step recovers all type i markers to corresponding logical constants. 18 / 32

26 Argument Identification Many LFs contain named entities and numbers aka rare words. Does not make sense to replace them with special unknown word symbol. Identify entities and numbers in input questions and replace them with their type names and unique IDs. jobs with a salary of num 0 job(ans), salary_greater_than(ans, num 0, year) Pre-processed examples are used as training data. After decoding, a post-processing step recovers all type i markers to corresponding logical constants. 18 / 32

27 Semantic Parsing Datasets Length JOBS 9.80 what microsoft jobs do not require a bscs? answer(company(j, microsoft ),job(j),not((req_deg(j, bscs )))) Length GEO 7.60 what is the population of the state with the largest area? (population:i (argmax $0 (state:t $0) (area:i $0))) Length ATIS dallas to san francisco leaving after 4 in the afternoon please (lambda $0 e (and (>(departure_time $0) 1600:ti) (from $0 dallas:ci) (to $0 san_francisco:ci))) JOBS: queries to job listings; 500 training, 140 test instances. GEO: queries US geography database; 680 training, 200 test instances. ATIS: queries to a flight booking system; 4,480 training, 480 dev, 450 test. 19 / 32

28 Semantic Parsing Datasets Length IFTTT 6.95 turn on heater when temperature drops below 58 degree TRIGGER: Weather - Current_temperature_drops_below - ((Temperature (58)) (Degrees_in (f))) ACTION: WeMo_Insight_Switch - Turn_on - ((Which_switch? (""))) if-this-then-that recipes from the IFTTT website (Quirk et al., 2015) Recipes are simple programs with exactly one trigger and one action 77,495 training, 5,171 development, and 4,294 test examples IFTTT programs are represented as abstract syntax trees; NL descriptions provided by users. 20 / 32

29 Evaluation Results on JOBS 90 88% 91% 87% 90% % 79% 85% 78% 84% % COCKTAIL01 PRECISE03 ZC05 DCS+L13 TISP15 SEQ2SEQ attention argument SEQ2TREE attention Accuracy (%) 21 / 32

30 Evaluation Results on GEO Accuracy (%) % 72% 75% 87% 82% 79% 86% 88% 89%89% 89% 88% 85% 73% 86% 77% % SCISSOR05 KRISP06 WASP06 λ -WASP06 LNLZ08 ZC05 ZC07 UBL10 FUBL11 KCAZ13 DCS+L13 TISP15 SEQ2SEQ attention argument SEQ2TREE attention 22 / 32

31 Evaluation Results on ATIS 85 85% 83% 84% 84% 84% 85% Accuracy (%) % 79% 74% 80% 71% ZC07 UBL10 FUBL11 GUSP-FULL13 GUSP++13 TISP15 SEQ2SEQ attention argument SEQ2TREE attention 23 / 32

32 Results on IFTTT 50 48% 49% 50% 50% 50% 50% 50% Accuracy (%) % 35 35% 35% retrieval phrasal sync classifier posclass SEQ2SEQ attention argument SEQ2TREE attention (a) Omit non-english 24 / 32

33 Results on IFTTT Accuracy (%) % 57% 58% 60% 60% 60% 60% 60% 40 40% 38% retrieval phrasal sync classifier posclass SEQ2SEQ attention argument SEQ2TREE attention (b) Omit non-english & unintelligible 25 / 32

34 Results on IFTTT 74% 74% 70 65% 67% Accuracy (%) % 46% 43% 40 retrieval phrasal sync classifier posclass 74% 73% 71% SEQ2SEQ attention argument SEQ2TREE attention (c) 3 turkers agree with gold 26 / 32

35 Example Output: SEQ2SEQ job ( ANS ), salary_greater_than ( ANS, num0, year ), \+ ( ( req_deg ( ANS, degid0 ) ) ) </s> </s> degid0 a requir not do that num0 pay job which <s> ( lambda $0 e( exists $1 ( and ( round_trip $1)( class_type $1 first:cl )( from $1 ci0 )( to $1 ci1 )( = ( fare $1 ) $0 )) ) </s> </s> ci1 to ci0 from trip round fare class first what <s> which jobs pay num0 that do not require a degid0 what s first class fare round trip from ci0 to ci1 27 / 32

36 Example Output: TREE2SEQ argmin $0 ( and ( flight $0 ) ( from $0 ci0 ) ( to $0 ci1 ) ( tomorrow $0 ) ) ( departure_time $0 ) </s> </s> tomorrow ci1 to ci0 from flight earliest the is what <s> argmax $0 ( and ( place:t $0 ) ( loc:t $0 co0 ) ) ( elevation:i $0 ) </s> </s> co0 the in elev highest the is what <s> what is the earliest flight from ci0 to ci1 what is the highest elevation in the co0 28 / 32

37 What Have we Learned? Encoder-decoder neural network model for mapping natural language descriptions to meaning representations TREE2SEQ and attention improves performance Model general and could transfer to other tasks/architectures Future work: learn a model from question-answer pairs without access to target logical forms. 29 / 32

38 Summary For more details see Dong and Lapata (ACL, 2016) Data and code from index.php?page=code 30 / 32

39 Training Details Arguments were identified with plain string matching Parameters updated with RMSProp algorithm (batch size set to 20). Greadients were clipped, dropout was applied, and parameters were randomly initialized from a uniform distribution. A two-layer was used for IFTTT. Dimensions of hidden vector and word embedding from {150, 200, 250}. Early stopping was used to determine the number of epochs. Input sentences were reversed (Sutskever et al., 2014). All experiments were conducted on a single GTX 980 GPU device. 31 / 32

40 Decoding for SEQ2TREE Algorithm 1 Decoding for SEQ2TREE Input: q: Natural language utterance Output: â: Decoding result 1: Push the encoding result to a queue 2: Q.init({hid : SeqEnc(q)}) 3: Decode until no more nonterminals 4: while (c Q.pop()) do 5: Call sequence decoder 6: c.child SeqDec(c.hid) 7: Push new nonterminals to queue 8: for n nonterminal in c.child do 9: Q.push({hid : HidVec(n)}) 10: â convert decoding tree to output sequence 32 / 32

Driving Semantic Parsing from the World s Response

Driving Semantic Parsing from the World s Response Driving Semantic Parsing from the World s Response James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign CoNLL 2010 Clarke, Goldwasser,

More information

Semantic Parsing with Combinatory Categorial Grammars

Semantic Parsing with Combinatory Categorial Grammars Semantic Parsing with Combinatory Categorial Grammars Yoav Artzi, Nicholas FitzGerald and Luke Zettlemoyer University of Washington ACL 2013 Tutorial Sofia, Bulgaria Learning Data Learning Algorithm CCG

More information

Top Down Intro to Neural Semantic Parsing. Ofir

Top Down Intro to Neural Semantic Parsing. Ofir Top Down Intro to Neural Semantic Parsing Ofir Plan 1. Intro to NNs and RNNs Short! 2. Model 1: seq2seq Meaty! 3. Model 2: seq2tree 4. How to improve these models with attention Important! 5. Results &

More information

Outline. Learning. Overview Details Example Lexicon learning Supervision signals

Outline. Learning. Overview Details Example Lexicon learning Supervision signals Outline Learning Overview Details Example Lexicon learning Supervision signals 0 Outline Learning Overview Details Example Lexicon learning Supervision signals 1 Supervision in syntactic parsing Input:

More information

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26 The Task (Logical Form) λx 0.state(x 0

More information

Learning Dependency-Based Compositional Semantics

Learning Dependency-Based Compositional Semantics Learning Dependency-Based Compositional Semantics Semantic Representations for Textual Inference Workshop Mar. 0, 0 Percy Liang Google/Stanford joint work with Michael Jordan and Dan Klein Motivating Problem:

More information

Cross-lingual Semantic Parsing

Cross-lingual Semantic Parsing Cross-lingual Semantic Parsing Part I: 11 Dimensions of Semantic Parsing Kilian Evang University of Düsseldorf 1 / 94 Abbreviations NL natural language e.g., English, Bulgarian NLU natural language utterance

More information

Bidirectional Attention for SQL Generation. Tong Guo 1 Huilin Gao 2. Beijing, China

Bidirectional Attention for SQL Generation. Tong Guo 1 Huilin Gao 2. Beijing, China Bidirectional Attention for SQL Generation Tong Guo 1 Huilin Gao 2 1 Lenovo AI Lab 2 China Electronic Technology Group Corporation Information Science Academy, Beijing, China arxiv:1801.00076v6 [cs.cl]

More information

Introduction to Semantic Parsing with CCG

Introduction to Semantic Parsing with CCG Introduction to Semantic Parsing with CCG Kilian Evang Heinrich-Heine-Universität Düsseldorf 2018-04-24 Table of contents 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG)

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Sequence to Sequence Models and Attention

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Sequence to Sequence Models and Attention TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Sequence to Sequence Models and Attention Encode-Decode Architectures for Machine Translation [Figure from Luong et al.] In Sutskever

More information

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018 Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:

More information

Learning to translate with neural networks. Michael Auli

Learning to translate with neural networks. Michael Auli Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each

More information

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Natural Language Processing Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Projects Project descriptions due today! Last class Sequence to sequence models Attention Pointer networks Today Weak

More information

Lecture 6: Neural Networks for Representing Word Meaning

Lecture 6: Neural Networks for Representing Word Meaning Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,

More information

NLP Homework: Dependency Parsing with Feed-Forward Neural Network

NLP Homework: Dependency Parsing with Feed-Forward Neural Network NLP Homework: Dependency Parsing with Feed-Forward Neural Network Submission Deadline: Monday Dec. 11th, 5 pm 1 Background on Dependency Parsing Dependency trees are one of the main representations used

More information

NLU: Semantic parsing

NLU: Semantic parsing NLU: Semantic parsing Adam Lopez slide credits: Chris Dyer, Nathan Schneider March 30, 2018 School of Informatics University of Edinburgh alopez@inf.ed.ac.uk Recall: meaning representations Sam likes Casey

More information

Lecture 5: Semantic Parsing, CCGs, and Structured Classification

Lecture 5: Semantic Parsing, CCGs, and Structured Classification Lecture 5: Semantic Parsing, CCGs, and Structured Classification Kyle Richardson kyle@ims.uni-stuttgart.de May 12, 2016 Lecture Plan paper: Zettlemoyer and Collins (2012) general topics: (P)CCGs, compositional

More information

arxiv: v1 [cs.cl] 21 May 2017

arxiv: v1 [cs.cl] 21 May 2017 Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we

More information

Better Conditional Language Modeling. Chris Dyer

Better Conditional Language Modeling. Chris Dyer Better Conditional Language Modeling Chris Dyer Conditional LMs A conditional language model assigns probabilities to sequences of words, w =(w 1,w 2,...,w`), given some conditioning context, x. As with

More information

arxiv: v2 [cs.cl] 1 Jan 2019

arxiv: v2 [cs.cl] 1 Jan 2019 Variational Self-attention Model for Sentence Representation arxiv:1812.11559v2 [cs.cl] 1 Jan 2019 Qiang Zhang 1, Shangsong Liang 2, Emine Yilmaz 1 1 University College London, London, United Kingdom 2

More information

Marrying Dynamic Programming with Recurrent Neural Networks

Marrying Dynamic Programming with Recurrent Neural Networks Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark Marrying

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

Multi-Source Neural Translation

Multi-Source Neural Translation Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu In the neural encoder-decoder

More information

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill

More information

LECTURER: BURCU CAN Spring

LECTURER: BURCU CAN Spring LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can

More information

Parts 3-6 are EXAMPLES for cse634

Parts 3-6 are EXAMPLES for cse634 1 Parts 3-6 are EXAMPLES for cse634 FINAL TEST CSE 352 ARTIFICIAL INTELLIGENCE Fall 2008 There are 6 pages in this exam. Please make sure you have all of them INTRODUCTION Philosophical AI Questions Q1.

More information

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):

More information

Multi-Source Neural Translation

Multi-Source Neural Translation Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu Abstract We build a multi-source

More information

Conditional Language modeling with attention

Conditional Language modeling with attention Conditional Language modeling with attention 2017.08.25 Oxford Deep NLP 조수현 Review Conditional language model: assign probabilities to sequence of words given some conditioning context x What is the probability

More information

Multiword Expression Identification with Tree Substitution Grammars

Multiword Expression Identification with Tree Substitution Grammars Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic

More information

Today s Lecture. Dropout

Today s Lecture. Dropout Today s Lecture so far we discussed RNNs as encoder and decoder we discussed some architecture variants: RNN vs. GRU vs. LSTM attention mechanisms Machine Translation 1: Advanced Neural Machine Translation

More information

AN ABSTRACT OF THE DISSERTATION OF

AN ABSTRACT OF THE DISSERTATION OF AN ABSTRACT OF THE DISSERTATION OF Kai Zhao for the degree of Doctor of Philosophy in Computer Science presented on May 30, 2017. Title: Structured Learning with Latent Variables: Theory and Algorithms

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09 Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require

More information

a) b) (Natural Language Processing; NLP) (Deep Learning) Bag of words White House RGB [1] IBM

a) b) (Natural Language Processing; NLP) (Deep Learning) Bag of words White House RGB [1] IBM c 1. (Natural Language Processing; NLP) (Deep Learning) RGB IBM 135 8511 5 6 52 yutat@jp.ibm.com a) b) 2. 1 0 2 1 Bag of words White House 2 [1] 2015 4 Copyright c by ORSJ. Unauthorized reproduction of

More information

Bringing machine learning & compositional semantics together: central concepts

Bringing machine learning & compositional semantics together: central concepts Bringing machine learning & compositional semantics together: central concepts https://githubcom/cgpotts/annualreview-complearning Chris Potts Stanford Linguistics CS 244U: Natural language understanding

More information

Recurrent neural network grammars

Recurrent neural network grammars Widespread phenomenon: Polarity items can only appear in certain contexts Recurrent neural network grammars lide credits: Chris Dyer, Adhiguna Kuncoro Example: anybody is a polarity item that tends to

More information

Unanimous Prediction for 100% Precision with Application to Learning Semantic Mappings

Unanimous Prediction for 100% Precision with Application to Learning Semantic Mappings Unanimous Prediction for 00% Precision with Application to Learning Semantic Mappings Fereshte Khani Stanford University fereshte@cs.stanford.edu Martin Rinard MIT rinard@lcs.mit.edu Percy Liang Stanford

More information

text classification 3: neural networks

text classification 3: neural networks text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Learning to reason by reading text and answering questions

Learning to reason by reading text and answering questions Learning to reason by reading text and answering questions Minjoon Seo Natural Language Processing Group University of Washington May 30, 2017 @ Seoul National University UWNLP What is reasoning? One-to-one

More information

PROBABILISTIC KNOWLEDGE GRAPH CONSTRUCTION: COMPOSITIONAL AND INCREMENTAL APPROACHES. Dongwoo Kim with Lexing Xie and Cheng Soon Ong CIKM 2016

PROBABILISTIC KNOWLEDGE GRAPH CONSTRUCTION: COMPOSITIONAL AND INCREMENTAL APPROACHES. Dongwoo Kim with Lexing Xie and Cheng Soon Ong CIKM 2016 PROBABILISTIC KNOWLEDGE GRAPH CONSTRUCTION: COMPOSITIONAL AND INCREMENTAL APPROACHES Dongwoo Kim with Lexing Xie and Cheng Soon Ong CIKM 2016 1 WHAT IS KNOWLEDGE GRAPH (KG)? KG widely used in various tasks

More information

Improving Lexical Choice in Neural Machine Translation. Toan Q. Nguyen & David Chiang

Improving Lexical Choice in Neural Machine Translation. Toan Q. Nguyen & David Chiang Improving Lexical Choice in Neural Machine Translation Toan Q. Nguyen & David Chiang 1 Overview Problem: Rare word mistranslation Model 1: softmax cosine similarity (fixnorm) Model 2: direction connections

More information

Probabilistic Context-free Grammars

Probabilistic Context-free Grammars Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John

More information

Margin-based Decomposed Amortized Inference

Margin-based Decomposed Amortized Inference Margin-based Decomposed Amortized Inference Gourab Kundu and Vivek Srikumar and Dan Roth University of Illinois, Urbana-Champaign Urbana, IL. 61801 {kundu2, vsrikum2, danr}@illinois.edu Abstract Given

More information

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu

Natural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Natural Language Processing Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu סקר הערכת הוראה Fill it! Presentations next week 21 projects 20/6: 12 presentations 27/6: 9 presentations 10 minutes

More information

Coverage Embedding Models for Neural Machine Translation

Coverage Embedding Models for Neural Machine Translation Coverage Embedding Models for Neural Machine Translation Haitao Mi Baskaran Sankaran Zhiguo Wang Abe Ittycheriah T.J. Watson Research Center IBM 1101 Kitchawan Rd, Yorktown Heights, NY 10598 {hmi, bsankara,

More information

with Local Dependencies

with Local Dependencies CS11-747 Neural Networks for NLP Structured Prediction with Local Dependencies Xuezhe Ma (Max) Site https://phontron.com/class/nn4nlp2017/ An Example Structured Prediction Problem: Sequence Labeling Sequence

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen

CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen LECTURE 4: DESIGN THEORIES (FUNCTIONAL DEPENDENCIES) Design theory E/R diagrams are high-level design Formal theory for

More information

Outline. Learning. Overview Details

Outline. Learning. Overview Details Outline Learning Overview Details 0 Outline Learning Overview Details 1 Supervision in syntactic parsing Input: S NP VP NP NP V VP ESSLLI 2016 the known summer school is V PP located in Bolzano Output:

More information

Random Coattention Forest for Question Answering

Random Coattention Forest for Question Answering Random Coattention Forest for Question Answering Jheng-Hao Chen Stanford University jhenghao@stanford.edu Ting-Po Lee Stanford University tingpo@stanford.edu Yi-Chun Chen Stanford University yichunc@stanford.edu

More information

Machine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26

Machine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26 Machine Translation 10: Advanced Neural Machine Translation Architectures Rico Sennrich University of Edinburgh R. Sennrich MT 2018 10 1 / 26 Today s Lecture so far today we discussed RNNs as encoder and

More information

A Discriminative Model for Semantics-to-String Translation

A Discriminative Model for Semantics-to-String Translation A Discriminative Model for Semantics-to-String Translation Aleš Tamchyna 1 and Chris Quirk 2 and Michel Galley 2 1 Charles University in Prague 2 Microsoft Research July 30, 2015 Tamchyna, Quirk, Galley

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Information Extraction from Text

Information Extraction from Text Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information

More information

Recurrent Neural Network

Recurrent Neural Network Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks

More information

Jointly Extracting Event Triggers and Arguments by Dependency-Bridge RNN and Tensor-Based Argument Interaction

Jointly Extracting Event Triggers and Arguments by Dependency-Bridge RNN and Tensor-Based Argument Interaction Jointly Extracting Event Triggers and Arguments by Dependency-Bridge RNN and Tensor-Based Argument Interaction Feng Qian,LeiSha, Baobao Chang, Zhifang Sui Institute of Computational Linguistics, Peking

More information

Hidden Markov Models

Hidden Markov Models 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework

More information

Task-Oriented Dialogue System (Young, 2000)

Task-Oriented Dialogue System (Young, 2000) 2 Review Task-Oriented Dialogue System (Young, 2000) 3 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Speech Recognition Hypothesis are there any action movies to see

More information

CS230: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention

CS230: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention CS23: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention Today s outline We will learn how to: I. Word Vector Representation i. Training - Generalize results with word vectors -

More information

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti

Quasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations

More information

Linear Classifiers IV

Linear Classifiers IV Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers IV Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are

More information

Bayesian Paragraph Vectors

Bayesian Paragraph Vectors Bayesian Paragraph Vectors Geng Ji 1, Robert Bamler 2, Erik B. Sudderth 1, and Stephan Mandt 2 1 Department of Computer Science, UC Irvine, {gji1, sudderth}@uci.edu 2 Disney Research, firstname.lastname@disneyresearch.com

More information

Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning

Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den Broeck KULeuven Symposium Dec 12, 2018 Outline Learning Adding knowledge to deep learning Logistic circuits

More information

Random Generation of Nondeterministic Tree Automata

Random Generation of Nondeterministic Tree Automata Random Generation of Nondeterministic Tree Automata Thomas Hanneforth 1 and Andreas Maletti 2 and Daniel Quernheim 2 1 Department of Linguistics University of Potsdam, Germany 2 Institute for Natural Language

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

CS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG)

CS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG) CS1820 Notes hgupta1, kjline, smechery April 3-April 5 April 3 Notes 1 Minichiello-Durbin Algorithm input: set of sequences output: plausible Ancestral Recombination Graph (ARG) note: the optimal ARG is

More information

Natural Language Processing (CSEP 517): Machine Translation

Natural Language Processing (CSEP 517): Machine Translation Natural Language Processing (CSEP 57): Machine Translation Noah Smith c 207 University of Washington nasmith@cs.washington.edu May 5, 207 / 59 To-Do List Online quiz: due Sunday (Jurafsky and Martin, 2008,

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars

More information

Qualifier: CS 6375 Machine Learning Spring 2015

Qualifier: CS 6375 Machine Learning Spring 2015 Qualifier: CS 6375 Machine Learning Spring 2015 The exam is closed book. You are allowed to use two double-sided cheat sheets and a calculator. If you run out of room for an answer, use an additional sheet

More information

Latent Variable Models in NLP

Latent Variable Models in NLP Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification

More information

Conditional Language Modeling. Chris Dyer

Conditional Language Modeling. Chris Dyer Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain

More information

Dialogue Systems. Statistical NLU component. Representation. A Probabilistic Dialogue System. Task: map a sentence + context to a database query

Dialogue Systems. Statistical NLU component. Representation. A Probabilistic Dialogue System. Task: map a sentence + context to a database query Statistical NLU component Task: map a sentence + context to a database query Dialogue Systems User: Show me flights from NY to Boston, leaving tomorrow System: [returns a list of flights] Origin (City

More information

Personal Project: Shift-Reduce Dependency Parsing

Personal Project: Shift-Reduce Dependency Parsing Personal Project: Shift-Reduce Dependency Parsing 1 Problem Statement The goal of this project is to implement a shift-reduce dependency parser. This entails two subgoals: Inference: We must have a shift-reduce

More information

Tracking the World State with Recurrent Entity Networks

Tracking the World State with Recurrent Entity Networks Tracking the World State with Recurrent Entity Networks Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes, Yann LeCun Task At each timestep, get information (in the form of a sentence) about the

More information

Presented By: Omer Shmueli and Sivan Niv

Presented By: Omer Shmueli and Sivan Niv Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan

More information

Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron)

Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron) Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron) Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor (http://brenocon.com) 1 Models for

More information

arxiv: v2 [cs.lg] 26 Jan 2016

arxiv: v2 [cs.lg] 26 Jan 2016 Stacked Attention Networks for Image Question Answering Zichao Yang 1, Xiaodong He 2, Jianfeng Gao 2, Li Deng 2, Alex Smola 1 1 Carnegie Mellon University, 2 Microsoft Research, Redmond, WA 98052, USA

More information

Lecture Notes on Inductive Definitions

Lecture Notes on Inductive Definitions Lecture Notes on Inductive Definitions 15-312: Foundations of Programming Languages Frank Pfenning Lecture 2 September 2, 2004 These supplementary notes review the notion of an inductive definition and

More information

KNOWLEDGE-GUIDED RECURRENT NEURAL NETWORK LEARNING FOR TASK-ORIENTED ACTION PREDICTION

KNOWLEDGE-GUIDED RECURRENT NEURAL NETWORK LEARNING FOR TASK-ORIENTED ACTION PREDICTION KNOWLEDGE-GUIDED RECURRENT NEURAL NETWORK LEARNING FOR TASK-ORIENTED ACTION PREDICTION Liang Lin, Lili Huang, Tianshui Chen, Yukang Gan, and Hui Cheng School of Data and Computer Science, Sun Yat-Sen University,

More information

Improving Sequence-to-Sequence Constituency Parsing

Improving Sequence-to-Sequence Constituency Parsing Improving Sequence-to-Sequence Constituency Parsing Lemao Liu, Muhua Zhu and Shuming Shi Tencent AI Lab, Shenzhen, China {redmondliu,muhuazhu, shumingshi}@tencent.com Abstract Sequence-to-sequence constituency

More information

Lecture 3: ASR: HMMs, Forward, Viterbi

Lecture 3: ASR: HMMs, Forward, Viterbi Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The

More information

A proof theoretical account of polarity items and monotonic inference.

A proof theoretical account of polarity items and monotonic inference. A proof theoretical account of polarity items and monotonic inference. Raffaella Bernardi UiL OTS, University of Utrecht e-mail: Raffaella.Bernardi@let.uu.nl Url: http://www.let.uu.nl/ Raffaella.Bernardi/personal

More information

Deep Learning Recurrent Networks 2/28/2018

Deep Learning Recurrent Networks 2/28/2018 Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Language Models Tobias Scheffer Stochastic Language Models A stochastic language model is a probability distribution over words.

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Contents 1 Pseudo-code for the damped Gauss-Newton vector product 2 2 Details of the pathological synthetic problems

More information

arxiv: v1 [cs.cl] 27 Aug 2018

arxiv: v1 [cs.cl] 27 Aug 2018 Zero-shot Transfer Learning for Semantic Parsing Javid Dadashkarimi Computer Science Department Yale University javid.dadashkarimi@yale.edu Alexander Fabbri Computer Science Department Yale University

More information

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers

More information

Data Mining & Machine Learning

Data Mining & Machine Learning Data Mining & Machine Learning CS57300 Purdue University March 1, 2018 1 Recap of Last Class (Model Search) Forward and Backward passes 2 Feedforward Neural Networks Neural Networks: Architectures 2-layer

More information

A QUESTION ANSWERING SYSTEM USING ENCODER-DECODER, SEQUENCE-TO-SEQUENCE, RECURRENT NEURAL NETWORKS. A Project. Presented to

A QUESTION ANSWERING SYSTEM USING ENCODER-DECODER, SEQUENCE-TO-SEQUENCE, RECURRENT NEURAL NETWORKS. A Project. Presented to A QUESTION ANSWERING SYSTEM USING ENCODER-DECODER, SEQUENCE-TO-SEQUENCE, RECURRENT NEURAL NETWORKS A Project Presented to The Faculty of the Department of Computer Science San José State University In

More information

Lecture 9: Decoding. Andreas Maletti. Stuttgart January 20, Statistical Machine Translation. SMT VIII A. Maletti 1

Lecture 9: Decoding. Andreas Maletti. Stuttgart January 20, Statistical Machine Translation. SMT VIII A. Maletti 1 Lecture 9: Decoding Andreas Maletti Statistical Machine Translation Stuttgart January 20, 2012 SMT VIII A. Maletti 1 Lecture 9 Last time Synchronous grammars (tree transducers) Rule extraction Weight training

More information

Lecture Notes on Inductive Definitions

Lecture Notes on Inductive Definitions Lecture Notes on Inductive Definitions 15-312: Foundations of Programming Languages Frank Pfenning Lecture 2 August 28, 2003 These supplementary notes review the notion of an inductive definition and give

More information

CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss

CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss Jeffrey Flanigan Chris Dyer Noah A. Smith Jaime Carbonell School of Computer Science, Carnegie Mellon University, Pittsburgh,

More information

Tuning as Linear Regression

Tuning as Linear Regression Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical

More information

Deep Neural Solver for Math Word Problems

Deep Neural Solver for Math Word Problems Deep Neural Solver for Math Word Problems Yan Wang Xiaojiang Liu Shuming Shi Tencent AI Lab {brandenwang, kieranliu, shumingshi}@tencent.com Abstract This paper presents a deep neural solver to automatically

More information