Language to Logical Form with Neural Attention
|
|
- Marian French
- 5 years ago
- Views:
Transcription
1 Language to Logical Form with Neural Attention Mirella Lapata and Li Dong Institute for Language, Cognition and Computation School of Informatics University of Edinburgh 1 / 32
2 Semantic Parsing Natural Language (NL) Parser Machine Executable Language 2 / 32
3 Semantic Parsing: Querying a Database NL What are the capitals of states bordering Texas? Parser DB λ x. capital(y, x) state(y) next_to(y, Texas) 3 / 32
4 Semantic Parsing: Instructing a Robot 1 NL at the chair, move forward three steps past the sofa LF Parser λ a.pre(a, x.chair(x)) move(a) len(a, 3) dir(a, forward) past(a, y.sofa(y)) : sentence! logical form 1 Courtesy: Artzi et al., / 32
5 Search Semantic Parsing: Answering Questions with Freebase titanic 1997 NL KB Who are the male actors in Titanic? Parser λ x. y. gender(male, x) cast(titanic, x, y) Titanic 1997 Drama film/romance 3h 30m 7.7/10 IMDb 88% Rotten Tomatoes James Cameron's "Titanic" is an epic, action-packed romance set against the ill-fated maiden voyage of the R.M.S. Titanic; the pride and joy of the White Star Line and, at the time, the larg More Initial release: November 18, 1997 (London) Director: James Cameron Featured song: My Heart Will Go On Cast Leonardo DiCaprio Jack Dawson Kate Winslet Rose DeWitt Bukater Billy Zane Caledon Hockley Gloria Stuart Rose DeWitt Bukater Kathy Bates Molly Brown Feedback 5 / 32
6 Supervised Approaches Induce parsers from sentences paired with logical forms Question Who are the male actors in Titanic? Logical Form λx. y. gender(male,x) cast(titanic,x,y) Parsing (Ge and Mooney, 2005; Lu et al., 2008; Zhao and Huang, 2015) Inductive logic programming (Zelle and Mooney, 1996; Tang and Mooney, 2000; Thomspon and Mooney, 2003) Machine translation (Wong and Mooney, 2006; Wong and Mooney, 2007; Andreas et al., 2013) CCG grammar induction (Zettlemoyer and Collins, 2005; Zettle- moyer and Collins, 2007; Kwiatkowski et al., 2010; Kwiatkowski et al., 2011) 6 / 32
7 Indirect Supervision Induce parsers from questions paired with side information Question Who are the male actors in Titanic? Answer {DICAPRIO, BILLYZANE... } Answers to questions (Clarke et al., 2010; Liang et al., 2013) System demonstrations (Chen and Mooney, 2011; Goldwasser and Roth, 2011; Artzi and Zettlemoyer, 2013) Distant supervision (Cai and Yates, 2013; Reddy et al., 2014) 7 / 32
8 Indirect Supervision Induce parsers from questions paired with side information Question Who are the male actors in Titanic? Logical Form Latent λx. y. gender(male,x) cast(titanic,x,y) Answer {DICAPRIO, BILLYZANE... } Answers to questions (Clarke et al., 2010; Liang et al., 2013) System demonstrations (Chen and Mooney, 2011; Goldwasser and Roth, 2011; Artzi and Zettlemoyer, 2013) Distant supervision (Cai and Yates, 2013; Reddy et al., 2014) 7 / 32
9 In General Developing semantic parsers requires linguistic expertise! high-quality lexicons based on underlying grammar formalism manually-built templates based on underlying grammar formalism grammar-based features pertaining to Logical Form and Natural Language Domain- and representation-specific! 8 / 32
10 Our Goal: All Purpose Semantic Parsing Question Who are the male actors in Titanic? Logical Form λx. y. gender(male,x) cast(titanic,x,y) 9 / 32
11 Our Goal: All Purpose Semantic Parsing Question Who are the male actors in Titanic? Logical Form λx. y. gender(male,x) cast(titanic,x,y) Learn from NL descriptions paired with meaning representations Use minimal domain (and grammar) knowledge Model is general and can be used across meaning representations 9 / 32
12 Problem Formulation Model maps natural language input q = x 1 x q to a logical form representation of its meaning a = y 1 y a. p(a q) = a t=1 p(y t y <t,q) where y <t = y 1 y t 1 Encoder encodes natural language input q into a vector representation Decoder generates y 1,,y a conditioned on the encoding vector. Vinyals et al., (2015a,b), Kalchbrenner and Blunsom (2013), Cho et al., (2014), Sutskever et al., (2014), Karpathy and Fei-Fei, (2015) 10 / 32
13 Encoder Decoder Framework Attention Layer what microsoft jobs do not require a bscs? answer(j,(compa ny(j,'microsoft'),j ob(j),not((req_de g(j,'bscs'))))) Input Utterance Sequence Encoder Sequence/Tree Decoder Logical Form 11 / 32
14 SEQ2SEQ Model encoder decoder h l t = ( h l ) t 1,hl 1 t h 0 t = W q e(x t ) h 0 t = W a e(y t 1 ) p(y t y <t,q) = softmax ( W o h L ) e(yt t ) 12 / 32
15 SEQ2TREE MODEL (lambda $0 e (and (>(departure_time $0) 1600:ti) (from $0 dallas:ci))) root lambda $0 e < n > and < n > < n > > < n > 1600:ti from $0 dallas:ci departure_time $0 13 / 32
16 SEQ2TREE MODEL (lambda $0 e (and (>(departure_time $0) 1600:ti) (from $0 dallas:ci))) root lambda $0 e < n > and < n > < n > > < n > 1600:ti from $0 dallas:ci departure_time $0 13 / 32
17 SEQ2TREE MODEL (lambda $0 e (and (>(departure_time $0) 1600:ti) (from $0 dallas:ci))) root lambda $0 e < n > and < n > < n > > < n > 1600:ti from $0 dallas:ci departure_time $0 13 / 32
18 SEQ2TREE MODEL (lambda $0 e (and (>(departure_time $0) 1600:ti) (from $0 dallas:ci))) root lambda $0 e < n > and < n > < n > > < n > 1600:ti from $0 dallas:ci departure_time $0 13 / 32
19 SEQ2TREE MODEL (lambda $0 e (and (>(departure_time $0) 1600:ti) (from $0 dallas:ci))) root lambda $0 e < n > and < n > < n > > < n > 1600:ti from $0 dallas:ci departure_time $0 13 / 32
20 SEQ2TREE MODEL (lambda $0 e (and (>(departure_time $0) 1600:ti) (from $0 dallas:ci))) root lambda $0 e < n > and < n > < n > > < n > 1600:ti from $0 dallas:ci departure_time $0 13 / 32
21 SEQ2TREE MODEL lambda $0 e <n> </s> and <n> <n> </s> > <n> 1600:ti </s> from $0 dallas:ci </s> <n> Nonterminal Start decoding Parent feeding Encoder unit Decoder unit departure _time $0 </s> 14 / 32
22 Parent Feeding Connections y 1 =A y 2 =B y 3 =<n> y 4 =</s> q <s> t 1 t 2 t 3 t 4 y 5 =C y 6 =</s> t 5 t 6 <(> A SEQ2TREE decoding example for the logical form A B (C) Hidden vector of the parent nonterminal is concatenated with the inputs and fed to the. p(a q) = p(y 1 y 2 y 3 y 4 q)p(y 5 y 6 y 3,q) 15 / 32
23 Attention Mechanism Attention Scores Scores computed by current h vector and all h vectors of encoder The encoder-side context vector c t is obtained in the form of a weighted sum, which is further used to predict y t Bahdanau et al., (2015), Luong et al., (2015b), Xu et al., (2015) 16 / 32
24 Training and Inference Our goal is to maximize the likelihood of the generated logical forms given natural language utterances as input. minimize (q,a) D logp(a q) where D is the set of all natural language-logical form training pairs At test time, we predict the logical form for an input utterance q by: â = argmax a p ( a q ) Iterating over all possible a s to obtain the optimal prediction is impractical Probability p(a q) decomposed so that we can use greedy/beam search. 17 / 32
25 Argument Identification Many LFs contain named entities and numbers aka rare words. Does not make sense to replace them with special unknown word symbol. Identify entities and numbers in input questions and replace them with their type names and unique IDs. jobs with a salary of job(ans), salary_greater_than(ans, 40000, year) Pre-processed examples are used as training data. After decoding, a post-processing step recovers all type i markers to corresponding logical constants. 18 / 32
26 Argument Identification Many LFs contain named entities and numbers aka rare words. Does not make sense to replace them with special unknown word symbol. Identify entities and numbers in input questions and replace them with their type names and unique IDs. jobs with a salary of num 0 job(ans), salary_greater_than(ans, num 0, year) Pre-processed examples are used as training data. After decoding, a post-processing step recovers all type i markers to corresponding logical constants. 18 / 32
27 Semantic Parsing Datasets Length JOBS 9.80 what microsoft jobs do not require a bscs? answer(company(j, microsoft ),job(j),not((req_deg(j, bscs )))) Length GEO 7.60 what is the population of the state with the largest area? (population:i (argmax $0 (state:t $0) (area:i $0))) Length ATIS dallas to san francisco leaving after 4 in the afternoon please (lambda $0 e (and (>(departure_time $0) 1600:ti) (from $0 dallas:ci) (to $0 san_francisco:ci))) JOBS: queries to job listings; 500 training, 140 test instances. GEO: queries US geography database; 680 training, 200 test instances. ATIS: queries to a flight booking system; 4,480 training, 480 dev, 450 test. 19 / 32
28 Semantic Parsing Datasets Length IFTTT 6.95 turn on heater when temperature drops below 58 degree TRIGGER: Weather - Current_temperature_drops_below - ((Temperature (58)) (Degrees_in (f))) ACTION: WeMo_Insight_Switch - Turn_on - ((Which_switch? (""))) if-this-then-that recipes from the IFTTT website (Quirk et al., 2015) Recipes are simple programs with exactly one trigger and one action 77,495 training, 5,171 development, and 4,294 test examples IFTTT programs are represented as abstract syntax trees; NL descriptions provided by users. 20 / 32
29 Evaluation Results on JOBS 90 88% 91% 87% 90% % 79% 85% 78% 84% % COCKTAIL01 PRECISE03 ZC05 DCS+L13 TISP15 SEQ2SEQ attention argument SEQ2TREE attention Accuracy (%) 21 / 32
30 Evaluation Results on GEO Accuracy (%) % 72% 75% 87% 82% 79% 86% 88% 89%89% 89% 88% 85% 73% 86% 77% % SCISSOR05 KRISP06 WASP06 λ -WASP06 LNLZ08 ZC05 ZC07 UBL10 FUBL11 KCAZ13 DCS+L13 TISP15 SEQ2SEQ attention argument SEQ2TREE attention 22 / 32
31 Evaluation Results on ATIS 85 85% 83% 84% 84% 84% 85% Accuracy (%) % 79% 74% 80% 71% ZC07 UBL10 FUBL11 GUSP-FULL13 GUSP++13 TISP15 SEQ2SEQ attention argument SEQ2TREE attention 23 / 32
32 Results on IFTTT 50 48% 49% 50% 50% 50% 50% 50% Accuracy (%) % 35 35% 35% retrieval phrasal sync classifier posclass SEQ2SEQ attention argument SEQ2TREE attention (a) Omit non-english 24 / 32
33 Results on IFTTT Accuracy (%) % 57% 58% 60% 60% 60% 60% 60% 40 40% 38% retrieval phrasal sync classifier posclass SEQ2SEQ attention argument SEQ2TREE attention (b) Omit non-english & unintelligible 25 / 32
34 Results on IFTTT 74% 74% 70 65% 67% Accuracy (%) % 46% 43% 40 retrieval phrasal sync classifier posclass 74% 73% 71% SEQ2SEQ attention argument SEQ2TREE attention (c) 3 turkers agree with gold 26 / 32
35 Example Output: SEQ2SEQ job ( ANS ), salary_greater_than ( ANS, num0, year ), \+ ( ( req_deg ( ANS, degid0 ) ) ) </s> </s> degid0 a requir not do that num0 pay job which <s> ( lambda $0 e( exists $1 ( and ( round_trip $1)( class_type $1 first:cl )( from $1 ci0 )( to $1 ci1 )( = ( fare $1 ) $0 )) ) </s> </s> ci1 to ci0 from trip round fare class first what <s> which jobs pay num0 that do not require a degid0 what s first class fare round trip from ci0 to ci1 27 / 32
36 Example Output: TREE2SEQ argmin $0 ( and ( flight $0 ) ( from $0 ci0 ) ( to $0 ci1 ) ( tomorrow $0 ) ) ( departure_time $0 ) </s> </s> tomorrow ci1 to ci0 from flight earliest the is what <s> argmax $0 ( and ( place:t $0 ) ( loc:t $0 co0 ) ) ( elevation:i $0 ) </s> </s> co0 the in elev highest the is what <s> what is the earliest flight from ci0 to ci1 what is the highest elevation in the co0 28 / 32
37 What Have we Learned? Encoder-decoder neural network model for mapping natural language descriptions to meaning representations TREE2SEQ and attention improves performance Model general and could transfer to other tasks/architectures Future work: learn a model from question-answer pairs without access to target logical forms. 29 / 32
38 Summary For more details see Dong and Lapata (ACL, 2016) Data and code from index.php?page=code 30 / 32
39 Training Details Arguments were identified with plain string matching Parameters updated with RMSProp algorithm (batch size set to 20). Greadients were clipped, dropout was applied, and parameters were randomly initialized from a uniform distribution. A two-layer was used for IFTTT. Dimensions of hidden vector and word embedding from {150, 200, 250}. Early stopping was used to determine the number of epochs. Input sentences were reversed (Sutskever et al., 2014). All experiments were conducted on a single GTX 980 GPU device. 31 / 32
40 Decoding for SEQ2TREE Algorithm 1 Decoding for SEQ2TREE Input: q: Natural language utterance Output: â: Decoding result 1: Push the encoding result to a queue 2: Q.init({hid : SeqEnc(q)}) 3: Decode until no more nonterminals 4: while (c Q.pop()) do 5: Call sequence decoder 6: c.child SeqDec(c.hid) 7: Push new nonterminals to queue 8: for n nonterminal in c.child do 9: Q.push({hid : HidVec(n)}) 10: â convert decoding tree to output sequence 32 / 32
Driving Semantic Parsing from the World s Response
Driving Semantic Parsing from the World s Response James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign CoNLL 2010 Clarke, Goldwasser,
More informationSemantic Parsing with Combinatory Categorial Grammars
Semantic Parsing with Combinatory Categorial Grammars Yoav Artzi, Nicholas FitzGerald and Luke Zettlemoyer University of Washington ACL 2013 Tutorial Sofia, Bulgaria Learning Data Learning Algorithm CCG
More informationTop Down Intro to Neural Semantic Parsing. Ofir
Top Down Intro to Neural Semantic Parsing Ofir Plan 1. Intro to NNs and RNNs Short! 2. Model 1: seq2seq Meaty! 3. Model 2: seq2tree 4. How to improve these models with attention Important! 5. Results &
More informationOutline. Learning. Overview Details Example Lexicon learning Supervision signals
Outline Learning Overview Details Example Lexicon learning Supervision signals 0 Outline Learning Overview Details Example Lexicon learning Supervision signals 1 Supervision in syntactic parsing Input:
More informationA Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26 The Task (Logical Form) λx 0.state(x 0
More informationLearning Dependency-Based Compositional Semantics
Learning Dependency-Based Compositional Semantics Semantic Representations for Textual Inference Workshop Mar. 0, 0 Percy Liang Google/Stanford joint work with Michael Jordan and Dan Klein Motivating Problem:
More informationCross-lingual Semantic Parsing
Cross-lingual Semantic Parsing Part I: 11 Dimensions of Semantic Parsing Kilian Evang University of Düsseldorf 1 / 94 Abbreviations NL natural language e.g., English, Bulgarian NLU natural language utterance
More informationBidirectional Attention for SQL Generation. Tong Guo 1 Huilin Gao 2. Beijing, China
Bidirectional Attention for SQL Generation Tong Guo 1 Huilin Gao 2 1 Lenovo AI Lab 2 China Electronic Technology Group Corporation Information Science Academy, Beijing, China arxiv:1801.00076v6 [cs.cl]
More informationIntroduction to Semantic Parsing with CCG
Introduction to Semantic Parsing with CCG Kilian Evang Heinrich-Heine-Universität Düsseldorf 2018-04-24 Table of contents 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG)
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, April Sequence to Sequence Models and Attention
TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Sequence to Sequence Models and Attention Encode-Decode Architectures for Machine Translation [Figure from Luong et al.] In Sutskever
More informationDeep Learning Sequence to Sequence models: Attention Models. 17 March 2018
Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:
More informationLearning to translate with neural networks. Michael Auli
Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each
More informationNatural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu
Natural Language Processing Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Projects Project descriptions due today! Last class Sequence to sequence models Attention Pointer networks Today Weak
More informationLecture 6: Neural Networks for Representing Word Meaning
Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,
More informationNLP Homework: Dependency Parsing with Feed-Forward Neural Network
NLP Homework: Dependency Parsing with Feed-Forward Neural Network Submission Deadline: Monday Dec. 11th, 5 pm 1 Background on Dependency Parsing Dependency trees are one of the main representations used
More informationNLU: Semantic parsing
NLU: Semantic parsing Adam Lopez slide credits: Chris Dyer, Nathan Schneider March 30, 2018 School of Informatics University of Edinburgh alopez@inf.ed.ac.uk Recall: meaning representations Sam likes Casey
More informationLecture 5: Semantic Parsing, CCGs, and Structured Classification
Lecture 5: Semantic Parsing, CCGs, and Structured Classification Kyle Richardson kyle@ims.uni-stuttgart.de May 12, 2016 Lecture Plan paper: Zettlemoyer and Collins (2012) general topics: (P)CCGs, compositional
More informationarxiv: v1 [cs.cl] 21 May 2017
Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we
More informationBetter Conditional Language Modeling. Chris Dyer
Better Conditional Language Modeling Chris Dyer Conditional LMs A conditional language model assigns probabilities to sequences of words, w =(w 1,w 2,...,w`), given some conditioning context, x. As with
More informationarxiv: v2 [cs.cl] 1 Jan 2019
Variational Self-attention Model for Sentence Representation arxiv:1812.11559v2 [cs.cl] 1 Jan 2019 Qiang Zhang 1, Shangsong Liang 2, Emine Yilmaz 1 1 University College London, London, United Kingdom 2
More informationMarrying Dynamic Programming with Recurrent Neural Networks
Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark Marrying
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted
More informationMulti-Source Neural Translation
Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu In the neural encoder-decoder
More informationGoogle s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill
More informationLECTURER: BURCU CAN Spring
LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can
More informationParts 3-6 are EXAMPLES for cse634
1 Parts 3-6 are EXAMPLES for cse634 FINAL TEST CSE 352 ARTIFICIAL INTELLIGENCE Fall 2008 There are 6 pages in this exam. Please make sure you have all of them INTRODUCTION Philosophical AI Questions Q1.
More informationNatural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):
More informationMulti-Source Neural Translation
Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu Abstract We build a multi-source
More informationConditional Language modeling with attention
Conditional Language modeling with attention 2017.08.25 Oxford Deep NLP 조수현 Review Conditional language model: assign probabilities to sequence of words given some conditioning context x What is the probability
More informationMultiword Expression Identification with Tree Substitution Grammars
Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic
More informationToday s Lecture. Dropout
Today s Lecture so far we discussed RNNs as encoder and decoder we discussed some architecture variants: RNN vs. GRU vs. LSTM attention mechanisms Machine Translation 1: Advanced Neural Machine Translation
More informationAN ABSTRACT OF THE DISSERTATION OF
AN ABSTRACT OF THE DISSERTATION OF Kai Zhao for the degree of Doctor of Philosophy in Computer Science presented on May 30, 2017. Title: Structured Learning with Latent Variables: Theory and Algorithms
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationNatural Language Processing : Probabilistic Context Free Grammars. Updated 5/09
Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require
More informationa) b) (Natural Language Processing; NLP) (Deep Learning) Bag of words White House RGB [1] IBM
c 1. (Natural Language Processing; NLP) (Deep Learning) RGB IBM 135 8511 5 6 52 yutat@jp.ibm.com a) b) 2. 1 0 2 1 Bag of words White House 2 [1] 2015 4 Copyright c by ORSJ. Unauthorized reproduction of
More informationBringing machine learning & compositional semantics together: central concepts
Bringing machine learning & compositional semantics together: central concepts https://githubcom/cgpotts/annualreview-complearning Chris Potts Stanford Linguistics CS 244U: Natural language understanding
More informationRecurrent neural network grammars
Widespread phenomenon: Polarity items can only appear in certain contexts Recurrent neural network grammars lide credits: Chris Dyer, Adhiguna Kuncoro Example: anybody is a polarity item that tends to
More informationUnanimous Prediction for 100% Precision with Application to Learning Semantic Mappings
Unanimous Prediction for 00% Precision with Application to Learning Semantic Mappings Fereshte Khani Stanford University fereshte@cs.stanford.edu Martin Rinard MIT rinard@lcs.mit.edu Percy Liang Stanford
More informationtext classification 3: neural networks
text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationLearning to reason by reading text and answering questions
Learning to reason by reading text and answering questions Minjoon Seo Natural Language Processing Group University of Washington May 30, 2017 @ Seoul National University UWNLP What is reasoning? One-to-one
More informationPROBABILISTIC KNOWLEDGE GRAPH CONSTRUCTION: COMPOSITIONAL AND INCREMENTAL APPROACHES. Dongwoo Kim with Lexing Xie and Cheng Soon Ong CIKM 2016
PROBABILISTIC KNOWLEDGE GRAPH CONSTRUCTION: COMPOSITIONAL AND INCREMENTAL APPROACHES Dongwoo Kim with Lexing Xie and Cheng Soon Ong CIKM 2016 1 WHAT IS KNOWLEDGE GRAPH (KG)? KG widely used in various tasks
More informationImproving Lexical Choice in Neural Machine Translation. Toan Q. Nguyen & David Chiang
Improving Lexical Choice in Neural Machine Translation Toan Q. Nguyen & David Chiang 1 Overview Problem: Rare word mistranslation Model 1: softmax cosine similarity (fixnorm) Model 2: direction connections
More informationProbabilistic Context-free Grammars
Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John
More informationMargin-based Decomposed Amortized Inference
Margin-based Decomposed Amortized Inference Gourab Kundu and Vivek Srikumar and Dan Roth University of Illinois, Urbana-Champaign Urbana, IL. 61801 {kundu2, vsrikum2, danr}@illinois.edu Abstract Given
More informationNatural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu
Natural Language Processing Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu סקר הערכת הוראה Fill it! Presentations next week 21 projects 20/6: 12 presentations 27/6: 9 presentations 10 minutes
More informationCoverage Embedding Models for Neural Machine Translation
Coverage Embedding Models for Neural Machine Translation Haitao Mi Baskaran Sankaran Zhiguo Wang Abe Ittycheriah T.J. Watson Research Center IBM 1101 Kitchawan Rd, Yorktown Heights, NY 10598 {hmi, bsankara,
More informationwith Local Dependencies
CS11-747 Neural Networks for NLP Structured Prediction with Local Dependencies Xuezhe Ma (Max) Site https://phontron.com/class/nn4nlp2017/ An Example Structured Prediction Problem: Sequence Labeling Sequence
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationCS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen
CS 464/564 Introduction to Database Management System Instructor: Abdullah Mueen LECTURE 4: DESIGN THEORIES (FUNCTIONAL DEPENDENCIES) Design theory E/R diagrams are high-level design Formal theory for
More informationOutline. Learning. Overview Details
Outline Learning Overview Details 0 Outline Learning Overview Details 1 Supervision in syntactic parsing Input: S NP VP NP NP V VP ESSLLI 2016 the known summer school is V PP located in Bolzano Output:
More informationRandom Coattention Forest for Question Answering
Random Coattention Forest for Question Answering Jheng-Hao Chen Stanford University jhenghao@stanford.edu Ting-Po Lee Stanford University tingpo@stanford.edu Yi-Chun Chen Stanford University yichunc@stanford.edu
More informationMachine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26
Machine Translation 10: Advanced Neural Machine Translation Architectures Rico Sennrich University of Edinburgh R. Sennrich MT 2018 10 1 / 26 Today s Lecture so far today we discussed RNNs as encoder and
More informationA Discriminative Model for Semantics-to-String Translation
A Discriminative Model for Semantics-to-String Translation Aleš Tamchyna 1 and Chris Quirk 2 and Michel Galley 2 1 Charles University in Prague 2 Microsoft Research July 30, 2015 Tamchyna, Quirk, Galley
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationInformation Extraction from Text
Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information
More informationRecurrent Neural Network
Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks
More informationJointly Extracting Event Triggers and Arguments by Dependency-Bridge RNN and Tensor-Based Argument Interaction
Jointly Extracting Event Triggers and Arguments by Dependency-Bridge RNN and Tensor-Based Argument Interaction Feng Qian,LeiSha, Baobao Chang, Zhifang Sui Institute of Computational Linguistics, Peking
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework
More informationTask-Oriented Dialogue System (Young, 2000)
2 Review Task-Oriented Dialogue System (Young, 2000) 3 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Speech Recognition Hypothesis are there any action movies to see
More informationCS230: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention
CS23: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention Today s outline We will learn how to: I. Word Vector Representation i. Training - Generalize results with word vectors -
More informationQuasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti
Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations
More informationLinear Classifiers IV
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers IV Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The Expectation Maximization (EM) algorithm is one approach to unsupervised, semi-supervised, or lightly supervised learning. In this kind of learning either no labels are
More informationBayesian Paragraph Vectors
Bayesian Paragraph Vectors Geng Ji 1, Robert Bamler 2, Erik B. Sudderth 1, and Stephan Mandt 2 1 Department of Computer Science, UC Irvine, {gji1, sudderth}@uci.edu 2 Disney Research, firstname.lastname@disneyresearch.com
More informationProbabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning
Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den Broeck KULeuven Symposium Dec 12, 2018 Outline Learning Adding knowledge to deep learning Logistic circuits
More informationRandom Generation of Nondeterministic Tree Automata
Random Generation of Nondeterministic Tree Automata Thomas Hanneforth 1 and Andreas Maletti 2 and Daniel Quernheim 2 1 Department of Linguistics University of Potsdam, Germany 2 Institute for Natural Language
More informationSegmental Recurrent Neural Networks for End-to-end Speech Recognition
Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave
More informationCS1820 Notes. hgupta1, kjline, smechery. April 3-April 5. output: plausible Ancestral Recombination Graph (ARG)
CS1820 Notes hgupta1, kjline, smechery April 3-April 5 April 3 Notes 1 Minichiello-Durbin Algorithm input: set of sequences output: plausible Ancestral Recombination Graph (ARG) note: the optimal ARG is
More informationNatural Language Processing (CSEP 517): Machine Translation
Natural Language Processing (CSEP 57): Machine Translation Noah Smith c 207 University of Washington nasmith@cs.washington.edu May 5, 207 / 59 To-Do List Online quiz: due Sunday (Jurafsky and Martin, 2008,
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars
More informationQualifier: CS 6375 Machine Learning Spring 2015
Qualifier: CS 6375 Machine Learning Spring 2015 The exam is closed book. You are allowed to use two double-sided cheat sheets and a calculator. If you run out of room for an answer, use an additional sheet
More informationLatent Variable Models in NLP
Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable
More informationStatistical Methods for NLP
Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification
More informationConditional Language Modeling. Chris Dyer
Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain
More informationDialogue Systems. Statistical NLU component. Representation. A Probabilistic Dialogue System. Task: map a sentence + context to a database query
Statistical NLU component Task: map a sentence + context to a database query Dialogue Systems User: Show me flights from NY to Boston, leaving tomorrow System: [returns a list of flights] Origin (City
More informationPersonal Project: Shift-Reduce Dependency Parsing
Personal Project: Shift-Reduce Dependency Parsing 1 Problem Statement The goal of this project is to implement a shift-reduce dependency parser. This entails two subgoals: Inference: We must have a shift-reduce
More informationTracking the World State with Recurrent Entity Networks
Tracking the World State with Recurrent Entity Networks Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes, Yann LeCun Task At each timestep, get information (in the form of a sentence) about the
More informationPresented By: Omer Shmueli and Sivan Niv
Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan
More informationLecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron)
Lecture 13: Discriminative Sequence Models (MEMM and Struct. Perceptron) Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor (http://brenocon.com) 1 Models for
More informationarxiv: v2 [cs.lg] 26 Jan 2016
Stacked Attention Networks for Image Question Answering Zichao Yang 1, Xiaodong He 2, Jianfeng Gao 2, Li Deng 2, Alex Smola 1 1 Carnegie Mellon University, 2 Microsoft Research, Redmond, WA 98052, USA
More informationLecture Notes on Inductive Definitions
Lecture Notes on Inductive Definitions 15-312: Foundations of Programming Languages Frank Pfenning Lecture 2 September 2, 2004 These supplementary notes review the notion of an inductive definition and
More informationKNOWLEDGE-GUIDED RECURRENT NEURAL NETWORK LEARNING FOR TASK-ORIENTED ACTION PREDICTION
KNOWLEDGE-GUIDED RECURRENT NEURAL NETWORK LEARNING FOR TASK-ORIENTED ACTION PREDICTION Liang Lin, Lili Huang, Tianshui Chen, Yukang Gan, and Hui Cheng School of Data and Computer Science, Sun Yat-Sen University,
More informationImproving Sequence-to-Sequence Constituency Parsing
Improving Sequence-to-Sequence Constituency Parsing Lemao Liu, Muhua Zhu and Shuming Shi Tencent AI Lab, Shenzhen, China {redmondliu,muhuazhu, shumingshi}@tencent.com Abstract Sequence-to-sequence constituency
More informationLecture 3: ASR: HMMs, Forward, Viterbi
Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The
More informationA proof theoretical account of polarity items and monotonic inference.
A proof theoretical account of polarity items and monotonic inference. Raffaella Bernardi UiL OTS, University of Utrecht e-mail: Raffaella.Bernardi@let.uu.nl Url: http://www.let.uu.nl/ Raffaella.Bernardi/personal
More informationDeep Learning Recurrent Networks 2/28/2018
Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Language Models Tobias Scheffer Stochastic Language Models A stochastic language model is a probability distribution over words.
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationLearning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials
Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Contents 1 Pseudo-code for the damped Gauss-Newton vector product 2 2 Details of the pathological synthetic problems
More informationarxiv: v1 [cs.cl] 27 Aug 2018
Zero-shot Transfer Learning for Semantic Parsing Javid Dadashkarimi Computer Science Department Yale University javid.dadashkarimi@yale.edu Alexander Fabbri Computer Science Department Yale University
More informationACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers
More informationData Mining & Machine Learning
Data Mining & Machine Learning CS57300 Purdue University March 1, 2018 1 Recap of Last Class (Model Search) Forward and Backward passes 2 Feedforward Neural Networks Neural Networks: Architectures 2-layer
More informationA QUESTION ANSWERING SYSTEM USING ENCODER-DECODER, SEQUENCE-TO-SEQUENCE, RECURRENT NEURAL NETWORKS. A Project. Presented to
A QUESTION ANSWERING SYSTEM USING ENCODER-DECODER, SEQUENCE-TO-SEQUENCE, RECURRENT NEURAL NETWORKS A Project Presented to The Faculty of the Department of Computer Science San José State University In
More informationLecture 9: Decoding. Andreas Maletti. Stuttgart January 20, Statistical Machine Translation. SMT VIII A. Maletti 1
Lecture 9: Decoding Andreas Maletti Statistical Machine Translation Stuttgart January 20, 2012 SMT VIII A. Maletti 1 Lecture 9 Last time Synchronous grammars (tree transducers) Rule extraction Weight training
More informationLecture Notes on Inductive Definitions
Lecture Notes on Inductive Definitions 15-312: Foundations of Programming Languages Frank Pfenning Lecture 2 August 28, 2003 These supplementary notes review the notion of an inductive definition and give
More informationCMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss
CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss Jeffrey Flanigan Chris Dyer Noah A. Smith Jaime Carbonell School of Computer Science, Carnegie Mellon University, Pittsburgh,
More informationTuning as Linear Regression
Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical
More informationDeep Neural Solver for Math Word Problems
Deep Neural Solver for Math Word Problems Yan Wang Xiaojiang Liu Shuming Shi Tencent AI Lab {brandenwang, kieranliu, shumingshi}@tencent.com Abstract This paper presents a deep neural solver to automatically
More information