Driving Semantic Parsing from the World s Response
|
|
- Augusta Richardson
- 5 years ago
- Views:
Transcription
1 Driving Semantic Parsing from the World s Response James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign CoNLL 2010 Clarke, Goldwasser, Chang, Roth 1
2 What is Semantic Parsing? Meaning Representation make(coffee, sugar=0, milk=0.3) I d like a coffee with no sugar and just a little milk Clarke, Goldwasser, Chang, Roth 2
3 What is Semantic Parsing? Meaning Representation make(coffee, sugar=0, milk=0.3) I d like a coffee with no sugar and just a little milk Clarke, Goldwasser, Chang, Roth 2
4 Supervised Learning Problem text meaning Training algorithm Model Challenges: Structured Prediction problem Model part of the structure as hidden? Clarke, Goldwasser, Chang, Roth 3
5 Lots of previous work Multiple approaches to the problem: KRISP (Kate & Mooney 2006) SVM-based parser using string kernels. Zettlemoyer & Collins 2005; Zettlemoyer & Collins 2007 Probabilistic parser based on relaxed CCG grammars. WASP (Wong & Mooney 2006; Wong & Mooney 2007) Based on Synchronous CFG. Ge & Mooney 2009 Integrated syntactic and semantic parser. Clarke, Goldwasser, Chang, Roth 4
6 Lots of previous work Multiple approaches to the problem: KRISP (Kate & Mooney 2006) SVM-based parser using string kernels. Zettlemoyer & Collins 2005; Zettlemoyer & Collins 2007 Probabilistic parser based on relaxed CCG grammars. WASP (Wong & Mooney 2006; Wong & Mooney 2007) Based on Synchronous CFG. Ge & Mooney 2009 Integrated syntactic and semantic parser. Assumption: A training set consisting of natural language and meaning representation pairs. Clarke, Goldwasser, Chang, Roth 4
7 Using the World s response Meaning Representation make(coffee, sugar=0, milk=0.3) I d like a coffee with no sugar and just a little milk Clarke, Goldwasser, Chang, Roth 5
8 Using the World s response Meaning Representation make(coffee, sugar=0, milk=0.3) I d like a coffee with no sugar and just a little milk Good! Bad! Clarke, Goldwasser, Chang, Roth 5
9 Using the World s response Meaning Representation make(coffee, sugar=0, milk=0.3) I d like a coffee with no sugar and just a little milk Good! Bad! Question: Can we use feedback based on the response to provide supervision? Clarke, Goldwasser, Chang, Roth 5
10 This work We aim to: Reduce the burden of annotation for semantic parsing. We focus on: Using the World s response to learn a semantic parser. Developing new training algorithms to support this learning paradigm. A lightweight semantic parsing model that doesn t require annotated data. This results in: Learning a semantic parser using zero annotated meaning representations. Clarke, Goldwasser, Chang, Roth 6
11 Outline 1 Semantic Parsing 2 Learning DIRECT Approach AGGRESSIVE Approach 3 Semantic Parsing Model 4 Experiments Clarke, Goldwasser, Chang, Roth 7
12 Outline 1 Semantic Parsing 2 Learning DIRECT Approach AGGRESSIVE Approach 3 Semantic Parsing Model 4 Experiments Clarke, Goldwasser, Chang, Roth 8
13 Semantic Parsing INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas))) Clarke, Goldwasser, Chang, Roth 9
14 Semantic Parsing INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas))) F : X Z ẑ = F w (x) = arg max w T Φ(x, y, z) y Y,z Z Clarke, Goldwasser, Chang, Roth 9
15 Semantic Parsing INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas))) F : X Z ẑ = F w (x) = arg max w T Φ(x, y, z) y Y,z Z Model The nature of inference and feature functions. Learning Strategy How we obtain the weights. Clarke, Goldwasser, Chang, Roth 9
16 Semantic Parsing INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas))) Response r New Mexico F : X Z ẑ = F w (x) = arg max w T Φ(x, y, z) y Y,z Z Model The nature of inference and feature functions. Learning Strategy How we obtain the weights. Clarke, Goldwasser, Chang, Roth 9
17 Outline 1 Semantic Parsing 2 Learning DIRECT Approach AGGRESSIVE Approach 3 Semantic Parsing Model 4 Experiments Clarke, Goldwasser, Chang, Roth 10
18 Learning Inputs: Natural language sentences. Feedback : X Z {+1, 1}. Zero meaning representations. Clarke, Goldwasser, Chang, Roth 11
19 Learning Inputs: Natural language sentences. Feedback : X Z {+1, 1}. Zero meaning representations. { +1 if execute(z) = r Feedback(x, z) = 1 otherwise Clarke, Goldwasser, Chang, Roth 11
20 Learning Inputs: Natural language sentences. Feedback : X Z {+1, 1}. Zero meaning representations. Goal: A weight vector that scores the correct meaning representation higher than all other meaning representations. Response Driven Learning: Feedback Input text predict Meaning Representation apply to World Clarke, Goldwasser, Chang, Roth 11
21 Learning Strategies x 1 x 2 x 3. repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n Clarke, Goldwasser, Chang, Roth 12
22 Learning Strategies x 1 y 1 z 1 x 2 y 2 z 2 x 3 y 3 z 3... x n y n z n repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence y, z = arg max w T Φ(x, y, z) Clarke, Goldwasser, Chang, Roth 12
23 Learning Strategies x 1 y 1 z 1 +1 x 2 y 2 z 2 1 x 3 y 3 z repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n y n z n 1 Clarke, Goldwasser, Chang, Roth 12
24 Learning Strategies x 1 y 1 z 1 +1 x 2 y 2 z 2 1 x 3 y 3 z repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n y n z n 1 Clarke, Goldwasser, Chang, Roth 12
25 Outline 1 Semantic Parsing 2 Learning DIRECT Approach AGGRESSIVE Approach 3 Semantic Parsing Model 4 Experiments Clarke, Goldwasser, Chang, Roth 13
26 DIRECT Approach Binary Learning Feedback Input text predict Meaning Representation apply to World DIRECT Learn a binary classifier to discriminate between good and bad meaning representations. Clarke, Goldwasser, Chang, Roth 14
27 DIRECT Approach x 1 y 1 z 1 +1 x 2 y 2 z 2 1 x 3 y 3 z 3 1 Use (x, y, z) as a training example with label from feedback..... x n y n z n 1 Clarke, Goldwasser, Chang, Roth 15
28 DIRECT Approach x 1, y 1, z 1 +1 x 2, y 2, z 2 x 3, y 3, z Use (x, y, z) as a training example with label from feedback. Find w such that f w T Φ(x, y, z) > 0 x n, y n, z n 1 Clarke, Goldwasser, Chang, Roth 15
29 DIRECT Approach Each point represented by Φ(x, y, x) normalized by x Clarke, Goldwasser, Chang, Roth 16
30 DIRECT Approach w Learn a binary classifier to discriminate between good and bad meaning representations. Clarke, Goldwasser, Chang, Roth 16
31 DIRECT Approach repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence Clarke, Goldwasser, Chang, Roth 17
32 DIRECT Approach x 1 x 2 x 3. repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n Clarke, Goldwasser, Chang, Roth 17
33 DIRECT Approach x 1 y 1 z 1 x 2 y 2 z 2 x 3 y 3 z 3... repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence y, z = arg max w T Φ(x, y, z) x n y n z n Clarke, Goldwasser, Chang, Roth 17
34 DIRECT Approach x 1 y 1 z 1 +1 x 2 y 2 z 2 +1 x 3 y 3 z repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n y n z n 1 Clarke, Goldwasser, Chang, Roth 17
35 DIRECT Approach x 1, y 1, z 1 +1 x 2, y 2, z 2 +1 x 3, y 3, z repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n, y n, z n 1 Clarke, Goldwasser, Chang, Roth 17
36 DIRECT Approach w Clarke, Goldwasser, Chang, Roth 18
37 DIRECT Approach w Clarke, Goldwasser, Chang, Roth 18
38 DIRECT Approach w Clarke, Goldwasser, Chang, Roth 18
39 DIRECT Approach w Repeat until convergence! Clarke, Goldwasser, Chang, Roth 18
40 Outline 1 Semantic Parsing 2 Learning DIRECT Approach AGGRESSIVE Approach 3 Semantic Parsing Model 4 Experiments Clarke, Goldwasser, Chang, Roth 19
41 AGGRESSIVE Approach Structured Learning Feedback Input text predict Meaning Representation apply to World AGGRESSIVE Positive feedback is a good indicator of the correct meaning representation. Use data with positive feedback as training data for structured learning. Clarke, Goldwasser, Chang, Roth 20
42 AGGRESSIVE Approach x 1 y 1 z 1 +1 x 2 y 2 z 2 1 x 3 y 3 z repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n y n z n 1 Clarke, Goldwasser, Chang, Roth 21
43 AGGRESSIVE Approach x 1 y 1 z 1 +1 x 2 y 2 z 2 1 Use items with positive feedback as training data for a structured learner. x 3 y 3 z x n y n z n 1 Clarke, Goldwasser, Chang, Roth 21
44 AGGRESSIVE Approach x 1 y 1 z 1 +1 x 2 y 2 z 2 1 Use items with positive feedback as training data for a structured learner. x 3 y 3 z x n y n z n 1 Clarke, Goldwasser, Chang, Roth 21
45 AGGRESSIVE Approach x 1 y 1 z 1 x 2 y 2 z 2 Use items with positive feedback as training data for a structured learner. x 3 y 3 z x n y n z n Clarke, Goldwasser, Chang, Roth 21
46 AGGRESSIVE Approach x 1 y 1 z 1 x 2 y 2 z 2 x 3 y 3 z 3... x n y n z n. Use items with positive feedback as training data for a structured learner. Implicitly consider all other meaning representations for these examples as bad. Find w such that w T Φ(x, y, z ) > w T Φ(x, y, z ) Clarke, Goldwasser, Chang, Roth 21
47 AGGRESSIVE Approach repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence Clarke, Goldwasser, Chang, Roth 22
48 AGGRESSIVE Approach x 1 x 2 x 3. repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n Clarke, Goldwasser, Chang, Roth 22
49 AGGRESSIVE Approach x 1 y 1 z 1 x 2 y 2 z 2 x 3 y 3 z 3... repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence y, z = arg max w T Φ(x, y, z) x n y n z n Clarke, Goldwasser, Chang, Roth 22
50 AGGRESSIVE Approach x 1 y 1 z 1 +1 x 2 y 2 z 2 +1 x 3 y 3 z repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n y n z n +1 Clarke, Goldwasser, Chang, Roth 22
51 AGGRESSIVE Approach x 1 y 1 z 1 +1 x 2 y 2 z 2 +1 x 3 y 3 z repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n y n z n +1 Clarke, Goldwasser, Chang, Roth 22
52 AGGRESSIVE Approach x 1 y 1 z 1 x 2 y 2 z 2 x 3 y 3 z 3... repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n y n z n Clarke, Goldwasser, Chang, Roth 22
53 AGGRESSIVE Approach x 1 y 1 z 1 x 2 y 2 z 2 x 3 y 3 z 3... repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n y n z n Clarke, Goldwasser, Chang, Roth 22
54 Summary of Learning Strategies Learning Strategy Feedback Input text predict Meaning Representation apply to World DIRECT Uses both positive and negative feedback as examples to train a binary classifier. AGGRESSIVE Adapts the feedback signal and uses only positive feedback to train a structured predictor. Clarke, Goldwasser, Chang, Roth 23
55 Outline 1 Semantic Parsing 2 Learning DIRECT Approach AGGRESSIVE Approach 3 Semantic Parsing Model 4 Experiments Clarke, Goldwasser, Chang, Roth 24
56 Model INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas))) ẑ = F w (x) = arg max w T Φ(x, y, z) y Y,z Z First-order: Map lexical items. largest largest Second-order: Composition. next_to(state( )) or state(next_to( )) Inference procedure leverages the typing information of the domain. Clarke, Goldwasser, Chang, Roth 25
57 First-order Decisions How many people live in the state of Texas? Goal: population(state(texas))
58 First-order Decisions How many people live in the state of Texas? loc texas next_to state population null Goal: population(state(texas))
59 First-order Decisions How many people live in the state of Texas? loc texas next_to state population null Goal: population(state(texas))
60 First-order Decisions How many people live in the state of Texas? loc texas next_to state population null > texas texas Use a simple lexicon to bootstrap the > state process. state > population population > loc in > next_to next borders adjacent Goal: population(state(texas))
61 First-order Decisions How many people live in the state of Texas? loc texas next_to state population null > texas texas Use a simple lexicon to bootstrap the > state process. state > population population > loc in > next_to next borders adjacent Goal: population(state(texas))
62 First-order Decisions How many people live in the state of Texas? loc texas next_to state population null > texas texas Use a simple lexicon to bootstrap the > state process. state > population population > loc in > next_to next borders adjacent Goal: population(state(texas))
63 First-order Decisions How many people live in the state of Texas? loc texas next_to state population null > texas texas > state state > population population > loc in > next_to next borders adjacent Goal: population(state(texas)) Use a simple lexicon to bootstrap the process. Lexical resources help us move beyond the lexicon. wordnet_sim(people,population)
64 First-order Decisions How many people live in the state of Texas? loc texas next_to state population null > texas texas > state state > population population > loc in > next_to next borders adjacent Goal: population(state(texas)) Use a simple lexicon to bootstrap the process. Lexical resources help us move beyond the lexicon. wordnet_sim(people,population)
65 First-order Decisions How many people live in the state of Texas? loc texas next_to state population null > texas texas > state state > population population > loc in > next_to next borders adjacent Goal: population(state(texas)) Use a simple lexicon to bootstrap the process. Lexical resources help us move beyond the lexicon. wordnet_sim(people,population) Context helps disambiguate between choices.
66 Second-order Decisions How do we compose the predicates and constants. Domain dependent: Encode typing information inherent in the domain into the inference procedure. population(state( )) vs state(population( )) Features: Dependency path distance. Word position distance. Predicate bigrams. next_to(state( )) vs state(next_to( )) Clarke, Goldwasser, Chang, Roth 27
67 Outline 1 Semantic Parsing 2 Learning DIRECT Approach AGGRESSIVE Approach 3 Semantic Parsing Model 4 Experiments Clarke, Goldwasser, Chang, Roth 28
68 Evaluation Domain: GEOQUERY U.S Geographical Questions. Response 250. (x, r) pairs. Zero meaning representations. Query 250. (x) sentences. Evaluation metric: Accuracy (percentage of meaning representations that return the correct answer). Clarke, Goldwasser, Chang, Roth 29
69 Learning Behavior Algorithm R250 Q250 NOLEARN 22.2 DIRECT AGGRESSIVE SUPERVISED Clarke, Goldwasser, Chang, Roth 30
70 Learning Behavior Algorithm R250 Q250 NOLEARN 22.2 DIRECT AGGRESSIVE SUPERVISED NOLEARN used to initialize both learning approaches. Clarke, Goldwasser, Chang, Roth 30
71 Learning Behavior Algorithm R250 Q250 NOLEARN 22.2 DIRECT AGGRESSIVE SUPERVISED Q: How good is our model when trained in a fully supervised manner? Clarke, Goldwasser, Chang, Roth 30
72 Learning Behavior Algorithm R250 Q250 NOLEARN 22.2 DIRECT AGGRESSIVE SUPERVISED Q: How good is our model when trained in a fully supervised manner? A: 80% on test data. Other supervised methods range from 60% to 85% accuracy. Clarke, Goldwasser, Chang, Roth 30
73 Learning Behavior Algorithm R250 Q250 NOLEARN 22.2 DIRECT AGGRESSIVE SUPERVISED Q: Is it possible to learn without any meaning representations? Clarke, Goldwasser, Chang, Roth 30
74 Learning Behavior Algorithm R250 Q250 NOLEARN 22.2 DIRECT AGGRESSIVE SUPERVISED Q: Is it possible to learn without any meaning representations? A: Yes! A: Learns to cover more of the Response data set. A: And only 7% below the SUPERVISED upper bound. Clarke, Goldwasser, Chang, Roth 30
75 Learning Behavior Algorithm R250 Q250 NOLEARN 22.2 DIRECT AGGRESSIVE SUPERVISED Q: Is it possible to learn without any meaning representations? A: Yes! A: Learns to cover more of the Response data set. A: And only 7% below the SUPERVISED upper bound. Clarke, Goldwasser, Chang, Roth 30
76 Learning Behavior Accuracy on Response Initialization Learning Iterations AGGRESSIVE DIRECT AGGRESSIVE correctly interprets 16% that DIRECT does not. 9% vice-versa. Leaving only 9% incorrect. Clarke, Goldwasser, Chang, Roth 31
77 Learning from Indirect Supervision Similar to indirect learning protocols: Learning a binary classifier with hidden explanation. Supervision only required for binary data. No labeled structures. NAACL 2010 (Chang, Goldwasser, Roth, Srikumar 2010a). Structured learning with binary and structured labels. Mix of supervision for binary data and structured data. Binary label indicates whether input has a good structure. ICML 2010 (Chang, Goldwasser, Roth, Srikumar 2010b). Clarke, Goldwasser, Chang, Roth 32
78 Conclusions Contributions: Response Driven Learning. A new learning paradigm that doesn t rely on annotated meaning representations. Supervised at the response level. Natural supervision signal. Two learning algorithms capable of working within response driven learning. A shallow semantic parsing model. Future work: Can we combine the two learning algorithms? Other semantic parsing domains? Response driven learning for other tasks? Clarke, Goldwasser, Chang, Roth 33
Semantic Parsing with Combinatory Categorial Grammars
Semantic Parsing with Combinatory Categorial Grammars Yoav Artzi, Nicholas FitzGerald and Luke Zettlemoyer University of Washington ACL 2013 Tutorial Sofia, Bulgaria Learning Data Learning Algorithm CCG
More informationA Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26 The Task (Logical Form) λx 0.state(x 0
More informationIntroduction to Semantic Parsing with CCG
Introduction to Semantic Parsing with CCG Kilian Evang Heinrich-Heine-Universität Düsseldorf 2018-04-24 Table of contents 1 Introduction to CCG Categorial Grammar (CG) Combinatory Categorial Grammar (CCG)
More informationPenn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark
Penn Treebank Parsing Advanced Topics in Language Processing Stephen Clark 1 The Penn Treebank 40,000 sentences of WSJ newspaper text annotated with phrasestructure trees The trees contain some predicate-argument
More informationOutline. Learning. Overview Details Example Lexicon learning Supervision signals
Outline Learning Overview Details Example Lexicon learning Supervision signals 0 Outline Learning Overview Details Example Lexicon learning Supervision signals 1 Supervision in syntactic parsing Input:
More informationLearning Dependency-Based Compositional Semantics
Learning Dependency-Based Compositional Semantics Semantic Representations for Textual Inference Workshop Mar. 0, 0 Percy Liang Google/Stanford joint work with Michael Jordan and Dan Klein Motivating Problem:
More informationPersonal Project: Shift-Reduce Dependency Parsing
Personal Project: Shift-Reduce Dependency Parsing 1 Problem Statement The goal of this project is to implement a shift-reduce dependency parser. This entails two subgoals: Inference: We must have a shift-reduce
More informationMargin-based Decomposed Amortized Inference
Margin-based Decomposed Amortized Inference Gourab Kundu and Vivek Srikumar and Dan Roth University of Illinois, Urbana-Champaign Urbana, IL. 61801 {kundu2, vsrikum2, danr}@illinois.edu Abstract Given
More informationOutline. Learning. Overview Details
Outline Learning Overview Details 0 Outline Learning Overview Details 1 Supervision in syntactic parsing Input: S NP VP NP NP V VP ESSLLI 2016 the known summer school is V PP located in Bolzano Output:
More informationSemantic Role Labeling and Constrained Conditional Models
Semantic Role Labeling and Constrained Conditional Models Mausam Slides by Ming-Wei Chang, Nick Rizzolo, Dan Roth, Dan Jurafsky Page 1 Nice to Meet You 0: 2 ILP & Constraints Conditional Models (CCMs)
More informationarxiv: v2 [cs.cl] 20 Aug 2016
Solving General Arithmetic Word Problems Subhro Roy and Dan Roth University of Illinois, Urbana Champaign {sroy9, danr}@illinois.edu arxiv:1608.01413v2 [cs.cl] 20 Aug 2016 Abstract This paper presents
More informationLanguage to Logical Form with Neural Attention
Language to Logical Form with Neural Attention Mirella Lapata and Li Dong Institute for Language, Cognition and Computation School of Informatics University of Edinburgh mlap@inf.ed.ac.uk 1 / 32 Semantic
More informationCross-lingual Semantic Parsing
Cross-lingual Semantic Parsing Part I: 11 Dimensions of Semantic Parsing Kilian Evang University of Düsseldorf 1 / 94 Abbreviations NL natural language e.g., English, Bulgarian NLU natural language utterance
More informationMachine Learning for Structured Prediction
Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for
More informationLECTURER: BURCU CAN Spring
LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can
More informationTALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing
TALP at GeoQuery 2007: Linguistic and Geographical Analysis for Query Parsing Daniel Ferrés and Horacio Rodríguez TALP Research Center Software Department Universitat Politècnica de Catalunya {dferres,horacio}@lsi.upc.edu
More informationInformation Extraction from Text
Information Extraction from Text Jing Jiang Chapter 2 from Mining Text Data (2012) Presented by Andrew Landgraf, September 13, 2013 1 What is Information Extraction? Goal is to discover structured information
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationNatural Language Processing (CSEP 517): Text Classification
Natural Language Processing (CSEP 517): Text Classification Noah Smith c 2017 University of Washington nasmith@cs.washington.edu April 10, 2017 1 / 71 To-Do List Online quiz: due Sunday Read: Jurafsky
More informationLecture 5: Semantic Parsing, CCGs, and Structured Classification
Lecture 5: Semantic Parsing, CCGs, and Structured Classification Kyle Richardson kyle@ims.uni-stuttgart.de May 12, 2016 Lecture Plan paper: Zettlemoyer and Collins (2012) general topics: (P)CCGs, compositional
More informationLecture 13: Structured Prediction
Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page
More informationAN ABSTRACT OF THE DISSERTATION OF
AN ABSTRACT OF THE DISSERTATION OF Kai Zhao for the degree of Doctor of Philosophy in Computer Science presented on May 30, 2017. Title: Structured Learning with Latent Variables: Theory and Algorithms
More informationSemantic Role Labeling via Tree Kernel Joint Inference
Semantic Role Labeling via Tree Kernel Joint Inference Alessandro Moschitti, Daniele Pighin and Roberto Basili Department of Computer Science University of Rome Tor Vergata 00133 Rome, Italy {moschitti,basili}@info.uniroma2.it
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationGlobal Machine Learning for Spatial Ontology Population
Global Machine Learning for Spatial Ontology Population Parisa Kordjamshidi, Marie-Francine Moens KU Leuven, Belgium Abstract Understanding spatial language is important in many applications such as geographical
More informationAlgorithms for Syntax-Aware Statistical Machine Translation
Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial
More informationUnanimous Prediction for 100% Precision with Application to Learning Semantic Mappings
Unanimous Prediction for 00% Precision with Application to Learning Semantic Mappings Fereshte Khani Stanford University fereshte@cs.stanford.edu Martin Rinard MIT rinard@lcs.mit.edu Percy Liang Stanford
More informationNatural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/22/2010 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements W7 due tonight [this is your last written for
More informationLearning and Inference over Constrained Output
Learning and Inference over Constrained Output Vasin Punyakanok Dan Roth Wen-tau Yih Dav Zimak Department of Computer Science University of Illinois at Urbana-Champaign {punyakan, danr, yih, davzimak}@uiuc.edu
More informationIntroduction to Data-Driven Dependency Parsing
Introduction to Data-Driven Dependency Parsing Introductory Course, ESSLLI 2007 Ryan McDonald 1 Joakim Nivre 2 1 Google Inc., New York, USA E-mail: ryanmcd@google.com 2 Uppsala University and Växjö University,
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins
Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationMultiword Expression Identification with Tree Substitution Grammars
Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic
More informationDialogue Systems. Statistical NLU component. Representation. A Probabilistic Dialogue System. Task: map a sentence + context to a database query
Statistical NLU component Task: map a sentence + context to a database query Dialogue Systems User: Show me flights from NY to Boston, leaving tomorrow System: [returns a list of flights] Origin (City
More informationPolyhedral Outer Approximations with Application to Natural Language Parsing
Polyhedral Outer Approximations with Application to Natural Language Parsing André F. T. Martins 1,2 Noah A. Smith 1 Eric P. Xing 1 1 Language Technologies Institute School of Computer Science Carnegie
More informationChapter 2 The Naïve Bayes Model in the Context of Word Sense Disambiguation
Chapter 2 The Naïve Bayes Model in the Context of Word Sense Disambiguation Abstract This chapter discusses the Naïve Bayes model strictly in the context of word sense disambiguation. The theoretical model
More informationStatistical Methods for NLP
Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification
More informationAlgorithms for NLP. Classification II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Classification II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Minimize Training Error? A loss function declares how costly each mistake is E.g. 0 loss for correct label,
More informationLearning Features from Co-occurrences: A Theoretical Analysis
Learning Features from Co-occurrences: A Theoretical Analysis Yanpeng Li IBM T. J. Watson Research Center Yorktown Heights, New York 10598 liyanpeng.lyp@gmail.com Abstract Representing a word by its co-occurrences
More informationLab 12: Structured Prediction
December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationA* Search. 1 Dijkstra Shortest Path
A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationTuning as Linear Regression
Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical
More informationFast Computing Grammar-driven Convolution Tree Kernel for Semantic Role Labeling
Fast Computing Grammar-driven Convolution Tree Kernel for Semantic Role Labeling Wanxiang Che 1, Min Zhang 2, Ai Ti Aw 2, Chew Lim Tan 3, Ting Liu 1, Sheng Li 1 1 School of Computer Science and Technology
More informationNLP Programming Tutorial 11 - The Structured Perceptron
NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Prediction Problems Given x, A book review Oh, man I love this book! This book is
More informationNatural Language Processing
SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University October 9, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class
More informationAdvanced Natural Language Processing Syntactic Parsing
Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm
More informationFeatures of Statistical Parsers
Features of tatistical Parsers Preliminary results Mark Johnson Brown University TTI, October 2003 Joint work with Michael Collins (MIT) upported by NF grants LI 9720368 and II0095940 1 Talk outline tatistical
More informationBringing machine learning & compositional semantics together: central concepts
Bringing machine learning & compositional semantics together: central concepts https://githubcom/cgpotts/annualreview-complearning Chris Potts Stanford Linguistics CS 244U: Natural language understanding
More informationDiscrimina)ve Latent Variable Models. SPFLODD November 15, 2011
Discrimina)ve Latent Variable Models SPFLODD November 15, 2011 Lecture Plan 1. Latent variables in genera)ve models (review) 2. Latent variables in condi)onal models 3. Latent variables in structural SVMs
More informationMIRA, SVM, k-nn. Lirong Xia
MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output
More informationProbabilistic Context-free Grammars
Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John
More informationML4NLP Multiclass Classification
ML4NLP Multiclass Classification CS 590NLP Dan Goldwasser Purdue University dgoldwas@purdue.edu Social NLP Last week we discussed the speed-dates paper. Interesting perspective on NLP problems- Can we
More informationLinear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt)
Linear Models for Classification: Discriminative Learning (Perceptron, SVMs, MaxEnt) Nathan Schneider (some slides borrowed from Chris Dyer) ENLP 12 February 2018 23 Outline Words, probabilities Features,
More informationLearning with multiple models. Boosting.
CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationDecoding and Inference with Syntactic Translation Models
Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationDEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE
Data Provided: None DEPARTMENT OF COMPUTER SCIENCE Autumn Semester 203 204 MACHINE LEARNING AND ADAPTIVE INTELLIGENCE 2 hours Answer THREE of the four questions. All questions carry equal weight. Figures
More informationLearning to translate with neural networks. Michael Auli
Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2010 Lecture 22: Nearest Neighbors, Kernels 4/18/2011 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!)
More informationLinear Classifiers IV
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers IV Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationA Support Vector Method for Multivariate Performance Measures
A Support Vector Method for Multivariate Performance Measures Thorsten Joachims Cornell University Department of Computer Science Thanks to Rich Caruana, Alexandru Niculescu-Mizil, Pierre Dupont, Jérôme
More informationGrammars and introduction to machine learning. Computers Playing Jeopardy! Course Stony Brook University
Grammars and introduction to machine learning Computers Playing Jeopardy! Course Stony Brook University Last class: grammars and parsing in Prolog Noun -> roller Verb thrills VP Verb NP S NP VP NP S VP
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 23, 2015 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationMachine Learning for natural language processing
Machine Learning for natural language processing Classification: Maximum Entropy Models Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 24 Introduction Classification = supervised
More informationCategorization ANLP Lecture 10 Text Categorization with Naive Bayes
1 Categorization ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Important task for both humans and machines object identification face recognition spoken word recognition
More informationANLP Lecture 10 Text Categorization with Naive Bayes
ANLP Lecture 10 Text Categorization with Naive Bayes Sharon Goldwater 6 October 2014 Categorization Important task for both humans and machines 1 object identification face recognition spoken word recognition
More informationIntroduction to Computational Linguistics
Introduction to Computational Linguistics Olga Zamaraeva (2018) Based on Bender (prev. years) University of Washington May 3, 2018 1 / 101 Midterm Project Milestone 2: due Friday Assgnments 4& 5 due dates
More informationProbabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning
Probabilistic and Logistic Circuits: A New Synthesis of Logic and Machine Learning Guy Van den Broeck KULeuven Symposium Dec 12, 2018 Outline Learning Adding knowledge to deep learning Logistic circuits
More informationLow-Dimensional Discriminative Reranking. Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park
Low-Dimensional Discriminative Reranking Jagadeesh Jagarlamudi and Hal Daume III University of Maryland, College Park Discriminative Reranking Useful for many NLP tasks Enables us to use arbitrary features
More informationCMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss
CMU at SemEval-2016 Task 8: Graph-based AMR Parsing with Infinite Ramp Loss Jeffrey Flanigan Chris Dyer Noah A. Smith Jaime Carbonell School of Computer Science, Carnegie Mellon University, Pittsburgh,
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationPAC Generalization Bounds for Co-training
PAC Generalization Bounds for Co-training Sanjoy Dasgupta AT&T Labs Research dasgupta@research.att.com Michael L. Littman AT&T Labs Research mlittman@research.att.com David McAllester AT&T Labs Research
More information6.891: Lecture 24 (December 8th, 2003) Kernel Methods
6.891: Lecture 24 (December 8th, 2003) Kernel Methods Overview ffl Recap: global linear models ffl New representations from old representations ffl computational trick ffl Kernels for NLP structures ffl
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationIntegrating Morphology in Probabilistic Translation Models
Integrating Morphology in Probabilistic Translation Models Chris Dyer joint work with Jon Clark, Alon Lavie, and Noah Smith January 24, 2011 lti das alte Haus the old house mach das do that 2 das alte
More informationDeep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017
Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion
More informationAlgorithms for NLP. Language Modeling III. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Language Modeling III Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Office hours on website but no OH for Taylor until next week. Efficient Hashing Closed address
More informationQuasi-Second-Order Parsing for 1-Endpoint-Crossing, Pagenumber-2 Graphs
Quasi-Second-Order Parsing for 1-Endpoint-Crossing, Pagenumber-2 Graphs Junjie Cao, Sheng Huang, Weiwei Sun, Xiaojun Wan Institute of Computer Science and Technology Peking University September 5, 2017
More informationUnsupervised Rank Aggregation with Distance-Based Models
Unsupervised Rank Aggregation with Distance-Based Models Kevin Small Tufts University Collaborators: Alex Klementiev (Johns Hopkins University) Ivan Titov (Saarland University) Dan Roth (University of
More informationEmpirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs
Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21
More informationUndirected Graphical Models
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional
More informationPattern Recognition and Machine Learning. Perceptrons and Support Vector machines
Pattern Recognition and Machine Learning James L. Crowley ENSIMAG 3 - MMIS Fall Semester 2016 Lessons 6 10 Jan 2017 Outline Perceptrons and Support Vector machines Notation... 2 Perceptrons... 3 History...3
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology SE lecture revision 2013 Outline 1. Bayesian classification
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationA proof theoretical account of polarity items and monotonic inference.
A proof theoretical account of polarity items and monotonic inference. Raffaella Bernardi UiL OTS, University of Utrecht e-mail: Raffaella.Bernardi@let.uu.nl Url: http://www.let.uu.nl/ Raffaella.Bernardi/personal
More informationChunking with Support Vector Machines
NAACL2001 Chunking with Support Vector Machines Graduate School of Information Science, Nara Institute of Science and Technology, JAPAN Taku Kudo, Yuji Matsumoto {taku-ku,matsu}@is.aist-nara.ac.jp Chunking
More informationGenerative Models for Sentences
Generative Models for Sentences Amjad Almahairi PhD student August 16 th 2014 Outline 1. Motivation Language modelling Full Sentence Embeddings 2. Approach Bayesian Networks Variational Autoencoders (VAE)
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning
Probabilistic Context Free Grammars Many slides from Michael Collins and Chris Manning Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic
More informationPreference-Based Rank Elicitation using Statistical Models: The Case of Mallows
Preference-Based Rank Elicitation using Statistical Models: The Case of Mallows Robert Busa-Fekete 1 Eyke Huellermeier 2 Balzs Szrnyi 3 1 MTA-SZTE Research Group on Artificial Intelligence, Tisza Lajos
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationMining Classification Knowledge
Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification
More informationStructural Learning with Amortized Inference
Structural Learning with Amortized Inference AAAI 15 Kai-Wei Chang, Shyam Upadhyay, Gourab Kundu, and Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign {kchang10,upadhya3,kundu2,danr}@illinois.edu
More informationJoint Inference for Event Timeline Construction
Joint Inference for Event Timeline Construction Quang Xuan Do Wei Lu Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign Urbana, IL 61801, USA {quangdo2,luwei,danr}@illinois.edu
More informationThe efficiency of identifying timed automata and the power of clocks
The efficiency of identifying timed automata and the power of clocks Sicco Verwer a,b,1,, Mathijs de Weerdt b, Cees Witteveen b a Eindhoven University of Technology, Department of Mathematics and Computer
More informationLast Time. Today. Bayesian Learning. The Distributions We Love. CSE 446 Gaussian Naïve Bayes & Logistic Regression
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 22 Dan Weld Learning Gaussians Naïve Bayes Last Time Gaussians Naïve Bayes Logistic Regression Today Some slides from Carlos Guestrin, Luke Zettlemoyer
More informationThe Research on Syntactic Features in Semantic Role Labeling
23 6 2009 11 J OU RNAL OF CH IN ESE IN FORMA TION PROCESSIN G Vol. 23, No. 6 Nov., 2009 : 100320077 (2009) 0620011208,,,, (, 215006) :,,,( NULL ),,,; CoNLL22005 Shared Task WSJ 77. 54 %78. 75 %F1, : ;;;;
More information