Dependency Parsing. Statistical NLP Fall (Non-)Projectivity. CoNLL Format. Lecture 9: Dependency Parsing
|
|
- Evelyn McKenzie
- 6 years ago
- Views:
Transcription
1 Dependency Parsing Statistical NLP Fall 2016 Lecture 9: Dependency Parsing Slav Petrov Google prep dobj ROOT nsubj pobj det PRON VERB DET NOUN ADP NOUN They solved the problem with statistics CoNLL Format ROOT vc su obj1 mod punct Cathy zag hen zwaaien (Non-)Projectivity Crossing Arcs needed to account for nonprojective constructions Fairly rare in English but can be common in other languages (e.g. Czech):
2 Formal Conditions Styles of Dependency Parsing Transition-Based (tr) Fast greedy linear time inference algorithms Trained for greedy search Beam search Graph-Based (gr) Slower exhaustive dynamic programming inference algorithms Higher-order factorizations Accuracy greedy tr k-best tr O(k n) 1st-order gr 3rd-order gr O(n 4 ) 2nd-order gr O(n 3 ) O(n) O(n 3 ) [Nivre et al ] Time [McDonald et al ] Arc-Factored Models Graph-based Parsing sumes that scores factor over the tree Arc-factored models Score(tree) = edges I washed dishes with detergent = I washed washed dishes washed with with detergent + + +
3 Dependency Representation Representation Dependency Representation Representation Heads Heads Modifiers Modifiers Dependency Representation Representation Dependency Representation Representation Heads Heads Modifiers Modifiers
4 Dependency Representation Representation Dependency Representation Representation Heads Heads Modifiers Modifiers Dependency Representation Representation Dependency Representation Representation Heads Heads Modifiers Modifiers
5 Graph-Based Parsing Arc-factored Projective Parsing Arc-factored Projective Parsing Eisner Algorithm
6 Eisner First-Order Rules Eisner First-Order Parsing Eisner First-Order Parsing First-Order Parsing + h m h r r +1 m + h e h m m e In practice also left arc version Eisner First-Order Parsing First-Order Parsing Eisner First-Order Parsing First-Order Parsing
7 Eisner First-Order Parsing First-Order Parsing Eisner First-Order Parsing First-Order Parsing Eisner First-Order Parsing First-Order Parsing Eisner First-Order Parsing First-Order Parsing
8 Eisner First-Order Parsing First-Order Parsing Eisner First-Order Parsing First-Order Parsing Eisner Algorithm Pseudo Code Maximum Spanning Trees (MSTs) Can use MST algorithms for nonprojective parsing!
9 Chu-Liu-Edmonds Chu-Liu-Edmonds Find Cycle and Contract Recalculate Edge Weights
10 Theorem Final MST Chu-Liu-Edmonds PseudoCode Chu-Liu-Edmonds PseudoCode
11 Arc Weights Arc Feature Ideas for f(ijk) Identities of the words wi and wj and the label lk Part-of-speech tags of the words wi and wj and the label lk Part-of-speech of words surrounding and between wi and wj Number of words between wi and wj and their orientation Combinations of the above First-Order Feature Calculation First-Order Feature Computation (Structured) Perceptron [] [VBD] [] [ADP] [] [VERB] [] [IN] [ VBD] [ ADP] [ ] [VBD ADP] [ VERB] [ IN] [ ] [VERB IN] [VBD ADP] [ ADP] [ VBD ADP] [ VBD ] [ADJ ADP] [VBD ADP] [VBD ADJ ADP] [VBD ADJ ] [NNS ADP] [NNS VBD ADP] [NNS VBD ] [ADJ ADP NNP] [VBD ADP NNP] [VBD ADJ NNP] [NNS ADP NNP] [NNS VBD NNP] [ left 5] [VBD left 5] [ left 5] [ADP left 5] [VERB IN] [ IN] [ VERB IN] [ VERB ] [JJ IN] [VERB IN] [VERB JJ IN] [VERB JJ ] [NOUN IN] [NOUN VERB IN] [NOUN VERB ] [JJ IN NOUN] [VERB IN NOUN] [VERB JJ NOUN] [NOUN IN NOUN] [NOUN VERB NOUN] [ left 5] [VERB left 5] [ left 5] [IN left 5] [ VBD ADP] [VBD ADJ ADP] [NNS VBD ADP] [VBD ADJ ADP NNP] [NNS VBD ADP NNP] [ VBD left 5] [ ADP left 5] [ left 5] [VBD ADP left 5] [ VERB IN] [VERB JJ IN] [NOUN VERB IN] [VERB JJ IN NOUN] [NOUN VERB IN NOUN] [ VERB left 5] [ IN left 5] [ left 5] [VERB IN left 5] [VBD ADP left 5] [ ADP left 5] [ VBD ADP left 5] [ VBD left 5] [ADJ ADP left 5] [VBD ADP left 5] [VBD ADJ ADP left 5] [VBD ADJ left 5] [NNS ADP left 5] [NNS VBD ADP left 5] [NNS VBD left 5] [ADJ ADP NNP left 5] [VBD ADP NNP left 5] [VBD ADJ NNP left 5] [NNS ADP NNP left 5] [NNS VBD NNP left 5] [VERB IN left 5] [ IN left 5] [ VERB IN left 5] [ VERB left 5] [JJ IN left 5] [VERB IN left 5] [VERB JJ IN left 5] [VERB JJ left 5] [NOUN IN left 5] [NOUN VERB IN left 5]
12 Transition Based Dependency Parsing Arc-Standard Example Process sentence left to right Different transition strategies available Delay decisions by pushing on stack Arc-Standard Transition Strategy [Nivre 03] Initial configuration: ([][0 n][]) Terminal configuration: ([0][]A) SHIFT shift: (σ[i β]a) ([σ i]βa) left-arc (label): ([σ i j]ba) ([σ j]ba {jli}) right-arc (label): ([σ i j]ba) ([σ i]ba {ilj}) Arc-Standard Example Arc-Standard Example booked a flight to Lisbon SHIFT I LEFT-ARC nsubj
13 Arc-Standard Example Arc-Standard Example I booked a flight to Lisbon a flight to Lisbon SHIFT I booked SHIFT nsubj nsubj Arc-Standard Example Arc-Standard Example flight to Lisbon a flight to Lisbon I a booked LEFT-ARC det I booked SHIFT nsubj nsubj det
14 Arc-Standard Example Arc-Standard Example to Lisbon Lisbon a I flight booked SHIFT a to flight RIGHT-ARC pobj I booked nsubj det nsubj det Arc-Standard Example Arc-Standard Example to Lisbon a flight to Lisbon a I flight booked RIGHT-ARC prep I booked RIGHT-ARC dobj nsubj det pobj nsubj det prep pobj
15 Arc-Standard Example Features a flight I booked to Lisbon SHIFT RIGHT-ARC? LEFT-ARC? dobj nsubj I booked det a prep flight pobj to Lisbon Features ZPar Parser # From Single Words pair { stack.tag stack.word } stack { word tag } pair { input.tag input.word } input { word tag } pair { input(1).tag input(1).word } input(1) { word tag } pair { input(2).tag input(2).word } input(2) { word tag } # From quad { triple triple triple triple pair { pair { pair { word pairs stack.tag stack.word input.tag input.word } { stack.tag stack.word input.word } { stack.word input.tag input.word } { stack.tag stack.word input.tag } { stack.tag input.tag input.word } stack.word input.word } stack.tag input.tag } input.tag input(1).tag } # From triple triple triple triple triple triple word triples { input.tag input(1).tag input(2).tag } { stack.tag input.tag input(1).tag } { stack.head(1).tag stack.tag input.tag } { stack.tag stack.child(-1).tag input.tag } { stack.tag stack.child(1).tag input.tag } { stack.tag input.tag input.child(-1).tag } # Distance pair { stack.distance stack.word } pair { stack.distance stack.tag } pair { stack.distance input.word } pair { stack.distance input.tag } triple { stack.distance stack.word input.word } triple { stack.distance stack.tag input.tag } Stack top word = flight Stack top POS tag = NOUN Buffer front word = to Child of stack top word = a... # valency pair { stack.word stack.valence(-1) } pair { stack.word stack.valence(1) } pair { stack.tag stack.valence(-1) } pair { stack.tag stack.valence(1) } pair { input.word input.valence(-1) } pair { input.tag input.valence(-1) } # unigrams stack.head(1) {word tag} stack.label stack.child(-1) {word tag label} stack.child(1) {word tag label} input.child(-1) {word tag label} SVM / Structured Perceptron Hyperparameters Regularization Loss function Hand-crafted features # third order stack.head(1).head(1) {word tag} stack.head(1).label stack.child(-1).sibling(1) {word tag label} stack.child(1).sibling(-1) {word tag label} input.child(-1).sibling(1) {word tag label} triple { stack.tag stack.child(-1).tag stack.child(-1).sibling(1).tag } triple { stack.tag stack.child(1).tag stack.child(1).sibling(-1).tag } triple { stack.tag stack.head(1).tag stack.head(1).head(1).tag } triple { input.tag input.child(-1).tag input.child(-1).sibling(1).tag } # label set pair { stack.tag stack.child(-1).label } triple { stack.tag stack.child(-1).label stack.child(-1).sibling(1).label } quad { stack.tag stack.child(-1).label stack.child(-1).sibling(1).label stack.child(-1).sibling(2).label } pair { stack.tag stack.child(1).label } triple { stack.tag stack.child(1).label stack.child(1).sibling(-1).label } quad { stack.tag stack.child(1).label stack.child(1).sibling(-1).label stack.child(1).sibling(-2).label } pair { input.tag input.child(-1).label } triple { input.tag input.child(-1).label input.child(-1).sibling(1).label } quad { input.tag input.child(-1).label input.child(-1).sibling(1).label input.child(-1).sibling(2).label }
16 Neural Network Transition Based Parser [Chen & Manning 14] and [Weiss et al. 15] Softmax Neural Network Transition Based Parser Softmax [Weiss et al. 15] Hidden Layer Hidden Layer words pos labels words pos labels Embedding Layer Embedding Layer f0 = 1 [buffer0-word = to ] f0 = 1 [buffer0-word = to ] f1 = 1 [buffer1-word = Bilbao ] f2 = 1 [buffer0-pos = IN ] f3 = 1 [stack0-label = pobj ] Atomic Inputs f1 = 1 [buffer1-word = Bilbao ] f2 = 1 [buffer0-pos = IN ] f3 = 1 [stack0-label = pobj ] Atomic Inputs Neural Network Transition Based Parser Neural Network Transition Based Parser Softmax [Weiss et al. 15] Softmax [Weiss et al. 15] Hidden Layer 2 Hidden Layer Hidden Layer 1 words pos labels words pos labels Embedding Layer Embedding Layer f0 = 1 [buffer0-word = to ] f0 = 1 [buffer0-word = to ] f1 = 1 [buffer1-word = Bilbao ] f2 = 1 [buffer0-pos = IN ] f3 = 1 [stack0-label = pobj ] Atomic Inputs f1 = 1 [buffer1-word = Bilbao ] f2 = 1 [buffer0-pos = IN ] f3 = 1 [stack0-label = pobj ] Atomic Inputs
17 English Results (WSJ 23) Neural Network Transition Based Parser fs words pos.w [Weiss et al. 15] labels Method 3rd-order Graph-based (ZM2014) UAS LAS Beam Transition-based Linear (ZN2011) NN Baseline (Chen & Manning 2014) NN Better SGD (Weiss et al. 2015) NN Deeper Network (Weiss et al. 2015) f0 = 1 [buffer0-word = to ] structured perceptron f1 = 1 [buffer1-word = Bilbao ] f2 = 1 [buffer0-pos = IN ] f3 = 1 [stack0-label = pobj ] NN Hyperparameters Regularization Loss function NN Hyperparameters Regularization Dimensions Loss function Activation function Initialization Adagrad Dropout
18 NN Hyperparameters Regularization Dimensions NN Hyperparameters Loss function Optimization matters! Use random restarts grid Pick best using holdout data Activation function Initialization Adagrad Tune: WSJ S24 Dev: WSJ S22 Test: WSJ S23 Dropout Mini-batch size Initial learning rate Learning rate schedule Momentum Stopping time Parameter averaging Random Restarts: How much Variance? Effect of Embedding Dimensions Variance of Networks on Tuning/Dev Set nd hidden layer + pre training increases correlation UAS (%) UAS (%) on WSJ Dev Set Word Tuning on WSJ (Tune Set DposDlabels=32) Pretrained 200x200 Pretrained x Pretrained 200x200 Pretrained x UAS (%) on WSJ Tune Set Word Embedding Dimension (Dwords) 128
19 UAS (%) Effect of Embedding Dimensions POS/Label Tuning on WSJ (Tune Set D words =64) Pretrained 200x Pretrained x POS/Label Embedding Dimension (D pos D labels ) How Important is Lookahead? UAS ( 22 of the WSJ) Local ? Alice saw Bob eat pizza with Charlie How Important is Lookahead? How Important is Lookahead? 95 Local 95 Local UAS 85 UAS ? Alice saw Bob eat pizza with Charlie? Alice saw Bob eat pizza with Charlie
20 How Important is Lookahead? How Important is Lookahead? Local Local UAS 85 UAS ? Alice saw Bob eat pizza with Charlie? Alice saw Bob eat pizza with Charlie How Important is Lookahead? How Important is Lookahead? Local Local LSTM [Kiperwasser & Goldberg '16] UAS 85 UAS Alice saw Bob eat pizza with Charlie Bi-LSTM Alice saw Bob eat pizza with Charlie Bi-LSTM
21 Better Beam Search with Local Model Beam Search with Local Model Beam 95 Local +Beam 90 UAS (Schematic) Alice saw Bob eat pizza with Charlie Lookahead Beam Training with Early Updates BACKPROP X i X i X i X i X i (1) i (2) i (3) i (4) i ( ) i Globally normalized with respect to the beam: ( ) i exp P i P Beam j=1 exp P i (j) i Backpropagate through all steps paths and layers [Collins and Roark 04 Zhou et al. 15] Label Bias In Local Model every decision 0 ɸ 1 cannot penalize decision for pushing overall structure to far from gold Global Model learns to assign credit/blame Other views: Need to learn to model states not along the gold path Look into the future and overrule low-entropy (low-branching) states
22 Globally Normalized Model English Results (WSJ 23) UAS Local +Beam Global Lookahead Method UAS LAS Beam 3rd-order Graph-based (ZM2014) Transition-based Linear (ZN2011) NN Baseline (Chen & Manning 2014) NN Better SGD (Weiss et al. 2015) NN Deeper Network (Weiss et al. 2015) NN Perceptron (Weiss et al. 2015) NN CRF (Andor et al. 2016) NN CRF Semi-Supervised (Andor et al.) S-LSTM (Dyer et al. 2015) Contrastive NN (Zhou et al. 2015) Tri-Training ZPar Parser Berkeley Parser UAS LAS UAS LAS [Zhou et al. 05 Li et al. 14] UAS LAS ~40% agreement English Out-of-Domain Results Train on WSJ + Web Treebank + QuestionBank Evaluate on Web 3rd Order Graph (ZM2014) Transition-based Linear (ZN 2011 B=32) Transition-based NN (B=1) Transition-based NN (B=8) UAS (%) Supervised Semi-Supervised
23 Multilingual Results SyntaxNet and Parsey McParseface Tensor-Based Graph (Lei et al. '14 3rd-Order Graph (Zhang & McDonald '14) Transition-based NN (Weiss et al. '15) Transition-based CRF (Andor et al. 16) 95.0 UAS (%) Catalan Chinese Czech English German Japanese Spanish No tri-training data With morph features SyntaxNet and Parsey LSTMs vs SyntaxNet Add screenshots once released LSTMs SyntaxNet Accuracy + ++ Efficiency - + End-to-End ++ - Recurrence + - (yet) (yet)
24 Universal Dependencies Universal Dependencies Stanford Universal++ Dependencies Interset++ Morphological Features Google Universal++ POS Tags Constituency Parsing CKY Algorithm Lexicalized Grammars Summary Latent Variable Grammars Conditional Random Field Parsing Neural Network Representations Dependency Parsing Eisner Algorithm Maximum Spanning Tree Algorithm Transition Based Parsing Neural Network Representations
Vine Pruning for Efficient Multi-Pass Dependency Parsing. Alexander M. Rush and Slav Petrov
Vine Pruning for Efficient Multi-Pass Dependency Parsing Alexander M. Rush and Slav Petrov Dependency Parsing Styles of Dependency Parsing greedy O(n) transition-based parsers (Nivre 2004) graph-based
More informationNLP Homework: Dependency Parsing with Feed-Forward Neural Network
NLP Homework: Dependency Parsing with Feed-Forward Neural Network Submission Deadline: Monday Dec. 11th, 5 pm 1 Background on Dependency Parsing Dependency trees are one of the main representations used
More informationWord Embeddings in Feedforward Networks; Tagging and Dependency Parsing using Feedforward Networks. Michael Collins, Columbia University
Word Embeddings in Feedforward Networks; Tagging and Dependency Parsing using Feedforward Networks Michael Collins, Columbia University Overview Introduction Multi-layer feedforward networks Representing
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins
Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationLab 12: Structured Prediction
December 4, 2014 Lecture plan structured perceptron application: confused messages application: dependency parsing structured SVM Class review: from modelization to classification What does learning mean?
More informationNatural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):
More informationMarrying Dynamic Programming with Recurrent Neural Networks
Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark Marrying
More informationCS395T: Structured Models for NLP Lecture 19: Advanced NNs I
CS395T: Structured Models for NLP Lecture 19: Advanced NNs I Greg Durrett Administrivia Kyunghyun Cho (NYU) talk Friday 11am GDC 6.302 Project 3 due today! Final project out today! Proposal due in 1 week
More informationCS395T: Structured Models for NLP Lecture 19: Advanced NNs I. Greg Durrett
CS395T: Structured Models for NLP Lecture 19: Advanced NNs I Greg Durrett Administrivia Kyunghyun Cho (NYU) talk Friday 11am GDC 6.302 Project 3 due today! Final project out today! Proposal due in 1 week
More informationLECTURER: BURCU CAN Spring
LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can
More information13A. Computational Linguistics. 13A. Log-Likelihood Dependency Parsing. CSC 2501 / 485 Fall 2017
Computational Linguistics CSC 2501 / 485 Fall 2017 13A 13A. Log-Likelihood Dependency Parsing Gerald Penn Department of Computer Science, University of Toronto Based on slides by Yuji Matsumoto, Dragomir
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars CS 585, Fall 2017 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2017 Brendan O Connor College of Information and Computer Sciences
More informationDependency grammar. Recurrent neural networks. Transition-based neural parsing. Word representations. Informs Models
Dependency grammar Morphology Word order Transition-based neural parsing Word representations Recurrent neural networks Informs Models Dependency grammar Morphology Word order Transition-based neural parsing
More informationTransition-Based Parsing
Transition-Based Parsing Based on atutorial at COLING-ACL, Sydney 2006 with Joakim Nivre Sandra Kübler, Markus Dickinson Indiana University E-mail: skuebler,md7@indiana.edu Transition-Based Parsing 1(11)
More informationComputational Linguistics
Computational Linguistics Dependency-based Parsing Clayton Greenberg Stefan Thater FR 4.7 Allgemeine Linguistik (Computerlinguistik) Universität des Saarlandes Summer 2016 Acknowledgements These slides
More informationPart of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about
More informationComputational Linguistics. Acknowledgements. Phrase-Structure Trees. Dependency-based Parsing
Computational Linguistics Dependency-based Parsing Dietrich Klakow & Stefan Thater FR 4.7 Allgemeine Linguistik (Computerlinguistik) Universität des Saarlandes Summer 2013 Acknowledgements These slides
More informationPenn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark
Penn Treebank Parsing Advanced Topics in Language Processing Stephen Clark 1 The Penn Treebank 40,000 sentences of WSJ newspaper text annotated with phrasestructure trees The trees contain some predicate-argument
More informationLog-Linear Models with Structured Outputs
Log-Linear Models with Structured Outputs Natural Language Processing CS 4120/6120 Spring 2016 Northeastern University David Smith (some slides from Andrew McCallum) Overview Sequence labeling task (cf.
More informationImproving Sequence-to-Sequence Constituency Parsing
Improving Sequence-to-Sequence Constituency Parsing Lemao Liu, Muhua Zhu and Shuming Shi Tencent AI Lab, Shenzhen, China {redmondliu,muhuazhu, shumingshi}@tencent.com Abstract Sequence-to-sequence constituency
More informationTransition-based Dependency Parsing with Selectional Branching
Transitionbased Dependency Parsing with Selectional Branching Presented at the 4th workshop on Statistical Parsing in Morphologically Rich Languages October 18th, 2013 Jinho D. Choi University of Massachusetts
More informationEcient Higher-Order CRFs for Morphological Tagging
Ecient Higher-Order CRFs for Morphological Tagging Thomas Müller, Helmut Schmid and Hinrich Schütze Center for Information and Language Processing University of Munich Outline 1 Contributions 2 Motivation
More informationGlobally Normalized Transition-Based Neural Networks
Globally Normalized Transition-Based Neural Networks Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov and Michael Collins Google Inc New York,
More informationProbabilistic Context-Free Grammars. Michael Collins, Columbia University
Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationDependency Parsing. COSI 114 Computational Linguistics Marie Meteer. March 21, 2015 Brandeis University
+ Dependency Parsing COSI 114 Computational Linguistics Marie Meteer March 21, 2015 Brandeis University Dependency Grammar and Dependency Structure Dependency syntax postulates that syntac1c structure
More informationNatural Language Processing
Natural Language Processing Global linear models Based on slides from Michael Collins Globally-normalized models Why do we decompose to a sequence of decisions? Can we directly estimate the probability
More informationSpectral Unsupervised Parsing with Additive Tree Metrics
Spectral Unsupervised Parsing with Additive Tree Metrics Ankur Parikh, Shay Cohen, Eric P. Xing Carnegie Mellon, University of Edinburgh Ankur Parikh 2014 1 Overview Model: We present a novel approach
More informationNatural Language Processing
SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class
More informationStructured Prediction Models via the Matrix-Tree Theorem
Structured Prediction Models via the Matrix-Tree Theorem Terry Koo Amir Globerson Xavier Carreras Michael Collins maestro@csail.mit.edu gamir@csail.mit.edu carreras@csail.mit.edu mcollins@csail.mit.edu
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars
More informationRegularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018
1-61 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Regularization Matt Gormley Lecture 1 Feb. 19, 218 1 Reminders Homework 4: Logistic
More informationStatistical Methods for NLP
Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification
More informationApplied Natural Language Processing
Applied Natural Language Processing Info 256 Lecture 7: Testing (Feb 12, 2019) David Bamman, UC Berkeley Significance in NLP You develop a new method for text classification; is it better than what comes
More informationTransition-based dependency parsing
Transition-based dependency parsing Daniël de Kok Overview Dependency graphs and treebanks. Transition-based dependency parsing. Making parse choices using perceptrons. Today Recap Transition systems Parsing
More informationFeatures of Statistical Parsers
Features of tatistical Parsers Preliminary results Mark Johnson Brown University TTI, October 2003 Joint work with Michael Collins (MIT) upported by NF grants LI 9720368 and II0095940 1 Talk outline tatistical
More informationAN ABSTRACT OF THE DISSERTATION OF
AN ABSTRACT OF THE DISSERTATION OF Kai Zhao for the degree of Doctor of Philosophy in Computer Science presented on May 30, 2017. Title: Structured Learning with Latent Variables: Theory and Algorithms
More informationProbabilistic Context-free Grammars
Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John
More informationCS460/626 : Natural Language
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24 Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th,
More informationCS626: NLP, Speech and the Web. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012
CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 14: Parsing Algorithms 30 th August, 2012 Parsing Problem Semantics Part of Speech Tagging NLP Trinity Morph Analysis
More informationSoft Inference and Posterior Marginals. September 19, 2013
Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard inference Give me a single solution Viterbi algorithm Maximum spanning tree (Chu-Liu-Edmonds alg.) Soft inference
More informationCKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016
CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class Monday: Martin Luther King Jr. Day CKY Parsing: Finish the parse Recognizer à Parser Roadmap Earley parsing Motivation:
More informationNLP Programming Tutorial 11 - The Structured Perceptron
NLP Programming Tutorial 11 - The Structured Perceptron Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Prediction Problems Given x, A book review Oh, man I love this book! This book is
More informationS NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.
/6/7 CS 6/CS: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang The grammar: Binary, no epsilons,.9..5
More informationMultiword Expression Identification with Tree Substitution Grammars
Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic
More informationStructured Prediction
Structured Prediction Classification Algorithms Classify objects x X into labels y Y First there was binary: Y = {0, 1} Then multiclass: Y = {1,...,6} The next generation: Structured Labels Structured
More informationAdvanced Graph-Based Parsing Techniques
Advanced Graph-Based Parsing Techniques Joakim Nivre Uppsala University Linguistics and Philology Based on previous tutorials with Ryan McDonald Advanced Graph-Based Parsing Techniques 1(33) Introduction
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationIntroduction to Data-Driven Dependency Parsing
Introduction to Data-Driven Dependency Parsing Introductory Course, ESSLLI 2007 Ryan McDonald 1 Joakim Nivre 2 1 Google Inc., New York, USA E-mail: ryanmcd@google.com 2 Uppsala University and Växjö University,
More informationLecture 13: Structured Prediction
Lecture 13: Structured Prediction Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/nlp16 CS6501: NLP 1 Quiz 2 v Lectures 9-13 v Lecture 12: before page
More informationNeural Networks Language Models
Neural Networks Language Models Philipp Koehn 10 October 2017 N-Gram Backoff Language Model 1 Previously, we approximated... by applying the chain rule p(w ) = p(w 1, w 2,..., w n ) p(w ) = i p(w i w 1,...,
More informationwith Local Dependencies
CS11-747 Neural Networks for NLP Structured Prediction with Local Dependencies Xuezhe Ma (Max) Site https://phontron.com/class/nn4nlp2017/ An Example Structured Prediction Problem: Sequence Labeling Sequence
More informationCS395T: Structured Models for NLP Lecture 17: CNNs. Greg Durrett
CS395T: Structured Models for NLP Lecture 17: CNNs Greg Durrett Project 2 Results Top 3 scores: Su Wang: 90.13 UAS Greedy logisrc regression with extended feature set, trained for 30 epochs with Adagrad
More informationNatural Language Processing. Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu
Natural Language Processing Slides from Andreas Vlachos, Chris Manning, Mihai Surdeanu Projects Project descriptions due today! Last class Sequence to sequence models Attention Pointer networks Today Weak
More informationParsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)
Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements
More informationConditional Random Fields for Sequential Supervised Learning
Conditional Random Fields for Sequential Supervised Learning Thomas G. Dietterich Adam Ashenfelter Department of Computer Science Oregon State University Corvallis, Oregon 97331 http://www.eecs.oregonstate.edu/~tgd
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationRecurrent neural network grammars
Widespread phenomenon: Polarity items can only appear in certain contexts Recurrent neural network grammars lide credits: Chris Dyer, Adhiguna Kuncoro Example: anybody is a polarity item that tends to
More informationApplied Natural Language Processing
Applied Natural Language Processing Info 256 Lecture 20: Sequence labeling (April 9, 2019) David Bamman, UC Berkeley POS tagging NNP Labeling the tag that s correct for the context. IN JJ FW SYM IN JJ
More informationMachine Learning for Structured Prediction
Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for
More informationImproving Neural Parsing by Disentangling Model Combination and Reranking Effects. Daniel Fried*, Mitchell Stern* and Dan Klein UC Berkeley
Improving Neural Parsing by Disentangling Model Combination and Reranking Effects Daniel Fried*, Mitchell Stern* and Dan Klein UC erkeley Top-down generative models Top-down generative models S VP The
More informationDiscrimina)ve Latent Variable Models. SPFLODD November 15, 2011
Discrimina)ve Latent Variable Models SPFLODD November 15, 2011 Lecture Plan 1. Latent variables in genera)ve models (review) 2. Latent variables in condi)onal models 3. Latent variables in structural SVMs
More informationA Context-Free Grammar
Statistical Parsing A Context-Free Grammar S VP VP Vi VP Vt VP VP PP DT NN PP PP P Vi sleeps Vt saw NN man NN dog NN telescope DT the IN with IN in Ambiguity A sentence of reasonable length can easily
More informationNatural Language Processing
Natural Language Processing Info 59/259 Lecture 4: Text classification 3 (Sept 5, 207) David Bamman, UC Berkeley . https://www.forbes.com/sites/kevinmurnane/206/04/0/what-is-deep-learning-and-how-is-it-useful
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 26 February 2018 Recap: tagging POS tagging is a sequence labelling task.
More informationNeural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016
Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning
Probabilistic Context Free Grammars Many slides from Michael Collins and Chris Manning Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic
More informationPersonal Project: Shift-Reduce Dependency Parsing
Personal Project: Shift-Reduce Dependency Parsing 1 Problem Statement The goal of this project is to implement a shift-reduce dependency parser. This entails two subgoals: Inference: We must have a shift-reduce
More informationQuasi-Synchronous Phrase Dependency Grammars for Machine Translation. lti
Quasi-Synchronous Phrase Dependency Grammars for Machine Translation Kevin Gimpel Noah A. Smith 1 Introduction MT using dependency grammars on phrases Phrases capture local reordering and idiomatic translations
More informationHidden Markov Models
CS 2750: Machine Learning Hidden Markov Models Prof. Adriana Kovashka University of Pittsburgh March 21, 2016 All slides are from Ray Mooney Motivating Example: Part Of Speech Tagging Annotate each word
More information10/17/04. Today s Main Points
Part-of-speech Tagging & Hidden Markov Model Intro Lecture #10 Introduction to Natural Language Processing CMPSCI 585, Fall 2004 University of Massachusetts Amherst Andrew McCallum Today s Main Points
More informationRecurrent Neural Networks. COMP-550 Oct 5, 2017
Recurrent Neural Networks COMP-550 Oct 5, 2017 Outline Introduction to neural networks and deep learning Feedforward neural networks Recurrent neural networks 2 Classification Review y = f( x) output label
More informationCS388: Natural Language Processing Lecture 4: Sequence Models I
CS388: Natural Language Processing Lecture 4: Sequence Models I Greg Durrett Mini 1 due today Administrivia Project 1 out today, due September 27 Viterbi algorithm, CRF NER system, extension Extension
More informationMaxent Models and Discriminative Estimation
Maxent Models and Discriminative Estimation Generative vs. Discriminative models (Reading: J+M Ch6) Introduction So far we ve looked at generative models Language models, Naive Bayes But there is now much
More informationNLP Programming Tutorial 8 - Recurrent Neural Nets
NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Feed Forward Neural Nets All connections point forward ϕ( x) y It is a directed acyclic
More informationLecture 12: Algorithms for HMMs
Lecture 12: Algorithms for HMMs Nathan Schneider (some slides from Sharon Goldwater; thanks to Jonathan May for bug fixes) ENLP 17 October 2016 updated 9 September 2017 Recap: tagging POS tagging is a
More informationThe Infinite PCFG using Hierarchical Dirichlet Processes
S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise
More informationEmpirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs
Empirical Methods in Natural Language Processing Lecture 11 Part-of-speech tagging and HMMs (based on slides by Sharon Goldwater and Philipp Koehn) 21 February 2018 Nathan Schneider ENLP Lecture 11 21
More informationNeural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig
Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Slides credit: Graham Neubig Outline Perceptron: recap and limitations Neural networks Multi-layer perceptron Forward propagation
More informationACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers
More informationStructured Prediction
Machine Learning Fall 2017 (structured perceptron, HMM, structured SVM) Professor Liang Huang (Chap. 17 of CIML) x x the man bit the dog x the man bit the dog x DT NN VBD DT NN S =+1 =-1 the man bit the
More informationMachine Learning for NLP
Machine Learning for NLP Uppsala University Department of Linguistics and Philology Slides borrowed from Ryan McDonald, Google Research Machine Learning for NLP 1(50) Introduction Linear Classifiers Classifiers
More informationTransition-based Dependency Parsing with Selectional Branching
Transition-based Dependency Parsing with Selectional Branching Jinho D. Choi Department of Computer Science University of Massachusetts Amherst Amherst, MA, 01003, USA jdchoi@cs.umass.edu Andrew McCallum
More informationNatural Language Processing
Natural Language Processing Info 159/259 Lecture 7: Language models 2 (Sept 14, 2017) David Bamman, UC Berkeley Language Model Vocabulary V is a finite set of discrete symbols (e.g., words, characters);
More informationGraph-based Dependency Parsing. Ryan McDonald Google Research
Graph-based Dependency Parsing Ryan McDonald Google Research ryanmcd@google.com Reader s Digest Graph-based Dependency Parsing Ryan McDonald Google Research ryanmcd@google.com root ROOT Dependency Parsing
More informationHidden Markov Models (HMMs)
Hidden Markov Models HMMs Raymond J. Mooney University of Texas at Austin 1 Part Of Speech Tagging Annotate each word in a sentence with a part-of-speech marker. Lowest level of syntactic analysis. John
More informationDeep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation
Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation Steve Renals Machine Learning Practical MLP Lecture 5 16 October 2018 MLP Lecture 5 / 16 October 2018 Deep Neural Networks
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/jv7vj9 Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationA Tabular Method for Dynamic Oracles in Transition-Based Parsing
A Tabular Method for Dynamic Oracles in Transition-Based Parsing Yoav Goldberg Department of Computer Science Bar Ilan University, Israel yoav.goldberg@gmail.com Francesco Sartorio Department of Information
More informationA Supertag-Context Model for Weakly-Supervised CCG Parser Learning
A Supertag-Context Model for Weakly-Supervised CCG Parser Learning Dan Garrette Chris Dyer Jason Baldridge Noah A. Smith U. Washington CMU UT-Austin CMU Contributions 1. A new generative model for learning
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationArtificial Intelligence
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 20-21 Natural Language Parsing Parsing of Sentences Are sentences flat linear structures? Why tree? Is
More informationStatistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.
http://goo.gl/xilnmn Course website KYOTO UNIVERSITY Statistical Machine Learning Theory From Multi-class Classification to Structured Output Prediction Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationDynamic-oracle Transition-based Parsing with Calibrated Probabilistic Output
Dynamic-oracle Transition-based Parsing with Calibrated Probabilistic Output Yoav Goldberg Computer Science Department Bar Ilan University Ramat Gan, Israel yoav.goldberg@gmail.com Abstract We adapt the
More informationPart-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287
Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the
More informationFeature Noising. Sida Wang, joint work with Part 1: Stefan Wager, Percy Liang Part 2: Mengqiu Wang, Chris Manning, Percy Liang, Stefan Wager
Feature Noising Sida Wang, joint work with Part 1: Stefan Wager, Percy Liang Part 2: Mengqiu Wang, Chris Manning, Percy Liang, Stefan Wager Outline Part 0: Some backgrounds Part 1: Dropout as adaptive
More informationLecture 5: UDOP, Dependency Grammars
Lecture 5: UDOP, Dependency Grammars Jelle Zuidema ILLC, Universiteit van Amsterdam Unsupervised Language Learning, 2014 Generative Model objective PCFG PTSG CCM DMV heuristic Wolff (1984) UDOP ML IO K&M
More informationNatural Language Processing
Natural Language Processing Info 159/259 Lecture 12: Features and hypothesis tests (Oct 3, 2017) David Bamman, UC Berkeley Announcements No office hours for DB this Friday (email if you d like to chat)
More informationLatent Variable Models in NLP
Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable
More information