Recurrent neural network grammars
|
|
- Rosa Riley
- 5 years ago
- Views:
Transcription
1 Widespread phenomenon: Polarity items can only appear in certain contexts Recurrent neural network grammars lide credits: Chris Dyer, Adhiguna Kuncoro Example: anybody is a polarity item that tends to appear only in specific contexts: lecture that I gave did not appeal to anybody but not: * lecture that I gave appealed to anybody We might infer that the licensing context is the word not appearing among the preceding words, and you could use an RNN to model this. However: * lecture that I did not give appealed to anybody Language is hierarchical licensing context depends on recursive structure (syntax) One theory of hierarchy Generate symbols sequentially using an RNN Add some control symbols to rewrite the history periodically lecture that I gave did not appeal to anybody Periodically compress a sequence into a single constituent Augment RNN with an operation to compress recent history into a single vector (-> reduce ) RNN predicts next symbol based on the history of compressed elements and non-compressed terminals ( shift or generate ) RNN must also predict control symbols that decide how big constituents are lecture appealed to anybody We call such models recurrent neural network grammars. that I did not give
2 (Ordered) tree traversals are sequences Terminals tack Action NT() ( NT() ( ( GEN() ( ( GEN(hungry) hungry ( ( hungry GEN(cat) ( ( REDUCE ( ( ) ( ( ) ( meows ). ) Terminals hungry tack ( ( hungry ( ( Action NT() ( NT() ( ( GEN() ( ( ( ( ) ( ( ) Compress into a single composite symbol GEN(hungry) GEN(cat) REDUCE Terminals hungry tack Action NT() ( NT() ( ( GEN() ( ( GEN(hungry) ( ( hungry GEN(cat) ( ( REDUCE ( ( ) NT()
3 Terminals tack Action Terminals tack Action NT() NT() ( NT() ( NT() ( ( GEN() ( ( GEN() ( ( GEN(hungry) ( ( GEN(hungry) hungry ( ( hungry GEN(cat) hungry ( ( hungry GEN(cat) ( ( REDUCE ( ( REDUCE ( ( ) NT() ( ( ) NT() ( ( ) ( ( ( ) (??? Q: What information can we use to predict the next action, and how can we encode it with an RNN? Terminals hungry tack ( ( hungry ( ( ( ( ) ( ( ) ( Action NT() ( NT() ( ( GEN() ( ( A: We can use an RNN for each of: 1. Previous terminal symbols 2. Previous actions 3. Current contents GEN(hungry) GEN(cat) REDUCE NT() Terminals hungry meows meows tack Action NT() ( NT() ( ( GEN() ( ( GEN(hungry) ( ( hungry GEN(cat) ( ( REDUCE ( ( ) NT() ( ( ) ( GEN(meows) ( ( ) ( meows REDUCE ( ( ) ( meows) GEN(.) ( ( ) ( meows). REDUCE ( ( ) ( meows).)
4 Terminals hungry meows meows tack Action NT() ( NT() ( ( GEN() ( ( GEN(hungry) ( ( hungry GEN(cat) ( ( REDUCE Final ( symbol ( hungry is cat) NT() (a vector ( representation ( ) of) ( GEN(meows) the complete tree. ( ( ) ( meows REDUCE ( ( ) ( meows) GEN(.) ( ( ) ( meows). REDUCE ( ( ) ( meows).) yntactic Composition Need representation for: ( ) ( hungry cat ) Recursion tack symbols composed recursively mirror corresponding tree structure Need representation for: ( ) ( (ADJP very hungry) cat) {z } v Effect tack encodes -down syntactic recency, rather than left-to-right string recency ( v cat )
5 tack RNNs tack RNNs Augment a sequential RNN with a pointer Two constant-time operations push - read input, add to of, connect to current location of the pointer pop - move pointer to its parent A summary of contents is obtained by accessing the output of the RNN at location of the pointer Note: push and pop are discrete actions here (cf. Grefenstette et al., 2015) y 0 ; PUH tack RNNs tack RNNs y 0 y 1 POP y 0 y 1 ; x 1 ; x 1
6 tack RNNs tack RNNs y 0 y 1 PUH y 0 y 1 y 2 POP ; x 1 ; x 1 x 2 tack RNNs tack RNNs y 0 y 1 y 2 y 0 y 1 y 2 PUH ; x 1 x 2 ; x 1 x 2
7 tack RNNs evolution of the LTM over y 0 y 1 y 2 y 3 ; x 1 x 2 x 3 ( ( ) ( meows ). ) evolution of the LTM over evolution of the LTM over ( ( ) ( meows ). ) ( ( ) ( meows ). )
8 evolution of the LTM over evolution of the LTM over ( ( ) ( meows ). ) ( ( ) ( meows ). ) evolution of the LTM over evolution of the LTM over ( ( ) ( meows ). ) ( ( ) ( meows ). )
9 evolution of the LTM over evolution of the LTM over ( ( ) ( meows ). ) ( ( ) ( meows ). ) evolution of the LTM over evolution of the LTM over ( ( ) ( meows ). ) ( ( ) ( meows ). )
10 evolution of the LTM over Each word is conditioned on history represented by a trio of RNNs p(meows history) ( ( ) ( meows ). ) ( ( ) ( meows ). ) Train with backpropagation through structure In training, backpropagate through these three RNNs) And recursively through this structure. ( ( ) ( meows ). ) This network is dynamic. Don t derive gradients by hand that s error prone. Use automatic differentiation instead Complete model sentence tree Model is dynamic: variable number of context-dependent actions at each step allowable actions at this step equence of actions (completely defines x and y) Actions up to time t bias history action embedding embedding
11 Complete model Parameter Estimation RNNGs jointly model sequences of words together with a tree structure, p (x, y) Any parse tree can be converted to a sequence of actions (depth first traversal) and vice versa (subject to wellformedness constraints) output (buffer) action history We use trees from the Penn Treebank We could treat the non-generation actions as latent variables or learn them with RL, effectively making this a problem of grammar induction. Future work Inference An RNNG is a joint distribution p(x,y) over strings (x) and parse trees (y) We are interested in two inference questions: What is p(x) for a given x? [language modeling] What is max p(y x) for a given x? [parsing] y Unfortunately, the dynamic programming algorithms we often rely on are of no help here We can use importance sampling to do both by sampling from a discriminatively trained model English PTB (Parsing) Petrov and Klein (2007) hindo et al (2012) ingle model hindo et al (2012) Ensemble Vinyals et al (2015) PTB only Vinyals et al (2015) Ensemble Type F1 G 90.1 G 91.1 ~G 92.4 D Discriminative D 89.8 Generative (I) G 92.4
12 Importance ampling Assume we ve got a conditional distribution q(y x) s.t. (i) p(x, y) > 0 =) q(y x) > 0 (ii) y q(y x) is tractable and (iii) q(y x) is tractable Let the importance weights p(x) = X y2y(x) p(x, y) = = E y q(y x) w(x, y) X y2y(x) w(x, y) = p(x, y) q(y x) w(x, y)q(y x) Importance ampling p(x) = X y2y(x) p(x, y) = = E y q(y x) w(x, y) X y2y(x) w(x, y)q(y x) Replace this expectation with its Monte Carlo estimate. y (i) q(y x) for i 2 {1, 2,...,N} E q(y x) w(x, y) MC 1 NX w(x, y (i) ) N i=1 English PTB (LM) Perplexity 5-gram IKN LTM + Dropout Generative (I) Do we need a? Kuncoro et al., Oct 2017 Both and action history encode the same information, but expose it to the classifier in different ways. Chinese CTB (LM) Perplexity 5-gram IKN LTM + Dropout Generative (I) Leaving out is harmful; using it on its own works slightly better than complete model!
13 RNNG as a mini-linguist Replace composition with one that computes attention over objects in the composed sequence, using embedding of NT for similarity. What does this learn? RNNG as a mini-linguist Replace composition with one that computes attention over objects in the composed sequence, using embedding of NT for similarity. What does this learn? RNNG as a mini-linguist Replace composition with one that computes attention over objects in the composed sequence, using embedding of NT for similarity. What does this learn? RNNG as a mini-linguist Replace composition with one that computes attention over objects in the composed sequence, using embedding of NT for similarity. What does this learn?
14 RNNG as a mini-linguist Replace composition with one that computes attention over objects in the composed sequence, using embedding of NT for similarity. What does this learn? ummary Language is hierarchical, and this inductive bias can be encoded into an RNN-style model. RNNGs work by simulating a tree traversal like a pushdown automaton, but with continuous rather than finite history. Modeled by RNNs encoding (1) previous tokens, (2) previous actions, and (3) contents. A LTM evolves with contents. final representation computed by a LTM has a down recency bias, rather than left-to-right bias, which might be useful in modeling sentences. Effective for parsing and language modeling, and seems to capture linguistic intuitions about headedness.
Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)
Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements
More informationNatural Language Processing
Natural Language Processing Info 59/259 Lecture 4: Text classification 3 (Sept 5, 207) David Bamman, UC Berkeley . https://www.forbes.com/sites/kevinmurnane/206/04/0/what-is-deep-learning-and-how-is-it-useful
More informationImproving Neural Parsing by Disentangling Model Combination and Reranking Effects. Daniel Fried*, Mitchell Stern* and Dan Klein UC Berkeley
Improving Neural Parsing by Disentangling Model Combination and Reranking Effects Daniel Fried*, Mitchell Stern* and Dan Klein UC erkeley Top-down generative models Top-down generative models S VP The
More informationLecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1
Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS 164 -- Berkley University 1 Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient
More informationLecture Notes on Inductive Definitions
Lecture Notes on Inductive Definitions 15-312: Foundations of Programming Languages Frank Pfenning Lecture 2 August 28, 2003 These supplementary notes review the notion of an inductive definition and give
More informationThis kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.
Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one
More informationAlgorithms for Syntax-Aware Statistical Machine Translation
Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial
More informationChapter 14 (Partially) Unsupervised Parsing
Chapter 14 (Partially) Unsupervised Parsing The linguistically-motivated tree transformations we discussed previously are very effective, but when we move to a new language, we may have to come up with
More informationNeural Networks in Structured Prediction. November 17, 2015
Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving
More informationFeatures of Statistical Parsers
Features of tatistical Parsers Preliminary results Mark Johnson Brown University TTI, October 2003 Joint work with Michael Collins (MIT) upported by NF grants LI 9720368 and II0095940 1 Talk outline tatistical
More informationMarrying Dynamic Programming with Recurrent Neural Networks
Marrying Dynamic Programming with Recurrent Neural Networks I eat sushi with tuna from Japan Liang Huang Oregon State University Structured Prediction Workshop, EMNLP 2017, Copenhagen, Denmark Marrying
More informationConditional Language Modeling. Chris Dyer
Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain
More informationCS460/626 : Natural Language
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24 Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th,
More informationLecture Notes on Inductive Definitions
Lecture Notes on Inductive Definitions 15-312: Foundations of Programming Languages Frank Pfenning Lecture 2 September 2, 2004 These supplementary notes review the notion of an inductive definition and
More informationThe Noisy Channel Model and Markov Models
1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationLECTURER: BURCU CAN Spring
LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can
More informationDeep Sequence Models. Context Representation, Regularization, and Application to Language. Adji Bousso Dieng
Deep Sequence Models Context Representation, Regularization, and Application to Language Adji Bousso Dieng All Data Are Born Sequential Time underlies many interesting human behaviors. Elman, 1990. Why
More informationImproved Learning through Augmenting the Loss
Improved Learning through Augmenting the Loss Hakan Inan inanh@stanford.edu Khashayar Khosravi khosravi@stanford.edu Abstract We present two improvements to the well-known Recurrent Neural Network Language
More information1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u,
1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u, v, x, y, z as per the pumping theorem. 3. Prove that
More informationAdministrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example
Administrivia Test I during class on 10 March. Bottom-Up Parsing Lecture 11-12 From slides by G. Necula & R. Bodik) 2/20/08 Prof. Hilfinger CS14 Lecture 11 1 2/20/08 Prof. Hilfinger CS14 Lecture 11 2 Bottom-Up
More informationA fast and simple algorithm for training neural probabilistic language models
A fast and simple algorithm for training neural probabilistic language models Andriy Mnih Joint work with Yee Whye Teh Gatsby Computational Neuroscience Unit University College London 25 January 2013 1
More informationNatural Language Processing : Probabilistic Context Free Grammars. Updated 5/09
Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require
More informationProcessing/Speech, NLP and the Web
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25 Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March, 2011 Bracketed Structure: Treebank Corpus [ S1[
More informationRecurrent Neural Network
Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationProbabilistic Context-free Grammars
Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationNatural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars CS 585, Fall 2017 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp2017 Brendan O Connor College of Information and Computer Sciences
More informationPenn Treebank Parsing. Advanced Topics in Language Processing Stephen Clark
Penn Treebank Parsing Advanced Topics in Language Processing Stephen Clark 1 The Penn Treebank 40,000 sentences of WSJ newspaper text annotated with phrasestructure trees The trees contain some predicate-argument
More informationBringing machine learning & compositional semantics together: central concepts
Bringing machine learning & compositional semantics together: central concepts https://githubcom/cgpotts/annualreview-complearning Chris Potts Stanford Linguistics CS 244U: Natural language understanding
More informationObject Detection Grammars
Object Detection Grammars Pedro F. Felzenszwalb and David McAllester February 11, 2010 1 Introduction We formulate a general grammar model motivated by the problem of object detection in computer vision.
More informationCMSC 330: Organization of Programming Languages. Pushdown Automata Parsing
CMSC 330: Organization of Programming Languages Pushdown Automata Parsing Chomsky Hierarchy Categorization of various languages and grammars Each is strictly more restrictive than the previous First described
More informationLatent Variable Models in NLP
Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable
More informationQuiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7
Quiz 1, COMS 4705 Name: 10 30 30 20 Good luck! 4705 Quiz 1 page 1 of 7 Part #1 (10 points) Question 1 (10 points) We define a PCFG where non-terminal symbols are {S,, B}, the terminal symbols are {a, b},
More informationImproving Sequence-to-Sequence Constituency Parsing
Improving Sequence-to-Sequence Constituency Parsing Lemao Liu, Muhua Zhu and Shuming Shi Tencent AI Lab, Shenzhen, China {redmondliu,muhuazhu, shumingshi}@tencent.com Abstract Sequence-to-sequence constituency
More informationPushdown Automata (Pre Lecture)
Pushdown Automata (Pre Lecture) Dr. Neil T. Dantam CSCI-561, Colorado School of Mines Fall 2017 Dantam (Mines CSCI-561) Pushdown Automata (Pre Lecture) Fall 2017 1 / 41 Outline Pushdown Automata Pushdown
More informationShift-Reduce parser E + (E + (E) E [a-z] In each stage, we shift a symbol from the input to the stack, or reduce according to one of the rules.
Bottom-up Parsing Bottom-up Parsing Until now we started with the starting nonterminal S and tried to derive the input from it. In a way, this isn t the natural thing to do. It s much more logical to start
More informationDependency Parsing. Statistical NLP Fall (Non-)Projectivity. CoNLL Format. Lecture 9: Dependency Parsing
Dependency Parsing Statistical NLP Fall 2016 Lecture 9: Dependency Parsing Slav Petrov Google prep dobj ROOT nsubj pobj det PRON VERB DET NOUN ADP NOUN They solved the problem with statistics CoNLL Format
More informationMusic as a Formal Language
Music as a Formal Language Finite-State Automata and Pd Bryan Jurish moocow@ling.uni-potsdam.de Universität Potsdam, Institut für Linguistik, Potsdam, Germany pd convention 2004 / Jurish / Music as a formal
More informationOutline. CS21 Decidability and Tractability. Machine view of FA. Machine view of FA. Machine view of FA. Machine view of FA.
Outline CS21 Decidability and Tractability Lecture 5 January 16, 219 and Languages equivalence of NPDAs and CFGs non context-free languages January 16, 219 CS21 Lecture 5 1 January 16, 219 CS21 Lecture
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More informationNLP Homework: Dependency Parsing with Feed-Forward Neural Network
NLP Homework: Dependency Parsing with Feed-Forward Neural Network Submission Deadline: Monday Dec. 11th, 5 pm 1 Background on Dependency Parsing Dependency trees are one of the main representations used
More informationAlgorithms for NLP
Regular Expressions Chris Dyer Algorithms for NLP 11-711 Adapted from materials from Alon Lavie Goals of Today s Lecture Understand the properties of NFAs with epsilon transitions Understand concepts and
More informationEmpirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model
Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model (most slides from Sharon Goldwater; some adapted from Philipp Koehn) 5 October 2016 Nathan Schneider
More informationAspects of Tree-Based Statistical Machine Translation
Aspects of Tree-Based Statistical Machine Translation Marcello Federico Human Language Technology FBK 2014 Outline Tree-based translation models: Synchronous context free grammars Hierarchical phrase-based
More informationAspects of Tree-Based Statistical Machine Translation
Aspects of Tree-Based tatistical Machine Translation Marcello Federico (based on slides by Gabriele Musillo) Human Language Technology FBK-irst 2011 Outline Tree-based translation models: ynchronous context
More informationMathematics for linguists
1/13 Mathematics for linguists Gerhard Jäger gerhard.jaeger@uni-tuebingen.de Uni Tübingen, WS 2009/2010 November 26, 2009 2/13 The pumping lemma Let L be an infinite regular language over a finite alphabete
More informationStatistical Methods for NLP
Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification
More informationCMPT-825 Natural Language Processing. Why are parsing algorithms important?
CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop October 26, 2010 1/34 Why are parsing algorithms important? A linguistic theory is implemented in a formal system to generate
More informationConditional Language modeling with attention
Conditional Language modeling with attention 2017.08.25 Oxford Deep NLP 조수현 Review Conditional language model: assign probabilities to sequence of words given some conditioning context x What is the probability
More informationDeep Learning Sequence to Sequence models: Attention Models. 17 March 2018
Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:
More informationWhat we have done so far
What we have done so far DFAs and regular languages NFAs and their equivalence to DFAs Regular expressions. Regular expressions capture exactly regular languages: Construct a NFA from a regular expression.
More informationParsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing
L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the
More informationParsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.
: Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching
More informationCS395T: Structured Models for NLP Lecture 19: Advanced NNs I
CS395T: Structured Models for NLP Lecture 19: Advanced NNs I Greg Durrett Administrivia Kyunghyun Cho (NYU) talk Friday 11am GDC 6.302 Project 3 due today! Final project out today! Proposal due in 1 week
More informationHarvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs
Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs Harry Lewis October 8, 2013 Reading: Sipser, pp. 119-128. Pushdown Automata (review) Pushdown Automata = Finite automaton
More informationFoundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model
Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipop Koehn) 30 January
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationCS395T: Structured Models for NLP Lecture 19: Advanced NNs I. Greg Durrett
CS395T: Structured Models for NLP Lecture 19: Advanced NNs I Greg Durrett Administrivia Kyunghyun Cho (NYU) talk Friday 11am GDC 6.302 Project 3 due today! Final project out today! Proposal due in 1 week
More informationSlide credit from Hung-Yi Lee & Richard Socher
Slide credit from Hung-Yi Lee & Richard Socher 1 Review Recurrent Neural Network 2 Recurrent Neural Network Idea: condition the neural network on all previous words and tie the weights at each time step
More informationX-bar theory. X-bar :
is one of the greatest contributions of generative school in the filed of knowledge system. Besides linguistics, computer science is greatly indebted to Chomsky to have propounded the theory of x-bar.
More informationArtificial Intelligence
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 20-21 Natural Language Parsing Parsing of Sentences Are sentences flat linear structures? Why tree? Is
More informationParsing Algorithms. CS 4447/CS Stephen Watt University of Western Ontario
Parsing Algorithms CS 4447/CS 9545 -- Stephen Watt University of Western Ontario The Big Picture Develop parsers based on grammars Figure out properties of the grammars Make tables that drive parsing engines
More informationAn Efficient Context-Free Parsing Algorithm. Speakers: Morad Ankri Yaniv Elia
An Efficient Context-Free Parsing Algorithm Speakers: Morad Ankri Yaniv Elia Yaniv: Introduction Terminology Informal Explanation The Recognizer Morad: Example Time and Space Bounds Empirical results Practical
More informationProbabilistic Context-Free Grammars. Michael Collins, Columbia University
Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationCFLs and Regular Languages. CFLs and Regular Languages. CFLs and Regular Languages. Will show that all Regular Languages are CFLs. Union.
We can show that every RL is also a CFL Since a regular grammar is certainly context free. We can also show by only using Regular Expressions and Context Free Grammars That is what we will do in this half.
More informationBayesian (Nonparametric) Approaches to Language Modeling
Bayesian (Nonparametric) Approaches to Language Modeling Frank Wood unemployed (and homeless) January, 2013 Wood (University of Oxford) January, 2013 1 / 34 Message Whole Distributions As Random Variables
More informationSyntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis
Context-free grammars Derivations Parse Trees Left-recursive grammars Top-down parsing non-recursive predictive parsers construction of parse tables Bottom-up parsing shift/reduce parsers LR parsers GLR
More informationOctober 6, Equivalence of Pushdown Automata with Context-Free Gramm
Equivalence of Pushdown Automata with Context-Free Grammar October 6, 2013 Motivation Motivation CFG and PDA are equivalent in power: a CFG generates a context-free language and a PDA recognizes a context-free
More informationProf. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan
Language Processing Systems Prof. Mohamed Hamada Software Engineering La. The University of izu Japan Syntax nalysis (Parsing) 1. Uses Regular Expressions to define tokens 2. Uses Finite utomata to recognize
More informationRandom Generation of Nondeterministic Tree Automata
Random Generation of Nondeterministic Tree Automata Thomas Hanneforth 1 and Andreas Maletti 2 and Daniel Quernheim 2 1 Department of Linguistics University of Potsdam, Germany 2 Institute for Natural Language
More informationStructural Induction
Structural Induction In this lecture we ll extend the applicability of induction to many universes, ones where we can define certain kinds of objects by induction, in addition to proving their properties
More informationLearning to translate with neural networks. Michael Auli
Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each
More informationGenerative Models for Sentences
Generative Models for Sentences Amjad Almahairi PhD student August 16 th 2014 Outline 1. Motivation Language modelling Full Sentence Embeddings 2. Approach Bayesian Networks Variational Autoencoders (VAE)
More informationTHEORY OF COMPILATION
Lecture 04 Syntax analysis: top-down and bottom-up parsing THEORY OF COMPILATION EranYahav 1 You are here Compiler txt Source Lexical Analysis Syntax Analysis Parsing Semantic Analysis Inter. Rep. (IR)
More informationDecoding and Inference with Syntactic Translation Models
Decoding and Inference with Syntactic Translation Models March 5, 2013 CFGs S NP VP VP NP V V NP NP CFGs S NP VP S VP NP V V NP NP CFGs S NP VP S VP NP V NP VP V NP NP CFGs S NP VP S VP NP V NP VP V NP
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationThe Infinite PCFG using Hierarchical Dirichlet Processes
S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationLecture 7. Logic. Section1: Statement Logic.
Ling 726: Mathematical Linguistics, Logic, Section : Statement Logic V. Borschev and B. Partee, October 5, 26 p. Lecture 7. Logic. Section: Statement Logic.. Statement Logic..... Goals..... Syntax of Statement
More informationChapter 4: Bottom-up Analysis 106 / 338
Syntactic Analysis Chapter 4: Bottom-up Analysis 106 / 338 Bottom-up Analysis Attention: Many grammars are not LL(k)! A reason for that is: Definition Grammar G is called left-recursive, if A + A β for
More informationLecture 21: Spectral Learning for Graphical Models
10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation
More informationNEURAL LANGUAGE MODELS
COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,
More informationNatural Language Processing
SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class
More informationSyntactic Analysis. Top-Down Parsing
Syntactic Analysis Top-Down Parsing Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in Compilers class at University of Southern California (USC) have explicit permission to make
More informationCSE 105 THEORY OF COMPUTATION
CSE 105 THEORY OF COMPUTATION Spring 2016 http://cseweb.ucsd.edu/classes/sp16/cse105-ab/ Today's learning goals Sipser Ch 2 Define push down automata Trace the computation of a push down automaton Design
More informationSoft Inference and Posterior Marginals. September 19, 2013
Soft Inference and Posterior Marginals September 19, 2013 Soft vs. Hard Inference Hard inference Give me a single solution Viterbi algorithm Maximum spanning tree (Chu-Liu-Edmonds alg.) Soft inference
More informationContext-Free Grammars (and Languages) Lecture 7
Context-Free Grammars (and Languages) Lecture 7 1 Today Beyond regular expressions: Context-Free Grammars (CFGs) What is a CFG? What is the language associated with a CFG? Creating CFGs. Reasoning about
More informationImproved TBL algorithm for learning context-free grammar
Proceedings of the International Multiconference on ISSN 1896-7094 Computer Science and Information Technology, pp. 267 274 2007 PIPS Improved TBL algorithm for learning context-free grammar Marcin Jaworski
More informationDeep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017
Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion
More informationReview. Earley Algorithm Chapter Left Recursion. Left-Recursion. Rule Ordering. Rule Ordering
Review Earley Algorithm Chapter 13.4 Lecture #9 October 2009 Top-Down vs. Bottom-Up Parsers Both generate too many useless trees Combine the two to avoid over-generation: Top-Down Parsing with Bottom-Up
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationBayesian Tools for Natural Language Learning. Yee Whye Teh Gatsby Computational Neuroscience Unit UCL
Bayesian Tools for Natural Language Learning Yee Whye Teh Gatsby Computational Neuroscience Unit UCL Bayesian Learning of Probabilistic Models Potential outcomes/observations X. Unobserved latent variables
More informationWeek 5: Logistic Regression & Neural Networks
Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and
More informationAdvanced Natural Language Processing Syntactic Parsing
Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm
More informationIntroduction to Predicate Logic Part 1. Professor Anita Wasilewska Lecture Notes (1)
Introduction to Predicate Logic Part 1 Professor Anita Wasilewska Lecture Notes (1) Introduction Lecture Notes (1) and (2) provide an OVERVIEW of a standard intuitive formalization and introduction to
More informationNatural Language Processing
Natural Language Processing Language models Based on slides from Michael Collins, Chris Manning and Richard Soccer Plan Problem definition Trigram models Evaluation Estimation Interpolation Discounting
More information