Probabilistic Linguistics
|
|
- Abel Shields
- 5 years ago
- Views:
Transcription
1 Matilde Marcolli MAT1509HS: Mathematical and Computational Linguistics University of Toronto, Winter 2019, T 4-6 and W 4, BA6180
2 Bernoulli measures finite set A alphabet, strings of arbitrary (finite) length A = m A m Alphabet: letters, phonemes, lexical list of words,... Λ A = infinite strings in alphabet A; cylinder sets Λ A (w) = {α Λ A α i = w i, i = 1,..., m} with w A m (also called the ω-language A ω ) Λ A Cantor set with the topology generated by cylinder sets Bernoulli measure: P = (p a ) a A probability measure p a 0 for all a A and a A p a = 1 Gives measure µ P on Λ A with µ P (Λ A (w)) = p w1 p wm meaning: in a word w = w 1 w m each letter w i A is an independent random variable drawn with probabilities P = (p a ) a A
3 Markov measures same as above: A alphabet, A finite strings, Λ A infinite strings π = (π a ) a A probability distribution π a 0 and a π a = 1 P = (p ab ) a,b A stochastic matrix p ab 0 and a p ab = 1 Perron Frobenius eigenvector πp = π Markov measure µ π,p on Λ A with µ π,p (Λ A (w)) = π w1 p w1 w 2 p wm 1 w m meaning: in a word w 1 w m letters follow one another according to a Markov chain model, with probability p ab of having a and b as consecutive letters
4 Example of Markov Chain
5 support of Markov measure µ π,p subset Λ A,A Λ A with A = (A ab ) entries A ab = 0 if p ab = 0 and A ab = 1 if p ab 0 Λ A,A = {α Λ A A αi α i+1 = 1, i} both Bernoulli and Markov measures are shift invariant σ : Λ A Λ A, σ(a 1 a 2 a m ) = a 2 a 3 a m+1 the shift map is a good model for many properties in the theory of dynamical systems: widely studied examples Markov s original aim was modeling natural languages A. A. Markov (1913), Ein Beispiel statistischer Forschung am Text Eugen Onegin zur Verbindung von Proben in Ketten
6 Shannon Entropy Claude E. Shannon, Warren Weaver, The Mathematical Theory of Communication, University of Illinois Press, (reprinted 1998) Entropy measures the uncertainty associated to a prediction of the result of the experiment (equivalently the amount of information one gains from performing the experiment) Shannon entropy of a Bernoulli measure S(µ P ) = a A p a log(p a ) Entropy of a Markov measure S(µ π,p ) = π a p ab log(p ab ) (Kolmogorov Sinai entropy) a,b A
7 Relative entropy and cross-entropy Relative entropy (Kullback Leibler divergence) P = (p a ), Q = (q a ) KL(P Q) = p a log( p a ) q a a A cross entropy of probabilities P = (p a ) and Q = (q a ) S(P, Q) = S(P) + KL(P Q) = a A p a log(q a ) expected message-length per datum when a wrong distribution Q is assumed while data follow distribution P
8 Entropy and Cross-entropy for a message W (thought of as a random variable) with V(W ) set of possible values S(W ) = P(w) log P(w) w V(W ) = Average w V(W ) (#Bits Required for(w)) P M = probabilities estimated using a model S(W, P M ) = P(w) log P M (w) w V(W ) cross-entropy Cross-entropy gives a model evaluator
9 Entropy and Cross-entropy of languages asymptotic limit of per-word entropy as length grows 1 S(L, P) = lim n n w W n (L) for language L A with W n (L) = L A n P(w) log P(w) cross-entropy same with respect to a model P M 1 S(L, P, P M ) = lim n n w W n (L) P(w) log P M (w)
10 Other notions of Entropy of languages language structure functions s L (m) = #W m (L) (number of strings of length m in the language) Generating function for the language structure functions G C (t) = m s L (m)t m ρ = radius of convergence of the series G C (t) Entropy: S(L) = log ρ(g C (t)) Example: for L = A with #A = N, and P uniform distribution both S(L, P) = S(L) = log N
11 Trigram model can consider further dependencies between letters beyond consecutive ones Example trigram: how next letter in a word depends on the previous two P(w) = P(w 0 )P(w 1 w 0 )P(w 2 w 0 w 1 ) P(w m w m 2 w m 1 ) build the model probabilities by counting frequencies of sequences w j 2 w j 1 w j of specific choices of three words over corpora of texts... problem: the model suppresses grammatical but unlikely combinations of words, which occur infrequently problem of sparse data possible solution: smoothing out the probabilities P(w m w m 1 w m 2 ) = λ 1 P f (w m )+λ 2 P f (w m w m 1 )+λ 3 P f (w m w m 2 w m 1 ) respectively frequencies P f of single word, pair, and triple
12 Hidden Markov Models these more general types of dependencies (like trigram) extend Markov Chains to Hidden Markov Models first step construct Markov Chain for P(w) = m P(w j w j 2 w j 1 ) j=0 by taking states consisting of pairs of consecutive letters A A and an edge for each a A with probability of transition P(c ab) from ab to bc
13 Example:
14 second step: introduce the combination (with i λ i = 1) λ 1 P(w m ) + λ 2 P(w m w m 1 ) + λ 3 P(w m w m 2 w m 1 ) in the diagram by replacing edges out of every node of the Markov chain with a diagram with additional states marked by λ i and additional edges corresponding to all the probabilities contributing: P(a), P(a b) and P(c ab) with edges into λ i state marked by empty ɛ output symbol... the presence of ɛ-transitions with no output symbol implies this is a Hidden Markov Model with hidden states and visible states
15 Example: replacing the part of the diagram connecting node ab to nodes ba and bb
16 Shannon s series approximations to English Random texts composed according to probability distributions that better approximate English texts Step 0: random text from English alphabet (plus blank symbol): letters drawn with uniform Bernoulli distribution Step 1: random text with Bernoulli distribution based on frequency of letters in English
17 Step 2: random text with Markov distribution over two consecutive letters from English frequencies (diagram model) Step 3: trigram model
18 Can also do the same on an alphabet of words instead of letters Step 1: random text with Bernoulli distribution based on frequency of words in sample texts of English Step 2: Markov distribution for consecutive words (digram)
19 Stone house poet Mr.Shih who ate ten lions Text presented at the tenth Macy Conference on Cybernetics (1953) by linguist Yuen Ren Chao Shannon information as a written text very different from Shannon information as a spoken text
20 for an interesting example in English, consider the word buffalo - buffalo! = bully! VP - Buffalo buffalo = bison bully NP VP - Buffalo buffalo buffalo = those bison that are from the city of Buffalo bully - Buffalo buffalo buffalo buffalo = those bison that are from the city of Buffalo bully other bison in fact arbitrarily long sentences consisting solely of the word buffalo are grammatical (Example by Thomas Tymoczko, logician and philosopher of mathematics)
21 Bison from Buffalo, that bison from Buffalo bully, themselves bully bison from Buffalo Where is the information content? as string (in an alphabet of words) it has Shannon information zero... the information is in the parse trees
22 Probabilistic Context Free Grammars G = (V N, V T, P, S, P) V N and V T disjoint finite sets: non-terminal and terminal symbols S V N start symbol P finite rewriting system on V N V T P = production rules A α with A V N and α (V N V T ) Probabilities P(A α) P(A α) = 1 α ways to expand same non-terminal A add up to probability one
23 Probabilities of parse trees T G = {T } family of parse trees T for a context-free grammar G if G probabilistic, can assign probabilities to all the possible parse trees T (w) for a given string w in L G P(w) = P(T ) P(w T ) = P(T ) T =T (w) P(w, T ) = T T =T (w) last because tree includes the terminals (labels of leaves) so P(w T (w)) = 1 Probabilities account for syntactic ambiguities of parse trees in context-free languages
24 Subtree independence assumption a vertex v in an oriented rooted planar tree T spans a subset Ω(v) of the set of leaves of T if Ω(v) is the set of leaves reached by an oriented path in T starting at v denote by A kl a non-terminal labeling a vertex in a parse tree T that spans the subset w k... w l of the string w = w 1... w n parsed by T = T (w) 1 P(A kl w k... w l anything outside of k j l} = P(A kl w k... w l ) 2 P(A kl w k... w l anything above A kl in the tree} = P(A kl w k... w l )
25 Example A B C w 1 w 2 w 3 w 4 w 5 P(T ) = P(A, B, C, w 1, w 2, w 3, w 4, w 4 A) = P(B, C A) P(w 1, w 2, w 3 A, B, C) P(w 4 w 5 A, B, C, w 1, w 2, w 3 ) = P(B, C A) P(w 1, w 2, w 3 B) P(w 4 w 5 C) = P(A BC) P(B w 1, w 2, w 3 ) P(C w 4 w 5 )
26 Sentence probabilities in PCFGs Fact: context-free grammars can always be put in Chomsky normal form where all the production rules are of the form N w, N N 1 N 2 where N, N 1, N 2 are non-terminal, w terminal Parse trees for a CFG in Chomsky normal form have either an internal node marked with non-terminal N and one output to a leaf with terminal w or a node with nonterminal N and two outputs with non-terminals N 1 and N 2
27
28 assume CFG in Chomsky normal form inside probabilities β j (k, l) := P(w k,l N j k,l ) probability of the string of terminals inside (outputs of) the oriented tree with vertex (root) N j k,l outside probabilities α j (k, l) := P(w 1,k 1, N j k,l, w l+1,n) probability of everything that s outside the tree with root N j k,l
29
30 Recursive formula for inside probabilities β j (k, l) = P(w k,l N j k,l ) = P(w k,m, N p k,m w m+1,ln q m+1,l Nj k,l ) p,q,m = P(N p k,m, Nq m+1,l Nj k,l ) P(w k,m N j k,l, Np k,m, Nq m+1,l ) p,q,m P(w m+1,l w k,m, N j k,l Np k,m, Nq m+1,l ) = P(N p k,m, Nq m+1,l Nj k,l ) P(w k,m N p k,m ) P(w m+1,l N q m+1,l ) p,q,m = P(N j N p N q ) β p (k, m)β q (m + 1, l) p,q,m
31 Training Probabilistic Context-Free Grammars simpler case of a Markov chain: consider a transition s i w k s j from state s i to state s j labeled by w k given a large training corpus: count number of times the given transition occurs: counting function C(s i w k s j ) model probabilities on the frequencies obtained from these counting functions: P M (s i w k s j ) = C(s i w k s j ) l,m C(si w m s l ) a similar procedure exists for Hidden Markov Models
32 in the case of Probabilistic Context Free Grammars: use training corpus to estimate probabilities of production rules P M (N i w j ) = C(Ni w j ) k C(Ni w k ) At the internal (hidden) nodes counting function related to probabilites by C(N j N p N q ) := k,l,m P(N j k,l, Np k,m, Nq m+1,l w 1,n) = = 1 P(w 1,n ) 1 P(w 1,n ) P(N j k,l, Np k,m, Nq m+1,l, w 1,n) k,l,m α j (k, l)p(n j N p N q )β p (k, m)β q (m + 1, l) k,l,m
33 Training corpora and the syntactic-semantic interface the existence of different parse trees for the same sentence is a sign of semantic ambiguity training a Probabilistic Context Free Grammar over a large corpus can (sometime) resolve ambiguities by assigning different probabilities Example: two parsings of sentence: They are flying planes They (are flying) planes or They are (flying planes). This type of ambiguity might not be resolved by training over a corpus Example: the three following sentences all have different parsings, but the most likely parsing can be picked up by training over a corpus - Time flies like an arrow - Fruit flies like a banana - Time flies like a stopwatch
34 Probabilistic Tree Adjoining Grammars I = set of initial trees of the TAG; A = set of the auxiliary trees of the TAG each tree has subset of leaf nodes marked as nodes where substitution can occur adjunction can occur at any node marked by nonterminal (other than those marked for substitution) s(τ) set of substitution nodes of tree τ; α(τ) set of adjunction nodes of τ S(τ, τ, η) = substitution of tree τ into τ at node η; A(τ, τ, η) = adjunction of tree τ into tree τ at node η; A(τ,, η) = no adjunction performed at node η Ω set of all substitutions and adjunction events
35 Probabilistic TAG (PTAG) (I, A, P I, P S, P A ) P I : I R, P S : Ω R, P A : Ω R P I (τ) = 1 τ I P S (S(τ, τ, η)) = 1, τ I τ A P A (A(τ, τ, η)) = 1, η s(τ), τ I A η α(τ), τ I A P I (τ) = probability that a derivation begins with the tree τ P S (S(τ, τ, η) = probability of substituting τ into τ at η P A (A(τ, τ, η)) = probability of adjoining τ to τ at η
36 Probability of a derivation in a PTAG: P(τ) = P I (τ 0 ) N P opi (op i (τ i 1, τ i, η i )) i=1 if τ obtained from initial tree τ 0 through a sequence of N substitutions and adjunctions op i similar to probabilistic context-free grammars
37 Some References: 1 Eugene Charniak, Statistical Language Learning, MIT Press, Rens Bod, Jennifer Hay, Stefanie Jannedy (Eds.), Probabilistic Linguistics, MIT Press, Philip Resnik, Probabilistic tree-adjoining grammar as a framework for statistical natural language processing, Proc. of COLING-92 (1992)
Probabilistic Context-Free Grammar
Probabilistic Context-Free Grammar Petr Horáček, Eva Zámečníková and Ivana Burgetová Department of Information Systems Faculty of Information Technology Brno University of Technology Božetěchova 2, 612
More informationNatural Language Processing : Probabilistic Context Free Grammars. Updated 5/09
Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require
More informationLanguage Acquisition and Parameters: Part II
Language Acquisition and Parameters: Part II Matilde Marcolli CS0: Mathematical and Computational Linguistics Winter 205 Transition Matrices in the Markov Chain Model absorbing states correspond to local
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationLanguages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:
Languages A language is a set (usually infinite) of strings, also known as sentences Each string consists of a sequence of symbols taken from some alphabet An alphabet, V, is a finite set of symbols, e.g.
More informationProcessing/Speech, NLP and the Web
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25 Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March, 2011 Bracketed Structure: Treebank Corpus [ S1[
More informationMaschinelle Sprachverarbeitung
Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other
More informationDT2118 Speech and Speaker Recognition
DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language
More informationParsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)
Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements
More informationCross-Entropy and Estimation of Probabilistic Context-Free Grammars
Cross-Entropy and Estimation of Probabilistic Context-Free Grammars Anna Corazza Department of Physics University Federico II via Cinthia I-8026 Napoli, Italy corazza@na.infn.it Giorgio Satta Department
More informationProbabilistic Context-free Grammars
Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John
More informationNatural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):
More informationA* Search. 1 Dijkstra Shortest Path
A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the
More informationHarvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars
Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars Salil Vadhan October 2, 2012 Reading: Sipser, 2.1 (except Chomsky Normal Form). Algorithmic questions about regular
More informationEinführung in die Computerlinguistik
Einführung in die Computerlinguistik Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 22 CFG (1) Example: Grammar G telescope : Productions: S NP VP NP
More informationLECTURER: BURCU CAN Spring
LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can
More informationParsing Beyond Context-Free Grammars: Tree Adjoining Grammars
Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars Laura Kallmeyer & Tatiana Bladier Heinrich-Heine-Universität Düsseldorf Sommersemester 2018 Kallmeyer, Bladier SS 2018 Parsing Beyond CFG:
More informationStatistical Methods for NLP
Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification
More informationAdvanced Natural Language Processing Syntactic Parsing
Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm
More informationParsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22
Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky
More informationSequences and Information
Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols
More informationA Simple Introduction to Information, Channel Capacity and Entropy
A Simple Introduction to Information, Channel Capacity and Entropy Ulrich Hoensch Rocky Mountain College Billings, MT 59102 hoenschu@rocky.edu Friday, April 21, 2017 Introduction A frequently occurring
More informationModels of Language Evolution: Part II
Models of Language Evolution: Part II Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Main Reference Partha Niyogi, The computational nature of language learning and evolution,
More informationParsing with Context-Free Grammars
Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars
More informationCS460/626 : Natural Language
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24 Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th,
More informationExpectation Maximization (EM)
Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted
More information{Probabilistic Stochastic} Context-Free Grammars (PCFGs)
{Probabilistic Stochastic} Context-Free Grammars (PCFGs) 116 The velocity of the seismic waves rises to... S NP sg VP sg DT NN PP risesto... The velocity IN NP pl of the seismic waves 117 PCFGs APCFGGconsists
More informationSpeech Recognition Lecture 5: N-gram Language Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri
Speech Recognition Lecture 5: N-gram Language Models Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Components Acoustic and pronunciation model: Pr(o w) =
More informationProperties of Context-Free Languages
Properties of Context-Free Languages Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationA Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus
A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus Timothy A. D. Fowler Department of Computer Science University of Toronto 10 King s College Rd., Toronto, ON, M5S 3G4, Canada
More informationParsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26
Parsing Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 26 Table of contents 1 Context-Free Grammars 2 Simplifying CFGs Removing useless symbols Eliminating
More informationSuppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis
1 Introduction Parenthesis Matching Problem Describe the set of arithmetic expressions with correctly matched parenthesis. Arithmetic expressions with correctly matched parenthesis cannot be described
More informationProbabilistic Context-Free Grammars. Michael Collins, Columbia University
Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationA Mathematical Theory of Communication
A Mathematical Theory of Communication Ben Eggers Abstract This paper defines information-theoretic entropy and proves some elementary results about it. Notably, we prove that given a few basic assumptions
More informationThe Noisy Channel Model and Markov Models
1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle
More informationModels of Language Evolution
Models of Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Main Reference Partha Niyogi, The computational nature of language learning and evolution, MIT Press, 2006. From
More informationPlan for 2 nd half. Just when you thought it was safe. Just when you thought it was safe. Theory Hall of Fame. Chomsky Normal Form
Plan for 2 nd half Pumping Lemma for CFLs The Return of the Pumping Lemma Just when you thought it was safe Return of the Pumping Lemma Recall: With Regular Languages The Pumping Lemma showed that if a
More informationThis kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.
Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one
More informationIntroduction to Probablistic Natural Language Processing
Introduction to Probablistic Natural Language Processing Alexis Nasr Laboratoire d Informatique Fondamentale de Marseille Natural Language Processing Use computers to process human languages Machine Translation
More informationStatistical Methods for NLP
Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured
More informationAlgorithms for Syntax-Aware Statistical Machine Translation
Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning
Probabilistic Context Free Grammars Many slides from Michael Collins and Chris Manning Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic
More informationProbabilistic Context Free Grammars
1 Defining PCFGs A PCFG G consists of Probabilistic Context Free Grammars 1. A set of terminals: {w k }, k = 1..., V 2. A set of non terminals: { i }, i = 1..., n 3. A designated Start symbol: 1 4. A set
More informationImproved TBL algorithm for learning context-free grammar
Proceedings of the International Multiconference on ISSN 1896-7094 Computer Science and Information Technology, pp. 267 274 2007 PIPS Improved TBL algorithm for learning context-free grammar Marcin Jaworski
More informationA DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005
A DOP Model for LFG Rens Bod and Ronald Kaplan Kathrin Spreyer Data-Oriented Parsing, 14 June 2005 Lexical-Functional Grammar (LFG) Levels of linguistic knowledge represented formally differently (non-monostratal):
More informationPCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities
1 / 22 Inside L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 Inside- 2 / 22 for PCFGs 3 questions for Probabilistic Context Free Grammars (PCFGs): What is the probability of a sentence
More informationIntroduction to Theory of Computing
CSCI 2670, Fall 2012 Introduction to Theory of Computing Department of Computer Science University of Georgia Athens, GA 30602 Instructor: Liming Cai www.cs.uga.edu/ cai 0 Lecture Note 3 Context-Free Languages
More informationContext Free Grammars
Automata and Formal Languages Context Free Grammars Sipser pages 101-111 Lecture 11 Tim Sheard 1 Formal Languages 1. Context free languages provide a convenient notation for recursive description of languages.
More informationRoger Levy Probabilistic Models in the Study of Language draft, October 2,
Roger Levy Probabilistic Models in the Study of Language draft, October 2, 2012 224 Chapter 10 Probabilistic Grammars 10.1 Outline HMMs PCFGs ptsgs and ptags Highlight: Zuidema et al., 2008, CogSci; Cohn
More informationNatural Language Processing
SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class
More informationModels of Language Acquisition: Part II
Models of Language Acquisition: Part II Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Probably Approximately Correct Model of Language Learning General setting of Statistical
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationIntroduction to Computational Linguistics
Introduction to Computational Linguistics Olga Zamaraeva (2018) Based on Bender (prev. years) University of Washington May 3, 2018 1 / 101 Midterm Project Milestone 2: due Friday Assgnments 4& 5 due dates
More informationEven More on Dynamic Programming
Algorithms & Models of Computation CS/ECE 374, Fall 2017 Even More on Dynamic Programming Lecture 15 Thursday, October 19, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 26 Part I Longest Common Subsequence
More informationGrammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG
Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing Laura Kallmeyer, Timm Lichte, Wolfgang Maier Universität Tübingen Part I Formal Properties of TAG 16.05.2007 und 21.05.2007 TAG Parsing
More informationTree Adjoining Grammars
Tree Adjoining Grammars TAG: Parsing and formal properties Laura Kallmeyer & Benjamin Burkhardt HHU Düsseldorf WS 2017/2018 1 / 36 Outline 1 Parsing as deduction 2 CYK for TAG 3 Closure properties of TALs
More informationLING 473: Day 10. START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars
LING 473: Day 10 START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars 1 Issues with Projects 1. *.sh files must have #!/bin/sh at the top (to run on Condor) 2. If run.sh is supposed
More informationAn Introduction to Entropy and Subshifts of. Finite Type
An Introduction to Entropy and Subshifts of Finite Type Abby Pekoske Department of Mathematics Oregon State University pekoskea@math.oregonstate.edu August 4, 2015 Abstract This work gives an overview
More informationSection 1 (closed-book) Total points 30
CS 454 Theory of Computation Fall 2011 Section 1 (closed-book) Total points 30 1. Which of the following are true? (a) a PDA can always be converted to an equivalent PDA that at each step pops or pushes
More informationLanguage Modeling. Michael Collins, Columbia University
Language Modeling Michael Collins, Columbia University Overview The language modeling problem Trigram models Evaluating language models: perplexity Estimation techniques: Linear interpolation Discounting
More informationUnit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing
CS 562: Empirical Methods in Natural Language Processing Unit 2: Tree Models Lectures 19-23: Context-Free Grammars and Parsing Oct-Nov 2009 Liang Huang (lhuang@isi.edu) Big Picture we have already covered...
More informationComputational Models - Lecture 4
Computational Models - Lecture 4 Regular languages: The Myhill-Nerode Theorem Context-free Grammars Chomsky Normal Form Pumping Lemma for context free languages Non context-free languages: Examples Push
More informationCMPT-825 Natural Language Processing. Why are parsing algorithms important?
CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop October 26, 2010 1/34 Why are parsing algorithms important? A linguistic theory is implemented in a formal system to generate
More informationS NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.
/6/7 CS 6/CS: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang The grammar: Binary, no epsilons,.9..5
More informationSharpening the empirical claims of generative syntax through formalization
Sharpening the empirical claims of generative syntax through formalization Tim Hunter University of Minnesota, Twin Cities NASSLLI, June 2014 Part 1: Grammars and cognitive hypotheses What is a grammar?
More informationINF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models
1 University of Oslo : Department of Informatics INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Language Models & Hidden Markov Models Stephan Oepen & Erik Velldal Language
More informationLanguage Models. Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN
Language Models Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN Data Science: Jordan Boyd-Graber UMD Language Models 1 / 8 Language models Language models answer
More informationCS460/626 : Natural Language
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 27 SMT Assignment; HMM recap; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th March, 2011 CMU Pronunciation
More informationSyntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis
Context-free grammars Derivations Parse Trees Left-recursive grammars Top-down parsing non-recursive predictive parsers construction of parse tables Bottom-up parsing shift/reduce parsers LR parsers GLR
More informationOn the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar
Proceedings of Machine Learning Research vol 73:153-164, 2017 AMBN 2017 On the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar Kei Amii Kyoto University Kyoto
More informationECS 120: Theory of Computation UC Davis Phillip Rogaway February 16, Midterm Exam
ECS 120: Theory of Computation Handout MT UC Davis Phillip Rogaway February 16, 2012 Midterm Exam Instructions: The exam has six pages, including this cover page, printed out two-sided (no more wasted
More informationComputational Models - Lecture 4 1
Computational Models - Lecture 4 1 Handout Mode Iftach Haitner and Yishay Mansour. Tel Aviv University. April 3/8, 2013 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice
More informationFoundations of Informatics: a Bridging Course
Foundations of Informatics: a Bridging Course Week 3: Formal Languages and Semantics Thomas Noll Lehrstuhl für Informatik 2 RWTH Aachen University noll@cs.rwth-aachen.de http://www.b-it-center.de/wob/en/view/class211_id948.html
More information3F1 Information Theory, Lecture 3
3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output
More informationProbabilistic Context Free Grammars. Many slides from Michael Collins
Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar
More informationAutomata Theory (2A) Young Won Lim 5/31/18
Automata Theory (2A) Copyright (c) 2018 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later
More informationA non-turing-recognizable language
CS 360: Introduction to the Theory of Computing John Watrous, University of Waterloo A non-turing-recognizable language 1 OVERVIEW Thus far in the course we have seen many examples of decidable languages
More informationInformation Theory. Week 4 Compressing streams. Iain Murray,
Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 4 Compressing streams Iain Murray, 2014 School of Informatics, University of Edinburgh Jensen s inequality For convex functions: E[f(x)]
More informationd(ν) = max{n N : ν dmn p n } N. p d(ν) (ν) = ρ.
1. Trees; context free grammars. 1.1. Trees. Definition 1.1. By a tree we mean an ordered triple T = (N, ρ, p) (i) N is a finite set; (ii) ρ N ; (iii) p : N {ρ} N ; (iv) if n N + and ν dmn p n then p n
More informationThe effect of non-tightness on Bayesian estimation of PCFGs
The effect of non-tightness on Bayesian estimation of PCFGs hay B. Cohen Department of Computer cience Columbia University scohen@cs.columbia.edu Mark Johnson Department of Computing Macquarie University
More informationProbabilistic Language Modeling
Predicting String Probabilities Probabilistic Language Modeling Which string is more likely? (Which string is more grammatical?) Grill doctoral candidates. Regina Barzilay EECS Department MIT November
More informationContext-Free Grammars. 2IT70 Finite Automata and Process Theory
Context-Free Grammars 2IT70 Finite Automata and Process Theory Technische Universiteit Eindhoven May 18, 2016 Generating strings language L 1 = {a n b n n > 0} ab L 1 if w L 1 then awb L 1 production rules
More informationBasic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology
Basic Text Analysis Hidden Markov Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakimnivre@lingfiluuse Basic Text Analysis 1(33) Hidden Markov Models Markov models are
More informationTHE MEASURE-THEORETIC ENTROPY OF LINEAR CELLULAR AUTOMATA WITH RESPECT TO A MARKOV MEASURE. 1. Introduction
THE MEASURE-THEORETIC ENTROPY OF LINEAR CELLULAR AUTOMATA WITH RESPECT TO A MARKOV MEASURE HASAN AKIN Abstract. The purpose of this short paper is to compute the measure-theoretic entropy of the onedimensional
More informationFinite Presentations of Pregroups and the Identity Problem
6 Finite Presentations of Pregroups and the Identity Problem Alexa H. Mater and James D. Fix Abstract We consider finitely generated pregroups, and describe how an appropriately defined rewrite relation
More informationAttendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.
even Lectures on tatistical Parsing Christopher Manning LA Linguistic Institute 7 LA Lecture Attendee information Please put on a piece of paper: ame: Affiliation: tatus (undergrad, grad, industry, prof,
More informationFormal Languages, Grammars and Automata Lecture 5
Formal Languages, Grammars and Automata Lecture 5 Helle Hvid Hansen helle@cs.ru.nl http://www.cs.ru.nl/~helle/ Foundations Group Intelligent Systems Section Institute for Computing and Information Sciences
More informationIBM Model 1 for Machine Translation
IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of
More informationDefinition: A binary relation R from a set A to a set B is a subset R A B. Example:
Chapter 9 1 Binary Relations Definition: A binary relation R from a set A to a set B is a subset R A B. Example: Let A = {0,1,2} and B = {a,b} {(0, a), (0, b), (1,a), (2, b)} is a relation from A to B.
More informationGeometry of Phylogenetic Inference
Geometry of Phylogenetic Inference Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 References N. Eriksson, K. Ranestad, B. Sturmfels, S. Sullivant, Phylogenetic algebraic
More informationChapter 6. Properties of Regular Languages
Chapter 6 Properties of Regular Languages Regular Sets and Languages Claim(1). The family of languages accepted by FSAs consists of precisely the regular sets over a given alphabet. Every regular set is
More informationIn this chapter, we explore the parsing problem, which encompasses several questions, including:
Chapter 12 Parsing Algorithms 12.1 Introduction In this chapter, we explore the parsing problem, which encompasses several questions, including: Does L(G) contain w? What is the highest-weight derivation
More informationCompiling Techniques
Lecture 5: Top-Down Parsing 6 October 2015 The Parser Context-Free Grammar (CFG) Lexer Source code Scanner char Tokeniser token Parser AST Semantic Analyser AST IR Generator IR Errors Checks the stream
More informationChapter 14 (Partially) Unsupervised Parsing
Chapter 14 (Partially) Unsupervised Parsing The linguistically-motivated tree transformations we discussed previously are very effective, but when we move to a new language, we may have to come up with
More informationNon-context-Free Languages. CS215, Lecture 5 c
Non-context-Free Languages CS215, Lecture 5 c 2007 1 The Pumping Lemma Theorem. (Pumping Lemma) Let be context-free. There exists a positive integer divided into five pieces, Proof for for each, and..
More informationContext-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing
Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2017 Northeastern University David Smith with some slides from Jason Eisner & Andrew
More informationTo make a grammar probabilistic, we need to assign a probability to each context-free rewrite
Notes on the Inside-Outside Algorithm To make a grammar probabilistic, we need to assign a probability to each context-free rewrite rule. But how should these probabilities be chosen? It is natural to
More informationQuiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7
Quiz 1, COMS 4705 Name: 10 30 30 20 Good luck! 4705 Quiz 1 page 1 of 7 Part #1 (10 points) Question 1 (10 points) We define a PCFG where non-terminal symbols are {S,, B}, the terminal symbols are {a, b},
More informationACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging
ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers
More informationFORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY
15-453 FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY REVIEW for MIDTERM 1 THURSDAY Feb 6 Midterm 1 will cover everything we have seen so far The PROBLEMS will be from Sipser, Chapters 1, 2, 3 It will be
More information