Probabilistic Linguistics

Size: px
Start display at page:

Download "Probabilistic Linguistics"

Transcription

1 Matilde Marcolli MAT1509HS: Mathematical and Computational Linguistics University of Toronto, Winter 2019, T 4-6 and W 4, BA6180

2 Bernoulli measures finite set A alphabet, strings of arbitrary (finite) length A = m A m Alphabet: letters, phonemes, lexical list of words,... Λ A = infinite strings in alphabet A; cylinder sets Λ A (w) = {α Λ A α i = w i, i = 1,..., m} with w A m (also called the ω-language A ω ) Λ A Cantor set with the topology generated by cylinder sets Bernoulli measure: P = (p a ) a A probability measure p a 0 for all a A and a A p a = 1 Gives measure µ P on Λ A with µ P (Λ A (w)) = p w1 p wm meaning: in a word w = w 1 w m each letter w i A is an independent random variable drawn with probabilities P = (p a ) a A

3 Markov measures same as above: A alphabet, A finite strings, Λ A infinite strings π = (π a ) a A probability distribution π a 0 and a π a = 1 P = (p ab ) a,b A stochastic matrix p ab 0 and a p ab = 1 Perron Frobenius eigenvector πp = π Markov measure µ π,p on Λ A with µ π,p (Λ A (w)) = π w1 p w1 w 2 p wm 1 w m meaning: in a word w 1 w m letters follow one another according to a Markov chain model, with probability p ab of having a and b as consecutive letters

4 Example of Markov Chain

5 support of Markov measure µ π,p subset Λ A,A Λ A with A = (A ab ) entries A ab = 0 if p ab = 0 and A ab = 1 if p ab 0 Λ A,A = {α Λ A A αi α i+1 = 1, i} both Bernoulli and Markov measures are shift invariant σ : Λ A Λ A, σ(a 1 a 2 a m ) = a 2 a 3 a m+1 the shift map is a good model for many properties in the theory of dynamical systems: widely studied examples Markov s original aim was modeling natural languages A. A. Markov (1913), Ein Beispiel statistischer Forschung am Text Eugen Onegin zur Verbindung von Proben in Ketten

6 Shannon Entropy Claude E. Shannon, Warren Weaver, The Mathematical Theory of Communication, University of Illinois Press, (reprinted 1998) Entropy measures the uncertainty associated to a prediction of the result of the experiment (equivalently the amount of information one gains from performing the experiment) Shannon entropy of a Bernoulli measure S(µ P ) = a A p a log(p a ) Entropy of a Markov measure S(µ π,p ) = π a p ab log(p ab ) (Kolmogorov Sinai entropy) a,b A

7 Relative entropy and cross-entropy Relative entropy (Kullback Leibler divergence) P = (p a ), Q = (q a ) KL(P Q) = p a log( p a ) q a a A cross entropy of probabilities P = (p a ) and Q = (q a ) S(P, Q) = S(P) + KL(P Q) = a A p a log(q a ) expected message-length per datum when a wrong distribution Q is assumed while data follow distribution P

8 Entropy and Cross-entropy for a message W (thought of as a random variable) with V(W ) set of possible values S(W ) = P(w) log P(w) w V(W ) = Average w V(W ) (#Bits Required for(w)) P M = probabilities estimated using a model S(W, P M ) = P(w) log P M (w) w V(W ) cross-entropy Cross-entropy gives a model evaluator

9 Entropy and Cross-entropy of languages asymptotic limit of per-word entropy as length grows 1 S(L, P) = lim n n w W n (L) for language L A with W n (L) = L A n P(w) log P(w) cross-entropy same with respect to a model P M 1 S(L, P, P M ) = lim n n w W n (L) P(w) log P M (w)

10 Other notions of Entropy of languages language structure functions s L (m) = #W m (L) (number of strings of length m in the language) Generating function for the language structure functions G C (t) = m s L (m)t m ρ = radius of convergence of the series G C (t) Entropy: S(L) = log ρ(g C (t)) Example: for L = A with #A = N, and P uniform distribution both S(L, P) = S(L) = log N

11 Trigram model can consider further dependencies between letters beyond consecutive ones Example trigram: how next letter in a word depends on the previous two P(w) = P(w 0 )P(w 1 w 0 )P(w 2 w 0 w 1 ) P(w m w m 2 w m 1 ) build the model probabilities by counting frequencies of sequences w j 2 w j 1 w j of specific choices of three words over corpora of texts... problem: the model suppresses grammatical but unlikely combinations of words, which occur infrequently problem of sparse data possible solution: smoothing out the probabilities P(w m w m 1 w m 2 ) = λ 1 P f (w m )+λ 2 P f (w m w m 1 )+λ 3 P f (w m w m 2 w m 1 ) respectively frequencies P f of single word, pair, and triple

12 Hidden Markov Models these more general types of dependencies (like trigram) extend Markov Chains to Hidden Markov Models first step construct Markov Chain for P(w) = m P(w j w j 2 w j 1 ) j=0 by taking states consisting of pairs of consecutive letters A A and an edge for each a A with probability of transition P(c ab) from ab to bc

13 Example:

14 second step: introduce the combination (with i λ i = 1) λ 1 P(w m ) + λ 2 P(w m w m 1 ) + λ 3 P(w m w m 2 w m 1 ) in the diagram by replacing edges out of every node of the Markov chain with a diagram with additional states marked by λ i and additional edges corresponding to all the probabilities contributing: P(a), P(a b) and P(c ab) with edges into λ i state marked by empty ɛ output symbol... the presence of ɛ-transitions with no output symbol implies this is a Hidden Markov Model with hidden states and visible states

15 Example: replacing the part of the diagram connecting node ab to nodes ba and bb

16 Shannon s series approximations to English Random texts composed according to probability distributions that better approximate English texts Step 0: random text from English alphabet (plus blank symbol): letters drawn with uniform Bernoulli distribution Step 1: random text with Bernoulli distribution based on frequency of letters in English

17 Step 2: random text with Markov distribution over two consecutive letters from English frequencies (diagram model) Step 3: trigram model

18 Can also do the same on an alphabet of words instead of letters Step 1: random text with Bernoulli distribution based on frequency of words in sample texts of English Step 2: Markov distribution for consecutive words (digram)

19 Stone house poet Mr.Shih who ate ten lions Text presented at the tenth Macy Conference on Cybernetics (1953) by linguist Yuen Ren Chao Shannon information as a written text very different from Shannon information as a spoken text

20 for an interesting example in English, consider the word buffalo - buffalo! = bully! VP - Buffalo buffalo = bison bully NP VP - Buffalo buffalo buffalo = those bison that are from the city of Buffalo bully - Buffalo buffalo buffalo buffalo = those bison that are from the city of Buffalo bully other bison in fact arbitrarily long sentences consisting solely of the word buffalo are grammatical (Example by Thomas Tymoczko, logician and philosopher of mathematics)

21 Bison from Buffalo, that bison from Buffalo bully, themselves bully bison from Buffalo Where is the information content? as string (in an alphabet of words) it has Shannon information zero... the information is in the parse trees

22 Probabilistic Context Free Grammars G = (V N, V T, P, S, P) V N and V T disjoint finite sets: non-terminal and terminal symbols S V N start symbol P finite rewriting system on V N V T P = production rules A α with A V N and α (V N V T ) Probabilities P(A α) P(A α) = 1 α ways to expand same non-terminal A add up to probability one

23 Probabilities of parse trees T G = {T } family of parse trees T for a context-free grammar G if G probabilistic, can assign probabilities to all the possible parse trees T (w) for a given string w in L G P(w) = P(T ) P(w T ) = P(T ) T =T (w) P(w, T ) = T T =T (w) last because tree includes the terminals (labels of leaves) so P(w T (w)) = 1 Probabilities account for syntactic ambiguities of parse trees in context-free languages

24 Subtree independence assumption a vertex v in an oriented rooted planar tree T spans a subset Ω(v) of the set of leaves of T if Ω(v) is the set of leaves reached by an oriented path in T starting at v denote by A kl a non-terminal labeling a vertex in a parse tree T that spans the subset w k... w l of the string w = w 1... w n parsed by T = T (w) 1 P(A kl w k... w l anything outside of k j l} = P(A kl w k... w l ) 2 P(A kl w k... w l anything above A kl in the tree} = P(A kl w k... w l )

25 Example A B C w 1 w 2 w 3 w 4 w 5 P(T ) = P(A, B, C, w 1, w 2, w 3, w 4, w 4 A) = P(B, C A) P(w 1, w 2, w 3 A, B, C) P(w 4 w 5 A, B, C, w 1, w 2, w 3 ) = P(B, C A) P(w 1, w 2, w 3 B) P(w 4 w 5 C) = P(A BC) P(B w 1, w 2, w 3 ) P(C w 4 w 5 )

26 Sentence probabilities in PCFGs Fact: context-free grammars can always be put in Chomsky normal form where all the production rules are of the form N w, N N 1 N 2 where N, N 1, N 2 are non-terminal, w terminal Parse trees for a CFG in Chomsky normal form have either an internal node marked with non-terminal N and one output to a leaf with terminal w or a node with nonterminal N and two outputs with non-terminals N 1 and N 2

27

28 assume CFG in Chomsky normal form inside probabilities β j (k, l) := P(w k,l N j k,l ) probability of the string of terminals inside (outputs of) the oriented tree with vertex (root) N j k,l outside probabilities α j (k, l) := P(w 1,k 1, N j k,l, w l+1,n) probability of everything that s outside the tree with root N j k,l

29

30 Recursive formula for inside probabilities β j (k, l) = P(w k,l N j k,l ) = P(w k,m, N p k,m w m+1,ln q m+1,l Nj k,l ) p,q,m = P(N p k,m, Nq m+1,l Nj k,l ) P(w k,m N j k,l, Np k,m, Nq m+1,l ) p,q,m P(w m+1,l w k,m, N j k,l Np k,m, Nq m+1,l ) = P(N p k,m, Nq m+1,l Nj k,l ) P(w k,m N p k,m ) P(w m+1,l N q m+1,l ) p,q,m = P(N j N p N q ) β p (k, m)β q (m + 1, l) p,q,m

31 Training Probabilistic Context-Free Grammars simpler case of a Markov chain: consider a transition s i w k s j from state s i to state s j labeled by w k given a large training corpus: count number of times the given transition occurs: counting function C(s i w k s j ) model probabilities on the frequencies obtained from these counting functions: P M (s i w k s j ) = C(s i w k s j ) l,m C(si w m s l ) a similar procedure exists for Hidden Markov Models

32 in the case of Probabilistic Context Free Grammars: use training corpus to estimate probabilities of production rules P M (N i w j ) = C(Ni w j ) k C(Ni w k ) At the internal (hidden) nodes counting function related to probabilites by C(N j N p N q ) := k,l,m P(N j k,l, Np k,m, Nq m+1,l w 1,n) = = 1 P(w 1,n ) 1 P(w 1,n ) P(N j k,l, Np k,m, Nq m+1,l, w 1,n) k,l,m α j (k, l)p(n j N p N q )β p (k, m)β q (m + 1, l) k,l,m

33 Training corpora and the syntactic-semantic interface the existence of different parse trees for the same sentence is a sign of semantic ambiguity training a Probabilistic Context Free Grammar over a large corpus can (sometime) resolve ambiguities by assigning different probabilities Example: two parsings of sentence: They are flying planes They (are flying) planes or They are (flying planes). This type of ambiguity might not be resolved by training over a corpus Example: the three following sentences all have different parsings, but the most likely parsing can be picked up by training over a corpus - Time flies like an arrow - Fruit flies like a banana - Time flies like a stopwatch

34 Probabilistic Tree Adjoining Grammars I = set of initial trees of the TAG; A = set of the auxiliary trees of the TAG each tree has subset of leaf nodes marked as nodes where substitution can occur adjunction can occur at any node marked by nonterminal (other than those marked for substitution) s(τ) set of substitution nodes of tree τ; α(τ) set of adjunction nodes of τ S(τ, τ, η) = substitution of tree τ into τ at node η; A(τ, τ, η) = adjunction of tree τ into tree τ at node η; A(τ,, η) = no adjunction performed at node η Ω set of all substitutions and adjunction events

35 Probabilistic TAG (PTAG) (I, A, P I, P S, P A ) P I : I R, P S : Ω R, P A : Ω R P I (τ) = 1 τ I P S (S(τ, τ, η)) = 1, τ I τ A P A (A(τ, τ, η)) = 1, η s(τ), τ I A η α(τ), τ I A P I (τ) = probability that a derivation begins with the tree τ P S (S(τ, τ, η) = probability of substituting τ into τ at η P A (A(τ, τ, η)) = probability of adjoining τ to τ at η

36 Probability of a derivation in a PTAG: P(τ) = P I (τ 0 ) N P opi (op i (τ i 1, τ i, η i )) i=1 if τ obtained from initial tree τ 0 through a sequence of N substitutions and adjunctions op i similar to probabilistic context-free grammars

37 Some References: 1 Eugene Charniak, Statistical Language Learning, MIT Press, Rens Bod, Jennifer Hay, Stefanie Jannedy (Eds.), Probabilistic Linguistics, MIT Press, Philip Resnik, Probabilistic tree-adjoining grammar as a framework for statistical natural language processing, Proc. of COLING-92 (1992)

Probabilistic Context-Free Grammar

Probabilistic Context-Free Grammar Probabilistic Context-Free Grammar Petr Horáček, Eva Zámečníková and Ivana Burgetová Department of Information Systems Faculty of Information Technology Brno University of Technology Božetěchova 2, 612

More information

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09

Natural Language Processing : Probabilistic Context Free Grammars. Updated 5/09 Natural Language Processing : Probabilistic Context Free Grammars Updated 5/09 Motivation N-gram models and HMM Tagging only allowed us to process sentences linearly. However, even simple sentences require

More information

Language Acquisition and Parameters: Part II

Language Acquisition and Parameters: Part II Language Acquisition and Parameters: Part II Matilde Marcolli CS0: Mathematical and Computational Linguistics Winter 205 Transition Matrices in the Markov Chain Model absorbing states correspond to local

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write: Languages A language is a set (usually infinite) of strings, also known as sentences Each string consists of a sequence of symbols taken from some alphabet An alphabet, V, is a finite set of symbols, e.g.

More information

Processing/Speech, NLP and the Web

Processing/Speech, NLP and the Web CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25 Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March, 2011 Bracketed Structure: Treebank Corpus [ S1[

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

DT2118 Speech and Speaker Recognition

DT2118 Speech and Speaker Recognition DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language

More information

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements

More information

Cross-Entropy and Estimation of Probabilistic Context-Free Grammars

Cross-Entropy and Estimation of Probabilistic Context-Free Grammars Cross-Entropy and Estimation of Probabilistic Context-Free Grammars Anna Corazza Department of Physics University Federico II via Cinthia I-8026 Napoli, Italy corazza@na.infn.it Giorgio Satta Department

More information

Probabilistic Context-free Grammars

Probabilistic Context-free Grammars Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John

More information

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):

More information

A* Search. 1 Dijkstra Shortest Path

A* Search. 1 Dijkstra Shortest Path A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the

More information

Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars

Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars Salil Vadhan October 2, 2012 Reading: Sipser, 2.1 (except Chomsky Normal Form). Algorithmic questions about regular

More information

Einführung in die Computerlinguistik

Einführung in die Computerlinguistik Einführung in die Computerlinguistik Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 22 CFG (1) Example: Grammar G telescope : Productions: S NP VP NP

More information

LECTURER: BURCU CAN Spring

LECTURER: BURCU CAN Spring LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can

More information

Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars

Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars Parsing Beyond Context-Free Grammars: Tree Adjoining Grammars Laura Kallmeyer & Tatiana Bladier Heinrich-Heine-Universität Düsseldorf Sommersemester 2018 Kallmeyer, Bladier SS 2018 Parsing Beyond CFG:

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification

More information

Advanced Natural Language Processing Syntactic Parsing

Advanced Natural Language Processing Syntactic Parsing Advanced Natural Language Processing Syntactic Parsing Alicia Ageno ageno@cs.upc.edu Universitat Politècnica de Catalunya NLP statistical parsing 1 Parsing Review Statistical Parsing SCFG Inside Algorithm

More information

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22 Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky

More information

Sequences and Information

Sequences and Information Sequences and Information Rahul Siddharthan The Institute of Mathematical Sciences, Chennai, India http://www.imsc.res.in/ rsidd/ Facets 16, 04/07/2016 This box says something By looking at the symbols

More information

A Simple Introduction to Information, Channel Capacity and Entropy

A Simple Introduction to Information, Channel Capacity and Entropy A Simple Introduction to Information, Channel Capacity and Entropy Ulrich Hoensch Rocky Mountain College Billings, MT 59102 hoenschu@rocky.edu Friday, April 21, 2017 Introduction A frequently occurring

More information

Models of Language Evolution: Part II

Models of Language Evolution: Part II Models of Language Evolution: Part II Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Main Reference Partha Niyogi, The computational nature of language learning and evolution,

More information

Parsing with Context-Free Grammars

Parsing with Context-Free Grammars Parsing with Context-Free Grammars Berlin Chen 2005 References: 1. Natural Language Understanding, chapter 3 (3.1~3.4, 3.6) 2. Speech and Language Processing, chapters 9, 10 NLP-Berlin Chen 1 Grammars

More information

CS460/626 : Natural Language

CS460/626 : Natural Language CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 23, 24 Parsing Algorithms; Parsing in case of Ambiguity; Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th,

More information

Expectation Maximization (EM)

Expectation Maximization (EM) Expectation Maximization (EM) The EM algorithm is used to train models involving latent variables using training data in which the latent variables are not observed (unlabeled data). This is to be contrasted

More information

{Probabilistic Stochastic} Context-Free Grammars (PCFGs)

{Probabilistic Stochastic} Context-Free Grammars (PCFGs) {Probabilistic Stochastic} Context-Free Grammars (PCFGs) 116 The velocity of the seismic waves rises to... S NP sg VP sg DT NN PP risesto... The velocity IN NP pl of the seismic waves 117 PCFGs APCFGGconsists

More information

Speech Recognition Lecture 5: N-gram Language Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri

Speech Recognition Lecture 5: N-gram Language Models. Eugene Weinstein Google, NYU Courant Institute Slide Credit: Mehryar Mohri Speech Recognition Lecture 5: N-gram Language Models Eugene Weinstein Google, NYU Courant Institute eugenew@cs.nyu.edu Slide Credit: Mehryar Mohri Components Acoustic and pronunciation model: Pr(o w) =

More information

Properties of Context-Free Languages

Properties of Context-Free Languages Properties of Context-Free Languages Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus

A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus A Polynomial Time Algorithm for Parsing with the Bounded Order Lambek Calculus Timothy A. D. Fowler Department of Computer Science University of Toronto 10 King s College Rd., Toronto, ON, M5S 3G4, Canada

More information

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26 Parsing Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 26 Table of contents 1 Context-Free Grammars 2 Simplifying CFGs Removing useless symbols Eliminating

More information

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis

Suppose h maps number and variables to ɛ, and opening parenthesis to 0 and closing parenthesis 1 Introduction Parenthesis Matching Problem Describe the set of arithmetic expressions with correctly matched parenthesis. Arithmetic expressions with correctly matched parenthesis cannot be described

More information

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Probabilistic Context-Free Grammars. Michael Collins, Columbia University Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

A Mathematical Theory of Communication

A Mathematical Theory of Communication A Mathematical Theory of Communication Ben Eggers Abstract This paper defines information-theoretic entropy and proves some elementary results about it. Notably, we prove that given a few basic assumptions

More information

The Noisy Channel Model and Markov Models

The Noisy Channel Model and Markov Models 1/24 The Noisy Channel Model and Markov Models Mark Johnson September 3, 2014 2/24 The big ideas The story so far: machine learning classifiers learn a function that maps a data item X to a label Y handle

More information

Models of Language Evolution

Models of Language Evolution Models of Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Main Reference Partha Niyogi, The computational nature of language learning and evolution, MIT Press, 2006. From

More information

Plan for 2 nd half. Just when you thought it was safe. Just when you thought it was safe. Theory Hall of Fame. Chomsky Normal Form

Plan for 2 nd half. Just when you thought it was safe. Just when you thought it was safe. Theory Hall of Fame. Chomsky Normal Form Plan for 2 nd half Pumping Lemma for CFLs The Return of the Pumping Lemma Just when you thought it was safe Return of the Pumping Lemma Recall: With Regular Languages The Pumping Lemma showed that if a

More information

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this.

This kind of reordering is beyond the power of finite transducers, but a synchronous CFG can do this. Chapter 12 Synchronous CFGs Synchronous context-free grammars are a generalization of CFGs that generate pairs of related strings instead of single strings. They are useful in many situations where one

More information

Introduction to Probablistic Natural Language Processing

Introduction to Probablistic Natural Language Processing Introduction to Probablistic Natural Language Processing Alexis Nasr Laboratoire d Informatique Fondamentale de Marseille Natural Language Processing Use computers to process human languages Machine Translation

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured

More information

Algorithms for Syntax-Aware Statistical Machine Translation

Algorithms for Syntax-Aware Statistical Machine Translation Algorithms for Syntax-Aware Statistical Machine Translation I. Dan Melamed, Wei Wang and Ben Wellington ew York University Syntax-Aware Statistical MT Statistical involves machine learning (ML) seems crucial

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning

Probabilistic Context Free Grammars. Many slides from Michael Collins and Chris Manning Probabilistic Context Free Grammars Many slides from Michael Collins and Chris Manning Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic

More information

Probabilistic Context Free Grammars

Probabilistic Context Free Grammars 1 Defining PCFGs A PCFG G consists of Probabilistic Context Free Grammars 1. A set of terminals: {w k }, k = 1..., V 2. A set of non terminals: { i }, i = 1..., n 3. A designated Start symbol: 1 4. A set

More information

Improved TBL algorithm for learning context-free grammar

Improved TBL algorithm for learning context-free grammar Proceedings of the International Multiconference on ISSN 1896-7094 Computer Science and Information Technology, pp. 267 274 2007 PIPS Improved TBL algorithm for learning context-free grammar Marcin Jaworski

More information

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005

A DOP Model for LFG. Rens Bod and Ronald Kaplan. Kathrin Spreyer Data-Oriented Parsing, 14 June 2005 A DOP Model for LFG Rens Bod and Ronald Kaplan Kathrin Spreyer Data-Oriented Parsing, 14 June 2005 Lexical-Functional Grammar (LFG) Levels of linguistic knowledge represented formally differently (non-monostratal):

More information

PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities

PCFGs 2 L645 / B659. Dept. of Linguistics, Indiana University Fall PCFGs 2. Questions. Calculating P(w 1m ) Inside Probabilities 1 / 22 Inside L645 / B659 Dept. of Linguistics, Indiana University Fall 2015 Inside- 2 / 22 for PCFGs 3 questions for Probabilistic Context Free Grammars (PCFGs): What is the probability of a sentence

More information

Introduction to Theory of Computing

Introduction to Theory of Computing CSCI 2670, Fall 2012 Introduction to Theory of Computing Department of Computer Science University of Georgia Athens, GA 30602 Instructor: Liming Cai www.cs.uga.edu/ cai 0 Lecture Note 3 Context-Free Languages

More information

Context Free Grammars

Context Free Grammars Automata and Formal Languages Context Free Grammars Sipser pages 101-111 Lecture 11 Tim Sheard 1 Formal Languages 1. Context free languages provide a convenient notation for recursive description of languages.

More information

Roger Levy Probabilistic Models in the Study of Language draft, October 2,

Roger Levy Probabilistic Models in the Study of Language draft, October 2, Roger Levy Probabilistic Models in the Study of Language draft, October 2, 2012 224 Chapter 10 Probabilistic Grammars 10.1 Outline HMMs PCFGs ptsgs and ptags Highlight: Zuidema et al., 2008, CogSci; Cohn

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

Models of Language Acquisition: Part II

Models of Language Acquisition: Part II Models of Language Acquisition: Part II Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Probably Approximately Correct Model of Language Learning General setting of Statistical

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

Introduction to Computational Linguistics

Introduction to Computational Linguistics Introduction to Computational Linguistics Olga Zamaraeva (2018) Based on Bender (prev. years) University of Washington May 3, 2018 1 / 101 Midterm Project Milestone 2: due Friday Assgnments 4& 5 due dates

More information

Even More on Dynamic Programming

Even More on Dynamic Programming Algorithms & Models of Computation CS/ECE 374, Fall 2017 Even More on Dynamic Programming Lecture 15 Thursday, October 19, 2017 Sariel Har-Peled (UIUC) CS374 1 Fall 2017 1 / 26 Part I Longest Common Subsequence

More information

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG

Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing. Part I. Formal Properties of TAG. Outline: Formal Properties of TAG Grammar formalisms Tree Adjoining Grammar: Formal Properties, Parsing Laura Kallmeyer, Timm Lichte, Wolfgang Maier Universität Tübingen Part I Formal Properties of TAG 16.05.2007 und 21.05.2007 TAG Parsing

More information

Tree Adjoining Grammars

Tree Adjoining Grammars Tree Adjoining Grammars TAG: Parsing and formal properties Laura Kallmeyer & Benjamin Burkhardt HHU Düsseldorf WS 2017/2018 1 / 36 Outline 1 Parsing as deduction 2 CYK for TAG 3 Closure properties of TALs

More information

LING 473: Day 10. START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars

LING 473: Day 10. START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars LING 473: Day 10 START THE RECORDING Coding for Probability Hidden Markov Models Formal Grammars 1 Issues with Projects 1. *.sh files must have #!/bin/sh at the top (to run on Condor) 2. If run.sh is supposed

More information

An Introduction to Entropy and Subshifts of. Finite Type

An Introduction to Entropy and Subshifts of. Finite Type An Introduction to Entropy and Subshifts of Finite Type Abby Pekoske Department of Mathematics Oregon State University pekoskea@math.oregonstate.edu August 4, 2015 Abstract This work gives an overview

More information

Section 1 (closed-book) Total points 30

Section 1 (closed-book) Total points 30 CS 454 Theory of Computation Fall 2011 Section 1 (closed-book) Total points 30 1. Which of the following are true? (a) a PDA can always be converted to an equivalent PDA that at each step pops or pushes

More information

Language Modeling. Michael Collins, Columbia University

Language Modeling. Michael Collins, Columbia University Language Modeling Michael Collins, Columbia University Overview The language modeling problem Trigram models Evaluating language models: perplexity Estimation techniques: Linear interpolation Discounting

More information

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing

Unit 2: Tree Models. CS 562: Empirical Methods in Natural Language Processing. Lectures 19-23: Context-Free Grammars and Parsing CS 562: Empirical Methods in Natural Language Processing Unit 2: Tree Models Lectures 19-23: Context-Free Grammars and Parsing Oct-Nov 2009 Liang Huang (lhuang@isi.edu) Big Picture we have already covered...

More information

Computational Models - Lecture 4

Computational Models - Lecture 4 Computational Models - Lecture 4 Regular languages: The Myhill-Nerode Theorem Context-free Grammars Chomsky Normal Form Pumping Lemma for context free languages Non context-free languages: Examples Push

More information

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

CMPT-825 Natural Language Processing. Why are parsing algorithms important? CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop October 26, 2010 1/34 Why are parsing algorithms important? A linguistic theory is implemented in a formal system to generate

More information

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP NP PP 1.0. N people 0.

S NP VP 0.9 S VP 0.1 VP V NP 0.5 VP V 0.1 VP V PP 0.1 NP NP NP 0.1 NP NP PP 0.2 NP N 0.7 PP P NP 1.0 VP  NP PP 1.0. N people 0. /6/7 CS 6/CS: Natural Language Processing Instructor: Prof. Lu Wang College of Computer and Information Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang The grammar: Binary, no epsilons,.9..5

More information

Sharpening the empirical claims of generative syntax through formalization

Sharpening the empirical claims of generative syntax through formalization Sharpening the empirical claims of generative syntax through formalization Tim Hunter University of Minnesota, Twin Cities NASSLLI, June 2014 Part 1: Grammars and cognitive hypotheses What is a grammar?

More information

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models

INF4820: Algorithms for Artificial Intelligence and Natural Language Processing. Language Models & Hidden Markov Models 1 University of Oslo : Department of Informatics INF4820: Algorithms for Artificial Intelligence and Natural Language Processing Language Models & Hidden Markov Models Stephan Oepen & Erik Velldal Language

More information

Language Models. Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN

Language Models. Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN Language Models Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN Data Science: Jordan Boyd-Graber UMD Language Models 1 / 8 Language models Language models answer

More information

CS460/626 : Natural Language

CS460/626 : Natural Language CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 27 SMT Assignment; HMM recap; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT Bombay 17 th March, 2011 CMU Pronunciation

More information

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis Context-free grammars Derivations Parse Trees Left-recursive grammars Top-down parsing non-recursive predictive parsers construction of parse tables Bottom-up parsing shift/reduce parsers LR parsers GLR

More information

On the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar

On the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar Proceedings of Machine Learning Research vol 73:153-164, 2017 AMBN 2017 On the Sizes of Decision Diagrams Representing the Set of All Parse Trees of a Context-free Grammar Kei Amii Kyoto University Kyoto

More information

ECS 120: Theory of Computation UC Davis Phillip Rogaway February 16, Midterm Exam

ECS 120: Theory of Computation UC Davis Phillip Rogaway February 16, Midterm Exam ECS 120: Theory of Computation Handout MT UC Davis Phillip Rogaway February 16, 2012 Midterm Exam Instructions: The exam has six pages, including this cover page, printed out two-sided (no more wasted

More information

Computational Models - Lecture 4 1

Computational Models - Lecture 4 1 Computational Models - Lecture 4 1 Handout Mode Iftach Haitner and Yishay Mansour. Tel Aviv University. April 3/8, 2013 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice

More information

Foundations of Informatics: a Bridging Course

Foundations of Informatics: a Bridging Course Foundations of Informatics: a Bridging Course Week 3: Formal Languages and Semantics Thomas Noll Lehrstuhl für Informatik 2 RWTH Aachen University noll@cs.rwth-aachen.de http://www.b-it-center.de/wob/en/view/class211_id948.html

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output

More information

Probabilistic Context Free Grammars. Many slides from Michael Collins

Probabilistic Context Free Grammars. Many slides from Michael Collins Probabilistic Context Free Grammars Many slides from Michael Collins Overview I Probabilistic Context-Free Grammars (PCFGs) I The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

Automata Theory (2A) Young Won Lim 5/31/18

Automata Theory (2A) Young Won Lim 5/31/18 Automata Theory (2A) Copyright (c) 2018 Young W. Lim. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later

More information

A non-turing-recognizable language

A non-turing-recognizable language CS 360: Introduction to the Theory of Computing John Watrous, University of Waterloo A non-turing-recognizable language 1 OVERVIEW Thus far in the course we have seen many examples of decidable languages

More information

Information Theory. Week 4 Compressing streams. Iain Murray,

Information Theory. Week 4 Compressing streams. Iain Murray, Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 4 Compressing streams Iain Murray, 2014 School of Informatics, University of Edinburgh Jensen s inequality For convex functions: E[f(x)]

More information

d(ν) = max{n N : ν dmn p n } N. p d(ν) (ν) = ρ.

d(ν) = max{n N : ν dmn p n } N. p d(ν) (ν) = ρ. 1. Trees; context free grammars. 1.1. Trees. Definition 1.1. By a tree we mean an ordered triple T = (N, ρ, p) (i) N is a finite set; (ii) ρ N ; (iii) p : N {ρ} N ; (iv) if n N + and ν dmn p n then p n

More information

The effect of non-tightness on Bayesian estimation of PCFGs

The effect of non-tightness on Bayesian estimation of PCFGs The effect of non-tightness on Bayesian estimation of PCFGs hay B. Cohen Department of Computer cience Columbia University scohen@cs.columbia.edu Mark Johnson Department of Computing Macquarie University

More information

Probabilistic Language Modeling

Probabilistic Language Modeling Predicting String Probabilities Probabilistic Language Modeling Which string is more likely? (Which string is more grammatical?) Grill doctoral candidates. Regina Barzilay EECS Department MIT November

More information

Context-Free Grammars. 2IT70 Finite Automata and Process Theory

Context-Free Grammars. 2IT70 Finite Automata and Process Theory Context-Free Grammars 2IT70 Finite Automata and Process Theory Technische Universiteit Eindhoven May 18, 2016 Generating strings language L 1 = {a n b n n > 0} ab L 1 if w L 1 then awb L 1 production rules

More information

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology Basic Text Analysis Hidden Markov Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakimnivre@lingfiluuse Basic Text Analysis 1(33) Hidden Markov Models Markov models are

More information

THE MEASURE-THEORETIC ENTROPY OF LINEAR CELLULAR AUTOMATA WITH RESPECT TO A MARKOV MEASURE. 1. Introduction

THE MEASURE-THEORETIC ENTROPY OF LINEAR CELLULAR AUTOMATA WITH RESPECT TO A MARKOV MEASURE. 1. Introduction THE MEASURE-THEORETIC ENTROPY OF LINEAR CELLULAR AUTOMATA WITH RESPECT TO A MARKOV MEASURE HASAN AKIN Abstract. The purpose of this short paper is to compute the measure-theoretic entropy of the onedimensional

More information

Finite Presentations of Pregroups and the Identity Problem

Finite Presentations of Pregroups and the Identity Problem 6 Finite Presentations of Pregroups and the Identity Problem Alexa H. Mater and James D. Fix Abstract We consider finitely generated pregroups, and describe how an appropriately defined rewrite relation

More information

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment.

Attendee information. Seven Lectures on Statistical Parsing. Phrase structure grammars = context-free grammars. Assessment. even Lectures on tatistical Parsing Christopher Manning LA Linguistic Institute 7 LA Lecture Attendee information Please put on a piece of paper: ame: Affiliation: tatus (undergrad, grad, industry, prof,

More information

Formal Languages, Grammars and Automata Lecture 5

Formal Languages, Grammars and Automata Lecture 5 Formal Languages, Grammars and Automata Lecture 5 Helle Hvid Hansen helle@cs.ru.nl http://www.cs.ru.nl/~helle/ Foundations Group Intelligent Systems Section Institute for Computing and Information Sciences

More information

IBM Model 1 for Machine Translation

IBM Model 1 for Machine Translation IBM Model 1 for Machine Translation Micha Elsner March 28, 2014 2 Machine translation A key area of computational linguistics Bar-Hillel points out that human-like translation requires understanding of

More information

Definition: A binary relation R from a set A to a set B is a subset R A B. Example:

Definition: A binary relation R from a set A to a set B is a subset R A B. Example: Chapter 9 1 Binary Relations Definition: A binary relation R from a set A to a set B is a subset R A B. Example: Let A = {0,1,2} and B = {a,b} {(0, a), (0, b), (1,a), (2, b)} is a relation from A to B.

More information

Geometry of Phylogenetic Inference

Geometry of Phylogenetic Inference Geometry of Phylogenetic Inference Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 References N. Eriksson, K. Ranestad, B. Sturmfels, S. Sullivant, Phylogenetic algebraic

More information

Chapter 6. Properties of Regular Languages

Chapter 6. Properties of Regular Languages Chapter 6 Properties of Regular Languages Regular Sets and Languages Claim(1). The family of languages accepted by FSAs consists of precisely the regular sets over a given alphabet. Every regular set is

More information

In this chapter, we explore the parsing problem, which encompasses several questions, including:

In this chapter, we explore the parsing problem, which encompasses several questions, including: Chapter 12 Parsing Algorithms 12.1 Introduction In this chapter, we explore the parsing problem, which encompasses several questions, including: Does L(G) contain w? What is the highest-weight derivation

More information

Compiling Techniques

Compiling Techniques Lecture 5: Top-Down Parsing 6 October 2015 The Parser Context-Free Grammar (CFG) Lexer Source code Scanner char Tokeniser token Parser AST Semantic Analyser AST IR Generator IR Errors Checks the stream

More information

Chapter 14 (Partially) Unsupervised Parsing

Chapter 14 (Partially) Unsupervised Parsing Chapter 14 (Partially) Unsupervised Parsing The linguistically-motivated tree transformations we discussed previously are very effective, but when we move to a new language, we may have to come up with

More information

Non-context-Free Languages. CS215, Lecture 5 c

Non-context-Free Languages. CS215, Lecture 5 c Non-context-Free Languages CS215, Lecture 5 c 2007 1 The Pumping Lemma Theorem. (Pumping Lemma) Let be context-free. There exists a positive integer divided into five pieces, Proof for for each, and..

More information

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing

Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Context-Free Parsing: CKY & Earley Algorithms and Probabilistic Parsing Natural Language Processing CS 4120/6120 Spring 2017 Northeastern University David Smith with some slides from Jason Eisner & Andrew

More information

To make a grammar probabilistic, we need to assign a probability to each context-free rewrite

To make a grammar probabilistic, we need to assign a probability to each context-free rewrite Notes on the Inside-Outside Algorithm To make a grammar probabilistic, we need to assign a probability to each context-free rewrite rule. But how should these probabilities be chosen? It is natural to

More information

Quiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7

Quiz 1, COMS Name: Good luck! 4705 Quiz 1 page 1 of 7 Quiz 1, COMS 4705 Name: 10 30 30 20 Good luck! 4705 Quiz 1 page 1 of 7 Part #1 (10 points) Question 1 (10 points) We define a PCFG where non-terminal symbols are {S,, B}, the terminal symbols are {a, b},

More information

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers

More information

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY 15-453 FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY REVIEW for MIDTERM 1 THURSDAY Feb 6 Midterm 1 will cover everything we have seen so far The PROBLEMS will be from Sipser, Chapters 1, 2, 3 It will be

More information