Random Generation of Nondeterministic Tree Automata

Size: px
Start display at page:

Download "Random Generation of Nondeterministic Tree Automata"

Transcription

1 Random Generation of Nondeterministic Tree Automata Thomas Hanneforth 1 and Andreas Maletti 2 and Daniel Quernheim 2 1 Department of Linguistics University of Potsdam, Germany 2 Institute for Natural Language Processing University of Stuttgart, Germany maletti@ims.uni-stuttgart.de Hanoi, Vietnam (TTATT 2013)

2 Outline Motivation Nondeterministic Tree Automata Random Generation Analysis

3 Tree Substitution Grammar with Latent Variables Experiment [SHINDO et al., ACL 2012 best paper] F1 score grammar w 40 full CFG = LTL 62.7 TSG [POST, GILDEA, 2009] = xltl 82.6 TSG [COHN et al., 2010] = xltl CFGlv [COLLINS, 1999] = NTA CFGlv [PETROV, KLEIN, 2007] = NTA CFGlv [PETROV, 2010] = NTA 91.8 TSGlv (single) = RTG TSGlv (multiple) = RTG Discriminative Parsers CARRERAS et al., CHARNIAK, JOHNSON, HUANG,

4 Tree Substitution Grammar with Latent Variables Experiment [SHINDO et al., ACL 2012 best paper] F1 score grammar w 40 full CFG = LTL 62.7 TSG [POST, GILDEA, 2009] = xltl 82.6 TSG [COHN et al., 2010] = xltl CFGlv [COLLINS, 1999] = NTA CFGlv [PETROV, KLEIN, 2007] = NTA CFGlv [PETROV, 2010] = NTA 91.8 TSGlv (single) = RTG TSGlv (multiple) = RTG Discriminative Parsers CARRERAS et al., CHARNIAK, JOHNSON, HUANG,

5 Berkeley Parser Example parse S NP VP DT VBZ NP This is DT JJ NN a silly sentence from

6 Berkeley Parser Example productions S-1 ADJP-2 S S-1 ADJP-1 S S-1 VP-5 VP S-2 VP-5 VP S-1 PP-7 VP S-9 NP Formalism Berkeley parser = CFG (local tree grammar) + relabeling (+ weights)

7 Typical NTA Sizes English BERKELEY parser grammar (1,133 states and 4,267,277 transitions) English EGRET parser grammar Chinese EGRET parser grammar 153 MB 107 MB 98 MB EGRET = HUI ZHANG s C++ reimplementation of the BERKELEY parser (Java)

8 Algorithm testing Observations even efficient algorithms run slow on such data often require huge amounts of memory impossible for inefficient algorithms

9 Algorithm testing Observations even efficient algorithms run slow on such data often require huge amounts of memory impossible for inefficient algorithms realistic, but difficult to use as test data

10 Algorithm testing Observations even efficient algorithms run slow on such data often require huge amounts of memory impossible for inefficient algorithms realistic, but difficult to use as test data Testing on random NTA straightforward to implement straightforward to scale

11 Algorithm testing Observations even efficient algorithms run slow on such data often require huge amounts of memory impossible for inefficient algorithms realistic, but difficult to use as test data Testing on random NTA straightforward to implement straightforward to scale but what is the significance of the results?

12 Outline Motivation Nondeterministic Tree Automata Random Generation Analysis

13 Tree automaton Definition (THATCHER AND WRIGHT, 1965) A tree automaton is a tuple A = (Q, Σ, I, R) with alphabet Q ranked alphabet Σ I Q finite set R Σ(Q) Q states terminals final states rules Remark Instead of (l, q) we write l q

14 Regular Tree Grammar Example Q = {q 0, q 1, q 2, q 3, q 4, q 5, q 6 } Σ = {VP, S,... } F = {q 0 } and the following rules: VP q 4 S q 5 q 1 q 3 q 1 q 4 q 0 S q 6 q 2 q 0

15 Regular Tree Grammar Definition (Derivation semantics) Sentential forms: ξ, ζ T Σ (Q) ξ A ζ if there exist position w pos(ξ) and rule l q R ξ = ξ[l] w ζ = ξ[q] w

16 Regular Tree Grammar Definition (Derivation semantics) Sentential forms: ξ, ζ T Σ (Q) ξ A ζ if there exist position w pos(ξ) and rule l q R ξ = ξ[l] w ζ = ξ[q] w Definition (Recognized tree language) L(A) = {t T Σ f F : t A f }

17 Outline Motivation Nondeterministic Tree Automata Random Generation Analysis

18 Previous Approaches HÉAM et al for deterministic tree-walking automata (and deterministic top-down tree automata)

19 Previous Approaches HÉAM et al for deterministic tree-walking automata (and deterministic top-down tree automata) focus on generating automata uniformly at random (for estimating average-case complexity)

20 Previous Approaches HÉAM et al for deterministic tree-walking automata (and deterministic top-down tree automata) focus on generating automata uniformly at random (for estimating average-case complexity) generator used for evaluation of conversion from det. TWA to NTA

21 Previous Approaches HUGOT et al for tree automata with global equality constraints

22 Previous Approaches HUGOT et al for tree automata with global equality constraints focus on avoiding trivial cases (removal of unreachable states, minimum height requirement)

23 Previous Approaches HUGOT et al for tree automata with global equality constraints focus on avoiding trivial cases (removal of unreachable states, minimum height requirement) generator used for evaluation of emptiness checker

24 Our Approach Goals randomly generate non-trivial NTA generator (potentially) usable for all NTA algorithms

25 Our Approach Goals randomly generate non-trivial NTA generator (potentially) usable for all NTA algorithms When is an NTA non-trivial? large number of states large number of rules

26 Our Approach Goals randomly generate non-trivial NTA generator (potentially) usable for all NTA algorithms When is an NTA non-trivial? large number of states large number of rules

27 Our Approach Goals randomly generate non-trivial NTA generator (potentially) usable for all NTA algorithms When is an NTA non-trivial? large number of states large number of rules its language contains large trees

28 Our Approach Goals randomly generate non-trivial NTA generator (potentially) usable for all NTA algorithms When is an NTA non-trivial? large number of states large number of rules its language contains large trees

29 Our Approach Goals randomly generate non-trivial NTA generator (potentially) usable for all NTA algorithms When is an NTA non-trivial? large number of states large number of rules its language contains large trees its language has many MYHILL-NERODE congruence classes canonical NTA has many states (canonical NTA = equivalent minimal deterministic NTA)

30 Our Approach Restrictions binary trees (all RTL can be such encoded with linear overhead)

31 Our Approach Restrictions binary trees (all RTL can be such encoded with linear overhead) each state is final with probability.5

32 Our Approach Restrictions binary trees (all RTL can be such encoded with linear overhead) each state is final with probability.5 uniform probability for binary/nullary rules

33 Our Approach Restrictions binary trees (all RTL can be such encoded with linear overhead) each state is final with probability.5 uniform probability for binary/nullary rules three parameters 1. input binary ranked alphabet Σ = Σ 2 Σ 0 2. number n of states of generated NTA scaling 3. nullary rule probability d 0 for all nullary rules 4. binary rule probability d 2 for all binary rules

34 Our Approach Algorithm 1. Generate n states [n] = {1,..., n}

35 Our Approach Algorithm 1. Generate n states [n] = {1,..., n} 2. Make q final with probability 0.5 q [n]

36 Our Approach Algorithm 1. Generate n states [n] = {1,..., n} 2. Make q final with probability 0.5 q [n] 3. Add rule α q with probability d 0 α Σ 0, q [n]

37 Our Approach Algorithm 1. Generate n states [n] = {1,..., n} 2. Make q final with probability 0.5 q [n] 3. Add rule α q with probability d 0 α Σ 0, q [n] 4. Add rule σ(q 1, q 2 ) q with probability d 2 σ Σ 2, q, q 1, q 2 [n]

38 Our Approach Algorithm 1. Generate n states [n] = {1,..., n} 2. Make q final with probability 0.5 q [n] 3. Add rule α q with probability d 0 α Σ 0, q [n] 4. Add rule σ(q 1, q 2 ) q with probability d 2 σ Σ 2, q, q 1, q 2 [n] 5. Reject if it is not trim

39 Our Approach Algorithm 1. Generate n states [n] = {1,..., n} 2. Make q final with probability 0.5 q [n] 3. Add rule α q with probability d 0 α Σ 0, q [n] 4. Add rule σ(q 1, q 2 ) q with probability d 2 σ Σ 2, q, q 1, q 2 [n] 5. Reject if it is not trim Evaluation 1. Determinize 2. Minimize 3. Number of obtained states = complexity of the original random NTA

40 Outline Motivation Nondeterministic Tree Automata Random Generation Analysis

41 Determinization Definition (Power-set construction) P(A) = (P(Q), Σ, F, R ) with F = {S Q S F } α {q Q α q R} R α Σ 0 σ(s 1, S 2 ) {q Q σ(q 1, q 2 ) q R, q 1 S 1, q 2 S 2 } R σ Σ 2, S 1, S 2 Q

42 Determinization Definition (Power-set construction) P(A) = (P(Q), Σ, F, R ) with F = {S Q S F } α {q Q α q R} R α Σ 0 σ(s 1, S 2 ) {q Q σ(q 1, q 2 ) q R, q 1 S 1, q 2 S 2 } R σ Σ 2, S 1, S 2 Q Note will be the guiding definition for the analytical analysis

43 Analytical Analysis Intuition power-set construction should create each state S Q given states S 1, S 2 selected uniformly at random, each state q Q should occur in target of σ(s 1, S 2 ) with probability.5 (the same intuition is also used for string automata) this intuition will create large NTA after determinization (but that they remain large after minimization is non-trivial) we will confirm the intuition experimentally

44 Analytical Analysis Theorem If d 2 = 4(1 n2.5) and d0 =.5, then the intuition is met. Proof. Let S 1, S 2 Q be selected uniformly at random σ Σ 2, q Q π(q σ(s 1, S 2 )) = 1 π(q / σ(s 1, S 2 )) = 1 ( ) 1 π(q 1 S 1 ) π(q 2 S 2 ) π(σ(q 1, q 2 ) q R) q 1,q 2 Q = 1 ( 1 d 2 4 ) n 2 = 1 ( n 2.5 ) n 2 = 1 ( n 2.5) n 2 = 1 2

45 Analytical Predictions n d 2 d 2 CI n d 2 d 2 CI

46 Empirical Evaluation Setup Σ = {σ (2), α (0) } evaluation for random NTA with various densities d 2 (at least 40 random NTA per data point d 2 ) logarithmic scale for d 2 (enough datapoints on both sides of the spike)

47 Empirical Evaluation mean # of states in DTA 8 states transition density

48 Empirical Evaluation states 3000 mean # of states in DTA transition density

49 Empirical Evaluation Observations (almost perfect) log-normal distributions we can determine the mean (empirical and analytical)

50 Empirical Evaluation Observations (almost perfect) log-normal distributions we can determine the mean (empirical and analytical) hardest instances

51 Empirical Evaluation Observations (almost perfect) log-normal distributions we can determine the mean (empirical and analytical) hardest instances outside hardest instances: all trivial

52 Empirical Evaluation Observations (almost perfect) log-normal distributions we can determine the mean (empirical and analytical) hardest instances outside hardest instances: all trivial only test on random NTA for hardest density

53 Analytical Predictions n d 2 d 2 CI n d 2 d 2 CI

54 Analytical Predictions + Empirical Evaluation n d 2 d 2 CI n d 2 d 2 CI

55 Analytical Predictions + Empirical Evaluation n d 2 d 2 CI n d 2 d 2 CI [.577,.680] [.032,.053] [.209,.316] [.027,.043] [.102,.174] [.023,.034] [.064,.114] [.021,.030] [.048,.085] [.018,.025] [.038,.066] [.016,.022] CI = confidence interval; 95% confidence level

56 Conclusion Use random NTA carefully!

Pushdown Automata (Pre Lecture)

Pushdown Automata (Pre Lecture) Pushdown Automata (Pre Lecture) Dr. Neil T. Dantam CSCI-561, Colorado School of Mines Fall 2017 Dantam (Mines CSCI-561) Pushdown Automata (Pre Lecture) Fall 2017 1 / 41 Outline Pushdown Automata Pushdown

More information

Applications of Tree Automata Theory Lecture VI: Back to Machine Translation

Applications of Tree Automata Theory Lecture VI: Back to Machine Translation Applications of Tree Automata Theory Lecture VI: Back to Machine Translation Andreas Maletti Institute of Computer Science Universität Leipzig, Germany on leave from: Institute for Natural Language Processing

More information

Applications of Tree Automata Theory Lecture V: Theory of Tree Transducers

Applications of Tree Automata Theory Lecture V: Theory of Tree Transducers Applications of Tree Automata Theory Lecture V: Theory of Tree Transducers Andreas Maletti Institute of Computer Science Universität Leipzig, Germany on leave from: Institute for Natural Language Processing

More information

Natural Language Processing

Natural Language Processing SFU NatLangLab Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class Simon Fraser University September 27, 2018 0 Natural Language Processing Anoop Sarkar anoopsarkar.github.io/nlp-class

More information

CPS 220 Theory of Computation Pushdown Automata (PDA)

CPS 220 Theory of Computation Pushdown Automata (PDA) CPS 220 Theory of Computation Pushdown Automata (PDA) Nondeterministic Finite Automaton with some extra memory Memory is called the stack, accessed in a very restricted way: in a First-In First-Out fashion

More information

Statistical Machine Translation of Natural Languages

Statistical Machine Translation of Natural Languages 1/26 Statistical Machine Translation of Natural Languages Heiko Vogler Technische Universität Dresden Germany Graduiertenkolleg Quantitative Logics and Automata Dresden, November, 2012 1/26 Weighted Tree

More information

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) Parsing Based on presentations from Chris Manning s course on Statistical Parsing (Stanford) S N VP V NP D N John hit the ball Levels of analysis Level Morphology/Lexical POS (morpho-synactic), WSD Elements

More information

Probabilistic Context-free Grammars

Probabilistic Context-free Grammars Probabilistic Context-free Grammars Computational Linguistics Alexander Koller 24 November 2017 The CKY Recognizer S NP VP NP Det N VP V NP V ate NP John Det a N sandwich i = 1 2 3 4 k = 2 3 4 5 S NP John

More information

UNIT-VIII COMPUTABILITY THEORY

UNIT-VIII COMPUTABILITY THEORY CONTEXT SENSITIVE LANGUAGE UNIT-VIII COMPUTABILITY THEORY A Context Sensitive Grammar is a 4-tuple, G = (N, Σ P, S) where: N Set of non terminal symbols Σ Set of terminal symbols S Start symbol of the

More information

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed HKN CS/ECE 374 Midterm 1 Review Nathan Bleier and Mahir Morshed For the most part, all about strings! String induction (to some extent) Regular languages Regular expressions (regexps) Deterministic finite

More information

Lecture 7: Introduction to syntax-based MT

Lecture 7: Introduction to syntax-based MT Lecture 7: Introduction to syntax-based MT Andreas Maletti Statistical Machine Translation Stuttgart December 16, 2011 SMT VII A. Maletti 1 Lecture 7 Goals Overview Tree substitution grammars (tree automata)

More information

Lecture 9: Decoding. Andreas Maletti. Stuttgart January 20, Statistical Machine Translation. SMT VIII A. Maletti 1

Lecture 9: Decoding. Andreas Maletti. Stuttgart January 20, Statistical Machine Translation. SMT VIII A. Maletti 1 Lecture 9: Decoding Andreas Maletti Statistical Machine Translation Stuttgart January 20, 2012 SMT VIII A. Maletti 1 Lecture 9 Last time Synchronous grammars (tree transducers) Rule extraction Weight training

More information

Chapter 3. Regular grammars

Chapter 3. Regular grammars Chapter 3 Regular grammars 59 3.1 Introduction Other view of the concept of language: not the formalization of the notion of effective procedure, but set of words satisfying a given set of rules Origin

More information

Push-down Automata = FA + Stack

Push-down Automata = FA + Stack Push-down Automata = FA + Stack PDA Definition A push-down automaton M is a tuple M = (Q,, Γ, δ, q0, F) where Q is a finite set of states is the input alphabet (of terminal symbols, terminals) Γ is the

More information

Finite-state Machines: Theory and Applications

Finite-state Machines: Theory and Applications Finite-state Machines: Theory and Applications Unweighted Finite-state Automata Thomas Hanneforth Institut für Linguistik Universität Potsdam December 10, 2008 Thomas Hanneforth (Universität Potsdam) Finite-state

More information

Computational Models - Lecture 4

Computational Models - Lecture 4 Computational Models - Lecture 4 Regular languages: The Myhill-Nerode Theorem Context-free Grammars Chomsky Normal Form Pumping Lemma for context free languages Non context-free languages: Examples Push

More information

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016 CKY & Earley Parsing Ling 571 Deep Processing Techniques for NLP January 13, 2016 No Class Monday: Martin Luther King Jr. Day CKY Parsing: Finish the parse Recognizer à Parser Roadmap Earley parsing Motivation:

More information

Büchi Automata and their closure properties. - Ajith S and Ankit Kumar

Büchi Automata and their closure properties. - Ajith S and Ankit Kumar Büchi Automata and their closure properties - Ajith S and Ankit Kumar Motivation Conventional programs accept input, compute, output result, then terminate Reactive program : not expected to terminate

More information

Pushdown Automata (2015/11/23)

Pushdown Automata (2015/11/23) Chapter 6 Pushdown Automata (2015/11/23) Sagrada Familia, Barcelona, Spain Outline 6.0 Introduction 6.1 Definition of PDA 6.2 The Language of a PDA 6.3 Euivalence of PDA s and CFG s 6.4 Deterministic PDA

More information

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 5 : DFA minimization

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 5 : DFA minimization COMP-33 Theory of Computation Fall 27 -- Prof. Claude Crépeau Lec. 5 : DFA minimization COMP 33 Fall 27: Lectures Schedule 4. Context-free languages 5. Pushdown automata 6. Parsing 7. The pumping lemma

More information

CISC4090: Theory of Computation

CISC4090: Theory of Computation CISC4090: Theory of Computation Chapter 2 Context-Free Languages Courtesy of Prof. Arthur G. Werschulz Fordham University Department of Computer and Information Sciences Spring, 2014 Overview In Chapter

More information

What we have done so far

What we have done so far What we have done so far DFAs and regular languages NFAs and their equivalence to DFAs Regular expressions. Regular expressions capture exactly regular languages: Construct a NFA from a regular expression.

More information

Einführung in die Computerlinguistik

Einführung in die Computerlinguistik Einführung in die Computerlinguistik Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 22 CFG (1) Example: Grammar G telescope : Productions: S NP VP NP

More information

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where Recitation 11 Notes Context Free Grammars Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x A V, and x (V T)*. Examples Problem 1. Given the

More information

Introduction to Formal Languages, Automata and Computability p.1/42

Introduction to Formal Languages, Automata and Computability p.1/42 Introduction to Formal Languages, Automata and Computability Pushdown Automata K. Krithivasan and R. Rama Introduction to Formal Languages, Automata and Computability p.1/42 Introduction We have considered

More information

Introduction to Computers & Programming

Introduction to Computers & Programming 16.070 Introduction to Computers & Programming Theory of computation: What is a computer? FSM, Automata Prof. Kristina Lundqvist Dept. of Aero/Astro, MIT Models of Computation What is a computer? If you

More information

FORMAL LANGUAGES, AUTOMATA AND COMPUTATION

FORMAL LANGUAGES, AUTOMATA AND COMPUTATION FORMAL LANGUAGES, AUTOMATA AND COMPUTATION DECIDABILITY ( LECTURE 15) SLIDES FOR 15-453 SPRING 2011 1 / 34 TURING MACHINES-SYNOPSIS The most general model of computation Computations of a TM are described

More information

Intro to Theory of Computation

Intro to Theory of Computation Intro to Theory of Computation LECTURE 7 Last time: Proving a language is not regular Pushdown automata (PDAs) Today: Context-free grammars (CFG) Equivalence of CFGs and PDAs Sofya Raskhodnikova 1/31/2016

More information

Nondeterministic Finite Automata

Nondeterministic Finite Automata Nondeterministic Finite Automata Lecture 6 Section 2.2 Robb T. Koether Hampden-Sydney College Mon, Sep 5, 2016 Robb T. Koether (Hampden-Sydney College) Nondeterministic Finite Automata Mon, Sep 5, 2016

More information

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing L445 / L545 / B659 Dept. of Linguistics, Indiana University Spring 2016 1 / 46 : Overview Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the

More information

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46. : Overview L545 Dept. of Linguistics, Indiana University Spring 2013 Input: a string Output: a (single) parse tree A useful step in the process of obtaining meaning We can view the problem as searching

More information

Multiword Expression Identification with Tree Substitution Grammars

Multiword Expression Identification with Tree Substitution Grammars Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic

More information

CMP 309: Automata Theory, Computability and Formal Languages. Adapted from the work of Andrej Bogdanov

CMP 309: Automata Theory, Computability and Formal Languages. Adapted from the work of Andrej Bogdanov CMP 309: Automata Theory, Computability and Formal Languages Adapted from the work of Andrej Bogdanov Course outline Introduction to Automata Theory Finite Automata Deterministic Finite state automata

More information

DM17. Beregnelighed. Jacob Aae Mikkelsen

DM17. Beregnelighed. Jacob Aae Mikkelsen DM17 Beregnelighed Jacob Aae Mikkelsen January 12, 2007 CONTENTS Contents 1 Introduction 2 1.1 Operations with languages...................... 2 2 Finite Automata 3 2.1 Regular expressions/languages....................

More information

Outline. CS21 Decidability and Tractability. Machine view of FA. Machine view of FA. Machine view of FA. Machine view of FA.

Outline. CS21 Decidability and Tractability. Machine view of FA. Machine view of FA. Machine view of FA. Machine view of FA. Outline CS21 Decidability and Tractability Lecture 5 January 16, 219 and Languages equivalence of NPDAs and CFGs non context-free languages January 16, 219 CS21 Lecture 5 1 January 16, 219 CS21 Lecture

More information

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars COMP-330 Theory of Computation Fall 2017 -- Prof. Claude Crépeau Lec. 10 : Context-Free Grammars COMP 330 Fall 2017: Lectures Schedule 1-2. Introduction 1.5. Some basic mathematics 2-3. Deterministic finite

More information

Intro to Theory of Computation

Intro to Theory of Computation Intro to Theory of Computation 1/19/2016 LECTURE 3 Last time: DFAs and NFAs Operations on languages Today: Nondeterminism Equivalence of NFAs and DFAs Closure properties of regular languages Sofya Raskhodnikova

More information

Improving Neural Parsing by Disentangling Model Combination and Reranking Effects. Daniel Fried*, Mitchell Stern* and Dan Klein UC Berkeley

Improving Neural Parsing by Disentangling Model Combination and Reranking Effects. Daniel Fried*, Mitchell Stern* and Dan Klein UC Berkeley Improving Neural Parsing by Disentangling Model Combination and Reranking Effects Daniel Fried*, Mitchell Stern* and Dan Klein UC erkeley Top-down generative models Top-down generative models S VP The

More information

SYLLABUS. Introduction to Finite Automata, Central Concepts of Automata Theory. CHAPTER - 3 : REGULAR EXPRESSIONS AND LANGUAGES

SYLLABUS. Introduction to Finite Automata, Central Concepts of Automata Theory. CHAPTER - 3 : REGULAR EXPRESSIONS AND LANGUAGES Contents i SYLLABUS UNIT - I CHAPTER - 1 : AUT UTOMA OMATA Introduction to Finite Automata, Central Concepts of Automata Theory. CHAPTER - 2 : FINITE AUT UTOMA OMATA An Informal Picture of Finite Automata,

More information

Lecture Notes on Inductive Definitions

Lecture Notes on Inductive Definitions Lecture Notes on Inductive Definitions 15-312: Foundations of Programming Languages Frank Pfenning Lecture 2 September 2, 2004 These supplementary notes review the notion of an inductive definition and

More information

Theory of Computation (I) Yijia Chen Fudan University

Theory of Computation (I) Yijia Chen Fudan University Theory of Computation (I) Yijia Chen Fudan University Instructor Yijia Chen Homepage: http://basics.sjtu.edu.cn/~chen Email: yijiachen@fudan.edu.cn Textbook Introduction to the Theory of Computation Michael

More information

Accept or reject. Stack

Accept or reject. Stack Pushdown Automata CS351 Just as a DFA was equivalent to a regular expression, we have a similar analogy for the context-free grammar. A pushdown automata (PDA) is equivalent in power to contextfree grammars.

More information

INF Introduction and Regular Languages. Daniel Lupp. 18th January University of Oslo. Department of Informatics. Universitetet i Oslo

INF Introduction and Regular Languages. Daniel Lupp. 18th January University of Oslo. Department of Informatics. Universitetet i Oslo INF28 1. Introduction and Regular Languages Daniel Lupp Universitetet i Oslo 18th January 218 Department of Informatics University of Oslo INF28 Lecture :: 18th January 1 / 33 Details on the Course consists

More information

Nondeterministic Finite Automata

Nondeterministic Finite Automata Nondeterministic Finite Automata Not A DFA Does not have exactly one transition from every state on every symbol: Two transitions from q 0 on a No transition from q 1 (on either a or b) Though not a DFA,

More information

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology

More information

Composition Closure of ε-free Linear Extended Top-down Tree Transducers

Composition Closure of ε-free Linear Extended Top-down Tree Transducers Composition Closure of ε-free Linear Extended Top-down Tree Transducers Zoltán Fülöp 1, and Andreas Maletti 2, 1 Department of Foundations of Computer Science, University of Szeged Árpád tér 2, H-6720

More information

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

Probabilistic Context-Free Grammars. Michael Collins, Columbia University Probabilistic Context-Free Grammars Michael Collins, Columbia University Overview Probabilistic Context-Free Grammars (PCFGs) The CKY Algorithm for parsing with PCFGs A Probabilistic Context-Free Grammar

More information

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning A Supertag-Context Model for Weakly-Supervised CCG Parser Learning Dan Garrette Chris Dyer Jason Baldridge Noah A. Smith U. Washington CMU UT-Austin CMU Contributions 1. A new generative model for learning

More information

Lecture 17: Language Recognition

Lecture 17: Language Recognition Lecture 17: Language Recognition Finite State Automata Deterministic and Non-Deterministic Finite Automata Regular Expressions Push-Down Automata Turing Machines Modeling Computation When attempting to

More information

Lecture 1: Finite State Automaton

Lecture 1: Finite State Automaton Lecture 1: Finite State Automaton Instructor: Ketan Mulmuley Scriber: Yuan Li January 6, 2015 1 Deterministic Finite Automaton Informally, a deterministic finite automaton (DFA) has finite number of s-

More information

Theory of Computation (IV) Yijia Chen Fudan University

Theory of Computation (IV) Yijia Chen Fudan University Theory of Computation (IV) Yijia Chen Fudan University Review language regular context-free machine DFA/ NFA PDA syntax regular expression context-free grammar Pushdown automata Definition A pushdown automaton

More information

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0 Massachusetts Institute of Technology 6.863J/9.611J, Natural Language Processing, Spring, 2001 Department of Electrical Engineering and Computer Science Department of Brain and Cognitive Sciences Handout

More information

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science Natural Language Processing CS 6840 Lecture 06 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Statistical Parsing Define a probabilistic model of syntax P(T S):

More information

Chapter 14 (Partially) Unsupervised Parsing

Chapter 14 (Partially) Unsupervised Parsing Chapter 14 (Partially) Unsupervised Parsing The linguistically-motivated tree transformations we discussed previously are very effective, but when we move to a new language, we may have to come up with

More information

Computability and Complexity

Computability and Complexity Computability and Complexity Sequences and Automata CAS 705 Ryszard Janicki Department of Computing and Software McMaster University Hamilton, Ontario, Canada janicki@mcmaster.ca Ryszard Janicki Computability

More information

CP405 Theory of Computation

CP405 Theory of Computation CP405 Theory of Computation BB(3) q 0 q 1 q 2 0 q 1 1R q 2 0R q 2 1L 1 H1R q 1 1R q 0 1L Growing Fast BB(3) = 6 BB(4) = 13 BB(5) = 4098 BB(6) = 3.515 x 10 18267 (known) (known) (possible) (possible) Language:

More information

Computational Models - Lecture 4 1

Computational Models - Lecture 4 1 Computational Models - Lecture 4 1 Handout Mode Iftach Haitner and Yishay Mansour. Tel Aviv University. April 3/8, 2013 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice

More information

1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u,

1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u, 1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u, v, x, y, z as per the pumping theorem. 3. Prove that

More information

CS Pushdown Automata

CS Pushdown Automata Chap. 6 Pushdown Automata 6.1 Definition of Pushdown Automata Example 6.2 L ww R = {ww R w (0+1) * } Palindromes over {0, 1}. A cfg P 0 1 0P0 1P1. Consider a FA with a stack(= a Pushdown automaton; PDA).

More information

Finite State Transducers

Finite State Transducers Finite State Transducers Eric Gribkoff May 29, 2013 Original Slides by Thomas Hanneforth (Universitat Potsdam) Outline 1 Definition of Finite State Transducer 2 Examples of FSTs 3 Definition of Regular

More information

A* Search. 1 Dijkstra Shortest Path

A* Search. 1 Dijkstra Shortest Path A* Search Consider the eight puzzle. There are eight tiles numbered 1 through 8 on a 3 by three grid with nine locations so that one location is left empty. We can move by sliding a tile adjacent to the

More information

COM364 Automata Theory Lecture Note 2 - Nondeterminism

COM364 Automata Theory Lecture Note 2 - Nondeterminism COM364 Automata Theory Lecture Note 2 - Nondeterminism Kurtuluş Küllü March 2018 The FA we saw until now were deterministic FA (DFA) in the sense that for each state and input symbol there was exactly

More information

2017/08/29 Chapter 1.2 in Sipser Ø Announcement:

2017/08/29 Chapter 1.2 in Sipser Ø Announcement: Nondeterministic Human-aware Finite Robo.cs Automata 2017/08/29 Chapter 1.2 in Sipser Ø Announcement: q Piazza registration: http://piazza.com/asu/fall2017/cse355 q First poll will be posted on Piazza

More information

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 9 : Myhill-Nerode Theorem and applications

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 9 : Myhill-Nerode Theorem and applications COMP-33 Theory of Computation Fall 217 -- Prof. Claude Crépeau Lec. 9 : Myhill-Nerode Theorem and applications COMP 33 Fall 212: Lectures Schedule 1-2. Introduction 1.5. Some basic mathematics 2-3. Deterministic

More information

Roger Levy Probabilistic Models in the Study of Language draft, October 2,

Roger Levy Probabilistic Models in the Study of Language draft, October 2, Roger Levy Probabilistic Models in the Study of Language draft, October 2, 2012 224 Chapter 10 Probabilistic Grammars 10.1 Outline HMMs PCFGs ptsgs and ptags Highlight: Zuidema et al., 2008, CogSci; Cohn

More information

Chapter 1. Formal Definition and View. Lecture Formal Pushdown Automata on the 28th April 2009

Chapter 1. Formal Definition and View. Lecture Formal Pushdown Automata on the 28th April 2009 Chapter 1 Formal and View Lecture on the 28th April 2009 Formal of PA Faculty of Information Technology Brno University of Technology 1.1 Aim of the Lecture 1 Define pushdown automaton in a formal way

More information

CPS 220 Theory of Computation

CPS 220 Theory of Computation CPS 22 Theory of Computation Review - Regular Languages RL - a simple class of languages that can be represented in two ways: 1 Machine description: Finite Automata are machines with a finite number of

More information

Backward and Forward Bisimulation Minimisation of Tree Automata

Backward and Forward Bisimulation Minimisation of Tree Automata Backward and Forward Bisimulation Minimisation of Tree Automata Johanna Högberg Department of Computing Science Umeå University, S 90187 Umeå, Sweden Andreas Maletti Faculty of Computer Science Technische

More information

Chapter Five: Nondeterministic Finite Automata

Chapter Five: Nondeterministic Finite Automata Chapter Five: Nondeterministic Finite Automata From DFA to NFA A DFA has exactly one transition from every state on every symbol in the alphabet. By relaxing this requirement we get a related but more

More information

LECTURER: BURCU CAN Spring

LECTURER: BURCU CAN Spring LECTURER: BURCU CAN 2017-2018 Spring Regular Language Hidden Markov Model (HMM) Context Free Language Context Sensitive Language Probabilistic Context Free Grammar (PCFG) Unrestricted Language PCFGs can

More information

Final exam study sheet for CS3719 Turing machines and decidability.

Final exam study sheet for CS3719 Turing machines and decidability. Final exam study sheet for CS3719 Turing machines and decidability. A Turing machine is a finite automaton with an infinite memory (tape). Formally, a Turing machine is a 6-tuple M = (Q, Σ, Γ, δ, q 0,

More information

Harvard CS 121 and CSCI E-207 Lecture 10: Ambiguity, Pushdown Automata

Harvard CS 121 and CSCI E-207 Lecture 10: Ambiguity, Pushdown Automata Harvard CS 121 and CSCI E-207 Lecture 10: Ambiguity, Pushdown Automata Salil Vadhan October 4, 2012 Reading: Sipser, 2.2. Another example of a CFG (with proof) L = {x {a, b} : x has the same # of a s and

More information

Theory of Computation

Theory of Computation Thomas Zeugmann Hokkaido University Laboratory for Algorithmics http://www-alg.ist.hokudai.ac.jp/ thomas/toc/ Lecture 3: Finite State Automata Motivation In the previous lecture we learned how to formalize

More information

Context-Free Languages

Context-Free Languages CS:4330 Theory of Computation Spring 2018 Context-Free Languages Pushdown Automata Haniel Barbosa Readings for this lecture Chapter 2 of [Sipser 1996], 3rd edition. Section 2.2. Finite automaton 1 / 13

More information

Introduction to Computational Linguistics

Introduction to Computational Linguistics Introduction to Computational Linguistics Olga Zamaraeva (2018) Based on Bender (prev. years) University of Washington May 3, 2018 1 / 101 Midterm Project Milestone 2: due Friday Assgnments 4& 5 due dates

More information

Hyper-Minimization. Lossy compression of deterministic automata. Andreas Maletti. Institute for Natural Language Processing University of Stuttgart

Hyper-Minimization. Lossy compression of deterministic automata. Andreas Maletti. Institute for Natural Language Processing University of Stuttgart Hyper-Minimization Lossy compression of deterministic automata Andreas Maletti Institute for Natural Language Processing University of Stuttgart andreas.maletti@ims.uni-stuttgart.de Debrecen August 19,

More information

Computational Models - Lecture 4 1

Computational Models - Lecture 4 1 Computational Models - Lecture 4 1 Handout Mode Iftach Haitner. Tel Aviv University. November 21, 2016 1 Based on frames by Benny Chor, Tel Aviv University, modifying frames by Maurice Herlihy, Brown University.

More information

Computational Models - Lecture 3

Computational Models - Lecture 3 Slides modified by Benny Chor, based on original slides by Maurice Herlihy, Brown University. p. 1 Computational Models - Lecture 3 Equivalence of regular expressions and regular languages (lukewarm leftover

More information

Pushdown Automata: Introduction (2)

Pushdown Automata: Introduction (2) Pushdown Automata: Introduction Pushdown automaton (PDA) M = (K, Σ, Γ,, s, A) where K is a set of states Σ is an input alphabet Γ is a set of stack symbols s K is the start state A K is a set of accepting

More information

Spectral Unsupervised Parsing with Additive Tree Metrics

Spectral Unsupervised Parsing with Additive Tree Metrics Spectral Unsupervised Parsing with Additive Tree Metrics Ankur Parikh, Shay Cohen, Eric P. Xing Carnegie Mellon, University of Edinburgh Ankur Parikh 2014 1 Overview Model: We present a novel approach

More information

CISC 4090 Theory of Computation

CISC 4090 Theory of Computation 9/2/28 Stereotypical computer CISC 49 Theory of Computation Finite state machines & Regular languages Professor Daniel Leeds dleeds@fordham.edu JMH 332 Central processing unit (CPU) performs all the instructions

More information

Foundations of Informatics: a Bridging Course

Foundations of Informatics: a Bridging Course Foundations of Informatics: a Bridging Course Week 3: Formal Languages and Semantics Thomas Noll Lehrstuhl für Informatik 2 RWTH Aachen University noll@cs.rwth-aachen.de http://www.b-it-center.de/wob/en/view/class211_id948.html

More information

Finite Automata. Finite Automata

Finite Automata. Finite Automata Finite Automata Finite Automata Formal Specification of Languages Generators Grammars Context-free Regular Regular Expressions Recognizers Parsers, Push-down Automata Context Free Grammar Finite State

More information

Part I: Definitions and Properties

Part I: Definitions and Properties Turing Machines Part I: Definitions and Properties Finite State Automata Deterministic Automata (DFSA) M = {Q, Σ, δ, q 0, F} -- Σ = Symbols -- Q = States -- q 0 = Initial State -- F = Accepting States

More information

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22 Parsing Probabilistic CFG (PCFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 22 Table of contents 1 Introduction 2 PCFG 3 Inside and outside probability 4 Parsing Jurafsky

More information

Finite-state machines (FSMs)

Finite-state machines (FSMs) Finite-state machines (FSMs) Dr. C. Constantinides Department of Computer Science and Software Engineering Concordia University Montreal, Canada January 10, 2017 1/19 Finite-state machines (FSMs) and state

More information

CISC 4090 Theory of Computation

CISC 4090 Theory of Computation CISC 4090 Theory of Computation Context-Free Languages and Push Down Automata Professor Daniel Leeds dleeds@fordham.edu JMH 332 Languages: Regular and Beyond Regular: Captured by Regular Operations a b

More information

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET Regular Languages and FA A language is a set of strings over a finite alphabet Σ. All languages are finite or countably infinite. The set of all languages

More information

Compositions of Bottom-Up Tree Series Transformations

Compositions of Bottom-Up Tree Series Transformations Compositions of Bottom-Up Tree Series Transformations Andreas Maletti a Technische Universität Dresden Fakultät Informatik D 01062 Dresden, Germany maletti@tcs.inf.tu-dresden.de May 17, 2005 1. Motivation

More information

Statistical Methods for NLP

Statistical Methods for NLP Statistical Methods for NLP Stochastic Grammars Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(22) Structured Classification

More information

5 Context-Free Languages

5 Context-Free Languages CA320: COMPUTABILITY AND COMPLEXITY 1 5 Context-Free Languages 5.1 Context-Free Grammars Context-Free Grammars Context-free languages are specified with a context-free grammar (CFG). Formally, a CFG G

More information

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is Abstraction of Problems Data: abstracted as a word in a given alphabet. Σ: alphabet, a finite, non-empty set of symbols. Σ : all the words of finite length built up using Σ: Conditions: abstracted as a

More information

ECS 120: Theory of Computation UC Davis Phillip Rogaway February 16, Midterm Exam

ECS 120: Theory of Computation UC Davis Phillip Rogaway February 16, Midterm Exam ECS 120: Theory of Computation Handout MT UC Davis Phillip Rogaway February 16, 2012 Midterm Exam Instructions: The exam has six pages, including this cover page, printed out two-sided (no more wasted

More information

October 6, Equivalence of Pushdown Automata with Context-Free Gramm

October 6, Equivalence of Pushdown Automata with Context-Free Gramm Equivalence of Pushdown Automata with Context-Free Grammar October 6, 2013 Motivation Motivation CFG and PDA are equivalent in power: a CFG generates a context-free language and a PDA recognizes a context-free

More information

Processing/Speech, NLP and the Web

Processing/Speech, NLP and the Web CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25 Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March, 2011 Bracketed Structure: Treebank Corpus [ S1[

More information

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

CMPT-825 Natural Language Processing. Why are parsing algorithms important? CMPT-825 Natural Language Processing Anoop Sarkar http://www.cs.sfu.ca/ anoop October 26, 2010 1/34 Why are parsing algorithms important? A linguistic theory is implemented in a formal system to generate

More information

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007. Proofs, Strings, and Finite Automata CS154 Chris Pollett Feb 5, 2007. Outline Proofs and Proof Strategies Strings Finding proofs Example: For every graph G, the sum of the degrees of all the nodes in G

More information

Finite-State Transducers

Finite-State Transducers Finite-State Transducers - Seminar on Natural Language Processing - Michael Pradel July 6, 2007 Finite-state transducers play an important role in natural language processing. They provide a model for

More information

Classes and conversions

Classes and conversions Classes and conversions Regular expressions Syntax: r = ε a r r r + r r Semantics: The language L r of a regular expression r is inductively defined as follows: L =, L ε = {ε}, L a = a L r r = L r L r

More information

CMPT307: Complexity Classes: P and N P Week 13-1

CMPT307: Complexity Classes: P and N P Week 13-1 CMPT307: Complexity Classes: P and N P Week 13-1 Xian Qiu Simon Fraser University xianq@sfu.ca Strings and Languages an alphabet Σ is a finite set of symbols {0, 1}, {T, F}, {a, b,..., z}, N a string x

More information