Random Generation of Nondeterministic Tree Automata

Similar documents
Pushdown Automata (Pre Lecture)

Applications of Tree Automata Theory Lecture VI: Back to Machine Translation

Applications of Tree Automata Theory Lecture V: Theory of Tree Transducers

Natural Language Processing

CPS 220 Theory of Computation Pushdown Automata (PDA)

Statistical Machine Translation of Natural Languages

Parsing. Based on presentations from Chris Manning s course on Statistical Parsing (Stanford)

Probabilistic Context-free Grammars

UNIT-VIII COMPUTABILITY THEORY

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

Lecture 7: Introduction to syntax-based MT

Lecture 9: Decoding. Andreas Maletti. Stuttgart January 20, Statistical Machine Translation. SMT VIII A. Maletti 1

Chapter 3. Regular grammars

Push-down Automata = FA + Stack

Finite-state Machines: Theory and Applications

Computational Models - Lecture 4

CKY & Earley Parsing. Ling 571 Deep Processing Techniques for NLP January 13, 2016

Büchi Automata and their closure properties. - Ajith S and Ankit Kumar

Pushdown Automata (2015/11/23)

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 5 : DFA minimization

CISC4090: Theory of Computation

What we have done so far

Einführung in die Computerlinguistik

Definition: A grammar G = (V, T, P,S) is a context free grammar (cfg) if all productions in P have the form A x where

Introduction to Formal Languages, Automata and Computability p.1/42

Introduction to Computers & Programming

FORMAL LANGUAGES, AUTOMATA AND COMPUTATION

Intro to Theory of Computation

Nondeterministic Finite Automata

Parsing with CFGs L445 / L545 / B659. Dept. of Linguistics, Indiana University Spring Parsing with CFGs. Direction of processing

Parsing with CFGs. Direction of processing. Top-down. Bottom-up. Left-corner parsing. Chart parsing CYK. Earley 1 / 46.

Multiword Expression Identification with Tree Substitution Grammars

CMP 309: Automata Theory, Computability and Formal Languages. Adapted from the work of Andrej Bogdanov

DM17. Beregnelighed. Jacob Aae Mikkelsen

Outline. CS21 Decidability and Tractability. Machine view of FA. Machine view of FA. Machine view of FA. Machine view of FA.

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 10 : Context-Free Grammars

Intro to Theory of Computation

Improving Neural Parsing by Disentangling Model Combination and Reranking Effects. Daniel Fried*, Mitchell Stern* and Dan Klein UC Berkeley

SYLLABUS. Introduction to Finite Automata, Central Concepts of Automata Theory. CHAPTER - 3 : REGULAR EXPRESSIONS AND LANGUAGES

Lecture Notes on Inductive Definitions

Theory of Computation (I) Yijia Chen Fudan University

Accept or reject. Stack

INF Introduction and Regular Languages. Daniel Lupp. 18th January University of Oslo. Department of Informatics. Universitetet i Oslo

Nondeterministic Finite Automata

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation

Composition Closure of ε-free Linear Extended Top-down Tree Transducers

Probabilistic Context-Free Grammars. Michael Collins, Columbia University

A Supertag-Context Model for Weakly-Supervised CCG Parser Learning

Lecture 17: Language Recognition

Lecture 1: Finite State Automaton

Theory of Computation (IV) Yijia Chen Fudan University

Handout 8: Computation & Hierarchical parsing II. Compute initial state set S 0 Compute initial state set S 0

Natural Language Processing CS Lecture 06. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Chapter 14 (Partially) Unsupervised Parsing

Computability and Complexity

CP405 Theory of Computation

Computational Models - Lecture 4 1

1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u,

CS Pushdown Automata

Finite State Transducers

A* Search. 1 Dijkstra Shortest Path

COM364 Automata Theory Lecture Note 2 - Nondeterminism

2017/08/29 Chapter 1.2 in Sipser Ø Announcement:

COMP-330 Theory of Computation. Fall Prof. Claude Crépeau. Lec. 9 : Myhill-Nerode Theorem and applications

Roger Levy Probabilistic Models in the Study of Language draft, October 2,

Chapter 1. Formal Definition and View. Lecture Formal Pushdown Automata on the 28th April 2009

CPS 220 Theory of Computation

Backward and Forward Bisimulation Minimisation of Tree Automata

Chapter Five: Nondeterministic Finite Automata

LECTURER: BURCU CAN Spring

Final exam study sheet for CS3719 Turing machines and decidability.

Harvard CS 121 and CSCI E-207 Lecture 10: Ambiguity, Pushdown Automata

Theory of Computation

Context-Free Languages

Introduction to Computational Linguistics

Hyper-Minimization. Lossy compression of deterministic automata. Andreas Maletti. Institute for Natural Language Processing University of Stuttgart

Computational Models - Lecture 4 1

Computational Models - Lecture 3

Pushdown Automata: Introduction (2)

Spectral Unsupervised Parsing with Additive Tree Metrics

CISC 4090 Theory of Computation

Foundations of Informatics: a Bridging Course

Finite Automata. Finite Automata

Part I: Definitions and Properties

Parsing. Probabilistic CFG (PCFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 22

Finite-state machines (FSMs)

CISC 4090 Theory of Computation

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Compositions of Bottom-Up Tree Series Transformations

Statistical Methods for NLP

5 Context-Free Languages

Theory of Computation p.1/?? Theory of Computation p.2/?? Unknown: Implicitly a Boolean variable: true if a word is

ECS 120: Theory of Computation UC Davis Phillip Rogaway February 16, Midterm Exam

October 6, Equivalence of Pushdown Automata with Context-Free Gramm

Processing/Speech, NLP and the Web

CMPT-825 Natural Language Processing. Why are parsing algorithms important?

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.

Finite-State Transducers

Classes and conversions

CMPT307: Complexity Classes: P and N P Week 13-1

Transcription:

Random Generation of Nondeterministic Tree Automata Thomas Hanneforth 1 and Andreas Maletti 2 and Daniel Quernheim 2 1 Department of Linguistics University of Potsdam, Germany 2 Institute for Natural Language Processing University of Stuttgart, Germany maletti@ims.uni-stuttgart.de Hanoi, Vietnam (TTATT 2013)

Outline Motivation Nondeterministic Tree Automata Random Generation Analysis

Tree Substitution Grammar with Latent Variables Experiment [SHINDO et al., ACL 2012 best paper] F1 score grammar w 40 full CFG = LTL 62.7 TSG [POST, GILDEA, 2009] = xltl 82.6 TSG [COHN et al., 2010] = xltl 85.4 84.7 CFGlv [COLLINS, 1999] = NTA 88.6 88.2 CFGlv [PETROV, KLEIN, 2007] = NTA 90.6 90.1 CFGlv [PETROV, 2010] = NTA 91.8 TSGlv (single) = RTG 91.6 91.1 TSGlv (multiple) = RTG 92.9 92.4 Discriminative Parsers CARRERAS et al., 2008 91.1 CHARNIAK, JOHNSON, 2005 92.0 91.4 HUANG, 2008 92.3 91.7

Tree Substitution Grammar with Latent Variables Experiment [SHINDO et al., ACL 2012 best paper] F1 score grammar w 40 full CFG = LTL 62.7 TSG [POST, GILDEA, 2009] = xltl 82.6 TSG [COHN et al., 2010] = xltl 85.4 84.7 CFGlv [COLLINS, 1999] = NTA 88.6 88.2 CFGlv [PETROV, KLEIN, 2007] = NTA 90.6 90.1 CFGlv [PETROV, 2010] = NTA 91.8 TSGlv (single) = RTG 91.6 91.1 TSGlv (multiple) = RTG 92.9 92.4 Discriminative Parsers CARRERAS et al., 2008 91.1 CHARNIAK, JOHNSON, 2005 92.0 91.4 HUANG, 2008 92.3 91.7

Berkeley Parser Example parse S NP VP DT VBZ NP This is DT JJ NN a silly sentence from http://tomato.banatao.berkeley.edu:8080/parser/parser.html

Berkeley Parser Example productions S-1 ADJP-2 S-1 0.0035453455987323125 10 0 S-1 ADJP-1 S-1 2.108608433271444 10 6 S-1 VP-5 VP-3 1.6367163259885093 10 4 S-2 VP-5 VP-3 9.724998692152419 10 8 S-1 PP-7 VP-0 1.0686659961009547 10 5 S-9 NP-3 0.012551243773149695 10 0 Formalism Berkeley parser = CFG (local tree grammar) + relabeling (+ weights)

Typical NTA Sizes English BERKELEY parser grammar (1,133 states and 4,267,277 transitions) English EGRET parser grammar Chinese EGRET parser grammar 153 MB 107 MB 98 MB EGRET = HUI ZHANG s C++ reimplementation of the BERKELEY parser (Java)

Algorithm testing Observations even efficient algorithms run slow on such data often require huge amounts of memory impossible for inefficient algorithms

Algorithm testing Observations even efficient algorithms run slow on such data often require huge amounts of memory impossible for inefficient algorithms realistic, but difficult to use as test data

Algorithm testing Observations even efficient algorithms run slow on such data often require huge amounts of memory impossible for inefficient algorithms realistic, but difficult to use as test data Testing on random NTA straightforward to implement straightforward to scale

Algorithm testing Observations even efficient algorithms run slow on such data often require huge amounts of memory impossible for inefficient algorithms realistic, but difficult to use as test data Testing on random NTA straightforward to implement straightforward to scale but what is the significance of the results?

Outline Motivation Nondeterministic Tree Automata Random Generation Analysis

Tree automaton Definition (THATCHER AND WRIGHT, 1965) A tree automaton is a tuple A = (Q, Σ, I, R) with alphabet Q ranked alphabet Σ I Q finite set R Σ(Q) Q states terminals final states rules Remark Instead of (l, q) we write l q

Regular Tree Grammar Example Q = {q 0, q 1, q 2, q 3, q 4, q 5, q 6 } Σ = {VP, S,... } F = {q 0 } and the following rules: VP q 4 S q 5 q 1 q 3 q 1 q 4 q 0 S q 6 q 2 q 0

Regular Tree Grammar Definition (Derivation semantics) Sentential forms: ξ, ζ T Σ (Q) ξ A ζ if there exist position w pos(ξ) and rule l q R ξ = ξ[l] w ζ = ξ[q] w

Regular Tree Grammar Definition (Derivation semantics) Sentential forms: ξ, ζ T Σ (Q) ξ A ζ if there exist position w pos(ξ) and rule l q R ξ = ξ[l] w ζ = ξ[q] w Definition (Recognized tree language) L(A) = {t T Σ f F : t A f }

Outline Motivation Nondeterministic Tree Automata Random Generation Analysis

Previous Approaches HÉAM et al. 2009 for deterministic tree-walking automata (and deterministic top-down tree automata)

Previous Approaches HÉAM et al. 2009 for deterministic tree-walking automata (and deterministic top-down tree automata) focus on generating automata uniformly at random (for estimating average-case complexity)

Previous Approaches HÉAM et al. 2009 for deterministic tree-walking automata (and deterministic top-down tree automata) focus on generating automata uniformly at random (for estimating average-case complexity) generator used for evaluation of conversion from det. TWA to NTA

Previous Approaches HUGOT et al. 2010 for tree automata with global equality constraints

Previous Approaches HUGOT et al. 2010 for tree automata with global equality constraints focus on avoiding trivial cases (removal of unreachable states, minimum height requirement)

Previous Approaches HUGOT et al. 2010 for tree automata with global equality constraints focus on avoiding trivial cases (removal of unreachable states, minimum height requirement) generator used for evaluation of emptiness checker

Our Approach Goals randomly generate non-trivial NTA generator (potentially) usable for all NTA algorithms

Our Approach Goals randomly generate non-trivial NTA generator (potentially) usable for all NTA algorithms When is an NTA non-trivial? large number of states large number of rules

Our Approach Goals randomly generate non-trivial NTA generator (potentially) usable for all NTA algorithms When is an NTA non-trivial? large number of states large number of rules

Our Approach Goals randomly generate non-trivial NTA generator (potentially) usable for all NTA algorithms When is an NTA non-trivial? large number of states large number of rules its language contains large trees

Our Approach Goals randomly generate non-trivial NTA generator (potentially) usable for all NTA algorithms When is an NTA non-trivial? large number of states large number of rules its language contains large trees

Our Approach Goals randomly generate non-trivial NTA generator (potentially) usable for all NTA algorithms When is an NTA non-trivial? large number of states large number of rules its language contains large trees its language has many MYHILL-NERODE congruence classes canonical NTA has many states (canonical NTA = equivalent minimal deterministic NTA)

Our Approach Restrictions binary trees (all RTL can be such encoded with linear overhead)

Our Approach Restrictions binary trees (all RTL can be such encoded with linear overhead) each state is final with probability.5

Our Approach Restrictions binary trees (all RTL can be such encoded with linear overhead) each state is final with probability.5 uniform probability for binary/nullary rules

Our Approach Restrictions binary trees (all RTL can be such encoded with linear overhead) each state is final with probability.5 uniform probability for binary/nullary rules three parameters 1. input binary ranked alphabet Σ = Σ 2 Σ 0 2. number n of states of generated NTA scaling 3. nullary rule probability d 0 for all nullary rules 4. binary rule probability d 2 for all binary rules

Our Approach Algorithm 1. Generate n states [n] = {1,..., n}

Our Approach Algorithm 1. Generate n states [n] = {1,..., n} 2. Make q final with probability 0.5 q [n]

Our Approach Algorithm 1. Generate n states [n] = {1,..., n} 2. Make q final with probability 0.5 q [n] 3. Add rule α q with probability d 0 α Σ 0, q [n]

Our Approach Algorithm 1. Generate n states [n] = {1,..., n} 2. Make q final with probability 0.5 q [n] 3. Add rule α q with probability d 0 α Σ 0, q [n] 4. Add rule σ(q 1, q 2 ) q with probability d 2 σ Σ 2, q, q 1, q 2 [n]

Our Approach Algorithm 1. Generate n states [n] = {1,..., n} 2. Make q final with probability 0.5 q [n] 3. Add rule α q with probability d 0 α Σ 0, q [n] 4. Add rule σ(q 1, q 2 ) q with probability d 2 σ Σ 2, q, q 1, q 2 [n] 5. Reject if it is not trim

Our Approach Algorithm 1. Generate n states [n] = {1,..., n} 2. Make q final with probability 0.5 q [n] 3. Add rule α q with probability d 0 α Σ 0, q [n] 4. Add rule σ(q 1, q 2 ) q with probability d 2 σ Σ 2, q, q 1, q 2 [n] 5. Reject if it is not trim Evaluation 1. Determinize 2. Minimize 3. Number of obtained states = complexity of the original random NTA

Outline Motivation Nondeterministic Tree Automata Random Generation Analysis

Determinization Definition (Power-set construction) P(A) = (P(Q), Σ, F, R ) with F = {S Q S F } α {q Q α q R} R α Σ 0 σ(s 1, S 2 ) {q Q σ(q 1, q 2 ) q R, q 1 S 1, q 2 S 2 } R σ Σ 2, S 1, S 2 Q

Determinization Definition (Power-set construction) P(A) = (P(Q), Σ, F, R ) with F = {S Q S F } α {q Q α q R} R α Σ 0 σ(s 1, S 2 ) {q Q σ(q 1, q 2 ) q R, q 1 S 1, q 2 S 2 } R σ Σ 2, S 1, S 2 Q Note will be the guiding definition for the analytical analysis

Analytical Analysis Intuition power-set construction should create each state S Q given states S 1, S 2 selected uniformly at random, each state q Q should occur in target of σ(s 1, S 2 ) with probability.5 (the same intuition is also used for string automata) this intuition will create large NTA after determinization (but that they remain large after minimization is non-trivial) we will confirm the intuition experimentally

Analytical Analysis Theorem If d 2 = 4(1 n2.5) and d0 =.5, then the intuition is met. Proof. Let S 1, S 2 Q be selected uniformly at random σ Σ 2, q Q π(q σ(s 1, S 2 )) = 1 π(q / σ(s 1, S 2 )) = 1 ( ) 1 π(q 1 S 1 ) π(q 2 S 2 ) π(σ(q 1, q 2 ) q R) q 1,q 2 Q = 1 ( 1 d 2 4 ) n 2 = 1 ( 1 1 + n 2.5 ) n 2 = 1 ( n 2.5) n 2 = 1 2

Analytical Predictions n d 2 d 2 CI n d 2 d 2 CI 2.636 8.043 3.297 9.034 4.170 10.028 5.109 11.023 6.076 12.019 7.056 13.016

Empirical Evaluation Setup Σ = {σ (2), α (0) } evaluation for random NTA with various densities d 2 (at least 40 random NTA per data point d 2 ) logarithmic scale for d 2 (enough datapoints on both sides of the spike)

Empirical Evaluation mean # of states in DTA 8 states 200 180 160 140 120 100 80 60 40 20 0 0.001 0.01 0.1 1 transition density

Empirical Evaluation 3500 12 states 3000 mean # of states in DTA 2500 2000 1500 1000 500 0 0.0001 0.001 0.01 0.1 1 transition density

Empirical Evaluation Observations (almost perfect) log-normal distributions we can determine the mean (empirical and analytical)

Empirical Evaluation Observations (almost perfect) log-normal distributions we can determine the mean (empirical and analytical) hardest instances

Empirical Evaluation Observations (almost perfect) log-normal distributions we can determine the mean (empirical and analytical) hardest instances outside hardest instances: all trivial

Empirical Evaluation Observations (almost perfect) log-normal distributions we can determine the mean (empirical and analytical) hardest instances outside hardest instances: all trivial only test on random NTA for hardest density

Analytical Predictions n d 2 d 2 CI n d 2 d 2 CI 2.636 8.043 3.297 9.034 4.170 10.028 5.109 11.023 6.076 12.019 7.056 13.016

Analytical Predictions + Empirical Evaluation n d 2 d 2 CI n d 2 d 2 CI 2.636.626 8.043.041 3.297.257 9.034.034 4.170.133 10.028.028 5.109.086 11.023.025 6.076.064 12.019.021 7.056.050 13.016.019

Analytical Predictions + Empirical Evaluation n d 2 d 2 CI n d 2 d 2 CI 2.636.626 [.577,.680] 8.043.041 [.032,.053] 3.297.257 [.209,.316] 9.034.034 [.027,.043] 4.170.133 [.102,.174] 10.028.028 [.023,.034] 5.109.086 [.064,.114] 11.023.025 [.021,.030] 6.076.064 [.048,.085] 12.019.021 [.018,.025] 7.056.050 [.038,.066] 13.016.019 [.016,.022] CI = confidence interval; 95% confidence level

Conclusion Use random NTA carefully!