CroMo - Morphological Analysis for Croatian

Size: px
Start display at page:

Download "CroMo - Morphological Analysis for Croatian"

Transcription

1 - Morphological Analysis for Croatian Damir Ćavar 1, Ivo-Pavao Jazbec 2 and Tomislav Stojanov 2 Linguistics Department, University of Zadar 1 Institute of Croatian Language and Linguistics 2 FSMNLP 2008

2

3 Scenario Synchronic and diachronic study of language change and acquisition models Language data from a long period of time, and three major dialects in Croatia implying: Variation wrt. e.g. string-based morphology or feature bundles Ongoing discovery wrt. string combinatorics and features Research questions require quantitative and qualitative information: of phonological, morphological, syntactic and semantic tokens and feature bundles, and their correlation and variation at various stages over time

4 Morphological segmentation and annotation and lemmatization, and... Segmenting words: isponapijali su se they got drunk a little bit to satisfaction is po napija li Annotating segments: aspect prefix aspect prefix from stem-lemma napiti plural participle Extending the annotation: to a certain saturation a little bit get drunk from root-lemma piti past event

5 FSA Architecture Mapping of morph-groups to DFSAs (Mealy or Moore machine): 0 č p š v root (-index 1 5 i t e 2 3 a 4 v root )-index 0 n p v pref (-index 1 3 a o 2 4 v pref )-index asp v pref )-index asp o 8 v suf )-index pres 1st pl m 2 v suf )-index pres 1st sg š 3 v suf )-index pres 2st sg 0 ε t j 1 4 v suf )-index pres 3rd sg e 5 v suf )-index pres 2nd pl v suf (-index 6 v suf )-index 2nd sg imper u 7 v suf )-index pres 3rd pl

6 FSA Architecture Mapping ambiguity on emission: emission tuple 1 to n Label DFSAs with variable names Use rules referring to variable names for modeling of morphotactic regularities: verbaspectprefs*. verbatiroots. verbinflsuf

7 FSA Architecture Generating potentially cyclic DFSAs: m š v suf )-index pres 1st sg o v suf )-index pres 2st sg 19 v suf )-index pres 1st p p 0 3 n ε ε o 4 ε v pref )-index asp 5 č p š v root (-index 6 8 i t e 7 9 a 10 v root )-index ε 11 ε t j v suf )-index pres 3rd sg e 16 v suf )-index pres 2nd v pref (-index 1 a 2 ε v pref )-index asp v suf (-index 17 u v suf )-index 2nd sg imper 18 v suf )-index pres 3rd

8 FSA Architecture Ambiguity mapped on emission tuple: e 4 [ v suff )-index; adj suff ) 0 g [ v root (-index; adj root (-index ] 1 o 2 r 3 [ v root )-index; adj root )-index ] [ v suff (-index; adj suff (-index ]

9 FSA Architecture Lemmatization as a rule: Rightmost root is the semantic head Root-lemma: generate canonical word-form from the right-most root neprijatelja ne + prijatelj + a NEG + N-root + ACC not friend = enemy friend not compositional! but useful for semantic field analysis! root-lemma: neprijatelja prijatelj Stem/base-lemma: generate canonical word-form from the stem without inflectional suffixes base-lemma: neprijatelja neprijatelj

10 FSA Architecture Lemmatization (Hack): emission of byte-offset for suffix-elimination pointer to suffix string Clean solution: g:(g, (FV,...)) 0 1 o:(o, ()) 2 r:(r, ()) 3 e:(a, (FV,...)) 4

11 Implementation C++ wrapper for final application Ragel code (automaton definition) generated from morpheme DBs and rules, with associated feature bundles (extended version of Ragel, ( V. 6.1) for handling ambiguity via introduction of multiple emission symbols = emission tuples) Ragel generated C code (jump-code) Morpheme tables Rules Ragel code Code Code DOT Binary

12 Implementation Emission (feature bundles): as one bit-vector Features mapped from the General Ontology for Linguistic Description (upper ontology) possibility: reasoning over linguistic concepts and features Optimization: mapping of concepts and their relations on a compressed bit-vector, maintaining inheritance and implicatures top-node concept sub-class terminal-classes

13 Hardware: dual core 2.4 GHz Lexical base: 120,000 morphemes (and allomorphs) Speed: approx. 50,000 tokens per second with average morpheme count of 2.5 per token Size: binary footprint approx. 5 MB Compilation (tables Ragel + C; Ragel C + DOT; gcc bin): approx. 5 minutes, min. 4 GB RAM for monolithic architecture

14 Interoperability issues addressed: GOLD platform independent code code-page independence Extensible (turnaround time of some minutes) Minimally invasive and minimalistic Open-source

Introduction to Computers & Programming

Introduction to Computers & Programming 16.070 Introduction to Computers & Programming Theory of computation: What is a computer? FSM, Automata Prof. Kristina Lundqvist Dept. of Aero/Astro, MIT Models of Computation What is a computer? If you

More information

Finite-state Machines: Theory and Applications

Finite-state Machines: Theory and Applications Finite-state Machines: Theory and Applications Unweighted Finite-state Automata Thomas Hanneforth Institut für Linguistik Universität Potsdam December 10, 2008 Thomas Hanneforth (Universität Potsdam) Finite-state

More information

Introduction to Formal Languages, Automata and Computability p.1/51

Introduction to Formal Languages, Automata and Computability p.1/51 Introduction to Formal Languages, Automata and Computability Finite State Automata K. Krithivasan and R. Rama Introduction to Formal Languages, Automata and Computability p.1/51 Introduction As another

More information

Part I: Definitions and Properties

Part I: Definitions and Properties Turing Machines Part I: Definitions and Properties Finite State Automata Deterministic Automata (DFSA) M = {Q, Σ, δ, q 0, F} -- Σ = Symbols -- Q = States -- q 0 = Initial State -- F = Accepting States

More information

Embedded Systems Development

Embedded Systems Development Embedded Systems Development Lecture 2 Finite Automata & SyncCharts Daniel Kästner AbsInt Angewandte Informatik GmbH kaestner@absint.com Some things I forgot to mention 2 Remember the HISPOS registration

More information

Syntax Analysis Part I

Syntax Analysis Part I 1 Syntax Analysis Part I Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007-2013 2 Position of a Parser in the Compiler Model Source Program Lexical Analyzer

More information

Syntax Analysis Part I

Syntax Analysis Part I 1 Syntax Analysis Part I Chapter 4 COP5621 Compiler Construction Copyright Robert van Engelen, Florida State University, 2007-2013 2 Position of a Parser in the Compiler Model Source Program Lexical Analyzer

More information

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example

Administrivia. Test I during class on 10 March. Bottom-Up Parsing. Lecture An Introductory Example Administrivia Test I during class on 10 March. Bottom-Up Parsing Lecture 11-12 From slides by G. Necula & R. Bodik) 2/20/08 Prof. Hilfinger CS14 Lecture 11 1 2/20/08 Prof. Hilfinger CS14 Lecture 11 2 Bottom-Up

More information

arxiv: v1 [cs.ds] 9 Apr 2018

arxiv: v1 [cs.ds] 9 Apr 2018 From Regular Expression Matching to Parsing Philip Bille Technical University of Denmark phbi@dtu.dk Inge Li Gørtz Technical University of Denmark inge@dtu.dk arxiv:1804.02906v1 [cs.ds] 9 Apr 2018 Abstract

More information

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4

Syntax Analysis Part I. Position of a Parser in the Compiler Model. The Parser. Chapter 4 1 Syntax Analysis Part I Chapter 4 COP5621 Compiler Construction Copyright Robert van ngelen, Flora State University, 2007 Position of a Parser in the Compiler Model 2 Source Program Lexical Analyzer Lexical

More information

Theory of Computation

Theory of Computation Theory of Computation (Feodor F. Dragan) Department of Computer Science Kent State University Spring, 2018 Theory of Computation, Feodor F. Dragan, Kent State University 1 Before we go into details, what

More information

Finite-state machines (FSMs)

Finite-state machines (FSMs) Finite-state machines (FSMs) Dr. C. Constantinides Department of Computer Science and Software Engineering Concordia University Montreal, Canada January 10, 2017 1/19 Finite-state machines (FSMs) and state

More information

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26 Parsing Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Winter 2017/18 1 / 26 Table of contents 1 Context-Free Grammars 2 Simplifying CFGs Removing useless symbols Eliminating

More information

Chapter 5. Finite Automata

Chapter 5. Finite Automata Chapter 5 Finite Automata 5.1 Finite State Automata Capable of recognizing numerous symbol patterns, the class of regular languages Suitable for pattern-recognition type applications, such as the lexical

More information

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1

Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS Berkley University 1 Lecture VII Part 2: Syntactic Analysis Bottom-up Parsing: LR Parsing. Prof. Bodik CS 164 -- Berkley University 1 Bottom-Up Parsing Bottom-up parsing is more general than topdown parsing And just as efficient

More information

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing Introduction to Bottom-Up Parsing Outline Review LL parsing Shift-reduce parsing The LR parsing algorithm Constructing LR parsing tables Compiler Design 1 (2011) 2 Top-Down Parsing: Review Top-down parsing

More information

Lexical Analysis. Reinhard Wilhelm, Sebastian Hack, Mooly Sagiv Saarland University, Tel Aviv University.

Lexical Analysis. Reinhard Wilhelm, Sebastian Hack, Mooly Sagiv Saarland University, Tel Aviv University. Lexical Analysis Reinhard Wilhelm, Sebastian Hack, Mooly Sagiv Saarland University, Tel Aviv University http://compilers.cs.uni-saarland.de Compiler Construction Core Course 2017 Saarland University Today

More information

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis

Syntactical analysis. Syntactical analysis. Syntactical analysis. Syntactical analysis Context-free grammars Derivations Parse Trees Left-recursive grammars Top-down parsing non-recursive predictive parsers construction of parse tables Bottom-up parsing shift/reduce parsers LR parsers GLR

More information

Supervisory Control: Advanced Theory and Applications

Supervisory Control: Advanced Theory and Applications Supervisory Control: Advanced Theory and Applications Dr Rong Su S1-B1b-59, School of EEE Nanyang Technological University Tel: +65 6790-6042, Email: rsu@ntu.edu.sg EE6226, Discrete Event Systems 1 Introduction

More information

Course Runtime Verification

Course Runtime Verification Course Martin Leucker (ISP) Volker Stolz (Høgskolen i Bergen, NO) INF5140 / V17 Chapters of the Course Chapter 1 Recall in More Depth Chapter 2 Specification Languages on Words Chapter 3 LTL on Finite

More information

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing Outline Introduction to Bottom-Up Parsing Review LL parsing Shift-reduce parsing he LR parsing algorithm Constructing LR parsing tables Compiler Design 1 (2011) 2 op-down Parsing: Review op-down parsing

More information

Parsing -3. A View During TD Parsing

Parsing -3. A View During TD Parsing Parsing -3 Deterministic table-driven parsing techniques Pictorial view of TD and BU parsing BU (shift-reduce) Parsing Handle, viable prefix, items, closures, goto s LR(k): SLR(1), LR(1) Problems with

More information

Foundations of

Foundations of 91.304 Foundations of (Theoretical) Computer Science Chapter 1 Lecture Notes (Section 1.3: Regular Expressions) David Martin dm@cs.uml.edu d with some modifications by Prof. Karen Daniels, Spring 2012

More information

(b) If G=({S}, {a}, {S SS}, S) find the language generated by G. [8+8] 2. Convert the following grammar to Greibach Normal Form G = ({A1, A2, A3},

(b) If G=({S}, {a}, {S SS}, S) find the language generated by G. [8+8] 2. Convert the following grammar to Greibach Normal Form G = ({A1, A2, A3}, Code No: 07A50501 R07 Set No. 2 III B.Tech I Semester Examinations,MAY 2011 FORMAL LANGUAGES AND AUTOMATA THEORY Computer Science And Engineering Time: 3 hours Max Marks: 80 Answer any FIVE Questions All

More information

Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts

Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts Adapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts Domenico Cantone Simone Faro Emanuele Giaquinta Department of Mathematics and Computer Science, University of Catania, Italy 1 /

More information

This lecture covers Chapter 5 of HMU: Context-free Grammars

This lecture covers Chapter 5 of HMU: Context-free Grammars This lecture covers Chapter 5 of HMU: Context-free rammars (Context-free) rammars (Leftmost and Rightmost) Derivations Parse Trees An quivalence between Derivations and Parse Trees Ambiguity in rammars

More information

6.863J Natural Language Processing Lecture 2: Automata, Two-level phonology, & PC-Kimmo (the Hamlet lecture)

6.863J Natural Language Processing Lecture 2: Automata, Two-level phonology, & PC-Kimmo (the Hamlet lecture) 6.863J Natural Language Processing Lecture 2: Automata, Two-level phonology, & PC-Kimmo (the Hamlet lecture) Instructor: Robert C. Berwick berwick@ai.mit.edu The Menu Bar Administrivia web page: www.ai.mit.edu/courses/6.863/

More information

Finite State Transducers

Finite State Transducers Finite State Transducers Eric Gribkoff May 29, 2013 Original Slides by Thomas Hanneforth (Universitat Potsdam) Outline 1 Definition of Finite State Transducer 2 Examples of FSTs 3 Definition of Regular

More information

Two-phase Implementation of Morphological Analysis

Two-phase Implementation of Morphological Analysis Two-phase Implementation of Morphological Analysis Arvi Hurskainen Institute for Asian and African Studies, Box 59 FIN-00014 University of Helsinki, Finland arvi.hurskainen@helsinki.fi Abstract SALAMA

More information

Advanced Automata Theory 10 Transducers and Rational Relations

Advanced Automata Theory 10 Transducers and Rational Relations Advanced Automata Theory 10 Transducers and Rational Relations Frank Stephan Department of Computer Science Department of Mathematics National University of Singapore fstephan@comp.nus.edu.sg Advanced

More information

Alan Bundy. Automated Reasoning LTL Model Checking

Alan Bundy. Automated Reasoning LTL Model Checking Automated Reasoning LTL Model Checking Alan Bundy Lecture 9, page 1 Introduction So far we have looked at theorem proving Powerful, especially where good sets of rewrite rules or decision procedures have

More information

Introduction: Computer Science is a cluster of related scientific and engineering disciplines concerned with the study and application of computations. These disciplines range from the pure and basic scientific

More information

Splitting the structured paths in stratified graphs. Application in Natural Language Generation

Splitting the structured paths in stratified graphs. Application in Natural Language Generation DOI: 10.2478/auom-2014-0031 An. Şt. Univ. Ovidius Constanţa Vol. 22(2),2014, 57 67 Splitting the structured paths in stratified graphs. Application in Natural Language Generation Dana Dănciulescu and Mihaela

More information

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I

GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I GEETANJALI INSTITUTE OF TECHNICAL STUDIES, UDAIPUR I Internal Examination 2017-18 B.Tech III Year VI Semester Sub: Theory of Computation (6CS3A) Time: 1 Hour 30 min. Max Marks: 40 Note: Attempt all three

More information

Context-free Grammars and Languages

Context-free Grammars and Languages Context-free Grammars and Languages COMP 455 002, Spring 2019 Jim Anderson (modified by Nathan Otterness) 1 Context-free Grammars Context-free grammars provide another way to specify languages. Example:

More information

Finite Automata. Finite Automata

Finite Automata. Finite Automata Finite Automata Finite Automata Formal Specification of Languages Generators Grammars Context-free Regular Regular Expressions Recognizers Parsers, Push-down Automata Context Free Grammar Finite State

More information

5 Context-Free Languages

5 Context-Free Languages CA320: COMPUTABILITY AND COMPLEXITY 1 5 Context-Free Languages 5.1 Context-Free Grammars Context-Free Grammars Context-free languages are specified with a context-free grammar (CFG). Formally, a CFG G

More information

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing Introduction to Bottom-Up Parsing Outline Review LL parsing Shift-reduce parsing The LR parsing algorithm Constructing LR parsing tables 2 Top-Down Parsing: Review Top-down parsing expands a parse tree

More information

September 11, Second Part of Regular Expressions Equivalence with Finite Aut

September 11, Second Part of Regular Expressions Equivalence with Finite Aut Second Part of Regular Expressions Equivalence with Finite Automata September 11, 2013 Lemma 1.60 If a language is regular then it is specified by a regular expression Proof idea: For a given regular language

More information

Discrete Mathematics. CS204: Spring, Jong C. Park Computer Science Department KAIST

Discrete Mathematics. CS204: Spring, Jong C. Park Computer Science Department KAIST Discrete Mathematics CS204: Spring, 2008 Jong C. Park Computer Science Department KAIST Today s Topics Sequential Circuits and Finite-State Machines Finite-State Automata Languages and Grammars Nondeterministic

More information

Finite-State Machines (Automata) lecture 12

Finite-State Machines (Automata) lecture 12 Finite-State Machines (Automata) lecture 12 cl a simple form of computation used widely one way to find patterns 1 A current D B A B C D B C D A C next 2 Application Fields Industry real-time control,

More information

Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars

Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars Harvard CS 121 and CSCI E-207 Lecture 9: Regular Languages Wrap-Up, Context-Free Grammars Salil Vadhan October 2, 2012 Reading: Sipser, 2.1 (except Chomsky Normal Form). Algorithmic questions about regular

More information

Introduction to Language Theory and Compilation: Exercises. Session 2: Regular expressions

Introduction to Language Theory and Compilation: Exercises. Session 2: Regular expressions Introduction to Language Theory and Compilation: Exercises Session 2: Regular expressions Regular expressions (RE) Finite automata are an equivalent formalism to regular languages (for each regular language,

More information

Examples of Regular Expressions. Finite Automata vs. Regular Expressions. Example of Using flex. Application

Examples of Regular Expressions. Finite Automata vs. Regular Expressions. Example of Using flex. Application Examples of Regular Expressions 1. 0 10, L(0 10 ) = {w w contains exactly a single 1} 2. Σ 1Σ, L(Σ 1Σ ) = {w w contains at least one 1} 3. Σ 001Σ, L(Σ 001Σ ) = {w w contains the string 001 as a substring}

More information

Introduction to Bottom-Up Parsing

Introduction to Bottom-Up Parsing Outline Introduction to Bottom-Up Parsing Review LL parsing Shift-reduce parsing he LR parsing algorithm Constructing LR parsing tables 2 op-down Parsing: Review op-down parsing expands a parse tree from

More information

CPSC 421: Tutorial #1

CPSC 421: Tutorial #1 CPSC 421: Tutorial #1 October 14, 2016 Set Theory. 1. Let A be an arbitrary set, and let B = {x A : x / x}. That is, B contains all sets in A that do not contain themselves: For all y, ( ) y B if and only

More information

Theory of Finite Automata

Theory of Finite Automata According to the Oxford English Dictionary, an automaton (plural is automata ) is a machine whose responses to all possible inputs are specified in advance. For this course, the term finite automaton can

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

FINITE STATE AUTOMATA

FINITE STATE AUTOMATA FINITE STATE AUTOMATA States An FSA has a finite set of states A system has a limited number of configurations Examples {On, Off}, {1,2,3,4,,k} {TV channels} States can be graphically represented as follows:

More information

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata.

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata. Code No: R09220504 R09 Set No. 2 II B.Tech II Semester Examinations,December-January, 2011-2012 FORMAL LANGUAGES AND AUTOMATA THEORY Computer Science And Engineering Time: 3 hours Max Marks: 75 Answer

More information

Non-emptiness Testing for TMs

Non-emptiness Testing for TMs 180 5. Reducibility The proof of unsolvability of the halting problem is an example of a reduction: a way of converting problem A to problem B in such a way that a solution to problem B can be used to

More information

Tasks of lexer. CISC 5920: Compiler Construction Chapter 2 Lexical Analysis. Tokens and lexemes. Buffering

Tasks of lexer. CISC 5920: Compiler Construction Chapter 2 Lexical Analysis. Tokens and lexemes. Buffering Tasks of lexer CISC 5920: Compiler Construction Chapter 2 Lexical Analysis Arthur G. Werschulz Fordham University Department of Computer and Information Sciences Copyright Arthur G. Werschulz, 2017. All

More information

Synthesizing Asynchronous Burst-Mode Machines without the Fundamental-Mode Timing Assumption

Synthesizing Asynchronous Burst-Mode Machines without the Fundamental-Mode Timing Assumption Synthesizing synchronous urst-mode Machines without the Fundamental-Mode Timing ssumption Gennette Gill Montek Singh Univ. of North Carolina Chapel Hill, NC, US Contribution Synthesize robust asynchronous

More information

Pattern-Matching for Strings with Short Descriptions

Pattern-Matching for Strings with Short Descriptions Pattern-Matching for Strings with Short Descriptions Marek Karpinski marek@cs.uni-bonn.de Department of Computer Science, University of Bonn, 164 Römerstraße, 53117 Bonn, Germany Wojciech Rytter rytter@mimuw.edu.pl

More information

UNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r

UNIT-II. NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: SIGNIFICANCE. Use of ε-transitions. s t a r t. ε r. e g u l a r Syllabus R9 Regulation UNIT-II NONDETERMINISTIC FINITE AUTOMATA WITH ε TRANSITIONS: In the automata theory, a nondeterministic finite automaton (NFA) or nondeterministic finite state machine is a finite

More information

Transducers for bidirectional decoding of prefix codes

Transducers for bidirectional decoding of prefix codes Transducers for bidirectional decoding of prefix codes Laura Giambruno a,1, Sabrina Mantaci a,1 a Dipartimento di Matematica ed Applicazioni - Università di Palermo - Italy Abstract We construct a transducer

More information

Logarithmic space. Evgenij Thorstensen V18. Evgenij Thorstensen Logarithmic space V18 1 / 18

Logarithmic space. Evgenij Thorstensen V18. Evgenij Thorstensen Logarithmic space V18 1 / 18 Logarithmic space Evgenij Thorstensen V18 Evgenij Thorstensen Logarithmic space V18 1 / 18 Journey below Unlike for time, it makes sense to talk about sublinear space. This models computations on input

More information

Properties of Context-Free Languages. Closure Properties Decision Properties

Properties of Context-Free Languages. Closure Properties Decision Properties Properties of Context-Free Languages Closure Properties Decision Properties 1 Closure Properties of CFL s CFL s are closed under union, concatenation, and Kleene closure. Also, under reversal, homomorphisms

More information

A Note on Sequential Rule-Based POS Tagging

A Note on Sequential Rule-Based POS Tagging A Note on Sequential Rule-Based POS Tagging Sylvain Schmitz To cite this version: Sylvain Schmitz. A Note on Sequential Rule-Based POS Tagging. Andreas Maletti. FSMNLP, 2011, Blois, France. Association

More information

Einführung in die Computerlinguistik

Einführung in die Computerlinguistik Einführung in die Computerlinguistik Context-Free Grammars (CFG) Laura Kallmeyer Heinrich-Heine-Universität Düsseldorf Summer 2016 1 / 22 CFG (1) Example: Grammar G telescope : Productions: S NP VP NP

More information

Deterministic Finite Automaton (DFA)

Deterministic Finite Automaton (DFA) 1 Lecture Overview Deterministic Finite Automata (DFA) o accepting a string o defining a language Nondeterministic Finite Automata (NFA) o converting to DFA (subset construction) o constructed from a regular

More information

(NB. Pages are intended for those who need repeated study in formal languages) Length of a string. Formal languages. Substrings: Prefix, suffix.

(NB. Pages are intended for those who need repeated study in formal languages) Length of a string. Formal languages. Substrings: Prefix, suffix. (NB. Pages 22-40 are intended for those who need repeated study in formal languages) Length of a string Number of symbols in the string. Formal languages Basic concepts for symbols, strings and languages:

More information

A practical introduction to active automata learning

A practical introduction to active automata learning A practical introduction to active automata learning Bernhard Steffen, Falk Howar, Maik Merten TU Dortmund SFM2011 Maik Merten, learning technology 1 Overview Motivation Introduction to active automata

More information

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007.

Proofs, Strings, and Finite Automata. CS154 Chris Pollett Feb 5, 2007. Proofs, Strings, and Finite Automata CS154 Chris Pollett Feb 5, 2007. Outline Proofs and Proof Strategies Strings Finding proofs Example: For every graph G, the sum of the degrees of all the nodes in G

More information

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad-500 014 Subject: FORMAL LANGUAGES AND AUTOMATA THEORY Class : CSE II PART A (SHORT ANSWER QUESTIONS) UNIT- I 1 Explain transition diagram, transition

More information

Operational Semantics

Operational Semantics Operational Semantics Semantics and applications to verification Xavier Rival École Normale Supérieure Xavier Rival Operational Semantics 1 / 50 Program of this first lecture Operational semantics Mathematical

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write: Languages A language is a set (usually infinite) of strings, also known as sentences Each string consists of a sequence of symbols taken from some alphabet An alphabet, V, is a finite set of symbols, e.g.

More information

Maschinelle Sprachverarbeitung

Maschinelle Sprachverarbeitung Maschinelle Sprachverarbeitung Parsing with Probabilistic Context-Free Grammar Ulf Leser Content of this Lecture Phrase-Structure Parse Trees Probabilistic Context-Free Grammars Parsing with PCFG Other

More information

Theory of computation: initial remarks (Chapter 11)

Theory of computation: initial remarks (Chapter 11) Theory of computation: initial remarks (Chapter 11) For many purposes, computation is elegantly modeled with simple mathematical objects: Turing machines, finite automata, pushdown automata, and such.

More information

Multiword Expression Identification with Tree Substitution Grammars

Multiword Expression Identification with Tree Substitution Grammars Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic

More information

3. Complete the following table of equivalent values. Use binary numbers with a sign bit and 7 bits for the value

3. Complete the following table of equivalent values. Use binary numbers with a sign bit and 7 bits for the value EGC22 Digital Logic Fundamental Additional Practice Problems. Complete the following table of equivalent values. Binary. Octal 35.77 33.23.875 29.99 27 9 64 Hexadecimal B.3 D.FD B.4C 2. Calculate the following

More information

CS 530: Theory of Computation Based on Sipser (second edition): Notes on regular languages(version 1.1)

CS 530: Theory of Computation Based on Sipser (second edition): Notes on regular languages(version 1.1) CS 530: Theory of Computation Based on Sipser (second edition): Notes on regular languages(version 1.1) Definition 1 (Alphabet) A alphabet is a finite set of objects called symbols. Definition 2 (String)

More information

Markov chains and the number of occurrences of a word in a sequence ( , 11.1,2,4,6)

Markov chains and the number of occurrences of a word in a sequence ( , 11.1,2,4,6) Markov chains and the number of occurrences of a word in a sequence (4.5 4.9,.,2,4,6) Prof. Tesler Math 283 Fall 208 Prof. Tesler Markov Chains Math 283 / Fall 208 / 44 Locating overlapping occurrences

More information

Computational Models - Lecture 3

Computational Models - Lecture 3 Slides modified by Benny Chor, based on original slides by Maurice Herlihy, Brown University. p. 1 Computational Models - Lecture 3 Equivalence of regular expressions and regular languages (lukewarm leftover

More information

Robustness Analysis of Networked Systems

Robustness Analysis of Networked Systems Robustness Analysis of Networked Systems Roopsha Samanta The University of Texas at Austin Joint work with Jyotirmoy V. Deshmukh and Swarat Chaudhuri January, 0 Roopsha Samanta Robustness Analysis of Networked

More information

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for MARKOV CHAINS A finite state Markov chain is a sequence S 0,S 1,... of discrete cv s from a finite alphabet S where q 0 (s) is a pmf on S 0 and for n 1, Q(s s ) = Pr(S n =s S n 1 =s ) = Pr(S n =s S n 1

More information

A Computational Approach to Minimalism

A Computational Approach to Minimalism A Computational Approach to Minimalism Alain LECOMTE CLIPS-IMAG BP-53 38041 Grenoble, cedex 9, France Email: Alain.Lecomte@upmf-grenoble.fr Abstract The aim of this paper is to recast minimalist principles

More information

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar

CSC 4181Compiler Construction. Context-Free Grammars Using grammars in parsers. Parsing Process. Context-Free Grammar CSC 4181Compiler Construction Context-ree Grammars Using grammars in parsers CG 1 Parsing Process Call the scanner to get tokens Build a parse tree from the stream of tokens A parse tree shows the syntactic

More information

Chapter 3. Regular grammars

Chapter 3. Regular grammars Chapter 3 Regular grammars 59 3.1 Introduction Other view of the concept of language: not the formalization of the notion of effective procedure, but set of words satisfying a given set of rules Origin

More information

CSCI 2200 Foundations of Computer Science Spring 2018 Quiz 3 (May 2, 2018) SOLUTIONS

CSCI 2200 Foundations of Computer Science Spring 2018 Quiz 3 (May 2, 2018) SOLUTIONS CSCI 2200 Foundations of Computer Science Spring 2018 Quiz 3 (May 2, 2018) SOLUTIONS 1. [6 POINTS] For language L 1 = {0 n 1 m n, m 1, m n}, which string is in L 1? ANSWER: 0001111 is in L 1 (with n =

More information

Formal Definition of Computation. August 28, 2013

Formal Definition of Computation. August 28, 2013 August 28, 2013 Computation model The model of computation considered so far is the work performed by a finite automaton Finite automata were described informally, using state diagrams, and formally, as

More information

Pushdown Automata: Introduction (2)

Pushdown Automata: Introduction (2) Pushdown Automata: Introduction Pushdown automaton (PDA) M = (K, Σ, Γ,, s, A) where K is a set of states Σ is an input alphabet Γ is a set of stack symbols s K is the start state A K is a set of accepting

More information

Ukkonen's suffix tree construction algorithm

Ukkonen's suffix tree construction algorithm Ukkonen's suffix tree construction algorithm aba$ $ab aba$ 2 2 1 1 $ab a ba $ 3 $ $ab a ba $ $ $ 1 2 4 1 String Algorithms; Nov 15 2007 Motivation Yet another suffix tree construction algorithm... Why?

More information

COMPUTER SCIENCE TRIPOS

COMPUTER SCIENCE TRIPOS CST.2016.2.1 COMPUTER SCIENCE TRIPOS Part IA Tuesday 31 May 2016 1.30 to 4.30 COMPUTER SCIENCE Paper 2 Answer one question from each of Sections A, B and C, and two questions from Section D. Submit the

More information

CSE 311: Foundations of Computing. Lecture 23: Finite State Machine Minimization & NFAs

CSE 311: Foundations of Computing. Lecture 23: Finite State Machine Minimization & NFAs CSE : Foundations of Computing Lecture : Finite State Machine Minimization & NFAs State Minimization Many different FSMs (DFAs) for the same problem Take a given FSM and try to reduce its state set by

More information

Compiler Design 1. LR Parsing. Goutam Biswas. Lect 7

Compiler Design 1. LR Parsing. Goutam Biswas. Lect 7 Compiler Design 1 LR Parsing Compiler Design 2 LR(0) Parsing An LR(0) parser can take shift-reduce decisions entirely on the basis of the states of LR(0) automaton a of the grammar. Consider the following

More information

THEORY OF COMPUTATION

THEORY OF COMPUTATION Finite Automata and Regular Expressions: Formal Languages and Regular expressions, Deterministic and Non-Deterministic Finite Automata, Finite Automata with ε-moves, Equivalence of NFA and DFA, Minimization

More information

What is this course about?

What is this course about? What is this course about? Examining the power of an abstract machine What can this box of tricks do? What is this course about? Examining the power of an abstract machine Domains of discourse: automata

More information

Accelerated Natural Language Processing Lecture 3 Morphology and Finite State Machines; Edit Distance

Accelerated Natural Language Processing Lecture 3 Morphology and Finite State Machines; Edit Distance Accelerated Natural Language Processing Lecture 3 Morphology and Finite State Machines; Edit Distance Sharon Goldwater (based on slides by Philipp Koehn) 20 September 2018 Sharon Goldwater ANLP Lecture

More information

COMPUTER SCIENCE TRIPOS

COMPUTER SCIENCE TRIPOS CST.2014.2.1 COMPUTER SCIENCE TRIPOS Part IA Tuesday 3 June 2014 1.30 to 4.30 pm COMPUTER SCIENCE Paper 2 Answer one question from each of Sections A, B and C, and two questions from Section D. Submit

More information

Bottom-Up Syntax Analysis

Bottom-Up Syntax Analysis Bottom-Up Syntax Analysis Mooly Sagiv http://www.cs.tau.ac.il/~msagiv/courses/wcc13.html Textbook:Modern Compiler Design Chapter 2.2.5 (modified) 1 Pushdown automata Deterministic fficient Parsers Report

More information

Computational Models - Lecture 4

Computational Models - Lecture 4 Computational Models - Lecture 4 Regular languages: The Myhill-Nerode Theorem Context-free Grammars Chomsky Normal Form Pumping Lemma for context free languages Non context-free languages: Examples Push

More information

Theoretical Computer Science

Theoretical Computer Science Theoretical Computer Science Zdeněk Sawa Department of Computer Science, FEI, Technical University of Ostrava 17. listopadu 15, Ostrava-Poruba 708 33 Czech republic September 22, 2017 Z. Sawa (TU Ostrava)

More information

Compiler Construction Lent Term 2015 Lectures (of 16)

Compiler Construction Lent Term 2015 Lectures (of 16) Compiler Construction Lent Term 2015 Lectures 13 --- 16 (of 16) 1. Return to lexical analysis : application of Theory of Regular Languages and Finite Automata 2. Generating Recursive descent parsers 3.

More information

Finite State Automata and Simple Conceptual Graphs with Binary Conceptual Relations

Finite State Automata and Simple Conceptual Graphs with Binary Conceptual Relations Finite State Automata and Simple Conceptual Graphs with Binary Conceptual Relations Galia Angelova and Stoyan Mihov Institute for Parallel Processing, Bulgarian Academy of Sciences 25A Acad. G. Bonchev

More information

Enhancing Active Automata Learning by a User Log Based Metric

Enhancing Active Automata Learning by a User Log Based Metric Master Thesis Computing Science Radboud University Enhancing Active Automata Learning by a User Log Based Metric Author Petra van den Bos First Supervisor prof. dr. Frits W. Vaandrager Second Supervisor

More information

McCreight's suffix tree construction algorithm

McCreight's suffix tree construction algorithm McCreight's suffix tree construction algorithm b 2 $baa $ 5 $ $ba 6 3 a b 4 $ aab$ 1 Motivation Recall: the suffix tree is an extremely useful data structure with space usage and construction time in O(n).

More information

Lecture 4. Finite Automata and Safe State Machines (SSM) Daniel Kästner AbsInt GmbH 2012

Lecture 4. Finite Automata and Safe State Machines (SSM) Daniel Kästner AbsInt GmbH 2012 Lecture 4 Finite Automata and Safe State Machines (SSM) Daniel Kästner AbsInt GmbH 2012 Initialization Analysis 2 Is this node well initialized? node init1() returns (out: int) let out = 1 + pre( 1 ->

More information

Harvard CS 121 and CSCI E-207 Lecture 10: Ambiguity, Pushdown Automata

Harvard CS 121 and CSCI E-207 Lecture 10: Ambiguity, Pushdown Automata Harvard CS 121 and CSCI E-207 Lecture 10: Ambiguity, Pushdown Automata Salil Vadhan October 4, 2012 Reading: Sipser, 2.2. Another example of a CFG (with proof) L = {x {a, b} : x has the same # of a s and

More information