Language-Processing Problems. Roland Backhouse DIMACS, 8th July, 2003

Similar documents
Applications of Regular Algebra to Language Theory Problems. Roland Backhouse February 2001

Galois Connections. Roland Backhouse 3rd December, 2002

Introduction to Kleene Algebras

Fusion on Languages. Roland Backhouse. University of Nottingham

Automata Theory and Formal Grammars: Lecture 1

Regular algebra applied to language problems

Chapter 0 Introduction. Fourth Academic Year/ Elective Course Electrical Engineering Department College of Engineering University of Salahaddin

Deterministic Finite Automata. Non deterministic finite automata. Non-Deterministic Finite Automata (NFA) Non-Deterministic Finite Automata (NFA)

Equational Theory of Kleene Algebra

Finite Presentations of Pregroups and the Identity Problem

Dynamic Programming. Shuang Zhao. Microsoft Research Asia September 5, Dynamic Programming. Shuang Zhao. Outline. Introduction.

What is this course about?

Peter Wood. Department of Computer Science and Information Systems Birkbeck, University of London Automata and Formal Languages

Definition: A binary relation R from a set A to a set B is a subset R A B. Example:

1. (a) Explain the procedure to convert Context Free Grammar to Push Down Automata.

Axioms of Kleene Algebra

Universität Augsburg

Mathematical Preliminaries. Sipser pages 1-28

Theory of Computer Science

The Turing Machine. Computability. The Church-Turing Thesis (1936) Theory Hall of Fame. Theory Hall of Fame. Undecidability

Introduction. Foundations of Computing Science. Pallab Dasgupta Professor, Dept. of Computer Sc & Engg INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Einführung in die Computerlinguistik

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

cse303 ELEMENTS OF THE THEORY OF COMPUTATION Professor Anita Wasilewska

C6.2 Push-Down Automata

An Overview of Residuated Kleene Algebras and Lattices Peter Jipsen Chapman University, California. 2. Background: Semirings and Kleene algebras

Problem 1: Suppose A, B, C and D are finite sets such that A B = C D and C = D. Prove or disprove: A = B.

C1.1 Introduction. Theory of Computer Science. Theory of Computer Science. C1.1 Introduction. C1.2 Alphabets and Formal Languages. C1.

1. Draw a parse tree for the following derivation: S C A C C A b b b b A b b b b B b b b b a A a a b b b b a b a a b b 2. Show on your parse tree u,

CHAPTER 1. Relations. 1. Relations and Their Properties. Discussion

Context Free Languages. Automata Theory and Formal Grammars: Lecture 6. Languages That Are Not Regular. Non-Regular Languages

What we have done so far

Chapter 1. Formal Definition and View. Lecture Formal Pushdown Automata on the 28th April 2009

3515ICT: Theory of Computation. Regular languages

Section Summary. Relations and Functions Properties of Relations. Combining Relations

EXAM. CS331 Compiler Design Spring Please read all instructions, including these, carefully

AC68 FINITE AUTOMATA & FORMULA LANGUAGES DEC 2013

Computational Models - Lecture 4

Plan for 2 nd half. Just when you thought it was safe. Just when you thought it was safe. Theory Hall of Fame. Chomsky Normal Form

NP-Complete problems

Non-context-Free Languages. CS215, Lecture 5 c

Primitive recursive functions: decidability problems

Finite State Machines Transducers Markov Models Hidden Markov Models Büchi Automata

5 Context-Free Languages

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Symposium on Theoretical Aspects of Computer Science 2008 (Bordeaux), pp

60-354, Theory of Computation Fall Asish Mukhopadhyay School of Computer Science University of Windsor

1. (12 points) Give a table-driven parse table for the following grammar: 1) X ax 2) X P 3) P DT 4) T P 5) T ε 7) D 1

Notes for Comp 497 (Comp 454) Week 10 4/5/05

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

On Fixed Point Equations over Commutative Semirings

Parsing. Context-Free Grammars (CFG) Laura Kallmeyer. Winter 2017/18. Heinrich-Heine-Universität Düsseldorf 1 / 26

Music as a Formal Language

Decidable and undecidable languages

Lecture #14: NP-Completeness (Chapter 34 Old Edition Chapter 36) Discussion here is from the old edition.

CS311 Computational Structures. NP-completeness. Lecture 18. Andrew P. Black Andrew Tolmach. Thursday, 2 December 2010

CS 373: Theory of Computation. Fall 2010

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

Theory of Computation

MA/CSSE 474 Theory of Computation

Automata Theory - Quiz II (Solutions)

Chapter 3. Regular grammars

Summer School on Introduction to Algorithms and Optimization Techniques July 4-12, 2017 Organized by ACMU, ISI and IEEE CEDA.

Non-emptiness Testing for TMs

Pushdown Automata. Reading: Chapter 6

Lexical Analysis. Reinhard Wilhelm, Sebastian Hack, Mooly Sagiv Saarland University, Tel Aviv University.

NP Completeness and Approximation Algorithms

CDM Parsing and Decidability

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Computation Histories

A Weak Bisimulation for Weighted Automata

Embedded systems specification and design

Theory of Computation

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

UNIT-I. Strings, Alphabets, Language and Operations

BASIC MATHEMATICAL TECHNIQUES

XMA2C011, Annual Examination 2012: Worked Solutions

Formal Definition of a Finite Automaton. August 26, 2013

Languages. Languages. An Example Grammar. Grammars. Suppose we have an alphabet V. Then we can write:

Classes of Boolean Functions

This lecture covers Chapter 7 of HMU: Properties of CFLs

6.1 The Pumping Lemma for CFLs 6.2 Intersections and Complements of CFLs

X-MA2C01-1: Partial Worked Solutions

1 More finite deterministic automata

Properties of Context-Free Languages. Closure Properties Decision Properties

FLAC Context-Free Grammars

Computational Models Lecture 2 1

3130CIT Theory of Computation

A parsing technique for TRG languages

Fall, 2017 CIS 262. Automata, Computability and Complexity Jean Gallier Solutions of the Practice Final Exam

3 Propositional Logic

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2017

Pushdown Automata. Notes on Automata and Theory of Computation. Chia-Ping Chen

CYK Algorithm for Parsing General Context-Free Grammars

Computational Models Lecture 2 1

NP-Completeness and Boolean Satisfiability

Automata Theory CS F-08 Context-Free Grammars

Automata and Languages

Notes for Comp 497 (454) Week 10

1.A Sets, Relations, Graphs, and Functions 1.A.1 Set a collection of objects(element) Let A be a set and a be an elements in A, then we write a A.

Transcription:

1 Language-Processing Problems Roland Backhouse DIMACS, 8th July, 2003

Introduction 2 Factors and the factor matrix were introduced by Conway (1971). He used them very effectively in, for example, constructing biregulators. Conway s discussion is wordy, making it difficult to understand. There are also occasional errors which are difficult to detect and add to the confusion. ( The theorem does prevent E from occurring twice should read The theorem does not prevent E from occurring twice. )

KMP Failure Function (pattern aabaa) 3 node i 1 2 3 4 5 failure node f(i) 0 1 0 1 2 Factor Graph (language Σ aabaa) ε ε a b a b a a 0 1 2 3 4 5 ε ε ε

Language Problems 4 S ::= ass ε. Is-empty S = φ ({a} = φ S = φ S = φ) {ε} = φ. Nullable Shortest word length ε S (ε {a} ε S ε S) ε {ε}. #S = (#a + #S + #S) #ε. Non-Example aa S (aa {a} aa S aa S) aa {ε}.

Fusion 5 Many problems are expressed in the form evaluate generate where generate generates a (possibly infinite) candidate set of solutions, and evaluate selects a best solution. Examples: shortest path, (x ) L. Solution method is to fuse the generation and evaluation processes, eliminating the need to generate all candidate solutions.

Conditions for Fusion 6 Fusion is made possible when evaluate is an adjoint in a Galois connection, generate is expressed as a fixed point. Algorithms for solving resulting fixed point equation include brute-force iteration, Knuth s generalisation of Dijkstra s shortest path algorithm.. Solution method typically involves generalising the problem.

Galois Connections 7 Suppose A = (A, ) and B = (B, ) are partially ordered sets and suppose F A B and G B A. Then (F, G) is a Galois connection of A and B iff, for all x B and y A, F(x) y x G(y). Examples Negation: p q p q. Ceiling function: x n x n. Maximum: x y z x z y z. Even (divisible by two): if b 2 b 1 fi \ m b even(m).

Parsing 8 x S b S if b Σ b Σ {x} fi. Shortest Word (Path) Let Σ k denote the set of all words over alphabet Σ of length at least k. Let #S denote the length of a shortest word in the language S. #S k S Σ k. (Most common application is when S is the set of paths from one node to another in a graph.)

Fusion Theorem 9 provided that F(µ g) = µ h F is a lower adjoint in a Galois connection of and (see brief summary of definition below) F g = h F. Galois Connection F(x) y x G(y). F is called the lower adjoint and G the upper adjoint.

Language Recognition 10 Problem: For given word x and grammar G, determine x L(G). That is, implement (x ) L. Language L(G) is the least fixed point (with respect to the subset relation) of a monotonic function. (x ) is the lower adjoint in a Galois connection of languages (ordered by the subset relation) and booleans (ordered by implication). (Recall, x S b S if b Σ b Σ {x} fi.)

Nullable Languages 11 Problem: For given grammar G, determine ε L(G). (ε ) L Solution: Easily expressed as a fixed point computation. Works because: The function (x ) is a lower adjoint in a Galois connection (for all x, but in particular for x = ε). For all languages S and T, ε S T ε S ε T.

Problem Generalisation 12 Problem: For given grammar G, determine whether all words in L(G) have even length. I.e. implement alleven L. The function alleven is a lower adjoint in a Galois connection. Specifically, for all languages S and T, alleven(s) b S if b Σ b (Σ Σ) fi. Nevertheless, fusion doesn t work (directly) because there is no such that, for all languages S and T, alleven(s T) alleven(s) alleven(t). Solution: Generalise by tupling: compute simultaneously alleven and allodd.

General Context-Free Parsing 13 Problem: For given grammar G, determine x L(G). (x ) L. Not (in general) expressible as a fixed point computation. Fusion fails because: for all x, x ε, there is no such that, for all languages S and T, x S T (x S) (x T). CYK: Let F(S) denote the relation i, j:: x[i..j) S. Works because: The function F is a lower adjoint. For all languages S and T, F(S T) = F(S) F(T) where B C denotes the composition of relations B and C.

Language Inclusion 14 Problem: For fixed (regular) language E and varying S, determine S E. Example: Emptiness test: S φ. Example: Pattern Matching: given pattern P, for each prefix t of text T, evaluate: {t} Σ {P}. Example: All words are of even length: S (Σ Σ).

Language Inclusion 15 Problem: For fixed (regular) language E and varying S, determine S E. Function ( E) is a lower adjoint. Specifically, S E b S if b E b Σ fi. But, for E φ and E Σ, there is no such that, for all languages S and T, S T E (S E) (T E). Solution (Oege de Moor): Use factor theory to derive generalisation.

Factors 16 For all languages S, T and U, S T U T S\U, S T U S U/T. Note: Hence, write S\(U/T) = (S\U)/T. S\U/T.

Left and Right Factors 17 Define the functions and by X = E/X, X = X\E. By definition, the range of is the set of left factors of E and the range of is the set of right factors of E. We also have the Galois connection: X Y Y X. Hence, X = X, X = X, E = E = E.

The Factor Matrix 18 Let L denote the set of left factors of E. Define the factor matrix of E to be the binary operator \ restricted to L L. Thus entries in the matrix take the form L 0 \L 1 where L 0 and L 1 are left factors of E. The factor matrix of E is denoted by [E]. It is a reflexive, transitive matrix. [E] = [E]. The row and column containing individual factors, the left factors, the right factors, and E itself, is given by: U\E/V = U \ V, V = E \ V, U = U \ E, E = E \ E.

Using the Factor Matrix 19 Problem: For fixed regular language E and varying S, determine S E. Generalisation: For fixed regular language E and varying S, determine the relation S [E]. (Formally, the relation L, M:: S L\M where L and M range over the left factors of E.) Works because: S T [E] (S [E]) (T [E]). where B C denotes the composition of relations B and C.

Proof 20 We have to show that S T U \ W V :: S U \ V T V \ W. First, S T E = { unit of conjunction } S T E true = { factors, T = E/T; cancellation } S T T T E = { factors, T = T \ E } S T T T. Whence:

S T U \ W 21 = { factors, definition of W } U S T W E = { above, with S,T := U S, T W } U S (T W) T W (T W) = { factors } S U \ (T W) T (T W) / W = { U /W = U \ W } S U \ (T W) T (T W) \ W. { one-point rule } V :: S U \ V T V \ W { Leibniz } V :: S T U \ V V \ W { cancellation, } S T U \ W.

Summary 22 Use of fusion as programming method. Problem generalisation involves generalising the algebra in the solution domain. Factor theory as basis for language inclusion problems. Challenges Efficient computation of factor matrices. Extension to non-regular languages.

References 23 J.H. Conway, Regular Algebra and Finite Machines, Chapman and Hall, London, 1971. Backhouse, R.C and Carré, B.A. Regular algebra applied to path-finding problems, J. Institute of Mathematics and its Applications, vol. 15, pp. 161 186, 1975. Backhouse, R.C. and Lutz, R.K., Factor graphs, failure functions and bi-trees, Automata, Languages and Programming, LNCS 52, pp. 61 75, 1977. Roland Backhouse, Fusion on Languages, 10th European Symposium on Programming, LNCS 2028, pp. 107 121, 2001. O. de Moor, S. Drape, D. Lacey and G. Sittampalam. Incremental program analysis via language factors. (www.comlab.ox.ac.uk/oucl/work/oege.demoor/pubs.htm) For related publications on fixed points, Galois connections and mathematics of program construction, see www.cs.nott.ac.uk/~rcb/papers