Peter Wood. Department of Computer Science and Information Systems Birkbeck, University of London Automata and Formal Languages

Similar documents
Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

Introduction to the Theory of Computation. Automata 1VO + 1PS. Lecturer: Dr. Ana Sokolova.

3515ICT: Theory of Computation. Regular languages

Foundations of Informatics: a Bridging Course

CS Lecture 29 P, NP, and NP-Completeness. k ) for all k. Fall The class P. The class NP

Automata Theory. Lecture on Discussion Course of CS120. Runzhe SJTU ACM CLASS

Pushdown Automata: Introduction (2)

Theory of Computation Lecture 1. Dr. Nahla Belal

CS 154, Lecture 2: Finite Automata, Closure Properties Nondeterminism,

NODIA AND COMPANY. GATE SOLVED PAPER Computer Science Engineering Theory of Computation. Copyright By NODIA & COMPANY

Einführung in die Computerlinguistik

Final exam study sheet for CS3719 Turing machines and decidability.

THEORY OF COMPUTATION (AUBER) EXAM CRIB SHEET

Nondeterministic Finite Automata

Mm7 Intro to distributed computing (jmp) Mm8 Backtracking, 2-player games, genetic algorithms (hps) Mm9 Complex Problems in Network Planning (JMP)

SYLLABUS. Introduction to Finite Automata, Central Concepts of Automata Theory. CHAPTER - 3 : REGULAR EXPRESSIONS AND LANGUAGES

On Rice s theorem. Hans Hüttel. October 2001

CSE 105 THEORY OF COMPUTATION. Spring 2018 review class

Finite Automata. Finite Automata

V Honors Theory of Computation

Undecidability COMS Ashley Montanaro 4 April Department of Computer Science, University of Bristol Bristol, UK

Problems, and How Computer Scientists Solve Them Manas Thakur

Complexity Theory. Knowledge Representation and Reasoning. November 2, 2005

Theory of Computation

Decidability (What, stuff is unsolvable?)

UNIT-VIII COMPUTABILITY THEORY

Chapter 6. Properties of Regular Languages

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

HKN CS/ECE 374 Midterm 1 Review. Nathan Bleier and Mahir Morshed

CPSC 421: Tutorial #1

COM364 Automata Theory Lecture Note 2 - Nondeterminism

Computational Models - Lecture 4

Pushdown automata. Twan van Laarhoven. Institute for Computing and Information Sciences Intelligent Systems Radboud University Nijmegen

Turing machines COMS Ashley Montanaro 21 March Department of Computer Science, University of Bristol Bristol, UK

CS 275 Automata and Formal Language Theory

1 Computational Problems

Introduction to Automata

FORMAL LANGUAGES, AUTOMATA AND COMPUTATION

CS 154 Introduction to Automata and Complexity Theory

Notes for Lecture Notes 2

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

Context Sensitive Grammar

The Pumping Lemma. for all n 0, u 1 v n u 2 L (i.e. u 1 u 2 L, u 1 vu 2 L [but we knew that anyway], u 1 vvu 2 L, u 1 vvvu 2 L, etc.

Introduction. Foundations of Computing Science. Pallab Dasgupta Professor, Dept. of Computer Sc & Engg INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR

Einführung in die Computerlinguistik

Chapter Five: Nondeterministic Finite Automata

CSE 105 THEORY OF COMPUTATION

DM17. Beregnelighed. Jacob Aae Mikkelsen

COMP/MATH 300 Topics for Spring 2017 June 5, Review and Regular Languages

Introduction to Turing Machines. Reading: Chapters 8 & 9

Part I: Definitions and Properties

Theory of computation: initial remarks (Chapter 11)

Inf2A: Converting from NFAs to DFAs and Closure Properties

What we have done so far

Theory of computation: initial remarks (Chapter 11)

Computational Models - Lecture 5 1

CS 154. Finite Automata, Nondeterminism, Regular Expressions

CS 301. Lecture 18 Decidable languages. Stephen Checkoway. April 2, 2018

Subsumption of concepts in FL 0 for (cyclic) terminologies with respect to descriptive semantics is PSPACE-complete.

Exam Computability and Complexity

CS Lecture 28 P, NP, and NP-Completeness. Fall 2008

Harvard CS 121 and CSCI E-207 Lecture 10: CFLs: PDAs, Closure Properties, and Non-CFLs

Introduction to Computers & Programming

Introduction to Theoretical Computer Science. Motivation. Automata = abstract computing devices

Embedded systems specification and design

Theory of Computation

Principles of Knowledge Representation and Reasoning

Harvard CS 121 and CSCI E-207 Lecture 10: Ambiguity, Pushdown Automata

CSE 105 Theory of Computation

CMP 309: Automata Theory, Computability and Formal Languages. Adapted from the work of Andrej Bogdanov

CA320 - Computability & Complexity

FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY

CS481F01 Solutions 8

Closure Properties of Regular Languages. Union, Intersection, Difference, Concatenation, Kleene Closure, Reversal, Homomorphism, Inverse Homomorphism

Parsing Regular Expressions and Regular Grammars

The Turing Machine. Computability. The Church-Turing Thesis (1936) Theory Hall of Fame. Theory Hall of Fame. Undecidability

St.MARTIN S ENGINEERING COLLEGE Dhulapally, Secunderabad

A representation of context-free grammars with the help of finite digraphs

Sri vidya college of engineering and technology

5 Context-Free Languages

3130CIT Theory of Computation

Complexity: Some examples

Deterministic Finite Automata. Non deterministic finite automata. Non-Deterministic Finite Automata (NFA) Non-Deterministic Finite Automata (NFA)

Automata: a short introduction

ECS 120: Theory of Computation UC Davis Phillip Rogaway February 16, Midterm Exam

Computational Models - Lecture 4 1

Formal Models in NLP

Parameterized Regular Expressions and Their Languages

Theory of Computation

Pushdown Automata. We have seen examples of context-free languages that are not regular, and hence can not be recognized by finite automata.

Undecidable Problems. Z. Sawa (TU Ostrava) Introd. to Theoretical Computer Science May 12, / 65

Discrete Mathematics. CS204: Spring, Jong C. Park Computer Science Department KAIST

Section 14.1 Computability then else

Non Determinism of Finite Automata

Turing Machines Part III

Automata and Languages

Computer Sciences Department

Introduction to Finite-State Automata

Solutions to Old Final Exams (For Fall 2007)

CS311 Computational Structures. NP-completeness. Lecture 18. Andrew P. Black Andrew Tolmach. Thursday, 2 December 2010

Transcription:

and and Department of Computer Science and Information Systems Birkbeck, University of London ptw@dcs.bbk.ac.uk

Outline and

Doing and analysing problems/languages computability/solvability/decidability is there an algorithm? computational complexity is it practical? expressive power are there things that cannot be expressed? formal languages provide well-studied models

given a finite alphabet (set) of symbols Σ e.g., Σ = {0, 1} a string is a sequence (concatenation) of symbols e.g., 0101 all finite strings over Σ are denoted by Σ e.g., Σ = {ɛ, 0, 1, 00, 01, 10, 11,...} language L over Σ is just a subset of Σ e.g., L 1 : strings with an even number of 1 s e.g., L 0 : strings representing valid Java programs (over an alphabet of all legal symbols in Java) are there finite representations for infinite languages? and

given a finite alphabet (set) of symbols Σ e.g., Σ = {0, 1} a string is a sequence (concatenation) of symbols e.g., 0101 all finite strings over Σ are denoted by Σ e.g., Σ = {ɛ, 0, 1, 00, 01, 10, 11,...} language L over Σ is just a subset of Σ e.g., L 1 : strings with an even number of 1 s e.g., L 0 : strings representing valid Java programs (over an alphabet of all legal symbols in Java) are there finite representations for infinite languages? yes, grammars (generative) and automata (recognition) and

and device (machine) for recognising (accepting) a language provide models of computation automaton comprises states and transitions between states automaton is given a string as input automaton M accepts a string w by halting in an accept/final state, when given w as input language L(M) accepted by automaton M is the set of all strings which M accepts

Types of and finite state automaton deterministic nondeterministic pushdown automaton linear-bounded automaton Turing machine

a Finite State Automaton L 1 (strings with an even number of 1 s) can be recognised by the following FSA 2 states s even and s odd 4 transitions seven is both the initial and final state 0 1 0 and s even s odd 1 FSA recognises 011:

a Finite State Automaton L 1 (strings with an even number of 1 s) can be recognised by the following FSA 2 states s even and s odd 4 transitions seven is both the initial and final state 0 1 0 and s even s odd 1 FSA recognises 011:

a Finite State Automaton L 1 (strings with an even number of 1 s) can be recognised by the following FSA 2 states s even and s odd 4 transitions seven is both the initial and final state 0 1 0 and s even s odd 1 FSA recognises 011: 0

a Finite State Automaton L 1 (strings with an even number of 1 s) can be recognised by the following FSA 2 states s even and s odd 4 transitions seven is both the initial and final state 0 1 0 and s even s odd 1 FSA recognises 011: 01

a Finite State Automaton L 1 (strings with an even number of 1 s) can be recognised by the following FSA 2 states s even and s odd 4 transitions seven is both the initial and final state 0 1 0 and s even s odd 1 FSA recognises 011: 011

and grammars generate languages using: symbols from alphabet Σ (called terminals) set N of nonterminals (one designated as starting) set P of productions, each of the form U V where U and V are (loosely) strings over Σ N a string (sequence of terminals) w is generated by G if there is a derivation of w using G, starting from the starting nonterminal of G language generated by grammar G, denoted L(G), is the set of strings which can be derived using G

Grammar Example L 1 (strings with an even number of 1 s) can be generated by a grammar with productions S ɛ S 0S S 1T T 0T T 1S and where S is the starting nonterminal

Grammar Example L 1 (strings with an even number of 1 s) can be generated by a grammar with productions S ɛ S 0S S 1T T 0T T 1S and where S is the starting nonterminal a derivation of 01010 is given by S 0S

Grammar Example L 1 (strings with an even number of 1 s) can be generated by a grammar with productions S ɛ S 0S S 1T T 0T T 1S and where S is the starting nonterminal a derivation of 01010 is given by S 0S 01T

Grammar Example L 1 (strings with an even number of 1 s) can be generated by a grammar with productions S ɛ S 0S S 1T T 0T T 1S and where S is the starting nonterminal a derivation of 01010 is given by S 0S 01T 010T

Grammar Example L 1 (strings with an even number of 1 s) can be generated by a grammar with productions S ɛ S 0S S 1T T 0T T 1S and where S is the starting nonterminal a derivation of 01010 is given by S 0S 01T 010T 0101S

Grammar Example L 1 (strings with an even number of 1 s) can be generated by a grammar with productions S ɛ S 0S S 1T T 0T T 1S and where S is the starting nonterminal a derivation of 01010 is given by S 0S 01T 010T 0101S 01010S

Grammar Example L 1 (strings with an even number of 1 s) can be generated by a grammar with productions S ɛ S 0S S 1T T 0T T 1S and where S is the starting nonterminal a derivation of 01010 is given by S 0S 01T 010T 0101S 01010S 01010

Uses of and to specify syntax of programming languages in natural language understanding in pattern recognition to specify schemas (types) for tree-structured data, e.g., XML, JSON in data compression...

Hierarchy of and and restrictions on productions give different types of grammars regular (type 3) context-free (type 2) context-sensitive (type 1) phrase-structure (type 0) for context-free, e.g., left side must be single nonterminal no restrictions for phrase-structure language is of type i iff there is a grammar of type i which generates it

Examples of Language Hierarchy and varying expressive power regular context-free context-sensitive phrase-structure

Examples of Language Hierarchy and varying expressive power regular context-free context-sensitive phrase-structure L 1 (strings over {0, 1} with an even number of 1 s) is regular

Examples of Language Hierarchy and varying expressive power regular context-free context-sensitive phrase-structure L 1 (strings over {0, 1} with an even number of 1 s) is regular L 2 = {0 n 1 n n 0} is context-free, but not regular

Examples of Language Hierarchy and varying expressive power regular context-free context-sensitive phrase-structure L 1 (strings over {0, 1} with an even number of 1 s) is regular L 2 = {0 n 1 n n 0} is context-free, but not regular L 3 = {ww w {0, 1} } is context-sensitive, but not context-free

Examples of Language Hierarchy and varying expressive power regular context-free context-sensitive phrase-structure L 1 (strings over {0, 1} with an even number of 1 s) is regular L 2 = {0 n 1 n n 0} is context-free, but not regular L 3 = {ww w {0, 1} } is context-sensitive, but not context-free there exists a phrase-structure (recursive) language which is not context-sensitive

Complexity of Grammar Problems and Problem Type 3 2 1 0 Is w L(G)? P P PSPACE U Is L(G) empty? P P U U Is L(G 1 ) L(G 2 )? PSPACE U U U P: decidable in polynomial time PSPACE: decidable in polynomial space (and complete for PSPACE: at least as hard as NP-complete) U: undecidable so type of grammar has significant effect on complexity

Relationships between and and A language is regular context-free context-sensitive phrase-structure iff accepted by finite-state pushdown linear-bounded Turing machine

and algebraic notation for denoting regular languages use (concatenation), (union) and (closure) operators L 1 denoted by RE 0 (0 1 0 1 0 ) given RE R, the set of strings it denotes is L(R) pattern matching in text query languages for XML or RDF

Using to Query Graphs Graphs (networks) are widely used for representing data social networks transportation and other networks geographical information semistructured data (e.g., XML and JSON) (hyper)document structure semantic associations in criminal investigations bibliographic citation analysis pathways in biological processes knowledge representation (e.g. semantic web) program analysis workflow systems data provenance... and

Using to Query Graphs and (my PhD thesis!) usually regular expressions used for string search consider data represented by a directed graph of labelled nodes and labelled edges regular expressions can express paths we are interested in sequence of edge labels rather than sequence of symbols (characters) a query using regular expression R can ask for all nodes connected by a path whose concatenation of edge labels is in L(R)

Graph G (where nodes represent people and places): citizenof a SA locatedin and b bornin CT livesin c bornin UK

and expression R = citizenof ((bornin livesin) locatedin ) asks for paths of edges between a person x and a place y such that x is a citizenof y, or x is bornin or livesin y, or x is bornin or livesin a place that is locatedin y

path query evaluation and REGULAR PATH PROBLEM Given graph G, pair of nodes x and y and regular expression R, is there a path from x to y satisfying R? algorithm: construct a nondeterministic finite automaton (NFA) M accepting L(R) assume M has initial state s0 and final state s f consider G as an NFA with initial state x and final state y form the intersection (or product ) I of G and M check if there is a path from (x, s0 ) to (y, s f ) Each step can be done in PTIME, so REGULAR PATH PROBLEM has PTIME complexity

and NFA M for R = citizenof ((bornin livesin) locatedin ) s 0 bornin livesin s 1 locatedin citizenof ɛ s f

Intersection of G and M and a, s 0 citizenof SA, s 1 ɛ SA, s f b, s 0 locatedin bornin CT, s 1 ɛ CT, s f livesin c, s 0 bornin UK, s 1 ɛ UK, s f

Intersection of G and M citizenof a, s 0 SA, s 1 ɛ SA, s f and b, s 0 locatedin bornin CT, s 1 ɛ CT, s f livesin c, s 0 bornin UK, s 1 ɛ UK, s f

Intersection of G and M and a, s 0 citizenof SA, s 1 ɛ SA, s f b, s 0 locatedin bornin CT, s 1 ɛ CT, s f livesin c, s 0 bornin UK, s 1 ɛ UK, s f

Intersection of G and M and a, s 0 citizenof SA, s 1 ɛ SA, s f locatedin bornin ɛ b, s 0 CT, s 1 CT, s f livesin c, s 0 bornin UK, s 1 ɛ UK, s f

Intersection of G and M and a, s 0 citizenof SA, s 1 ɛ SA, s f b, s 0 locatedin bornin CT, s 1 ɛ CT, s f livesin c, s 0 bornin UK, s 1 ɛ UK, s f

Intersection of G and M and a, s 0 citizenof SA, s 1 ɛ SA, s f b, s 0 locatedin bornin CT, s 1 ɛ CT, s f livesin bornin ɛ c, s 0 UK, s 1 UK, s f

simple path queries path p is simple if no node is repeated on p REGULAR SIMPLE PATH PROBLEM Given graph G, pair of nodes x and y and regular expression R, is there a simple path from x to y satisfying R? and

simple path queries and path p is simple if no node is repeated on p REGULAR SIMPLE PATH PROBLEM Given graph G, pair of nodes x and y and regular expression R, is there a simple path from x to y satisfying R? REGULAR SIMPLE PATH PROBLEM is NP-complete [Mendelzon & Wood (1989)]

simple path queries and path p is simple if no node is repeated on p REGULAR SIMPLE PATH PROBLEM Given graph G, pair of nodes x and y and regular expression R, is there a simple path from x to y satisfying R? REGULAR SIMPLE PATH PROBLEM is NP-complete [Mendelzon & Wood (1989)] there can be a path from x to y satisfying R but no simple path satisfying R, e.g., R = (c d) a c d b

Approaches to deal with this problem what causes the problem? and

Approaches to deal with this problem what causes the problem? the presence of cycles and

Approaches to deal with this problem what causes the problem? the presence of cycles obvious first step is to consider graphs without cycles DAGs and

Approaches to deal with this problem what causes the problem? the presence of cycles obvious first step is to consider graphs without cycles DAGs then might look at restricted forms of REs we looked at those corresponding to languages closed under abbreviations and

Approaches to deal with this problem what causes the problem? the presence of cycles obvious first step is to consider graphs without cycles DAGs then might look at restricted forms of REs we looked at those corresponding to languages closed under abbreviations then one might consider a combination of graphs and REs we looked at graphs whose cycle structure does not conflict with the RE and

Approaches to deal with this problem what causes the problem? the presence of cycles obvious first step is to consider graphs without cycles DAGs then might look at restricted forms of REs we looked at those corresponding to languages closed under abbreviations then one might consider a combination of graphs and REs we looked at graphs whose cycle structure does not conflict with the RE finally showed that conflict-freedom is a generalisation: no RE conflicts with any DAG an RE closed under abbreviations never conflicts with any graph and

Other approaches and in general, may also run experiments to measure actual running times

Other approaches and in general, may also run experiments to measure actual running times may also develop approximation algorithms can sometimes find a PTIME algorithm with a performance guarantee (e.g. for TSP, finds a tour at most twice the optimal distance) other times this problem itself is NP-hard

Other approaches and in general, may also run experiments to measure actual running times may also develop approximation algorithms can sometimes find a PTIME algorithm with a performance guarantee (e.g. for TSP, finds a tour at most twice the optimal distance) other times this problem itself is NP-hard use heuristic approaches

and is my system/language more powerful than others? is my system/language more efficient than others? expressive power or computational complexity can be studied by relating them to formal language theory: languages, grammars, automata,... tradeoff between expressive power and computational complexity consider restrictions of difficult problems or giving up exact solutions

References and Aho, Hopcroft and Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, 1974 Garey and Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W. H. Freeman and Company, 1979 Lewis and Papadimitriou, Elements of the Theory of Computation, Prentice-Hall, 1981 Mendelzon and Wood, Finding Simple Paths in Graph Databases, SIAM J. Computing, Vol. 24. No. 6, 1995, pp. 1235 1258 Sipser, Introduction to the Theory of Computation, PWS Publishing Company, 1997