Regular Languages and Applications

Similar documents
Prefix-Free Regular-Expression Matching

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

Harvard University Computer Science 121 Midterm October 23, 2012

Formal Languages and Automata

Theory of Computation Regular Languages

Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

Minimal DFA. minimal DFA for L starting from any other

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

NFAs continued, Closure Properties of Regular Languages

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

Designing finite automata II

Myhill-Nerode Theorem

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

Automata and Languages

Assignment 1 Automata, Languages, and Computability. 1 Finite State Automata and Regular Languages

Lecture 09: Myhill-Nerode Theorem

Regular expressions, Finite Automata, transition graphs are all the same!!

Lexical Analysis Finite Automate

Lecture 08: Feb. 08, 2019

CHAPTER 1 Regular Languages. Contents

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Formal languages, automata, and theory of computation

NFAs continued, Closure Properties of Regular Languages

Automata Theory 101. Introduction. Outline. Introduction Finite Automata Regular Expressions ω-automata. Ralf Huuck.

Nondeterminism and Nodeterministic Automata

Finite-State Automata: Recap

Closure Properties of Regular Languages

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-*

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

Finite Automata-cont d

1 Nondeterministic Finite Automata

State Minimization for DFAs

Converting Regular Expressions to Discrete Finite Automata: A Tutorial

CHAPTER 1 Regular Languages. Contents. definitions, examples, designing, regular operations. Non-deterministic Finite Automata (NFA)

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

CS 301. Lecture 04 Regular Expressions. Stephen Checkoway. January 29, 2018

Chapter 2 Finite Automata

Homework 3 Solutions

Convert the NFA into DFA

More on automata. Michael George. March 24 April 7, 2014

Intermediate Math Circles Wednesday, November 14, 2018 Finite Automata II. Nickolas Rollick a b b. a b 4

3 Regular expressions

CMSC 330: Organization of Programming Languages

11.1 Finite Automata. CS125 Lecture 11 Fall Motivation: TMs without a tape: maybe we can at least fully understand such a simple model?

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers. Mehryar Mohri Courant Institute and Google Research

CS 373, Spring Solutions to Mock midterm 1 (Based on first midterm in CS 273, Fall 2008.)

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

1.4 Nonregular Languages

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

Java II Finite Automata I

CS103B Handout 18 Winter 2007 February 28, 2007 Finite Automata

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. NFA for (a b)*abb.

Exercises Chapter 1. Exercise 1.1. Let Σ be an alphabet. Prove wv = w + v for all strings w and v.

First Midterm Examination

1 From NFA to regular expression

CISC 4090 Theory of Computation

Coalgebra, Lecture 15: Equations for Deterministic Automata

a,b a 1 a 2 a 3 a,b 1 a,b a,b 2 3 a,b a,b a 2 a,b CS Determinisitic Finite Automata 1

Deterministic Finite Automata

Types of Finite Automata. CMSC 330: Organization of Programming Languages. Comparing DFAs and NFAs. Comparing DFAs and NFAs (cont.) Finite Automata 2

NFAs and Regular Expressions. NFA-ε, continued. Recall. Last class: Today: Fun:

CSCI 340: Computational Models. Transition Graphs. Department of Computer Science

CS 330 Formal Methods and Models

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS

Non-deterministic Finite Automata

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2014

Non-deterministic Finite Automata

Non-Deterministic Finite Automata. Fall 2018 Costas Busch - RPI 1

12.1 Nondeterminism Nondeterministic Finite Automata. a a b ε. CS125 Lecture 12 Fall 2016

Let's start with an example:

CS 275 Automata and Formal Language Theory

CS 330 Formal Methods and Models Dana Richards, George Mason University, Spring 2016 Quiz Solutions

FABER Formal Languages, Automata and Models of Computation

Some Theory of Computation Exercises Week 1

Overview HC9. Parsing: Top-Down & LL(1) Context-Free Grammars (1) Introduction. CFGs (3) Context-Free Grammars (2) Vertalerbouw HC 9: Ch.

Nondeterministic Biautomata and Their Descriptional Complexity

CS 275 Automata and Formal Language Theory

Tutorial Automata and formal Languages

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Finite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh

GNFA GNFA GNFA GNFA GNFA

The University of Nottingham SCHOOL OF COMPUTER SCIENCE A LEVEL 2 MODULE, SPRING SEMESTER LANGUAGES AND COMPUTATION ANSWERS

Name Ima Sample ASU ID

First Midterm Examination

Homework 4. 0 ε 0. (00) ε 0 ε 0 (00) (11) CS 341: Foundations of Computer Science II Prof. Marvin Nakayama

NFA DFA Example 3 CMSC 330: Organization of Programming Languages. Equivalence of DFAs and NFAs. Equivalence of DFAs and NFAs (cont.

DFA Minimization and Applications

Thoery of Automata CS402

1.3 Regular Expressions

ɛ-closure, Kleene s Theorem,

Fundamentals of Computer Science

Deterministic Finite-State Automata

Lecture 9: LTL and Büchi Automata

CS 330 Formal Methods and Models

Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem. Kleene s Theorem 2/16/15

Table of contents: Lecture N Summary... 3 What does automata mean?... 3 Introduction to languages... 3 Alphabets... 3 Strings...

CS375: Logic and Theory of Computing

Transcription:

Regulr Lnguges nd Applictions Yo-Su Hn Deprtment of Computer Science Yonsei University 1-1 SNU 4/14

Regulr Lnguges An old nd well-known topic in CS Kleene Theorem in 1959 FA (finite-stte utomton) constructions: Thompson utomt, position utomt in 1960s Pttern Mtching Prolem in 1970s... in 1980s REVISIT Stte Complexity, Prime Decomposition, Pttern Mtching since mid 1990s 2-1 SNU 4/14

Regulr Lnguges An old nd well-known topic in CS Kleene Theorem in 1959 FA (finite-stte utomton) constructions: Thompson utomt, position utomt in 1960s Pttern Mtching Prolem in 1970s... in 1980s REVISIT Stte Complexity, Prime Decomposition, Pttern Mtching since mid 1990s XML, Bioinformtics - New Applictions 2-2 SNU 4/14

Overview Bsic Notions Position Construction nd XML DTD Regulr-Expression Pttern Mtching Stte Complexity Future Directions nd Conclusions 3-1 SNU 4/14

Regulr Expressions Regulr expressions re very convenient form tht represents (infinite) sets of strings clled regulr sets. Given finite lphet Σ, regulr expression over Σ is defined recursively s follows: 1., the empty-set symol, is regulr expression. 2. λ, the empty-string symol, is regulr expression. 3. Σ is regulr expression. 4. E +F (union), where E nd F re regulr expressions, is regulr expression. 5. E F (ctention), where E nd F re regulr expressions, is regulr expression. 6. E (Kleene str), where E is regulr expression, is regulr expression. 4-1 SNU 4/14

Finite-stte Automt (FAs) A finite-stte utomton A is specified y tuple (Q, Σ, δ, s, F ); Q finite set of sttes Σ finite lphet δ(p, ) = q set of trnsition rules s Q the strt stte F Q set of finl sttes 5-1 SNU 4/14

Finite-stte Automt - exmple q 4 q 5 q 1 q 2 q 3 q 9 q 6 q 7 q 8 q 10 s = q 1 F = {q 9, q 10 } 6-1 SNU 4/14

Finite-stte Automt - exmple q 4 q 5 q 1 q 2 q 3 q 9 q 6 q 7 q 8 q 10 s = q 1 F = {q 9, q 10 } T = 6-2 SNU 4/14

Finite-stte Automt - exmple q 4 q 5 q 1 q 2 q 3 q 9 q 6 q 7 q 8 q 10 s = q 1 F = {q 9, q 10 } T = Q = {q 2 } 6-3 SNU 4/14

Finite-stte Automt - exmple q 4 q 5 q 1 q 2 q 3 q 9 q 6 q 7 q 8 q 10 s = q 1 F = {q 9, q 10 } T = Q = {q 2 } 6-4 SNU 4/14

Finite-stte Automt - exmple q 4 q 5 q 1 q 2 q 3 q 9 q 6 q 7 q 8 q 10 s = q 1 F = {q 9, q 10 } T = 6-5 SNU 4/14

Finite-stte Automt - exmple q 4 q 5 q 1 q 2 q 3 q 9 q 6 q 7 q 8 q 10 s = q 1 F = {q 9, q 10 } T = Q = {q 3 } 6-6 SNU 4/14

Finite-stte Automt - exmple q 4 q 5 q 1 q 2 q 3 q 9 q 6 q 7 q 8 q 10 s = q 1 F = {q 9, q 10 } T = Q = {q 4, q 6 } 6-7 SNU 4/14

Finite-stte Automt - exmple q 4 q 5 q 1 q 2 q 3 q 9 q 6 q 7 q 8 q 10 s = q 1 F = {q 9, q 10 } T = Q = {q 7 } 6-8 SNU 4/14

Finite-stte Automt - exmple q 4 q 5 q 1 q 2 q 3 q 9 q 6 q 7 q 8 q 10 s = q 1 F = {q 9, q 10 } T = Q = {q 8, q 9 } 6-9 SNU 4/14

Finite-stte Automt - exmple q 4 q 5 q 1 q 2 q 3 q 9 q 6 q 7 q 8 q 10 s = q 1 F = {q 9, q 10 } T = Q = {q 9 } 6-10 SNU 4/14

Finite-stte Automt - exmple q 4 q 5 q 1 q 2 q 3 q 9 q 6 q 7 q 8 q 10 s = q 1 F = {q 9, q 10 } T = ccepted!! Q = {q 9 } F 6-11 SNU 4/14

Finite-stte Automt - exmple q 4 q 5 q 1 q 2 q 3 q 9 q 6 q 7 q 8 q 10 s = q 1 F = {q 9, q 10 } L = L( ( + ( + ))) 6-12 SNU 4/14

REs into Finite-stte Automt The well-known Thompson construction y Ken Thompson in 1968. E = λ E = E = λ M(E 1 ) λ λ M(E 2 ) λ E 1 + E 2 λ M(E 1 ) M(E 2 ) E 1 E 2 λ M(E) λ λ E λ 7-1 SNU 4/14

REs into Finite-stte Automt The well-known Thompson construction y Ken Thompson in 1968. E = λ E = E = esy to understnd nd uild-up too mny λ trnsitions λ M(E 1 ) λ λ M(E 2 ) λ E 1 + E 2 λ M(E 1 ) M(E 2 ) E 1 E 2 λ M(E) λ λ E λ 7-2 SNU 4/14

Position Automt - nother utomton construction Proposed y Glushkov nd McNughton nd Ymd in 1960 independently. The construction is sed on the positions of chrcters of given regulr expression. 8-1 SNU 4/14

Position Automt - n exmple E = ( + ) c( + ) E = (1 + 2) 3(4 + 5) 9-1 SNU 4/14

Position Automt - n exmple E = ( + ) c( + ) E = (1 + 2) 3(4 + 5) 0 0 9-2 SNU 4/14

Position Automt - n exmple E = ( + ) c( + ) E = (1 + 2) 3(4 + 5) 0 1 4 0 3 2 5 9-3 SNU 4/14

Position Automt - n exmple E = ( + ) c( + ) E = (1 + 2) 3(4 + 5) 0 1 4 0 3 2 5 9-4 SNU 4/14

Position Automt - n exmple E = ( + ) c( + ) E = (1 + 2) 3(4 + 5) 0 0 1 1 3 3 4 2 2 5 9-5 SNU 4/14

Position Automt - n exmple E = ( + ) c( + ) E = (1 + 2) 3(4 + 5) 1 0 0 1 2 1 1 2 2 3 3 3 3 4 5 4 5 2 9-6 SNU 4/14

Position Automt - n exmple E = ( + ) c( + ) E = (1 + 2) 3(4 + 5) 1 0 0 1 2 1 1 2 2 3 3 3 3 4 5 4 5 2 9-7 SNU 4/14

Position Automt - n exmple E = ( + ) c( + ) E = (1 + 2) 3(4 + 5) 1 0 0 1 2 1 1 2 2 3 3 3 3 4 5 4 5 2 9-8 SNU 4/14

Position Automt - n exmple E = ( + ) c( + ) E = (1 + 2) 3(4 + 5) 0 0 1 2 c c c 3 4 5 9-9 SNU 4/14

Position Automt The construction looks nice! All in-trnsitions of stte hve the sme lel. The numer of sttes = E + 1 Less sttes thn the Thompson utomt nd, thus usully fster! E = ( + ) c( + ) 0 1 2 c c c 3 4 5 10-1 SNU 4/14

Where do position utomt led us? 11-1 SNU 4/14

One-Unmiguous Regulr Lnguges Proposed y Brüggemnn-Klein nd Wood. A regulr lnguge L is one-unmiguous if there is regulr expression E such tht L = L(E) nd the position utomton of E is deterministic. 12-1 SNU 4/14

One-Unmiguous Regulr Lnguges Proposed y Brüggemnn-Klein nd Wood. A regulr lnguge L is one-unmiguous if there is regulr expression E such tht L = L(E) nd the position utomton of E is deterministic. Given n one-unmiguous regulr expression E nd n input string w, we cn red w using one lookhed with respect to E. E = SEO(UL) N S E O U L U L N 12-2 SNU 4/14

One-Unmiguous Regulr Lnguges Proposed y Brüggemnn-Klein nd Wood. A regulr lnguge L is one-unmiguous if there is regulr expression E such tht L = L(E) nd the position utomton of E is deterministic. Given n one-unmiguous regulr expression E nd n input string w, we cn red w using one lookhed with respect to E. E = SEO(UL) N S E O U L U L N 12-3 SNU 4/14

One-Unmiguous Regulr Lnguges Proposed y Brüggemnn-Klein nd Wood. A regulr lnguge L is one-unmiguous if there is regulr expression E such tht L = L(E) nd the position utomton of E is deterministic. Given n one-unmiguous regulr expression E nd n input string w, we cn red w using one lookhed with respect to E. E = SEO(UL) N S E O U L U L N 12-4 SNU 4/14

One-Unmiguous Regulr Lnguges Proposed y Brüggemnn-Klein nd Wood. A regulr lnguge L is one-unmiguous if there is regulr expression E such tht L = L(E) nd the position utomton of E is deterministic. Given n one-unmiguous regulr expression E nd n input string w, we cn red w using one lookhed with respect to E. E = SEO(UL) N S E O U L U L N 12-5 SNU 4/14

One-Unmiguous Regulr Lnguges Proposed y Brüggemnn-Klein nd Wood. A regulr lnguge L is one-unmiguous if there is regulr expression E such tht L = L(E) nd the position utomton of E is deterministic. Given n one-unmiguous regulr expression E nd n input string w, we cn red w using one lookhed with respect to E. E = SEO(UL) N S E O U L U L N 12-6 SNU 4/14

One-Unmiguous Regulr Lnguges Proposed y Brüggemnn-Klein nd Wood. A regulr lnguge L is one-unmiguous if there is regulr expression E such tht L = L(E) nd the position utomton of E is deterministic. Given n one-unmiguous regulr expression E nd n input string w, we cn red w using one lookhed with respect to E. E = SEO(UL) N S E O U L U L N 12-7 SNU 4/14

One-Unmiguous Regulr Lnguges Proposed y Brüggemnn-Klein nd Wood. A regulr lnguge L is one-unmiguous if there is regulr expression E such tht L = L(E) nd the position utomton of E is deterministic. Given n one-unmiguous regulr expression E nd n input string w, we cn red w using one lookhed with respect to E. E = SEO(UL) N S E O U L U L N 12-8 SNU 4/14

One-Unmiguous Regulr Lnguges Proposed y Brüggemnn-Klein nd Wood. A regulr lnguge L is one-unmiguous if there is regulr expression E such tht L = L(E) nd the position utomton of E is deterministic. Given n one-unmiguous regulr expression E nd n input string w, we cn red w using one lookhed with respect to E. E = SEO(UL) N S E O U L U L N 12-9 SNU 4/14

One-Unmiguous Regulr Lnguges Proposed y Brüggemnn-Klein nd Wood. A regulr lnguge L is one-unmiguous if there is regulr expression E such tht L = L(E) nd the position utomton of E is deterministic. Given n one-unmiguous regulr expression E nd n input string w, we cn red w using one lookhed with respect to E. E = SEO(UL) N S E O U L U L N 12-10 SNU 4/14

One-Unmiguous Regulr Lnguges Proposed y Brüggemnn-Klein nd Wood. A regulr lnguge L is one-unmiguous if there is regulr expression E such tht L = L(E) nd the position utomton of E is deterministic. Not ll regulr expressions re one-unmiguous. E = SEO(UL) UNI S E O U L U L N Not ll regulr lnguges re one-unmiguous. There re some regulr lnguges tht cnnot e defined y n one-miguous regulr lnguges. e.g. L(( + ) ( + ) k ), k 1 12-11 SNU 4/14

One-Unmiguous Regulr Lnguges Proposed y Brüggemnn-Klein nd Wood. A regulr lnguge L is one-unmiguous if there is regulr expression E such tht L = L(E) nd the position utomton of E is deterministic. <?xml version="1.0"?> <!DOCTYPE BOOK [ <!ELEMENT p (#PCDATA)> <!ELEMENT BOOK (OPENER,SUBTITLE?,INTRODUCTION?,(SECTION PART)+)> <!ELEMENT OPENER (TITLE_TEXT)*> <!ELEMENT TITLE_TEXT (#PCDATA)> <!ELEMENT SUBTITLE (#PCDATA)> <!ELEMENT INTRODUCTION (HEADER, p+)+> <!ELEMENT PART (HEADER, CHAPTER+)> <!ELEMENT SECTION (HEADER, p+)> <!ELEMENT HEADER (#PCDATA)> <!ELEMENT CHAPTER (CHAPTER_NUMBER, CHAPTER_TEXT)> <!ELEMENT CHAPTER_NUMBER (#PCDATA)> <!ELEMENT CHAPTER_TEXT (p)+> ]> 12-12 SNU 4/14

One-Unmiguous Regulr Lnguges Proposed y Brüggemnn-Klein nd Wood. A regulr lnguge L is one-unmiguous if there is regulr expression E such tht L = L(E) nd the position utomton of E is deterministic. <?xml version="1.0"?> <!DOCTYPE BOOK [ <!ELEMENT p (#PCDATA)> <!ELEMENT BOOK (OPENER,SUBTITLE?,INTRODUCTION?,(SECTION PART)+)> <!ELEMENT OPENER (TITLE_TEXT)*> <!ELEMENT TITLE_TEXT (#PCDATA)> <!ELEMENT SUBTITLE (#PCDATA)> <!ELEMENT INTRODUCTION (HEADER, p+)+> content models <!ELEMENT PART (HEADER, CHAPTER+)> <!ELEMENT SECTION (HEADER, p+)> <!ELEMENT HEADER (#PCDATA)> <!ELEMENT CHAPTER (CHAPTER_NUMBER, CHAPTER_TEXT)> <!ELEMENT CHAPTER_NUMBER (#PCDATA)> <!ELEMENT CHAPTER_TEXT (p)+> ]> 12-13 SNU 4/14

One-Unmiguous Regulr Lnguges Proposed y Brüggemnn-Klein nd Wood. A regulr lnguge L is one-unmiguous if there is regulr expression E such tht L = L(E) nd the position utomton of E is deterministic. <?xml version="1.0"?> <!DOCTYPE BOOK [ <!ELEMENT p (#PCDATA)> <!ELEMENT BOOK (OPENER,SUBTITLE?,INTRODUCTION?,(SECTION PART)+)> <!ELEMENT OPENER (TITLE_TEXT)*> <!ELEMENT TITLE_TEXT (#PCDATA)> <!ELEMENT SUBTITLE (#PCDATA)> BOOK <!ELEMENT ::= INTRODUCTION (HEADER, p+)+> OPENER <!ELEMENT (SUBTITLE+λ) PART (HEADER, (INTRODUCTION+λ) CHAPTER+)> (SECTION + PART)(SECTION + PART) <!ELEMENT SECTION (HEADER, p+)> <!ELEMENT HEADER (#PCDATA)> <!ELEMENT CHAPTER (CHAPTER_NUMBER, CHAPTER_TEXT)> <!ELEMENT CHAPTER_NUMBER (#PCDATA)> <!ELEMENT CHAPTER_TEXT (p)+> ]> 12-14 SNU 4/14

One-Unmiguous Regulr Lnguges Proposed y Brüggemnn-Klein nd Wood. A regulr lnguge L is one-unmiguous if there is regulr expression E such tht L = L(E) nd the position utomton of E is deterministic. <?xml version="1.0"?> <!DOCTYPE BOOK [ <!ELEMENT p (#PCDATA)> <!ELEMENT BOOK (OPENER,SUBTITLE?,INTRODUCTION?,(SECTION PART)+)> <!ELEMENT OPENER (TITLE_TEXT)*> <!ELEMENT TITLE_TEXT (#PCDATA)> <!ELEMENT SUBTITLE (#PCDATA)> BOOK <!ELEMENT ::= INTRODUCTION (HEADER, p+)+> OPENER <!ELEMENT (SUBTITLE+λ) PART (HEADER, (INTRODUCTION+λ) CHAPTER+)> (SECTION + PART)(SECTION + PART) <!ELEMENT SECTION (HEADER, p+)> <!ELEMENT HEADER One-unmiguous (#PCDATA)> regulr expression!! <!ELEMENT CHAPTER (CHAPTER_NUMBER, CHAPTER_TEXT)> <!ELEMENT CHAPTER_NUMBER (#PCDATA)> <!ELEMENT CHAPTER_TEXT (p)+> ]> 12-15 SNU 4/14

One-Unmiguous Regulr Lnguges vs XML DTD Regulr expressions for content models of DTD re one-unmiguous XML DTDs re LL(1) grmmrs [Wood 96] LL(k) grmmrs hve proper hierrchy [AU 72] k-unmiguous regulr lnguges?? 13-1 SNU 4/14

One-Unmiguous Regulr Lnguges vs XML DTD Regulr expressions for content models of DTD re one-unmiguous XML DTDs re LL(1) grmmrs [Wood 96] LL(k) grmmrs hve proper hierrchy [AU 72] k-unmiguous regulr lnguges?? We hve k-lookhed for processing n input string. X M L I 6-lookhed N S T A N C E current stte 13-2 SNU 4/14

k-lookhed Regulr Lnguges Two wys for defining k-lookhed regulr lnguges. The first is sed on lookhed of t most k 1 symols to determine the next, t most one, mtching position in given regulr expression: deterministic k-lookhed regulr expressions The second is similr except tht when we use lookhed of k symols, we must mtch the next k positions uniquely: k-lockdeterministic regulr expressions 14-1 SNU 4/14

Deterministic k-lookhed regulr lnguges fter reding i+1 t stte q i i i+1 i+2 i+k i+k+1 k-lookhed i i+1 i+2 t stte q i+1 k-lookhed i+k i+k+1 15-1 SNU 4/14

Deterministic k-lookhed regulr lnguges A regulr lnguge L is deterministic k-lookhed if there is deterministic k-lookhed regulr expression for L. A regulr expression is deterministic k-lookhed if its position utomton is deterministic k-lookhed. 0 1 2 3 E = ( + ) 16-1 SNU 4/14

Deterministic k-lookhed regulr lnguges A regulr lnguge L is deterministic k-lookhed if there is deterministic k-lookhed regulr expression for L. A regulr expression is deterministic k-lookhed if its position utomton is deterministic k-lookhed. 0 1 2 3 E = ( + ) # 16-2 SNU 4/14

Deterministic k-lookhed regulr lnguges A regulr lnguge L is deterministic k-lookhed if there is deterministic k-lookhed regulr expression for L. A regulr expression is deterministic k-lookhed if its position utomton is deterministic k-lookhed. 0 1 2 3 E = ( + ) # 16-3 SNU 4/14

Deterministic k-lookhed regulr lnguges A regulr lnguge L is deterministic k-lookhed if there is deterministic k-lookhed regulr expression for L. A regulr expression is deterministic k-lookhed if its position utomton is deterministic k-lookhed. 0? 1 2? 3 E = ( + ) # 16-4 SNU 4/14

Deterministic k-lookhed regulr lnguges A regulr lnguge L is deterministic k-lookhed if there is deterministic k-lookhed regulr expression for L. A regulr expression is deterministic k-lookhed if its position utomton is deterministic k-lookhed. 0 1 2 3 E = ( + ) # 16-5 SNU 4/14

Deterministic k-lookhed regulr lnguges A regulr lnguge L is deterministic k-lookhed if there is deterministic k-lookhed regulr expression for L. A regulr expression is deterministic k-lookhed if its position utomton is deterministic k-lookhed. 0 1 2 3 E = ( + ) # 16-6 SNU 4/14

Deterministic k-lookhed regulr lnguges A regulr lnguge L is deterministic k-lookhed if there is deterministic k-lookhed regulr expression for L. A regulr expression is deterministic k-lookhed if its position utomton is deterministic k-lookhed. 0 1 2 3 E = ( + ) # 16-7 SNU 4/14

Deterministic k-lookhed regulr lnguges A regulr lnguge L is deterministic k-lookhed if there is deterministic k-lookhed regulr expression for L. A regulr expression is deterministic k-lookhed if its position utomton is deterministic k-lookhed. 0 1 2 3 E = ( + ) E is deterministic 2-lookhed. # 16-8 SNU 4/14

Deterministic k-lookhed regulr lnguges Thm. L((+) (+) k ), for k 0, is deterministic (k+1)-lookhed. 17-1 SNU 4/14

Deterministic k-lookhed regulr lnguges Thm. L((+) (+) k ), for k 0, is deterministic (k+1)-lookhed. k = 1 17-2 SNU 4/14

17-3 SNU 4/14 Deterministic k-lookhed regulr lnguges Thm. L((+) (+) k ), for k 0, is deterministic (k+1)-lookhed. k = 1 k = 1 k = 2

Deterministic k-lookhed regulr lnguges Thm. L((+) (+) k ), for k 0, is deterministic (k+1)-lookhed. There exists hierrchy for deterministic k-lookhed regulr lnguges k k 1 k 2 k 3 1 18-1 SNU 4/14

k-lock-deterministic regulr lnguges i i+1 i+2 i+k i+k+1 t stte q i k-lookhed fter reding i+1 i+k i i+1 i+2 t stte q i i+k i+k+1 k-lookhed 19-1 SNU 4/14

k-lock-deterministic regulr lnguges We define regulr lnguge L to e k-lock-deterministic if there exists k-lock utomton A = (Q, Σ, Γ, δ, s, F ) tht stisfies the following conditions: 1. A is position utomton over Γ. 2. A is deterministic lock utomton. 3. L = L(A ). It is esy to verify tht position utomton A for n 1-deterministic regulr lnguge is 1-lock-deterministic. 20-1 SNU 4/14

k-lock-deterministic regulr lnguges Thm. There is proper hierrchy in k-lock-deterministic regulr lnguges. Sketch of Proof. A (k 1)-lock-deterministic regulr lnguge is k- lock-deterministic y definition. Thus, it is enough to show tht there is k-lock-deterministic regulr lnguge tht is not (k 1)-lockdeterministic. k k 1 k 2 k 3 1 21-1 SNU 4/14

k 3 sttes q 1 q 2 q 3 q 4 q 5 A q 1 q 3 A q 4 q 5 22-1 SNU 4/14

Two Wys... Thm. k-lock-deterministic regulr lnguges re proper sufmily of deterministic k-lookhed regulr lnguges. k-lookhed determinism k-lock determinism Generliztions of One-Deterministic Regulr Lnguges, Yo-Su Hn nd Derick Wood, Informtion nd Computtion, Vol. 206, 1117 1125, 2008 23-1 SNU 4/14

XML DTD vs XML Schem There s no vs XML Schem re much more flexile nd powerful Thus, there re lso much more difficult nd confusing 24-1 SNU 4/14

XML DTD vs XML Schem There s no vs XML Schem re much more flexile nd powerful Thus, there re lso much more difficult nd confusing XML DTD XML Schem 24-2 SNU 4/14

XML DTD vs XML Schem There s no vs XML Schem re much more flexile nd powerful Thus, there re lso much more difficult nd confusing XML DTD 1-lookhed determinism XML Schem k-lookhed determinism 24-3 SNU 4/14

XML DTD vs XML Schem There s no vs XML Schem re much more flexile nd powerful Thus, there re lso much more difficult nd confusing XML DTD 1-lookhed determinism XML Schem k-lookhed determinism 24-4 SNU 4/14

XML DTD vs XML Schem There s no vs XML Schem re much more flexile nd powerful Thus, there re lso much more difficult nd confusing XML DTD 1-lookhed determinism XML Schem? k-lookhed determinism 24-5 SNU 4/14

Pttern Mtching - n ppliction of regulr lnguges Given regulr expression pttern P nd text T, find ll sustrings of T tht re in L(P ). T = AGCT AAT CCCT GAGAGT CCAGT T AGT CCCAT P = T (AG + C) T 25-1 SNU 4/14

Pttern Mtching - n ppliction of regulr lnguges Given regulr expression pttern P nd text T, find ll sustrings of T tht re in L(P ). T = AGCT AAT CCCT GAGAGT CCAGT T AGT CCCAT P = T (AG + C) T 25-2 SNU 4/14

Pttern Mtching New Domins: WEB, Bioinformtics, Huge DB, Imges or Source Codes 26-1 SNU 4/14

Pttern Mtching - relted work Given text T nd regulr expression E, The recognition prolem: We cn report ll end positions of mtching sustrings of T in O(mn) time [Aho] or in O(mn/ log n) time [Myers]. The identifiction prolem: We cn report ll (strt, end) positions of mtching sustrings of T in O(mn 2 ) time [Aho]. 27-1 SNU 4/14

Pttern Mtching - recognition prolem Given E over Σ, we prepend Σ to E; this llows mtching to egin t ny position in T. E = ( + ) T = 28-1 SNU 4/14

Pttern Mtching - recognition prolem Given E over Σ, we prepend Σ to E; this llows mtching to egin t ny position in T. E = ( + ) T = Σ E 28-2 SNU 4/14

Pttern Mtching - recognition prolem Given E over Σ, we prepend Σ to E; this llows mtching to egin t ny position in T. E = ( + ) T = 28-3 SNU 4/14

Pttern Mtching - recognition prolem Given E over Σ, we prepend Σ to E; this llows mtching to egin t ny position in T. E = ( + ) T = Given E nd T, we cn find ll end positions of mtching sustrings of T in O(mn) time using O(m) spce, where E = m nd T = n [Aho]. 28-4 SNU 4/14

Pttern Mtching - identifiction prolem Given E over Σ, we prepend Σ to E; this llows mtching to egin t ny position in T. E = ( + ) T = 29-1 SNU 4/14

Pttern Mtching - identifiction prolem Given E over Σ, we prepend Σ to E; this llows mtching to egin t ny position in T. E = ( + ) E R = ( + ) T = 29-2 SNU 4/14

Pttern Mtching - identifiction prolem Given E over Σ, we prepend Σ to E; this llows mtching to egin t ny position in T. E = ( + ) E R = ( + ) T = 29-3 SNU 4/14

Pttern Mtching - identifiction prolem Given E over Σ, we prepend Σ to E; this llows mtching to egin t ny position in T. E = ( + ) E R = ( + ) T = 29-4 SNU 4/14

Pttern Mtching - identifiction prolem Given E over Σ, we prepend Σ to E; this llows mtching to egin t ny position in T. E = ( + ) E R = ( + ) T = 29-5 SNU 4/14

Pttern Mtching - identifiction prolem Given E over Σ, we prepend Σ to E; this llows mtching to egin t ny position in T. E = ( + ) E R = ( + ) T = 29-6 SNU 4/14

Pttern Mtching - identifiction prolem Given E over Σ, we prepend Σ to E; this llows mtching to egin t ny position in T. E = ( + ) E R = ( + ) T = Running Time = No. of mtching end positions O(mn) = O(n) O(mn) = O(mn 2 ). 29-7 SNU 4/14

Pttern Mtching - identifiction prolem Given E over Σ, we prepend Σ to E; this llows mtching to egin t ny position in T. E = ( + ) E R = ( + ) T = Running Time = No. of mtching end positions O(mn) = O(n) O(mn) = O(mn 2 ). We cn solve the idenftifiction prolem in O(mn 2 ) worst-cse time using O(m) spce [Aho]. 29-8 SNU 4/14

Prefix nd Infix Given two strings x nd y over Σ, we sy x is prefix of y if there exists z Σ such tht xz = y. x is n infix of y if there exists u, v Σ such tht uxv = y; we often cll x sustring of y. 30-1 SNU 4/14

Prefix nd Infix Given two strings x nd y over Σ, we sy x is prefix of y if there exists z Σ such tht xz = y. x is n infix of y if there exists u, v Σ such tht uxv = y; we often cll x sustring of y. y = seoul 30-2 SNU 4/14

Prefix nd Infix Given two strings x nd y over Σ, we sy x is prefix of y if there exists z Σ such tht xz = y. x is n infix of y if there exists u, v Σ such tht uxv = y; we often cll x sustring of y. y = seoul seo is prefix of y. 30-3 SNU 4/14

Prefix nd Infix Given two strings x nd y over Σ, we sy x is prefix of y if there exists z Σ such tht xz = y. x is n infix of y if there exists u, v Σ such tht uxv = y; we often cll x sustring of y. y = seoul eou is n infix of y. 30-4 SNU 4/14

Prefix nd Infix Given two strings x nd y over Σ, we sy x is prefix of y if there exists z Σ such tht xz = y. x is n infix of y if there exists u, v Σ such tht uxv = y; we often cll x sustring of y. We define pttern P to e prefix-free if no string in P is prefix of ny other strings in P. infix-free if no string in P is n infix of ny other strings in P. 30-5 SNU 4/14

Infix-free Regulr-Expression Mtching L IN L P RE L REG L REG L P RE L IN 31-1 SNU 4/14

Infix-free Regulr-Expression Mtching L IN L P RE L REG Given n infix-free regulr expression E nd text T : y = seoul eou is n infix of y. L REG E L P RE T 1 2 3 4 5 6 7 8 9 10 11 12 L IN 31-2 SNU 4/14

Infix-free Regulr-Expression Mtching L IN L P RE L REG Given n infix-free regulr expression E nd text T : y = seoul eou is n infix of y. L REG E L P RE T 1 2 3 4 5 6 7 8 9 10 11 12 = the recognition process L IN 31-3 SNU 4/14

Infix-free Regulr-Expression Mtching L IN L P RE L REG Given n infix-free regulr expression E nd text T : y = seoul eou is n infix of y. L REG E L P RE T 1 2 3 4 5 6 7 8 9 10 11 12 = the recognition process L IN 31-4 SNU 4/14

Infix-free Regulr-Expression Mtching L IN L P RE L REG Given n infix-free regulr expression E nd text T : y = seoul eou is n infix of y. L REG E L P RE E R = T 1 2 3 4 5 6 7 8 9 10 11 12 = the recognition process L IN 31-5 SNU 4/14

Infix-free Regulr-Expression Mtching L IN L P RE L REG Given n infix-free regulr expression E nd text T : y = seoul eou is n infix of y. L REG E L P RE E R = T 1 2 3 4 5 6 7 8 9 10 11 12 = L IN the recognition process Becuse of infix-freeness, ech pir of (, ) from left to right must e mtching sustring. 31-6 SNU 4/14

Infix-free Regulr-Expression Mtching L IN L P RE L REG Given n infix-free regulr expression E nd text T : y = seoul eou is n infix of y. L REG E L P RE E R = T 1 2 3 4 5 6 7 8 9 10 11 12 = L IN the recognition process Becuse of infix-freeness, ech pir of (, ) from left to right must e mtching sustring. We cn find ll mtching sustrings in O(mn) time [HWW07]. Prefix-Free Regulr Lnguges nd Pttern Mtching, Yo-Su Hn, Yjun Wng nd Derick Wood Theoreticl Computer Science Vol. 389, 307 317, 2007 31-7 SNU 4/14

Prefix-free Regulr-Expression Mtching L IN L P RE L REG If E is infix-free, we hve n O(mn) running time lgorithm If E is (norml) regulr expression, we hve n O(mn 2 ) running time lgorithm If E is prefix-free, then there re t most n mtching sustrings of T tht elong to L(E), where n is the size of T. 32-1 SNU 4/14

Prefix-free Regulr-Expression Mtching L IN L P RE L REG If E is infix-free, we hve n O(mn) running time lgorithm If E is (norml) regulr expression, we hve n O(mn 2 ) running time lgorithm If E is prefix-free, then there re t most n mtching sustrings of T tht elong to L(E), where n is the size of T. c c c c T = 13 32-2 SNU 4/14

Prefix-free Regulr-Expression Mtching L IN L P RE L REG If E is infix-free, we hve n O(mn) running time lgorithm If E is (norml) regulr expression, we hve n O(mn 2 ) running time lgorithm If E is prefix-free, then there re t most n mtching sustrings of T tht elong to L(E), where n is the size of T. c c c c T = 13 ccc is prefix of ccc. This contrdicts tht L(E) is prefix-free. 32-3 SNU 4/14

Prefix-free Regulr-Expression Mtching L IN L P RE L REG If E is infix-free, we hve n O(mn) running time lgorithm If E is (norml) regulr expression, we hve n O(mn 2 ) running time lgorithm If E is prefix-free, then there re t most n mtching sustrings of T tht elong to L(E), where n is the size of T. Cn we hve n O(mn) time lgorithm? c c c c T = 13 ccc is prefix of ccc. This contrdicts tht L(E) is prefix-free. 32-4 SNU 4/14

Prefix-free Regulr-Expression Mtching L IN L P RE L REG If E is infix-free, we hve n O(mn) running time lgorithm If E is (norml) regulr expression, we hve n O(mn 2 ) running time lgorithm If E is prefix-free, then there re t most n mtching sustrings of T tht elong to L(E), where n is the size of T. Cn we hve n O(mn) time lgorithm? c c c c T = 13 ccc is prefix of ccc. This contrdicts tht L(E) is prefix-free. 32-5 SNU 4/14

Prefix-free Regulr-Expression Mtching Sketch of our lgorithm: E T 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 33-1 SNU 4/14

Prefix-free Regulr-Expression Mtching Sketch of our lgorithm: E T = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 the recognition process 33-2 SNU 4/14

Prefix-free Regulr-Expression Mtching Sketch of our lgorithm: E T = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 the recognition process 33-3 SNU 4/14

Prefix-free Regulr-Expression Mtching Sketch of our lgorithm: E E R = T = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 the recognition process 33-4 SNU 4/14

Prefix-free Regulr-Expression Mtching Sketch of our lgorithm: E E R = T = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 the recognition process 33-5 SNU 4/14

Prefix-free Regulr-Expression Mtching Sketch of our lgorithm: E E R = T = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 the recognition process prllel processing strts 33-6 SNU 4/14

Prefix-free Regulr-Expression Mtching Sketch of our lgorithm: E E R = T = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 the recognition process T = Running Time = No. of mtching end positions O(mn) = O(n) O(mn) = O(mn 2 ). prllel processing strts 33-7 SNU 4/14

Prefix-free Regulr-Expression Mtching Sketch of our lgorithm: E E R = T = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 the recognition process T = Running Time = No. of mtching end positions O(mn) = O(n) O(mn) = O(mn 2 ). prllel processing strts 33-8 SNU 4/14

Prefix-free Regulr-Expression Mtching Sketch of our lgorithm: E E R = T = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 the recognition process T = Running Time = No. of mtching end positions O(mn) = O(n) O(mn) = O(mn 2 ). prllel processing strts 33-9 SNU 4/14

Prefix-free Regulr-Expression Mtching Sketch of our lgorithm: E E R = T = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 the recognition process Becuse of prefix-freeness, no two process cn hve the sme stte of E t the sme time. This implies tht single reverse scn is enough to find corresponding strt positions for ech end position. set of sttes for 12 set of sttes for 15 33-10 SNU 4/14

Prefix-free Regulr-Expression Mtching Given prefix-free regulr expression E nd text T, we cn identify ll mtching sustrings of T tht elong to L(E) in O(mn) worst-cse time - [HWW07]. Prefix-Free Regulr Lnguges nd Pttern Mtching, Yo-Su Hn, Yjun Wng nd Derick Wood Theoreticl Computer Science Vol. 389, 307 317, 2007 34-1 SNU 4/14

Stte Complexity Wht is the stte complexity of regulr lnguge L? 35-1 SNU 4/14

Stte Complexity Wht is the stte complexity of regulr lnguge L? Stte complexity is descriptionl complexity of L L hs unique miniml DFA A We define the stte complexity of L to e the numer of sttes in A 35-2 SNU 4/14

Stte Complexity Wht is the stte complexity of regulr lnguge L? Stte complexity is descriptionl complexity of L L hs unique miniml DFA A We define the stte complexity of L to e the numer of sttes in A We cn estimte needed resource. 35-3 SNU 4/14

Stte Complexity Prolem Given two (ritrry) regulr lnguges L 1 nd L 2, wht is the stte complexity of L 1 L 2? 36-1 SNU 4/14

Stte Complexity Prolem Given two (ritrry) regulr lnguges L 1 nd L 2, wht is the stte complexity of L 1 L 2? Upper ound m 1 m 2 36-2 SNU 4/14

Stte Complexity Prolem Given two (ritrry) regulr lnguges L 1 nd L 2, wht is the stte complexity of L 1 L 2? Upper ound t most m 1 m 2 f(m 1, m 2 ) 36-3 SNU 4/14

Stte Complexity Prolem Given two (ritrry) regulr lnguges L 1 nd L 2, wht is the stte complexity of L 1 L 2? Upper ound t most m 1 m 2 f(m 1, m 2 ) Lower ound 36-4 SNU 4/14

Stte Complexity Prolem Given two (ritrry) regulr lnguges L 1 nd L 2, wht is the stte complexity of L 1 L 2? Upper ound t most m 1 m 2 f(m 1, m 2 ) Lower ound Present two (generl) L 1 nd L 2 such tht the stte complexity of L 1 L 2 lwys reches the upper ound. 36-5 SNU 4/14

Stte Complexity Prolem Given two (ritrry) regulr lnguges L 1 nd L 2, wht is the stte complexity of L 1 L 2? Upper ound t most m 1 m 2 f(m 1, m 2 ) Lower ound Present two (generl) L 1 nd L 2 such tht the stte complexity of L 1 L 2 lwys reches the upper ound. Tight ound: UB = LB, the stte complexity of the intersection of two regulr lnguges is f(m 1, m 2 ) 36-6 SNU 4/14

Stte Complexity - Motivtion 1970s 2011 In recent yers, there hve een mny new pplictions of FAs, such s in nturl lnguge nd speech processing, softwre engineering, nd imge genertion nd encoding tht need lrge numer of sttes. the Bell Ls multilingul TTS system: 26.6MB for Germn, 30.0MB for French nd 39.0MB for Chinese. 37-1 SNU 4/14

Stte Complexity - motivtion New Helper: FA mnipultion softwre systems such s Gril+, Automte nd FireLite 38-1 SNU 4/14

Stte Complexity - motivtion New Helper: FA mnipultion softwre systems such s Gril+, Automte nd FireLite We clculte the upper ound. 38-2 SNU 4/14

Stte Complexity - motivtion New Helper: FA mnipultion softwre systems such s Gril+, Automte nd FireLite We clculte the upper ound. We guess lower ound nd verify it, nd repet this step until we find mtching lower ound. 38-3 SNU 4/14

Stte Complexity - motivtion New Helper: FA mnipultion softwre systems such s Gril+, Automte nd FireLite We clculte the upper ound. We guess lower ound nd verify it, nd repet this step until we find mtching lower ound. helper 38-4 SNU 4/14

Stte Complexity opertion finite lnguges regulr lnguges L 1 L 2 O(mn) mn L 1 L 2 O(mn) mn Σ \ L 1 m m L 1 L 2 (m n + 3)2 n 2 1 (2m 1)2 n 1 L 1 2 m 3 + 2 m 4, for m 4 2 m 1 + 2 m 2 L R 1 3 2 p 1 1 if m = 2p 2 p 1 if m = 2p 1 2 m 39-1 SNU 4/14

Union of Finite Lnguges Given two miniml DFAs A nd B for non-empty finite lnguges L 1 nd L 2, we cn construct DFA for L(A) L(B) sed on the Crtesin product of sttes s follows: Let A = (Q 1, Σ, δ 1, s 1, F 1 ) nd B = (Q 2, Σ, δ 2, s 2, F 2 ). M = (Q 1 Q 2, Σ, δ, (s 1, s 2 ), F ), where for ll p Q 1 nd q Q 2 nd Σ, δ((p, q), ) = (δ(p, ), δ(q, )) nd F = (F 1 Q 2 ) (Q 1 F 2 ). M is deterministic. 40-1 SNU 4/14

Union of Finite Lnguges - Crtesin Product of Sttes 1,1 1,2 1,3 1,n-1 1,n 2,1 The m 1th stte in A is the finl stte whose outtrnsitions go to the sink stte, the mth stte. m-1,1 m,1 m-1,n-1 m,n-1 m-1,n m,n 41-1 SNU 4/14

Union of Finite Lnguges - Crtesin Product of Sttes 1,1 1,2 1,3 1,n-1 1,n 2,1 The m 1th stte in A is the finl stte whose outtrnsitions go to the sink stte, the mth stte. For stte (i, j) in M, L i,j (M) = L i (A) L j (B). m-1,1 m,1 m-1,n-1 m,n-1 m-1,n m,n 41-2 SNU 4/14

Union of Finite Lnguges - Crtesin Product of Sttes 1,1 1,2 1,3 1,n-1 1,n 2,1 The m 1th stte in A is the finl stte whose outtrnsitions go to the sink stte, the mth stte. For stte (i, j) in M, L i,j (M) = L i (A) L j (B). m-1,1 m,1 m-1,n-1 m,n-1 m-1,n m,n ll sttes re unrechle from stte (1,1) since A nd B re non-returning. 41-3 SNU 4/14

Union of Finite Lnguges - Crtesin Product of Sttes 1,1 1,2 1,3 1,n-1 1,n 2,1 The m 1th stte in A is the finl stte whose outtrnsitions go to the sink stte, the mth stte. For stte (i, j) in M, L i,j (M) = L i (A) L j (B). m-1,1 m,1 m-1,n-1 m,n-1 m-1,n m,n ll sttes equivlent since L m 1,n 1 = L m 1,n = L m,n 1 = {λ}. ll sttes re unrechle from stte (1,1) since A nd B re non-returning. 41-4 SNU 4/14

Union of Finite Lnguges - Crtesin Product of Sttes 1,1 1,2 1,3 1,n-1 1,n 2,1 The m 1th stte in A is the finl stte whose outtrnsitions go to the sink stte, the mth stte. For stte (i, j) in M, L i,j (M) = L i (A) L j (B). m-1,1 m,1 m-1,n-1 m,n-1 m-1,n m,n ll sttes equivlent since L m 1,n 1 = L m 1,n = L m,n 1 = {λ}. Lemm ll sttes 1. mn (m+n 2) 2 re unrechle from= stte mn(1,1) (m + n) sttes re sufficient forsince L(A) A nd L(B). B re non-returning. 41-5 SNU 4/14

Union of Finite Lnguges Lemm 1. mn (m + n) sttes re sufficient for L(A) L(B). The next question is whether or not the ound is rechle in generl. 42-1 SNU 4/14

Union of Finite Lnguges Lemm 1. mn (m + n) sttes re sufficient for L(A) L(B). The next question is whether or not the ound is rechle in generl. The nswer is YES nd NO. 42-2 SNU 4/14

Union of Finite Lnguges Lemm 2. The upper ound mn (m + n) cnnot e reched with fixed lphet when m nd n re ritrrily lrge. Proof. Let A hve {p 0, p 1,..., p m 1 } nd B hve {q 0, q 1,..., q n 1 }. We order the sttes such tht if p j is rechle from p i, then i < j. Let i {1,..., m 1}. Any string tht reches p i from p 0 cn go through only the sttes p 1,..., p i 1 in etween nd cnnot visit the sme stte twice. Hence, there re t most t + t 2 + + t i = t(ti 1) t 1 = def D(i) strings tht cn rech p i from p 0. 43-1 SNU 4/14

Union of Finite Lnguges Lemm 2. The upper ound mn (m + n) cnnot e reched with fixed lphet when m nd n re ritrrily lrge. Proof. Let A hve {p 0, p 1,..., p m 1 } nd B hve {q 0, q 1,..., q n 1 }. We order the sttes such tht if p j is rechle from p i, then i < j. Let i {1,..., m 1}. Any string tht reches p i from p 0 cn go through only the sttes p 1,..., p i 1 in etween nd cnnot visit the sme stte twice. Hence, there re t most t + t 2 + + t i = t(ti 1) t 1 = def D(i) strings tht cn rech p i from p 0. Since M is deterministic, for ny fixed i for 1 i < m 1, t most D(i) of the pir-sttes (p i, q j ) re rechle from (p 0, q 0 ) in M. Thus, if n 2 > D(i), then some pir-sttes with p i s the first component re not rechle. Therefore, the ound mn (m + n) is not rechle. 43-2 SNU 4/14

Union of Finite Lnguges Lemm 2. The upper ound mn (m + n) cnnot e reched with fixed lphet when m nd n re ritrrily lrge. Wht if the size of n lphet is NOT fixed? 44-1 SNU 4/14

Union of Finite Lnguges Lemm 2. The upper ound mn (m + n) cnnot e reched with fixed lphet when m nd n re ritrrily lrge. Wht if the size of n lphet is NOT fixed? Lemm 3. The upper ound mn (m + n) is rechle if the size of the lphet cn depend on m nd n. 44-2 SNU 4/14

Union of Finite Lnguges Lemm 3. The upper ound mn (m + n) is rechle if the size of the lphet cn depend on m nd n. We prove the lemm y presenting two finite lnguges whose union reches the ound. Let Σ = {, c} { i,j 1 i m 2, 1 j n 2 nd (i, j) (m 2, n 2)} Let A = (Q 1, Σ, δ 1, p 0, {p m 2 }), where Q 1 = {p 0, p 1,..., p m 1 } nd δ 1 is defined s follows: δ 1 (p i, ) = p i+1, for 0 i m 2. δ 1 (p 0, i,j ) = p i, for 1 i m 2 nd 1 j n 2, (i, j) (m 2, n 2). Let B = (Q 2, Σ, δ 2, q 0, {q n 2 }), where Q 2 = {q 0, q 1,..., q n 1 } nd δ 2 is defined s follows: δ 2 (q i, c) = q i+1, for 0 i n 2. δ 2 (q 0, i,j ) = q j, for 1 j n 2 nd 1 i m 2, (i, j) (m 2, n 2). 45-1 SNU 4/14

Union of Finite Lnguges Lemm 3. The upper ound mn (m + n) is rechle if the size of the lphet cn depend on m nd n. A, 11, 12, 13 0 1 2 3 4 5 21, 22, 23 31, 32, 33 41, 42 B c, 11, 21, 31, 41 c c c 0 1 2 3 4 12, 22, 32, 42 13, 23, 33 An exmple of two miniml DFAs for finite lnguges whose sizes re 6 nd 5, respectively, where stte 5 ove nd stte 4 elow re sink sttes 46-1 SNU 4/14

Union of Finite Lnguges Lemm 3. The upper ound mn (m + n) is rechle if the size of the lphet cn depend on m nd n. Let L = L(A) L(B). We shows tht there exists set R consisting of mn (m+n) strings over Σ tht re pirwise inequivlent modulo the right invrint congruence of L. Let R = R 1 R 2 R 3, where R 1 = { i 0 i m 1}. R 2 = {c j 1 j n 3}. (Note tht R 2 does not include strings c 0, c n 2 nd c n 1.) R 3 = { i,j 1 i m 2 nd 1 j n 2 nd (i, j) (m 2, n 2)}. It is esy to verify tht ll strings in R re pirwise inequivlent. complete proof is given in the proceedings.) Then, R = mn (m + n). (The 47-1 SNU 4/14

Union of Finite Lnguges Theorem 1. Given two miniml DFAs A nd B for finite lnguges, mn (m + n) sttes re necessry nd sufficient in the worst-cse for the miniml DFA of L(A) L(B), where m = A nd n = B. 48-1 SNU 4/14

Union of Finite Lnguges Lemm 2 shows tht the upper ound is unrechle if Σ is fixed wheres Lemm 3 shows tht the upper ound is rechle if Σ depends on m nd n. Then, wht is the stte complexity of union with fixed sized lphet? 49-1 SNU 4/14

Union of Finite Lnguges Lemm 4. There exist DFAs A nd B, with m nd n sttes respectively, tht recognize finite lnguges over Σ such tht the miniml DFA for L(A) L(B) requires c(min{m, n}) 2 sttes. Proof. Let s 1 e ritrry nd r = log s. We define the finite lnguge L 1 = {w 1 w 2 w 1 = 2r, w 2 = odd(w 1 ) {, }, even(w 1 ) {c, d} }. L 1 cn e recognized y DFA A with t most 10s sttes. 50-1 SNU 4/14

Union of Finite Lnguges L 1 = {w 1 w 2 w 1 = 2r, w 2 = odd(w 1 ) {, }, even(w 1 ) {c, d} }. c, d c, d c, d c, d c, d c, d expnding tree for w 1 prt c, d c, d c, d c, d c, d c, d c, d c, d merging tree for w 2 prt A DFA A tht recognizes L 1 when r = 3. We omit the sink stte nd its in-trnsitions. 51-1 SNU 4/14

Union of Finite Lnguges Symmetriclly, we define L 2 = {w 1 w 2 w 1 = 2r, odd(w 1 ) {, }, w 2 = even(w 1 ) {c, d} }. The lnguge L 2 consists of strings uv, where u = 2r, odd chrcters of u re in {, }, even chrcters of u re in {c, d} nd even(u) coincides with v. By similr rgument, L 2 cn e recognized y DFA B with t most 10s sttes. 52-1 SNU 4/14

Union of Finite Lnguges Now let L = L 1 L 2. Let u 1 nd u 2 e distinct strings of length 2r such tht odd(u i ) {, } nd even(u i ) {c, d} for i = 1, 2. If odd(u 1 ) odd(u 2 ): u 1 odd(u 1 ) L 1 L ut u 2 odd(u 1 ) / L. Hence, u 1 nd u 2 re not equivlent modulo the right invrint congruence of L. If even(u 1 ) even(u 2 ): u 1 even(u 1 ) L 2 L ut u 2 even(u 1 ) L. The ove implies tht the right invrint congruence of L hs t lest 2 r 2 r s 2 different clsses. Therefore, if m = n = 10s is the size of the miniml DFAs for the finite lnguges L 1 nd L 2, then we know tht the miniml DFA for L = L 1 L 2 needs t lest 1 100 n2 sttes. 53-1 SNU 4/14

RECAP Structurl properties of the k-lookhed determinism tht might led to n efficient XML Schem prser Fst regulr-expression pttern mtching lgorithms Stte Complexity 54-1 SNU 4/14

Future Directions nd Conclusions Hierrchy of k-lookhed determinism XML Schem prser regulr-expression pttern mtching system for source codes pttern mtching + indexing pure theory prcticl ppliction stte complexity 55-1 SNU 4/14

THANK YOU ANY QUESTIONS?? 56-1 SNU 4/14