Streamed Validation of XML Documents

Similar documents
Theory of Computation Regular Languages. (NTU EE) Regular Languages Fall / 38

Theory of Computation Regular Languages

Minimal DFA. minimal DFA for L starting from any other

1.4 Nonregular Languages

Finite Automata Theory and Formal Languages TMV027/DIT321 LP4 2018

Non Deterministic Automata. Linz: Nondeterministic Finite Accepters, page 51

1.3 Regular Expressions

Non-Deterministic Finite Automata. Fall 2018 Costas Busch - RPI 1

Introduction to ω-autamata

11.1 Finite Automata. CS125 Lecture 11 Fall Motivation: TMs without a tape: maybe we can at least fully understand such a simple model?

Myhill-Nerode Theorem

5. (±±) Λ = fw j w is string of even lengthg [ 00 = f11,00g 7. (11 [ 00)± Λ = fw j w egins with either 11 or 00g 8. (0 [ ffl)1 Λ = 01 Λ [ 1 Λ 9.

NFAs and Regular Expressions. NFA-ε, continued. Recall. Last class: Today: Fun:

Recursively Enumerable and Recursive. Languages

Homework 4. 0 ε 0. (00) ε 0 ε 0 (00) (11) CS 341: Foundations of Computer Science II Prof. Marvin Nakayama

Anatomy of a Deterministic Finite Automaton. Deterministic Finite Automata. A machine so simple that you can understand it in less than one minute

Formal Languages and Automata

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers

CISC 4090 Theory of Computation

State Minimization for DFAs

Fundamentals of Computer Science

CSCI FOUNDATIONS OF COMPUTER SCIENCE

Deterministic Finite-State Automata

Harvard University Computer Science 121 Midterm October 23, 2012

CS375: Logic and Theory of Computing

Chapter 2 Finite Automata

Automata and Languages

Speech Recognition Lecture 2: Finite Automata and Finite-State Transducers. Mehryar Mohri Courant Institute and Google Research

CS 275 Automata and Formal Language Theory

Finite-State Automata: Recap

Probabilistic Model Checking Michaelmas Term Dr. Dave Parker. Department of Computer Science University of Oxford

AUTOMATA AND LANGUAGES. Definition 1.5: Finite Automaton

CHAPTER 1 Regular Languages. Contents

CS:4330 Theory of Computation Spring Regular Languages. Equivalences between Finite automata and REs. Haniel Barbosa

For convenience, we rewrite m2 s m2 = m m m ; where m is repeted m times. Since xyz = m m m nd jxyj»m, we hve tht the string y is substring of the fir

Finite Automata. Informatics 2A: Lecture 3. John Longley. 22 September School of Informatics University of Edinburgh

Grammar. Languages. Content 5/10/16. Automata and Languages. Regular Languages. Regular Languages

CMPSCI 250: Introduction to Computation. Lecture #31: What DFA s Can and Can t Do David Mix Barrington 9 April 2014

Finite Automata. Informatics 2A: Lecture 3. Mary Cryan. 21 September School of Informatics University of Edinburgh

CS103 Handout 32 Fall 2016 November 11, 2016 Problem Set 7

Non Deterministic Automata. Formal Languages and Automata - Yonsei CS 1

Convert the NFA into DFA

Kleene Theorems for Free Choice Nets Labelled with Distributed Alphabets

CMSC 330: Organization of Programming Languages. DFAs, and NFAs, and Regexps (Oh my!)

Regular expressions, Finite Automata, transition graphs are all the same!!

CSC 473 Automata, Grammars & Languages 11/9/10

Strong Bisimulation. Overview. References. Actions Labeled transition system Transition semantics Simulation Bisimulation

Turing Machines Part One

Lecture 08: Feb. 08, 2019

5.1 Definitions and Examples 5.2 Deterministic Pushdown Automata

Lexical Analysis Finite Automate

CS 275 Automata and Formal Language Theory

Coalgebra, Lecture 15: Equations for Deterministic Automata

NFAs continued, Closure Properties of Regular Languages

4 Deterministic Büchi Automata

NFAs continued, Closure Properties of Regular Languages

Chapter 4 Regular Grammar and Regular Sets. (Solutions / Hints)

Exercises Chapter 1. Exercise 1.1. Let Σ be an alphabet. Prove wv = w + v for all strings w and v.

First Midterm Examination

Relating logic to formal languages

CHAPTER 1 Regular Languages. Contents. definitions, examples, designing, regular operations. Non-deterministic Finite Automata (NFA)

Tutorial Automata and formal Languages

Non-Deterministic Finite Automata

Chapter Five: Nondeterministic Finite Automata. Formal Language, chapter 5, slide 1

First Midterm Examination

Closure Properties of Regular Languages

Lecture 9: LTL and Büchi Automata

CSE : Exam 3-ANSWERS, Spring 2011 Time: 50 minutes

Nondeterminism. Nondeterministic Finite Automata. Example: Moves on a Chessboard. Nondeterminism (2) Example: Chessboard (2) Formal NFA

GNFA GNFA GNFA GNFA GNFA

Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Regular Expressions (RE) Kleene-*

DFA minimisation using the Myhill-Nerode theorem

Good-for-Games Automata versus Deterministic Automata.

1 Structural induction, finite automata, regular expressions

TREE AUTOMATA AND TREE GRAMMARS

a b b a pop push read unread

Homework 3 Solutions

Learning Regular Languages over Large Alphabets

Chapter 1, Part 1. Regular Languages. CSC527, Chapter 1, Part 1 c 2012 Mitsunori Ogihara 1

More on automata. Michael George. March 24 April 7, 2014

CS 310 (sec 20) - Winter Final Exam (solutions) SOLUTIONS

Handout: Natural deduction for first order logic

CSCI 340: Computational Models. Kleene s Theorem. Department of Computer Science

CS415 Compilers. Lexical Analysis and. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

1. For each of the following theorems, give a two or three sentence sketch of how the proof goes or why it is not true.

a,b a 1 a 2 a 3 a,b 1 a,b a,b 2 3 a,b a,b a 2 a,b CS Determinisitic Finite Automata 1

Lecture 6 Regular Grammars

CS 275 Automata and Formal Language Theory

Agenda. Agenda. Regular Expressions. Examples of Regular Expressions. Regular Expressions (crash course) Computational Linguistics 1

Formal languages, automata, and theory of computation

A tutorial on sequential functions

Finite Automata-cont d

Lecture 3 ( ) (translated and slightly adapted from lecture notes by Martin Klazar)

THEOTY OF COMPUTATION

Regular languages refresher

Automata, Games, and Verification

Turing Machines Part One

Formal Language and Automata Theory (CS21004)

CS311 Computational Structures Regular Languages and Regular Grammars. Lecture 6

Lecture 09: Myhill-Nerode Theorem

Transcription:

Preliminries DTD Document Type Definition References Jnury 29, 2009

Preliminries DTD Document Type Definition References Structure Preliminries Unrnked Trees Recognizble Lnguges DTD Document Type Definition simple DTDs Specilized DTDs Strong Vlidtion Vlidting well-formed XML Documents References

Preliminries DTD Document Type Definition References Unrnked Trees From XML to unrnked Trees <b o o k C o l l e c t i o n> <book> < t i t l e>the Lord o f the Rings</ t i t l e> </ book> <book> <r e l t e d> < t i t l e>the Lord o f the Rings</ t i t l e> </ r e l t e d> < t i t l e>the H i s t o r y o f Middle e r t h</ t i t l e> </ book> </ b o o k C o l l e c t i o n>

Preliminries DTD Document Type Definition References Unrnked Trees From XML to unrnked Trees bookcollection book book title relted title title

Preliminries DTD Document Type Definition References Unrnked Trees From XML to unrnked Trees r c b c c

Preliminries DTD Document Type Definition References Unrnked Trees From XML to unrnked Trees r c b c c Forml representtion: Σ = {r,, b, c} r((c()), (b(c()), c())) = t T Σ

Preliminries DTD Document Type Definition References Unrnked Trees From XML to unrnked Trees r c b c c Forml representtion: Σ = {r,, b, c} r((c()), (b(c()), c())) = t T Σ String representtion: rccbccbccr = [t] [T Σ ]

Preliminries DTD Document Type Definition References Recognizble Lnguges Recognizble Lnguges Myhill-Nerode Theorem Let L be lnguge over n lphbet Σ. We define the Nerode reltion Σ Σ s follows: for every u, v Σ : u v w Σ : uw L vw L The Nerode reltion prtitions Σ in equivlence clsses. Theorem (Myhill-Nerode Theorem) A lnguge L is recognizble iff the Nerode reltion prtitions Σ in finitely mny equivlence clsses. [Bder, 2007]

Preliminries DTD Document Type Definition References simple DTDs DTD Document Type Definition Definition A DTD is tuple (Σ, r, P) where Σ is n lphbet, r Σ is clled the root lbel, nd P { R Σ, R Reg Σ } is finite set of so-clled productions. Nottion: D d... set of trees stisfying DTD d L(d) = [D d ]... set of string representtions of the trees in D d

Preliminries DTD Document Type Definition References simple DTDs DTD Document Type Definition Exmple A DTD which is stisfied by the tree c r b c c cn be: d = (Σ, r, P) where Σ = {r,, b, c} nd P = {r, bc + c, b c, c ε} So L(d) = {r} {cc, bccbcc} {r}.

Preliminries DTD Document Type Definition References Specilized DTDs Specilized DTDs Definition (specilized DTD) A specilized DTD over Σ is tuple d = (Σ, Σ, d, µ) where Σ nd Σ re lphbets, d is DTD over Σ, nd µ: Σ Σ is mpping.

Preliminries DTD Document Type Definition References Specilized DTDs Specilized DTDs Exmple Specilized DTD which is only stisfied by the tree d = (Σ, Σ, d, µ) where c r b c : c Σ = {r,, b, c}, Σ = {r, x, y, b, c}, d = (Σ, r, P), P = {r xy, x c, y bc, b c, c ε}, { α Σ if α {x, y}, : µ(α) = α otherwise.

Preliminries DTD Document Type Definition References Specilized DTDs Specilized DTDs Exmple Σ = {r,, b, c}, Σ = {r, x, y, b, c}, d = (Σ, r, P), P = {r xy, x c, y bc, b c, c ε}, { α Σ if α {x, y}, : µ(α) = α otherwise. So L(d ) = {rxccxybccbccyr} nd L(d) = {rccbccbccr}.

Preliminries DTD Document Type Definition References Strong Vlidtion Strong Vlidtion Definition We cll (specilized) DTD d strongly recognizble iff L(d) is recognizble.

Preliminries DTD Document Type Definition References Strong Vlidtion Strong Vlidtion Exmple (non-recursive DTD) Agin consider the DTD d = (Σ, r, P) where Σ = {r,, b, c} nd P = {r, bc + c, b c, c ε} The DTD d is not recursive nd the lnguge L(d) cn be represented by the regulr expression r (cc + bccbcc) r. Hence, this DTD is strongly recognizble.

Preliminries DTD Document Type Definition References Strong Vlidtion Strong Vlidtion Exmple (recursive DTD) Let d = (Σ, r, P) where Σ = {r, } nd P = {r, + ε}. The DTD d is obviously recursive. Moreover L(d) = {r n n r n 1}. Hence, d is not strongly recognizble.

Preliminries DTD Document Type Definition References Strong Vlidtion Strong Vlidtion Theorem Theorem A specilized DTD is strongly recognizble iff it is non-recursive. [Segoufin & Vinu, 2002]

Preliminries DTD Document Type Definition References Strong Vlidtion Strong Vlidtion Proof Step 1 Let d = (Σ, Σ, d, µ) be specilized DTD. Step 1: d is strongly recognizble d is non-recursive: Let d be strongly recognizble. Then there exists n FSA A which ccepts L(d).

Preliminries DTD Document Type Definition References Strong Vlidtion Strong Vlidtion Proof Step 1 Suppose Σ is recursive with respect to d.

Preliminries DTD Document Type Definition References Strong Vlidtion Strong Vlidtion Proof Step 1 Suppose Σ is recursive with respect to d. Then d nd d re recursive nd there exists tree t D d such tht repets on pth of t. So [t] hs the form [t] = ru 1 v 1 wv 2 u 2 r where u 1 u 2 nd v 1 v 2 re well-blnced words.

Preliminries DTD Document Type Definition References Strong Vlidtion Strong Vlidtion Proof Step 1 Suppose Σ is recursive with respect to d. Then d nd d re recursive nd there exists tree t D d such tht repets on pth of t. So [t] hs the form [t] = ru 1 v 1 wv 2 u 2 r where u 1 u 2 nd v 1 v 2 re well-blnced words. Since is recursive we cn repet the prts v 1 nd v 2 nd the trees n > 0: [t n ] = ru 1 (v 1 ) n w(v 2 ) n u 2 r re lso in L(d ) nd A ccepts µ([t n ]).

Preliminries DTD Document Type Definition References Strong Vlidtion Strong Vlidtion Proof Step 1 However, with the Myhill-Nerode theorem we cn show tht L(d) is not regulr: There is n infinite number of equivlence clsses of strings over Σ Σ becuse i, j 1: i j µ(ru 1 (v 1 ) i ) µ(ru 1 (v 1 ) j ).

Preliminries DTD Document Type Definition References Strong Vlidtion Strong Vlidtion Proof Step 1 However, with the Myhill-Nerode theorem we cn show tht L(d) is not regulr: There is n infinite number of equivlence clsses of strings over Σ Σ becuse i, j 1: i j µ(ru 1 (v 1 ) i ) µ(ru 1 (v 1 ) j ). This is contrdiction to the ssumption tht L(d) is regulr nd tht A recognizes L(d), hence, d nd d cn not be recursive.

Preliminries DTD Document Type Definition References Strong Vlidtion Strong Vlidtion Proof Step 2: FSA Construction Let d = (Σ, Σ, d, µ) be specilized DTD where Σ = {r,, b}, Σ = {ρ, α, β}, d = (Σ, ρ, P ), P = {ρ α, α β + ε, β ε} nd µ(ρ) = r, µ(α) =, µ(β) = b Since d is not recursive there exists strongly vlidting FSA. Our utomt A b for every b Σ re

Preliminries DTD Document Type Definition References Strong Vlidtion Strong Vlidtion Proof Step 2: FSA Construction Let d = (Σ, Σ, d, µ) be specilized DTD where Σ = {r,, b}, Σ = {ρ, α, β}, d = (Σ, ρ, P ), nd P = {ρ α, α β + ε, β ε} µ(ρ) = r, µ(α) =, µ(β) = b Since d is not recursive there exists strongly vlidting FSA. Our utomt A b for every b Σ re α A ρ : q 0,ρ r r

Preliminries DTD Document Type Definition References Strong Vlidtion Strong Vlidtion Proof Step 2: FSA Construction Let d = (Σ, Σ, d, µ) be specilized DTD where Σ = {r,, b}, Σ = {ρ, α, β}, d = (Σ, ρ, P ), nd P = {ρ α, α β + ε, β ε} µ(ρ) = r, µ(α) =, µ(β) = b Since d is not recursive there exists strongly vlidting FSA. Our utomt A b for every b Σ re A α : q 0,α β

Preliminries DTD Document Type Definition References Strong Vlidtion Strong Vlidtion Proof Step 2: FSA Construction Let d = (Σ, Σ, d, µ) be specilized DTD where Σ = {r,, b}, Σ = {ρ, α, β}, d = (Σ, ρ, P ), P = {ρ α, α β + ε, β ε} nd µ(ρ) = r, µ(α) =, µ(β) = b Since d is not recursive there exists strongly vlidting FSA. Our utomt A b for every b Σ re A β : q 0,β b b

Preliminries DTD Document Type Definition References Strong Vlidtion Strong Vlidtion Proof Step 2: FSA Construction α A ρ : q 0,ρ r r Now we build the trget utomton step by step. Our A 0 is equl to A ρ. The following re:

Preliminries DTD Document Type Definition References Strong Vlidtion Strong Vlidtion Proof Step 2: FSA Construction A α : q 0,α β A 1 : r ε ε r β

Preliminries DTD Document Type Definition References Strong Vlidtion Strong Vlidtion Proof Step 2: FSA Construction A β : q 0,β b b A 2 : r ε ε r ε b b ε

Preliminries DTD Document Type Definition References Strong Vlidtion Strong Vlidtion Proof Step 2: FSA Construction A 2 : r ε ε r ε b b ε Since A 2 contins no symbols from Σ nymore the following utomt A 3, A 4,... will be the sme. So A 2 is the desired utomton.

Preliminries DTD Document Type Definition References Vlidting well-formed XML Documents Exmple (recognizble DTD) Consider the DTD d = (Σ, r, P) where Σ = {r, } nd P = {r, + ε} gin.

Preliminries DTD Document Type Definition References Vlidting well-formed XML Documents Exmple (recognizble DTD) Consider the DTD d = (Σ, r, P) where Σ = {r, } nd P = {r, + ε} gin. There is regulr lnguge L R such tht L(d) = [T Σ ] L R.

Preliminries DTD Document Type Definition References Vlidting well-formed XML Documents Exmple (recognizble DTD) Consider the DTD d = (Σ, r, P) where Σ = {r, } nd gin. P = {r, + ε} There is regulr lnguge L R such tht L(d) = [T Σ ] L R. Let L R for exmple be L(r r). Then L R = {r m n r m 1, n 0} nd [T Σ ] L R = {r n n r n 1} = L(d).

Preliminries DTD Document Type Definition References Vlidting well-formed XML Documents Exmple (recognizble DTD) Consider the DTD d = (Σ, r, P) where Σ = {r, } nd gin. P = {r, + ε} There is regulr lnguge L R such tht L(d) = [T Σ ] L R. Let L R for exmple be L(r r). Then L R = {r m n r m 1, n 0} nd [T Σ ] L R = {r n n r n 1} = L(d). Or let L R for exmple be L(r r). So L R is mbiguous.

Preliminries DTD Document Type Definition References Vlidting well-formed XML Documents Exmple (not recognizble DTD) Let d = (Σ, b, P) be DTD where Σ = {, b, c} nd P = {b b + bc + ε, ε, c ε}. b b b b b c c Figure: Grphicl representtion for tree in D d.

Preliminries DTD Document Type Definition References References Bder, Prof. Dr.-Ing. Frnz. 2007 (mrch). Skript zur Lehrvernstltung Grundlgen der Theoretischen Informtik. Segoufin, Luc, & Vinu, Victor. (2002). Vlidting streming XML documents. Pges 53 64 of: Symposium on principles of dtbse systems. Assocition for Computing Mchinery.