arxiv: v1 [cs.fl] 13 Oct PDF Free Download

Regulr Expressions, u point Andre Asperti Deprtment of Computer Siene, University of Bologn Mur Anteo Zmoni 7, 40127, Bologn, ITALY sperti@s.unio.it Cludio Serdoti Coen Deprtment of Computer Siene, University of Bologn Mur Anteo Zmoni 7, 40127, Bologn, ITALY serdot@s.unio.it Enrio Tssi Mirosoft Reserh-INRIA Joint Center enrio.tssi@inri.fr rxiv:1010.2604v1 [s.fl] 13 Ot 2010 Astrt We introdue new tehnique for onstruting finite stte deterministi utomton from regulr expression, sed on the ide of mrking suitle set of positions inside the expression, intuitively representing the possile points rehed fter the proessing of n initil prefix of the input string. Pointed regulr expressions join the elegne nd the symoli ppelingness of Brzozowski s derivtives, with the effetiveness of MNughton nd Ymd s lelling tehnique, essentilly omining the est of the two pprohes. F.1.1 [Models of Comput- Ctegories nd Sujet Desriptors tion] Generl Terms Regulr expressions, Finite Sttes Automt, Deriv- Keywords tives Theory 1. Introdution There is hrdly sujet in Theoretil Computer Siene tht, in view of its relevne nd elegne, hs een so thoroughly investigted s the notion of regulr expression nd its reltion with finite stte utomt (see e.g. [1, 2] for some reent surveys). All the studies in this re hve een trditionlly inspired y two preursory, silr works: Brzozowski s theory of derivtives [3], nd MNughton nd Ymd s lgorithm [4]. The min dvntges of derivtives re tht they re synttilly ppeling, esy to grsp nd to prove orret (see [5] for reent revisittion). On the other side, MNughton nd Ymd s pproh results in prtiulrly effiient lgorithm, still used y most pttern mthers like the populr grep nd egrep utilities. The reltion etween the two pprohes hs een deeply investigted too, strting from the seminl work y Berry nd Sethi [6] where it is shown how to refine Brzozowski s method to get to the effiient lgorithm (Berry nd Sethi lgorithm hs een further improved y lter uthors [7, 8]). Regulr expressions re suh smll world tht it is muh t no one s surprise tht ll different pprohes, t the end, turn out to e equivlent; still, their philosophy, their underlying intuition, nd the tehniques to e deployed n e sensily different. Without hving the pretension to sy nything relly originl on the sujet, we introdue in this pper notion of pointed regulr expression, tht provides hep pllitive for derivtives nd llows simple, diret nd effiient onstrution of the deterministi finite utomton. Remrkly, the forml orrespondene etween pointed expressions nd Brzozowski s derivtives is unexpetedly entngled (see Setion 4.1) testifying the novelty nd the not-so-trivil nture of the notion. The ide of pointed expressions ws suggested y n ttempt of formlizing the theory of regulr lnguges y mens of n intertive prover 1. At first, we strted onsidering derivtives, sine they looked more suitle to the kind of symoli mnipultions tht n e esily delt with y mens of these tools. However, the need to onsider sets of derivtives nd, espeilly, to reson modulo ssoitivity, ommuttivity nd idempotene of sum, prompted us to look for n lterntive notion. Now, it is ler tht, in some sense, the derivtive of regulr expression e is set of suexpressions of e 2 : the only, ruil, differene is tht we nnot forget their ontext. So, the nturl solution is to point t suexpressions inside the originl term. This immeditely leds to the notion of pointed regulr expression (pre), tht is just norml regulr expression where some positions (it is enough to onsider individul hrters) hve een pointed out. Intuitively, the points mrk the positions inside the regulr expression whih hve een rehed fter reding some prefix of the input string, or etter the positions where the proessing of the remining string hs to e strted. Eh pointed expression for e represents stte of the deterministi utomton ssoited with e; sine we oviously hve only finite numer of possile lellings, the numer of sttes of the utomton is finite. Pointed regulr expressions llow the diret onstrution of the DFA [9] ssoited with regulr expression, in wy tht is simple, intuitive, nd effiient (the tsk is trditionlly onsidered s very involved in the literture: see e.g [1], pg.71). In the imposing iliogrphy on regulr expressions - s fr s we ould disover - the only uthor mentioning notion lose to ours is Wtson [10, 11]. However, he only dels with single points, while the most interesting properties of pre derive y their impliit dditive nture (suh s the possiility to ompute the move opertion y single pss on the mrked expression: see definition 21). [Copyright notie will pper here one preprint option is removed.] 1 The rule of the gme ws to void overkilling, i.e. not mke it more omplex thn deserved. 2 This is lso the reson why, t the end, we only hve finite numer of derivtives. short desription of pper 1 2019/3/18

2. Regulr expressions DEFINITION 1. A regulr expression over the lphet Σ is n expression e generted y the following grmmr: with Σ E ::= ɛ E + E EE E DEFINITION 2. The lnguge L(e) ssoited with the regulr expression e is defined y the following rules: L( ) = L(ɛ) = {ɛ} L() = {} L(e 1 + e 2) = L(e 1) L(e 2) L(e 1e 2) = L(e 1) L(e 2) L(e ) = L(e) where ɛ is the empty string, L 1 L 2 = { l 1l 2 l 1 L 1, l 2 L 2} is the ontention of L 1 nd L 2 nd L is the so lled Kleene s losure of L: L = i=0 Li, with L 0 = ɛ nd L i+1 = L L i. DEFINITION 3 (nullle). A regulr expression e is sid to e nullle if ɛ L(e). The ft of eing nullle is deidle; it is esy to prove tht the hrteristi funtion ν(e) n e omputed y the following rules: ν( ) = flse ν(ɛ) = true ν() = flse ν(e 1 + e 2) = ν(e 1) ν(e 2) ν(e 1e 2) = ν(e 1) ν(e 2) ν(e ) = true DEFINITION 4. A deterministi finite utomton (DFA) is quintuple (Q, Σ, q 0, t, F ) where Q is finite set of sttes; Σ is the input lphet; q 0 Q is the initil stte; t : Q Σ Q is the stte trnsition funtion; F Q is the set of finl sttes. The trnsition funtion t is extended to strings in the following wy: DEFINITION 5. Given funtion t : Q Σ Q, the funtion t : Q Σ Q is defined s follows: { t t(q, ɛ) = q (q, w) = t(q, w ) = t (t(q, ), w ) DEFINITION 6. Let A = (Q, Σ, q 0, t, F ) e DFA; the lnguge reognized A is defined s follows: L(A) = {w t (q 0, w) F } 3. Pointed regulr expressions DEFINITION 7. 1. A pointed item over the lphet Σ is n expression e generted y following grmmr: E ::= ɛ E + E EE E with Σ; 2. A pointed regulr expression (pre) is pir e, where is oolen nd e is pointed item. The term is used to point to position inside the regulr expression, preeding the given ourrene of. In pointed regulr expression, the oolen must e intuitively understood s the possiility to hve triling point t the end of the expression. DEFINITION 8. The rrier e of n item e is the regulr expression otined from e y removing ll the points. Similrly, the rrier of pointed regulr expression is the rrier of its item. In the sequel, we shll often use the sme nottion for funtions defined over items or pres, leving to the reder the simple dismigution tsk. Moreover, we use the nottion ɛ(), where is oolen, with the following mening: DEFINITION 9. ɛ(true) = {ɛ} ɛ(flse) = 1. The lnguge L p(e) ssoited with the item e is defined y the following rules: L p( ) = L p(ɛ) = L p() = L p( ) = {} L p(e 1 + e 2) = L p(e 1) L p(e 2) L p(e 1e 2) = L p(e 1) L( e 2 ) L p(e 2) L p(e ) = L p(e) L( e ) 2. For pointed regulr expression e, we define EXAMPLE 10. Indeed, L p( e, ) = L p(e) ɛ() L p(( + ) ) = L(( + ) ) L p(( + ) ) = = L p( + ) L( + ) = (L p() L p( )) L(( + ) ) = {} L(( + ) ) = L(( + ) ) Let us inidentlly oserve tht, s shown y the previous exmple, pointed regulr expressions n provide more ompt syntx for denoting lnguges thn trditionl regulr expressions. This my hve importnt pplitions to the investigtion of the desriptionl omplexity (suintness) of regulr lnguges (see e.g. [12, 13, 14]). EXAMPLE 11. If e ontins no point (i.e. e = e ) then L p(e) = LEMMA 12. If e is pointed item then ɛ L p(e). Hene, ɛ L p( e, ) if nd only if = true. Proof. A trivil struturl indution on e. 3.1 Brodsting points Intuitively, regulr expression e must e understood s pointed expression with single point in front of it. Sine however we only llow points over symols, we must rodst this initil point inside the expression, tht essentilly orresponds to the ɛ-losure opertion on utomt. We use the nottion ( ) to denote suh n opertion. The rodsting opertor is lso required to lift the item onstrutors (hoie, ontention nd Kleene s str) from items to pres: for exmple, to ontente pre e 1, true with nother pre e 2, 2, we must first rodst the triling point of the first expression inside e 2 nd then pre-pend e 1; similrly for the str opertion. We ould define first the rodsting funtion ( ) nd then the lifted onstrutors; however, oth the definition nd the theory of the rodsting funtion re simplified y mking it o-reursive with the lifted onstrutors. short desription of pper 2 2019/3/18

DEFINITION 13. 1. The funtion ( ) from pointed item to pres is defined s follows: ( ) =, flse (ɛ) = ɛ, true () =, flse ( ) =, flse (e 1 + e 2) = (e 1) (e 2) (e 1e 2) = (e 1) e 2, flse (e ) = e, true where (e) = e, 2. The lifted onstrutors re defined s follows e 1, 1 e 2, 2 = e 1 + e 2, 1 2 e 1e 2, 2 when 1 = flse e 1, 1 e 2, 2 = e 1e 2, 2 2 when 1 = true nd (e 2) = e 2, e, flse when = flse e, = e, true when = true nd (e ) = e, The pprent omplexity of the previous definition should not hide the extreme simpliity of the rodsting opertion: on sum we proeed in prllel; on ontention e 1e 2, we first work on e 1 nd in se we reh its end we pursue rodsting inside e 2; in se of e we rodst the point inside e relling tht we shll eventully hve triling point. EXAMPLE 14. Suppose to rodst point inside ( + ɛ)( + ) We strt working in prllel on the first ourrene of (where the point stops), nd on ɛ tht gets trversed. We hve hene rehed the end of +ɛ nd we must pursue rodsting inside ( +). Agin, we work in prllel on the two dditive suterms nd ; the first point is llowed to oth enter the str, nd to trverse it, stopping in front of ; the seond point just stops in front of. No point rehed tht end of + hene no further propgtion is possile. In onlusion: (( + ɛ)( + )) = ( + ɛ)(( ) + ) DEFINITION 15. The rodsting funtion is extended to pres in the ovious wy: ( e, ) = e, where (e) = e, As we shll prove in Corollry 18, rodsting n initil point my reh the end of n expression e if nd only if e is nullle. The following theorem hrterizes the rodsting funtion nd lso shows tht the semntis of the lifted onstrutors on pres is oherent with the orresponding onstrutors on items. THEOREM 16. 1. L p( e) = L p(e) L( e ). 2. L p(e 1 e 2) = L p(e 1) L p(e 2) 3. L p(e 1 e 2) = L p(e 1) L( e 2 ) L p(e 2) 4. L p(e ) = L p(e) L( e ) We do first the proof of 2., followed y the simultneous proof of 1. nd 3., nd we onlude with the proof of 4. 2 Proof.[of 2.] We need to prove L p(e 1 e 2) = L p(e 1) L p(e 2). L p( e 1, 1 e 2, 2 ) = = L p( e 1 + e 2, 1 2 ) = L p(e 1 + e 2) ɛ( 1) ɛ( 2) = L p(e 1) ɛ( 1) L p(e 2) ɛ( 2) = L p(e 1) L p(e 2) Proof.[of 1. nd 3.] We prove 1. (L p( e) = L p(e) L( e )) y indution on the struture of e, ssuming tht 3. holds on terms struturlly smller thn e. L p( ( )) = L p(, flse ) = = L p( ) L( ). L p( (ɛ)) = L p( ɛ, true ) = {ɛ} = L p(ɛ) L p( ɛ ). L p( ()) = L p(, flse ) = {} = L p() L( ). L p( ( )) = L p(, flse ) = {} = L p( ) L( ). Let e = e 1 + e 2. By indution hypothesis we know tht Thus, y 2., we hve L p( (e i)) = L p(e i) L( e i ) L p( (e 1 + e 2)) = = L p( (e 1) (e 2)) = L p( (e 1)) L p( (e 2)) = L p(e 1) L( e 1 ) L p(e 2) L( e 2 ) = L p(e 1 + e 2) L( e 1 + e 2 ) Let e = e 1e 2. By indution hypothesis we know tht L p( (e i)) = L p(e i) L( e i ) Thus, y 3. over the struturlly smller terms e 1 nd e 2 L p( (e 1e 2)) = = L p( (e 1) e 2, flse ) = L p( (e 1)) L( e 2 ) L p(e 2) = (L p(e 1) L( e 1 )) L( e 2 ) L p(e 2) = L p(e 1) L( e 2 ) L( e 1 ) L( e 2 ) L p(e 2) = L p(e 1e 2) L( e 1e 2 ) Let e = e 1. By indution hypothesis we know tht L p( (e 1)) = L p(e 1) ɛ( 1) = L p(e 1) L( e 1 ) nd in prtiulr, sine y Lemm 12 ɛ L p(e 1), Then, L p(e 1) = L p(e 1) (L( e 1 ) \ ɛ( 1)) L p( (e 1)) = = L p( e 1, true ) = L p(e 1 ) ɛ = L p(e 1)L( e 1 ) ɛ = (L p(e 1) (L( e 1 ) \ ɛ( 1)))L( e 1 ) ɛ = L p(e 1)L( e 1 ) (L( e 1 ) \ ɛ( 1))L( e 1 ) ɛ = L p(e 1)L( e 1 ) L( e 1 ) = L p(e 1) L( e 1 ) Hving proved 1. for e ssuming tht 3. holds on terms struturlly smller thn e, we now ssume tht 1. holds for e 1 nd e 2 in order to prove 3.: L p(e 1 e 2) = L p(e 1) L( e 2 ) L p(e 2) We distinguish the two ses of the definition of : short desription of pper 3 2019/3/18

L p( e 1, flse e 2, 2 ) = = L p( e 1e 2, 2 ) = L p(e 1e 2) ɛ( 2) = L p(e 1) L( e 2 ) L p(e 2) ɛ( 2) = L p(e 1) L( e 2 ) L p(e 2) L p( e 1, true e 2, 2 ) = = L p( e 1e 2, 2 2 ) = L p(e 1e 2 ) ɛ( 2) ɛ( 2 ) = L p(e 1) L( e 2 ) L p(e 2 ) ɛ( 2) ɛ( 2 ) = L p(e 1) L( e 2 ) L p(e 2) L( e 2 ) ɛ( 2) = (L p(e 1) ɛ(true)) L( e 2 ) L p(e 2) ɛ( 2) = L p(e 1) L( e 2 ) L p(e 2) Proof.[of 4.] We need to prove L p(e ) = L p(e) L( e ). We distinguish the two ses of the definition of : L p( e, flse ) = = L p( e, flse ) = L p(e ) = L p(e ) L( e ) = (L p(e ) ɛ(flse)) L( e ) = L p(e) L( e ) L p( e, true ) = = L p( e, true ) ɛ = L p(e ) ɛ = L p(e ) L( e ) ɛ = (L p(e ) L( e )) L( e ) ɛ = L p(e ) L( e ) L( e ) L( e ) ɛ = L p(e ) L( e ) L( e ) = (L p(e ) ɛ(true)) L( e ) = L p(e) L( e ) COROLLARY 17. For ny regulr expression e, L(e) = L p( e). Another importnt orollry is tht n initil point rehes the end of (pointed) expression e if nd only if e is le to generte the empty string. COROLLARY 18. e = e, true if nd only if ɛ L( e ). Proof. By theorem 16 we know tht L p( e) = L p(e) L( e ). So, if ɛ L p( e), sine y Lemm 12 ɛ L p(e), it must e ɛ L( e ). Conversely, if ɛ L( e ) then ɛ L p( e); if e = e,, this is possile only provided = true. To onlude this setion, let us prove the idempotene of the ( ) funtion (it will only e used in Setion 5, nd n e skipped t first reding). To this im we need tehnil lemm whose strightforwrd proof y se nlysis is omitted. LEMMA 19. 1. (e 1 e 2) = (e 1) (e 2) 2. (e 1 e 2) = (e 1) e 2 THEOREM 20. ( (e)) = (e) Proof. The proof is y indution on e. ( ( )) = (, flse ) =, flse flse = ( ) ( (ɛ)) = ( ɛ, true ) = ɛ, true true = (ɛ) ( ()) = (, flse ) =, flse flse = () ( ( )) = (, flse ) =, flse flse = ( ) If e is e 1 + e 2 then ( (e 1 + e 2)) = ( (e 1) (e 2)) = ( (e 1)) ( (e 2)) = = (e 1) (e 2) = (e 1 + e 2) If e is e 1e 2 then ( (e 1e 2)) = ( (e 1) e 2, flse ) ( (e 1)) e 2, flse = = (e 1) e 2, flse = (e 1e 2) If e is e 1, let (e 1) = e, nd let (e ) = e,. By indution hypothesis, e, = (e 1) = ( (e 1)) = ( e, ) = e, nd thus e = e. Finlly ( (e 1)) = ( e, true ) = e, true = e, true = = (e 1) 3.2 The move opertion We now define the move opertion, tht orresponds to the dvnement of the stte in response to the proessing of n input hrter. The intuition is ler: we hve to look t points inside e preeding the given hrter, let the point trverse the hrter, nd rodst it. All other points must e removed. DEFINITION 21. 1. The funtion move(e, ) tking in input pointed item e, hrter Σ nd giving k pointer regulr expression is defined s follow, y indution on the struture of e: move(, ) =, flse move(ɛ, ) = ɛ, flse move(, ) =, flse move(, ) =, true move(, ) =, flse if move(e 1 + e 2, ) = move(e 1, ) move(e 2, ) move(e 1e 2, ) = move(e 1, ) move(e 2, ) move(e, ) = move(e, ) 2. The move funtion is extended to pres y just ignoring the triling point: move( e,, ) = move(e, ) EXAMPLE 22. Let us onsider the pre ( + ɛ)(( ) + ) nd the two moves w.r.t. the hrters nd. For, we hve two possile positions (ll other points gets ersed); the innermost point stops in front of the finl, the other one rodst inside ( + ), so move(( +ɛ)(( ) + ), ) = (+ɛ)(( ) + ), flse For, we hve two positions too. The innermost point still stops in front of the finl, while the other point rehes the end of nd must go k through : move(( +ɛ)(( ) + ), ) = (+ɛ)(( ) +), flse THEOREM 23. For ny pointed regulr expression e nd string w, w L p(move(e, )) w L p(e) Proof. The proof is y indution on the struture of e. if e is tomi, nd e is not pointed symol, then oth L p(move(e, )) nd L p(e) re empty, nd hene oth sides re flse for ny w; if e = then L p(move(, )) = L p(, true ) = {ɛ} nd L p( ) = {}; if e = with then L p(move(, )) = L p(, flse ) = nd L p( ) = {}; hene for ny string w, oth sides re flse; if e = e 1+e 2 y indution hypothesis w L p(move(e i, )) w L p(e i), hene, short desription of pper 4 2019/3/18

w L p(move(e 1 + e 2, )) w L p(move(e 1, ) move(e 2, )) w L p(move(e 1, )) L p(move(e 2, )) (w L p(move(e 1, ))) (w L p(move(e 2, ))) (w L p(e 1)) (w L p(e 2)) w L p(e 1) L p(e 2) w L p(e 1 + e 2) suppose e = e 1e 2, y indution hypothesis w L p(move(e i, )) w L p(e i), hene, w L p(move(e 1e 2, )) w L p(move(e 1, ) move(e 2, )) w L p(move(e 1, )) L e 2 L p(move(e 2, )) w L p(move(e 1, )) L e 2 w L p(move(e 2, )) ( w 1, w 2, w = w 1w 2 w 1 L p(move(e 1, )) w 2 L( e 2 )) w L p(move(e 2, )) ( w 1, w 2, w = w 1w 2 w 1 L p(e) w 2 L( e 2 )) w L p(e 2) (w L p(e 1) L e 2 ) (w L p(e 2)) w L p(e 1) L e 2 L p(e 2) w L p(e 1e 2) suppose e = e 1, y indution hypothesis w L p(move(e 1, )) w L p(e 1), hene, w L p(move(e 1, )) w L p(move(e 1, )) w L p(move(e 1, )) L( move(e 1, ) ) w 1, w 2, w = w 1w 2 w 1 L p(move(e 1, )) w 2 L( e 1 ) w 1, w 2, w = w 1w 2 w 1 L p(e 1) w 2 L( e 1 ) w L p(e 1) L( e 1 ) w L p(e 1) We extend the move opertions to strings s usul. DEFINITION 24. move (e, ɛ) = e move (e, w) = move (move(e, ), w) THEOREM 25. For ny pointed regulr expression e nd ll strings α, β, β L p(move (e, α)) αβ L p(e) Proof. A trivil indution on the length of α, using theorem 23. COROLLARY 26. For ny pointed regulr expression e nd ny string α, α L p(e) e, L p(move (e, α)) = e, true Proof. By Theorems 25 nd Lemm 12. 3.3 From regulr expressions to DFAs DEFINITION 27. To ny regulr expression e we my ssoite DFA D e = (Q, Σ, q 0, t, F ) defined in the following wy: Q is the set of ll possile pointed expressions hving e s rrier; Σ is the lphet of the regulr expression q 0 is e; t is the move opertion of definition 21; F is the suset of pointed expressions e, with = true. THEOREM 28. Proof. By definition, L(D e) = L(e) w L(D e) move ( (e), w) = e, true for some e. By the previous theorem, this is possile if n only if w L p( (e)), nd y orollry 17, L p( (e)) = L(e). REMARK 29. The ft tht the set Q of sttes of D e is finite is ovious: its rdinlity is t most 2 n+1 where n is the numer of symols in e. This is one of the dvntges of pointed regulr expressions w.r.t. derivtives, whose finite nture only holds fter suitle quotient, nd is reltively omplex property to prove (see [3]). The utomton D e just defined my hve mny inessile sttes. We n provide nother lgorithmi nd diret onstrution tht yields the sme utomton restrited to the essile sttes only. DEFINITION 30. Let e e regulr expression nd let q 0 e e. Let lso Q 0 := {q 0} Q n+1 := Q n {e e Q n. e Q n.move(e, ) = e } Sine every Q n is suset of the finite set of pointed regulr expressions, there is n m suh tht Q m+1 = Q m. We ssoite to e the DFA D e = (Q m, Σ, q 0, F, t) where F nd t re defined s for the previous onstrution. 4 ( + ε)( * + ) 7 2 ( + ε)( * + ) 1 ( + ε)( * + ) ( ε * + )( + ) ( + ε)( * + ) ( + ε)( * + ) 9 ( + ε)( * + ) Figure 1. DFA for ( + ɛ)( + ) 3 5 6 ( + ε)( * + ) 8 ( + ε)( * + ) In Figure 1 we desrie the DFA ssoited with the regulr expression ( + ɛ)( + ). The grphil desription of the utomton is the trditionl one, with nodes for sttes nd lelled rs for trnsitions. Unrehle sttes re not shown. Finl sttes re emphsized y doule irle: sine stte e, is finl if nd only if is true, we my just lel nodes with the item (for instne, the pir of sttes 6 8 nd 7 9 only differ for the ft tht 6 nd 7 re finl, while 8 nd 9 re not). 3.4 Admissile reltions nd minimiztion The utomton in Figure 1 is miniml. This is not lwys the se. For instne, for the expression (+) we otin the utomton of Figure 2, nd it is esy to see tht the two sttes orresponding to the pres ( + ) nd ( + ) re equivlent ( wy to prove it is to oserve tht they define the sme lnguge). The ltter remrk, motivtes the following definition. short desription of pper 5 2019/3/18

00 00 11 01 00 0 00 11 ( + 1 00 11 ) * 01 00 ( 11 + ) 00 ( + 01 00 11) * * Figure 2. DFA for ( + ) ( + ) * The set of dmissile equivlene reltions over e is ounded lttie, ordered y refinement, whose ottom element is syntti identity nd whose top element is e 1 e 2 iff L(e 1) = L(e 2). Moreover, if 1< 2 (the first reltion is strit refinement of the seond one), the numer of sttes of D e/ 1 is stritly lrger thn the numer of sttes of D e/ 2. THEOREM 36. If is the top element of the lttie, thn D e/ is the miniml utomton tht reognizes L(e). DEFINITION 31. An equivlene reltion over pres hving the sme rrier is dmissile when for ll e 1 nd e 2 if e 1 e 2 then L p(e 1) = L p(e 2) if e 1 e 2 then for ll move(e 1, ) move(e 2, ) DEFINITION 32. To ny regulr expression e nd dmissile equivlene reltion over pres over e, we n diretly ssoite the DFA D e/ = (Q/, Σ, [q 0], move /, F/ ) where move / is the move opertion lifted to equivlene lsses thnks to the seond dmissiility ondition. In ple of working with equivlene lsses, for formliztion nd implementtion purposes it is simpler to work on representtive of equivlene lsses. Insted of hoosing priori representtive of eh equivlene lss, we n slightly modify the lgorithmi onstrution of definition 30 so tht it dynmilly identifies the representtive of the equivlene lsses. It is suffiient to red eh element of Q n s representtive of its equivlene lss nd to hnge the test e Q n so tht the new stte e is ompred to the representtives in Q n up to : DEFINITION 33. In definition 30 hnge the definition of Q n+1 s follows: Q n+1 := Q n {e. e Q n.move(e, ) = e e Q n.e e } The trnsition funtion t is defined s t(e, ) = e where move(e, ) = e nd e is the unique stte of Q m suh tht e e. In n tul implementtion, the trnsition funtion t is omputed together with the sets Q n t no dditionl ost. THEOREM 34. Repling eh stte e of the utomton of definition 33 with [e]/, we otin the restrition of the utomton of definition 32 to the essile sttes. We still need to prove tht quotienting over does not hnge the lnguge reognized y the utomton. THEOREM 35. L(D e/ ) = L(e) Proof. By theorem 28, it is suffiient to prove L(D e) = L(D e/ ) or, equivlently, tht for ll w, move / ([q 0]/, w) F/ move (q 0, w) F. We show this to hold y proving y indution over w tht for ll q [move (q, w)]/ = move / ([q]/, w) Bse se: move / ([q]/, ɛ) = [q]/ = [move (q, ɛ)]/ Indutive step: y ondition (2) of dmissiility, for ll q 1 [q 0]/, we hve move(q 1, ) move(q 0, ) nd thus move/ ([q 0]/, ) = [move(q 0, )]/ Hene move / ([q 0]/, w) = = move / (move/ ([q 0]/, ), w) = move / ([move(q 0, )]/, w) = [move (move(q 0, ), w)]/ = [move (q 0, w)]/ Proof. By the previous theorem, D e/ reognizes L(e) nd hs no unrehle sttes. By surd, let D = (Q, Σ, q 0, t, F ) e nother smller utomton tht reognizes L(e). Sine the two utomt re different, reognize the sme lnguges nd hve no unrehle sttes, there exists two words w 1, w 2 suh t (q 0, w 1) = t (q 0, w 2) ut [e 1]/ = move / ([q 0]/, w 1) move / ([q 0]/, w 2) = [e 2]/ where e 1 nd e 2 re ny two representtives of their equivlene lsses nd thus e 1 e 2. By definition of, L p(e 1) L p(e 2). Without loss of generlity, let w 3 L p(e 1) \ L p(e 2). We hve w 1w 3 L(e) nd w 2w 3 L(e) euse D e/ reognizes L(e), whih is surd sine t (q 0, w 1w 3) = t (q 0, w 2w 3) nd D lso reognizes L(e). The previous theorem tells us tht it is possile to ssoite to eh stte of n utomton for e (nd in prtiulr to the miniml utomton) pre e over e so tht the lnguge reognized y the utomton in the stte e is L p(e ), tht provides very suggestive lelling of sttes. The hrteriztion of the miniml utomton we just gve does not seem to entil n originl lgorithmi onstrution, sine does not suggest ny new effetive wy for omputing. However, similrly to wht hs een done for derivtives (where we hve similr prolems), it is interesting to investigte dmissile reltions tht re esier to ompute nd tend to produe smll utomt in most prtil ses. In prtiulr, in the next setion, we shll investigte one importnt reltion providing ommon quotient etween the utomt uilt with pres nd with Brzozowski s derivtives. 4. Red k Intuitively, pointed regulr expression orresponds to set of regulr expressions. In this setion we shll formlly investigte this red k funtion; this will llow us to estlish more syntti reltion etween trditionl regulr expressions nd their pointed version, nd to ompre our tehnique for uilding DFA with tht sed on derivtives. In the following setions we shll frequently del with sets of regulr expressions (to e understood dditively), tht we prefer to the tretment of regulr expressions up to ssoitivity, ommuttivity nd idempotene of the sum (ACI) tht is for instne typil of the trditionl theory of derivtives (this lso lrifies tht ACIrewriting is only used t the top level). It is hene useful to extend some syntti opertions, nd espeilly ontention, to sets of regulr expressions, with the usul distriutive mening: if e is regulr expression nd S is set of regulr expressions, then Se = {e e e S} We define es nd S 1S 2 in similr wy. Moreover, every funtion on regulr expressions is impliitly lifted to sets of regulr expressions y tking its imge. For exmple, L(S) = e S L(e) short desription of pper 6 2019/3/18

DEFINITION 37. We ssoite to eh item e set of regulr expressions R(e) defined y the following rules: R( ) = R(ɛ) = R() = R( ) = {} R(e 1 + e 2) = R(e 1) R(e 2) R(e 1e 2) = R(e 1) e 2 R(e 2) R(e ) = R(e) e R is extended to pointed regulr expression e, s follows R( e, ) = R(e) ɛ() Note tht, for ny item e, no regulr expression in R(e) is nullle. EXAMPLE 38. Sine (( + ɛ) ) = ( + ɛ)( ), true we hve R( (( + ɛ) )) = {,, ɛ} The prllel etween the syntti red-k funtion R nd the semntis L p of definition 9 is ler y inspetion of the rules. Hene the following lemm n e proved y trivil indution over e. LEMMA 39. L(R(e)) = L p(e) COROLLARY 40. For ny regulr expression e, L(R( (e))) = L(e) The previous orollry sttes tht R nd ( ) re semntilly inverse funtions. Synttilly, they ssoite to eh expression e n interesting look-hed norml form, onstituted (up to ssoitivity of ontention) y set of expressions of the kind e (plus ɛ if e is nullle), where e is derivtive of e w.r.t. (lthough synttilly different from Brzozowski s derivtives, defined in the next setion). This look-hed norml form (nf ) hs n interest in its own, nd n e simply defined y struturl indution over e. DEFINITION 41. nf ( ) = nf (ɛ) = nf () = {} nf (e 1 + e 2) = nf (e 1) nf (e 2) nf (e 1e 2) = nf (e 1)e 2 if ν(e 1) = flse nf (e 1e 2) = nf (e 1)e 2 nf (e 2) if ν(e 1) = true nf (e ) = nf (e)e REMARK 42. It is esy to prove tht, for eh e, the set nf (e) is mde, up to ssoitivity of ontention, only of expressions of the form or e. In prtiulr no expression in nf (e) is nullle! The previous remrk motivtes the following definition. DEFINITION 43. nf ɛ (e) = nf (e) ɛ(ν( e )) The min properties of nf ɛ re expressed y the following two lemms, whose simple proof is left to the reder. LEMMA 44. nf ɛ ( ) = nf ɛ (ɛ) = {ɛ} nf ɛ () = {} nf ɛ (e 1 + e 2) = nf ɛ (e 1) nf ɛ (e 2) nf ɛ (e 1e 2) = nf ɛ (e 1)e 2 if ν(e 1) = flse nf ɛ (e 1e 2) = nf (e 1)e 2 nf ɛ (e 2) if ν(e 1) = true nf ɛ (e ) = nf (e)e ɛ(ν(e)) THEOREM 45. L(e) = L(nf ɛ (e)) THEOREM 46. For ny pointed regulr expression e, R( (e)) = nf ɛ ( e ) R(e) Proof. Let (e) = e, ; then ɛ R( (e)) iff = true, iff ν( e ) = true. Hene the gol redues to prove tht R(e ) = nf e R(e). We proeed y indution on the struture of e. e =, ( ) =, flse nd R( ) = = nf ( ) e = ɛ, (ɛ) = ɛ, true nd R(ɛ) = = nf (ɛ) e = : ( ()) =, flse nd R( ) = {} = nf () e = : ( ( )) =, flse nd R( ) = {} = nf () = nf ( ) = nf ( ) R( ) e = e 1 + e 2: let (e 1 + e 2) = e 1 + e 2, ; then R(e 1 + e 2) = = R(e 1) R(e 2) = nf ( e 1 ) R(e 1) nf ( e 2 ) R(e 2) = nf e 1 + e 2 R(e 1 + e 2) e = e 1e 2. Let (e i) = e i, i. If 1 = flse then (e 1e 2) = e ie 2, flse ; moreover we know tht e 1 is not nullle. We hve then: R(e 1e 2) = = R(e 1) e 2 R(e 2) = (nf ( e 1 ) R(e 1)) e 2 R(e 2) = (nf ( e 1 ) e 2 R(e 1) e 2 R(e 2) = nf ( e 1e 2 ) R(e 1e 2) If 1 = true then (e 1e 2) = e ie 2, 2 ; moreover we know tht e 1 is nullle. R(e 1e 2) = = R(e 1) e 2 R(e 2) = (nf ( e 1 R(e 1)) e 2 nf ( e 2 )) R(e 2) = nf ( e 1 ) e 2 nf (e 2) R(e 1) e 2 R(e 2) = (nf ( e 1e 2 )) R(e 1e 2) e = e 1. Let (e 1) = e i, i ; then (e 1) = e i, true ; R(e 1 ) = = R(e 1) e 1 = (nf (e 1) R(e 1)) e 1 = nf (e 1) e 1 R(e 1)) e 1 = nf (e 1) R(e 1) COROLLARY 47. For ll regulr expression e, R( (e)) = nf ɛ (e) To onlude this setion, in nlogy with wht we did for the semnti funtion in Theorem 16, we express the ehviour of R in terms of the lifted lgeri onstrutors. This will e useful in Theorem 51. LEMMA 48. 1. R(e 1 e 2) = R(e 1) R(e 2) 2. R( e 1, flse e 2) = R(e 1) e 2 R(e 2) 3. R( e 1, true e 2) = R(e 1) e 2 nf ɛ ( e 2 ) R(e 2) 4. R( e 1, flse ) = R(e 1) e 1 5. R( e 1, true ) = R(e 1) e 1 nf ɛ ( e 1 ) Proof. Let e i = e i, i : 1. R(e 1 e 2) = = R( e 1, 1 lnglee 2, 2 ) = = R( e 1 + e 2, 1 2 ) = R(e 1 + e 2) ɛ( 1 2) = R(e 1) R(e 2) ɛ( 1) ɛ( 2) = R(e 1) ɛ( 1) R(e 2) ɛ( 2) = R(e 1) R(e 2) short desription of pper 7 2019/3/18

2. R( e 1, flse e 2, 2 ) = = R( e 1e 2, 2 ) = R(e 1) e 2 R(e 2) ɛ( 2) = R(e 1) e 2 R(e 2) 3. let (e 2) = e 2, 2 R( e 1, true e 2, 2 ) = = R( e 1e 2, 2 2 ) = R(e 1) e 2 R(e 2 ) ɛ( 2 ) ɛ( 2) = R(e 1) e 2 R( (e 2)) ɛ( 2) = (R(e 1) e 2 nf( e 2 ) R(e 2) ɛ( 2) = R(e 1) e 2 nf( e 2 ) R(e 2) 4. R( e 1, flse ) = R( e 1, flse ) = R(e 1 ) = R(e 1) e 1 5. let (e 1) = e 1, 1 ; then R( (e 1)) = R(e 1 ) ɛ( 1 ) = nf ɛ ( e 1 ) R(e 1), nd R(e 1 ) = nf ( e 1 ) R(e 1). R( e 1, true ) = = R( e 1, true ) = R(e 1 ) e 1 ɛ(true) = (R(e 1) dnf( e 1 )) e 1 ɛ(true) = R(e 1) e 1 dnf( e 1 ) e 1 ɛ(true) = R(e 1) e 1 nf( e 1 ) 4.1 Reltion with Brzozowski s Derivtives We re now redy to formlly investigte the reltion etween pointed expressions nd Brzozowski s derivtives. As we shll see, they give rise to quite different onstrutions nd the reltion is less ovious thn expeted. Let s strt with relling the forml definition. DEFINITION 49. DEFINITION 50. ( ) = (ɛ) = () = ɛ () = (e 1 + e 2) = (e 1) + (e 2) (e 1e 2) = (e 1)e 2 if not ν(e 1) (e 1e 2) = (e 1)e 2 + (e 2) if ν(e 1) (e ) = (e)e ɛ(e) = e w(e) = w( (e)) In generl, given regulr expression e over the lphet Σ, the set { w(e) w Σ } of ll its derivtives is not finite. In order to get finite set we must suitly quotient derivtives ording to lgeri equlities etween regulr expressions. The hoie of different set of equtions gives rise to different quotients, nd hene to different utomt. Sine for finiteness it is enough to onsider ssoitivity, ommuttivity nd idempotene of the sum (ACI), the trditionl theory of Brzozowski s derivtives is defined ording to these lws (lthough this is proly not the est hoie from prtil point of view). As prtil exmple, in Figure 3 we desrie the utomt otined using derivtives reltive to the expression ( + ) (ompre it with the utomt of Figure 2). Also, note tht the vertilly ligned sttes re equivlent. Let us remrk, first of ll, the hevy use of ACI. For instne while (( + ) ) = (ɛ + )( + ) (( + ) ) = ( + ɛ)( + ) ( + )* ( + ε )( + )* ( + )( + )* + ( + ε )( + )* ( ε + )( + )* ( + )* ( + )( + )* + ( + )( + )* + ( ε + )( + )* ( + )* Figure 3. Automton with Brzozowski s derivtives nd they n e ssimilted only up to ommuttivity of the sum. As nother exmple, (( + )( + ) + ( + ɛ)( + ) ) = = ( + )( + ) + (( + )( + ) + (ɛ + )( + ) ) nd the ltter expression n e redue to ( + )( + ) + (ɛ + )( + ) ) only using ssoitivity nd idempotene of the sum. The seond importnt remrk is tht, in generl, it is not true tht we my otin the pre-utomt y quotienting the derivtive one (nor the other wy round). For instne, from the initil stte, the two rs lelled nd led to single stte in the utomt of Figure 3, ut in different sttes in the utomt of Figure 2. A nturl question is hene to understnd if there exists ommon lgeri quotient etween the two onstrutions (not exploiting minimiztion). As we shll see, this n e hieved y identifying sttes with sme redk in the se of pres, nd sttes with similr lookhed norml form in the se of derivtives. For instne, in the se of the two utomt of Figures 2 nd 3, we would otin the ommon quotient of Figure 4. { ( + )*, ( + )*, ε} { ( + )*} { } Figure 4. A quotient of the two utomtons The generl piture is desried y the ommuting digrm of Figure 5, whose proof will e the ojet of the next setion (in Figure 5, w oviously stnds for the string 1... n). 4.2 Forml proof of the ommuting digrm in Figure 5 Prt of the digrm hs een lredy proved: the leftmost tringle, used to relte the initil stte of the two utomt, is Corollry 47; the two tringles t the right, used to relte the finl sttes, just sttes the trivil properties tht ɛ R( e, ) iff nd only if = true (sine no expression in R(e) is nullle), nd ɛ nf ɛ (e) if nd only if e is nullle (see Remrk 42). We strt proving the upper prt. We prove it for pointed item e nd leve the ovious generliztion to pointed expression to the reder (the move opertion does not depend from the presene of triling point, nd similrly the derivtive of ɛ is empty). THEOREM 51. For ny pointed item e, R(move(e, )) = nf ɛ ( (R(e))) Proof. By indution on the struture of e: short desription of pper 8 2019/3/18

00 00 11 01 11 (_) move(_, 1) δ 1 nf ε 000 111 000 111 nf ε nf ε nf ε nf ε δ 1 * move (_,w) 000 111 δw move(_, n) R R R R snd δ δ nf ε Figure 5. Pointed regulr expressions nd Brzozowski s derivtives the ses, ɛ, nd re trivil if e = then move(, ) =, true nd R, true = {ɛ}. On the other side, nf ɛ ( (R( )) = nf ɛ ( ({})) = nf ɛ ({ɛ}) = ɛ. if e = e 1 + e 2, then R(move(e 1 + e 2, )) = = R(move(e 1, ) move(e 2, )) = R(move(e 1, )) R(move(e 2, )) = nf ɛ ( (R(e 1))) nf ɛ ( (R(e 2))) = nf ɛ ( (R(e 1 + e 2))) let e = e 1e 2, nd let us suppose tht move(e 1, ) = e 1, flse nd thus R(move(e 1, ) = R(e 1) nd ν( (R(e 1))) = flse. Then R(move(e 1e 2, )) = = R(move(e 1, ) move(e 2, )) = R(move(e 1, )) move(e2, ) R(move(e 2, )) = nf ɛ ( (R(e 1))) e 2 nf ɛ ( (R(e 2))) = nf ɛ ( (R(e 1)) e 2 (R(e 2))) = nf ɛ ( (R(e 1) e 2 ) (R(e 2))) = nf ɛ ( (R(e 1) e 2 R(e 2))) = nf ɛ ( (R(e 1e 2))) If move(e 1, ) = e 1, true then R(move(e 1, )) = R(e 1) ɛ = nf ɛ ( (R(e 1)). In prtiulr R(e 1) = nf ( (R(e 1)) nd ν( (R(e 1))) = true. We hve then: n n ε _ ν(_) R(move(e 1e 2, )) = = R(move(e 1, ) move(e 2, )) = R(e 1) move(e 2, ) nf ɛ ( move(e 2, ) ) R(move(e 2, )) = R(e 1) e 2 nf ɛ ( e 2 ) R(move(e 2, )) = nf ( (R(e 1))) e 2 nf ɛ ( e 2 ) nf ɛ ( (R(e 2))) = nf ɛ ( (R(e 1)) e 2 ) nf ɛ ( (R(e 2))) = nf ɛ ( (R(e 1)) e 2 (R(e 2))) = nf ɛ ( (R(e 1) e 2 ) (R(e 2))) = nf ɛ ( (R(e 1) e 2 R(e 2))) = nf ɛ ( (R(e 1e 2))) let e = e 1, nd let us suppose tht move(e 1, ) = e 1, flse. Thus ɛ nf ɛ ( (R(e 1))). Then R(move(e 1, )) = = R(move(e 1, ) ) = R(e 1) e 1 = nf ɛ ( (R(e 1))) e 1 = nf ɛ ( (R(e 1)) e 1 ) = nf ɛ ( (R(e 1) e 1 ))) = nf ɛ ( (R(e 1))) If move(e 1, ) = e 1, true then R(move(e 1, )) = R(e 1) ɛ = nf ɛ ( (R(e 1)). In prtiulr R(e 1) = nf ( (R(e 1)) nd ν( (()R(e 1))) = true sine ɛ nf ɛ ( (R(e 1)). We hve then: R(move(e 1, )) = = R(move(e 1, ) ) = R(e 1) e 1 nf ɛ ( e 1 ) = nf ( (R(e 1))) e 1 nf ɛ ( e 1 ) = nf ɛ ( (R(e 1)) e 1 ) = nf ɛ ( (R(e 1) e 1 )) = nf ɛ ( (R(e 1))) We pss now to prove the lower prt of the digrm in Figure 5, nmely tht for ny regulr expression e, nf ɛ ( (e)) = nf ɛ ( (nf ɛ (e))) Sine however, nf ɛ ( (nf ɛ (e))) = nf ɛ ( (nf (e))) (the derivtive of ɛ is empty), this is equivlent to prove the following result. THEOREM 52. nf ɛ ( (e)) = nf ɛ ( (nf (e))) Proof. The proof is y indution on e. Any indution hypothesis over regulr expression e 1 n e strengthened to nf ɛ ( (e 1)e 2) = nf ɛ ( (nf (e 1))e 2) for ll e 2 sine nf ɛ ( (e 1)e 2) = nf ɛ ( (e 1))e 2 (nf ɛ (e 2) if ν( (e 1)) = nf ɛ ( (nf (e 1)))e 2 (nf ɛ (e 2) if ν( (nf (e 1))) = nf ɛ ( (nf (e 1))e 2) (oserve tht ν( (e 1)) = ν( (nf (e 1))) sine the lnguges denoted y (e 1) nd (nf (e 1)) re equl). We must onsider the following ses. If e is ɛ, or symol different from then oth sides of the eqution re empty If e is, nf ɛ ( ()) = nf ɛ (ɛ) = {ɛ} = nf ɛ ( ({})) = nf ɛ ( (nf ())) If e is e 1 + e 2, nf ɛ ( (e 1 + e 2)) = = nf ɛ ( (e 1) + (e 2)) = nf ɛ ( (e 1)) nf ɛ ( (e 2)) = nf ɛ ( (nf (e 1))) nf ɛ ( (nf (e 2))) = nf ɛ ( (nf (e 1) nf (e 2))) = nf ɛ ( (nf (e 1 + e 2))) If e is e 1e 2 nd ν(e 1) = flse, nf ɛ ( (e 1e 2)) = nf ɛ ( (e 1)e 2) = nf ɛ ( (nf (e 1))e 2) = = nf ɛ ( (nf (e 1)e 2)) = nf ɛ ( (nf (e 1e 2))) If e is e 1e 2 nd ν(e 1) = true, nf ɛ ( (e 1e 2)) = = nf ɛ ( (e 1)e 2) nf ɛ ( (e 2)) = nf ɛ ( (nf (e 1))e 2) nf ɛ ( (nf (e 2))) = nf ɛ ( (nf (e 1)e 2 nf (e 2))) = nf ɛ ( (nf (e 1e 2))) short desription of pper 9 2019/3/18

If e is e 1, nf ɛ ( (e 1)) = nf ɛ ( (e 1)e 1) = nf ɛ ( (nf (e 1))e 1) = = nf ɛ ( (nf (e 1)e 1)) = nf ɛ ( (nf (e 1))) LEMMA 53. R(e) = nf ɛ (R(e)) Proof. We proeed y indution over e: R( ) = = nf ɛ ( ) = nf ɛ (R( )) R(ɛ) = = nf ɛ ( ) = nf ɛ (R(ɛ)) R() = = nf ɛ ( ) = nf ɛ (R()) R( ) = {} = nf ɛ ({}) = nf ɛ (R()) R(e 1 + e 2) = R(e 1) R(e 2) = nf ɛ (R(e 1)) nf ɛ (R(e 2)) = nf ɛ (R(e 1) R(e 2)) = nf ɛ (R(e 1 + e 2)) R(e 1e 2) = R(e 1) e 2 R(e 2) = nf ɛ (R(e 1)) e 2 nf ɛ (R(e 2)) = nf ɛ (R(e 1) e 2 ) nf ɛ (R(e 2)) = nf ɛ (R(e 1) e 2 R(e 2)) = nf ɛ (R(e 1e 2)) R(e ) = R(e) e = nf ɛ (R(e)) e = nf ɛ (R(e) e ) = nf ɛ (R(e )) We re now redy to prove the ommuttion of the outermost digrm. THEOREM 54. For ny pointed item e, R(move (e, w)) = nf ɛ ( w(r(e))) Proof. The proof is y indution on the struture of w. In the se se, R(move (e, ɛ)) = R(e) = nf ɛ (R(e)) = nf ɛ ( ɛ(r(e))). In the indutive step, y Theorem 52, R(move (e, w)) = = R(move (move(e, ), w) = nf ɛ ( w(r(move(e, ))) = nf ɛ ( w(nf ɛ ( (R(e))))) = nf ɛ ( w( (R(e)))) = nf ɛ ( w(r(e))) COROLLARY 55. For ny regulr expression e, Proof. R(move ( e, w)) = nf ɛ ( w(e)) R(move ( e, w))=nf ɛ ( w(r( e))=nf ɛ ( w(nf ɛ (e))=nf ɛ ( w(e)) Another importnt onsequene of Lemms 51 nd 52 is tht R nd nf ɛ re dmissile reltions (respetively, over pres nd over derivtives). THEOREM 56. kn(r( )) (the kernel of R( )) is n dmissile equivlene reltion over pres. Proof. By Lemm 39 we derive tht for ll pres e 1, e 2, if R(e 1) = R(e 2) then L p(e 1) = L p(e 2). We lso need to prove tht for ll pres e 1, e 2 nd ll symol, if R(e 1) = R(e 2) then R(move(e 1, )) = R(move(e 2, )). By Theorem 51 R(move(e 1, )) = nf ɛ ( (R(e 1)) = nf ɛ ( (R(e 2)) = = R(move(e 2, )) THEOREM 57. kn(nf ɛ (e)) is n dmissile equivlene reltion over regulr expressions Proof. By Lemm 45 we derive tht for ll regulr expressions e 1, e 2, if nf ɛ (e 1) = nf ɛ (e 2) then L(e 1) = L(e 2). We lso need to prove tht for ll regulr expressions e 1, e 2 nd ll symol, if nf ɛ (e 1) = nf ɛ (e 2) then nf ɛ ( (e 1)) = nf ɛ ( (e 2)). By Theorem 52 nf ɛ ( (e 1)) = nf ɛ ( (nf ɛ (e 1)) = nf ɛ ( (nf ɛ (e 2)) = = nf ɛ ( (e 2)) THEOREM 58. For eh regulr expression e, let D e = (Q, Σ, e, t, F ) e the utomton for e uilt ording to Definition 30 nd let De δ = (Q δ, Σ, e, t δ, F δ ) the utomton for e otined with derivtives. Let kn(r) nd kn(nf ɛ ) e the kernels of R nd nf ɛ respetively. Then D e/ kn(r) = De/ δ kn(nfɛ ). Proof. The results holds y ommuttion of Figure 5, tht is grnted y the previous results, in prtiulr y Corollry 55, Theorem 56, Theorem 57, nd the ommuttion of the tringles reltive to the initil nd finl sttes. Theorem 58 reltes our finite utomt with the infinite sttes ones otined vi Brzozowski s derivtives efore quotienting the utomt sttes y mens of ACI to mke them finite. The following esy lemm shows tht kn(nf ɛ ) is n equivlene reltion finer thn ACI nd thus Theorem 58 lso holds for the stndrd finite Brzozowski s utomt sine we n quotient with ACI first. LEMMA 59. Let e 1 nd e 2 e regulr expressions. If e 1 = ACI e 2 then nf ɛ (e 1) = nf ɛ (e 2). 5. Merging By Theorem 16, L p( e) = L p(e) L( e ). A more syntti wy to look t this result is to oserve tht (e) n e otined y merging together the points in e nd ( e ), nd tht the lnguge defined y merging two pointed expressions e 1 nd e 2 is just the union of the two lnguges L p(e 1) nd L p(e 2). The merging opertion, tht we shll denote with, does lso provide the reltion etween deterministi nd nondeterministi utomt where, s in Wtson [10, 11], we my lel sttes with expressions with single point (for lk of spe, we shll not expliitly ddress the ltter issue in this pper, tht is however simple onsequene of Theorem 67). Finlly, the merging opertion will llow us to explin why the tehnique of pointed expressions nnot e (nively) generlized to intersetion nd omplement (see Setion 5.1). DEFINITION 60. Let e 1 nd e 2 e two items on the sme rrier e. The merge of e 1 nd e 2 is defined y the following rules y reursion over the struture of e: = ɛ ɛ = ɛ = = = = (e 1 1 + e 1 2) (e 2 1 + e 2 2) = (e 1 1 e 2 1) + (e 1 2 e 2 2) (e 1 1e 1 2) (e 2 1e 2 2) = e 1 e 2 = (e 1 1 e 2 1)(e 1 2 e 2 2) (e 1 e 2) The definition is extended to pres s follows: e 1, 1 e 2, 2 = e 1 e 2, 1 2 THEOREM 61. is ommuttive, ssoitive nd idempotent Proof. Trivil y indution over the struture of the rrier of the rguments. THEOREM 62. L p(e 1 e 2) = L p(e 1) L p(e 2) Proof. Trivil y indution on the ommon rrier of the items of e 1 nd e 2. All the onstrutions we presented so fr ommute with the merge opertion. Sine merging essentilly orresponds to the suset onstrution over utomt, the following theorems onstitute the proof of orretness of the suset onstrution. short desription of pper 10 2019/3/18

THEOREM 63. (e 1 1 e 2 1) (e 1 2 e 2 2) = (e 1 1 e 1 2) (e 2 1 e 2 2) Proof. Trivil y expnsion of definitions. THEOREM 64. 1. for e 1 nd e 2 items on the sme rrier, (e 1 e 2) = (e 1) e 2, flse 2. for e 1 nd e 2 pres on the sme rrier, (e 1 e 2) = (e 1) e 2 3. (e 1 1 e 2 1) (e 1 2 e 2 2) = (e 1 1 e 1 2) (e 2 1 e 2 2) COROLLARY 65. (e 1 e 2) = e 1 (e 2) = (e 1) (e 2) Proof.[of the orollry] The orollry is simple onsequene of ommuttivity of nd idempotene of ( ): (e 1 e 2) = (e 2 e 1) = (e 2) e 1 = e 1 (e 2) (e 1 e 2) = ( (e 1 e 2)) = ( (e 1) e 2) = (e 1) (e 2) Proof.[of 1.] We first prove (e 1 e 2) = (e 1) e 2, flse y indution over the struture of the ommon rrier of e 1 nd e 2, ssuming tht 3. holds on terms whose rrier is struturlly smller thn e. If e 1 is, ɛ,, then trivil If e 1 is e 1 1 + e 2 1 nd e 2 is e 1 2 + e 2 2: ((e 1 1 + e 2 1) (e 1 2 + e 2 2)) = = ((e 1 1 e 1 2) + (e 2 1 e 2 2)) = (e 1 1 e 1 2) (e 2 1 e 2 2) = ( (e 1 1) e 1 2, flse ) ( (e 2 1) e 2 2, flse ) = ( (e 1 1) (e 2 1)) ( e 1 2, flse e 2 2, flse ) = (e 1 1 + e 2 1) e 1 2 + e 2 2, flse If e 1 is e 1 1e 2 1 nd e 2 is e 1 2e 2 2 then, using 3. on items whose rrier is struturlly smller thn e 1, ((e 1 1e 2 1) (e 1 2e 2 2)) = = ((e 1 1 e 1 2)(e 2 1 e 2 2)) = (e 1 1 e 1 2) e 2 1 e 2 2, flse = ( (e 1 1) e 1 2, flse ) ( e 2 1, flse e 2 2, flse ) = ( (e 1 1) e 2 1, flse ) ( e 1 2, flse e 2 2, flse )) = (e 1 1e 2 1) e 1 2e 2 2, flse If e 1 is e 1 1 nd e 2 is e 1 2, let (e 1 1 e 1 2) = e, nd (e 1 1) = e,. By indution hypothesis, e, = (e 1 1 e 1 2) = (e 1 1) e 1 2, flse = e, e 1 2, flse Then (e 1 1 e 1 2 ) = ((e 1 1 e 1 2) ) = e, true = = e, true e 1 2, flse = (e 1 1 ) e 1 2, flse Proof.[Of 2.] Let e i j, i j = e i j. By definition of, we hve e 1 1 e 2 1 = e 1 1 e 2 1, 1 1 2 1 { e if = flse For ll nd e, let (e) := (e) otherwise Thus for ll e 1, e 2, 1, 2, letting e 2, 2 := 1 ( e 2, 2 ), the following holds: e 1, 1 e 2, 2 = e 1e 2, 2 2 Let e i 2, i 2 := i 1 (e i 2). By property 1. we hve: e 1 2 e 2 2, 1 2 2 2 = 1 1 (e 1 2) 2 1 (e 2 2) = 1 1 2 1 (e1 2 e 2 2) Thus (e 1 1 e 2 1) (e 1 2 e 2 2) = = e 1 1 e 2 1, 1 1 2 1 (e 1 2 e 2 2) = (e 1 1 e 2 1 )(e 1 2 e 2 2 ), 1 2 2 2 1 2 2 2 = (e 1 1 e 1 2 ) (e 2 1 e 2 2 ), ( 1 2 1 2 ) ( 2 2 2 = e 1 1 e 1 2, 1 2 1 2 e 2 1 e 2 2, 2 2 2 = (e 1 1 e 1 2) (e 2 1 e 2 2) THEOREM 66. (e 1 e 2) = e 1 e 2 Proof. Let e 1 = e 1 1, 1 nd e 2 = e 1 2, 2. Thus 2 ( e 1 1, 1 e 1 2, 2 ) = e 1 1 e 1 2, 1 2 2 ) Let define e, e 1 nd e 2 y ses on 1 nd 2 with the property tht e = e 1 e 2: If 1 = 2 = flse then let e i = e 1 i nd e = e 1 1 e 1 2. Oviously e = e 1 e 2. If 1 = true nd 2 = flse then let (e 1 1) = e 1, 1, let e 2 = e 1 2 nd let (e 1 1 e 1 2) = (e 1 1) e 1 2, flse = e,. Hene e 1 e 1 2 = e 1 e 2 = e. The se 1 = flse nd 2 = true is hndled dully to the previous one. If 1 = true nd 2 = true then let (e 1 i ) = e i, i nd let (e 1 1 e 1 2) = (e 1 1) (e 1 2) = e,. Hene e 1 e 2 = e. In ll ses, e 1 1 e 1 2, 1 2 = e, 1 2 = (e 1 e 2), 1 2 = = e 1 e 2, 1 2 = e 1, 1 e 2, 2 = e 1 1, 1 e 1 2, 2 THEOREM 67. move(e 1 e 2, ) = move(e 1, ) move(e 2, ) Proof. The proof is y indution on the struture of e. the ses, ɛ nd re trivil y omputtion the se hs four su-ses: if e 1 nd e 2 re oth, then move(, ) =, flse = move(, ) move(, ); otherwise t lest one in e 1 or e 2 is nd move(e 1 e 2, ) = move(, ) =, true = move(e 1, ) move(e 2, ) if e is e 1 + e 2 then move((e 1 1 + e 2 1) (e 1 2 + e 2 2), ) = = move((e 1 1 e 1 2) + (e 2 1 e 2 2), ) = move(e 1 1 e 1 2, ) move(e 2 1 e 2 2, ) = (move(e 1 1, ) move(e 1 2, )) (move(e 2 1, ) move(e 2 2, )) = (move(e 1 1, ) move(e 2 1, )) (move(e 1 2, ) move(e 2 2, )) = move(e 1 1 + e 2 1, ) move(e 1 2 + e 2 2, ) if e is e 1 e 2 then move((e 1 1e 2 1) (e 1 2e 2 2), ) = = move((e 1 1 e 1 2)(e 2 1 e 2 2), ) = move(e 1 1 e 1 2, ) move(e 2 1 e 2 2, ) = (move(e 1 1, ) move(e 1 2, )) (move(e 2 1, ) move(e 2 2, )) = (move(e 1 1, ) move(e 2 1, )) (move(e 1 2, ) move(e 2 2, )) = move(e 1 1e 2 1, ) move(e 1 2e 2 2, ) if e is e 1 then move(e 1 1 e 1 2 ) = move((e 1 1 e 1 2) ) = = move(e 1 1 e 1 2) = (move(e 1 1) move(e 1 2)) = move(e 1 1) move(e 1 2) = move(e 1 1 ) move(e 1 2 ) 5.1 Intersetion nd omplement Pointed expressions nnot e generlized in trivil wy to the opertions of intersetion nd omplement. Suppose to extend the definition of the lnguge in the ovious wy, letting L p(e 1 e 2) = short desription of pper 11 2019/3/18

L p(e 1) L p(e 2) nd L p( e) = L p(e). The prolem is tht merging is no longer dditive, nd Theorem 16 does not hold ny more. For instne, onsider the two expressions e 1 = nd e 2 =. Clerly L p(e 1) = L p(e 2) =, ut L p(e 1 e 2) = L p( ) = {}. To etter understnd the prolem, let e = ( ), nd let us onsider the result of move(e, ). Sine move(e, ) = ( ) ), true, we should rodst new point inside ( ) ), hene move(e, ) = ( ) ), tht is oviously wrong. The prolems in extending the tehnique to intersetion nd omplement re not due to some esily voidle defiieny of the pproh ut hve deep theoretil reson: indeed, even if these opertors do not inrese the expressive power of regulr expressions they n hve drsti impt on suintness, mking them muh hrder to hndle. For instne it is well known tht expressions with omplements n provide desriptions of ertin lnguges whih re non-elementry more ompt thn stndrd regulr expression [15]. Gelde [12] hs reently proved tht for ny nturl numer n there exists regulr expression with intersetion of size O(n) suh tht ny DFA epting its lnguge hs doule-exponentil size, i.e. it ontins t lest 2 2n sttes (see lso [13]). Hene, mrking positions with points is not enough, just euse we would not hve enough sttes. Sine the prolem is due to loss of informtion during merging, we re urrently investigting the possiility to exploit olored points. An importnt gol of this pproh would e to provide simple, ompletely syntti explntions for spe ounds of different lsses of lnguges. 6. Conlusions We introdued in this pper the notion of pointed regulr expression, investigted its min properties, nd its reltion with Brzozowski s derivtives. Points re used to mrk the positions inside the regulr expression whih hve een rehed fter reding some prefix of the input string, nd where the proessing of the remining string should strt. In prtiulr, eh pointed expression hs ler semntis. Sine eh pointed expression for e represents stte of the deterministi utomton ssoited with e, this mens we my ssoite semntis to eh stte in terms of the speifition e nd not of the ehviour of the utomton. This llows diret, intuitive nd esily verifile onstrution of the deterministi utomton for e. A mjor dvntge of pointed expressions is from the didtil point of view. Relying on n eletroni devie, it is rel plesure to see points moving inside the regulr expression in response to n input symol. Students immeditely grsp the ide, nd re le to mnully uild the utomt, nd to understnd the mening of its sttes, fter single lesson. Moreover, if you hve relly short time, you n ltogether skip the notion of nondeterministi utomt. Regulr expression reeived renewed interest in reent yers, mostly due to their use in XML-lnguges. Pointed expressions seem to open huge rnge of novel perspetives nd originl pprohes in the field, strting from the hllenging generliztion of the pproh to different opertors suh s ounting, intersetion, nd interleving (e.g. exploiting olors for points, see Setion 5.1). A lrge mount of reserh hs een reently devoted to the so lled suinteness prolem, nmely the investigtion of the desriptionl omplexity of regulr lnguges (see e.g. [12, 13, 14]). Sine, s oserved in Exmple10, pointed expression n provide more ompt desription for regulr lnguges thn trditionl regulr expression, it looks interesting to etter investigted this issue (tht seems to e relted to the so lled str-height [16] of the lnguge). It ould lso e worth to investigte vrints of the notion of pointed expression, llowing different positioning of points inside the expressions. Merging must e etter investigted, nd the whole equtionl theory of pointed expressions, oth with different nd (espeilly) fixed rriers must e entirely developed. As explined in the introdution, the notion of pointed expression ws suggested y n ttempt of formlizing the theory of regulr lnguges y mens of n intertive prover. This testify the relevne of the hoie of good dt strutures not just for the design of lgorithms ut lso for the forml investigtion of given field, nd is nie exmple of the kind of interesting feedk one my expet y the interply with utomted devies for proof development. Referenes [1] G. Rozenerg nd A. Slom, eds., Hndook of forml lnguges, vol. 1: word, lnguge, grmmr. New York, NY, USA: Springer- Verlg New York, In., 1997. [2] K. Ellul, B. Krwetz, J. Shllit, nd M. wei Wng, Regulr expressions: New results nd open prolems, Journl of Automt, Lnguges nd Comintoris, vol. 10, no. 4, pp. 407 437, 2005. [3] J. A. Brzozowski, Derivtives of regulr expressions, J. ACM, vol. 11, no. 4, pp. 481 494, 1964. [4] R. MNughton nd H. Ymd, Regulr expressions nd stte grphs for utomt, Ieee Trnstions On Eletroni Computers, vol. 9, no. 1, pp. 39 47, 1960. [5] S. Owens, J. H. Reppy, nd A. Turon, Regulr-expression derivtives re-exmined, J. Funt. Progrm., vol. 19, no. 2, pp. 173 190, 2009. [6] G. Berry nd R. Sethi, From regulr expressions to deterministi utomt, Theor. Comput. Si., vol. 48, no. 3, pp. 117 126, 1986. [7] A. Brüggemnn-Klein, Regulr expressions into finite utomt, Theor. Comput. Si., vol. 120, no. 2, pp. 197 213, 1993. [8] C.-H. Chng nd R. Pige, From regulr expressions to df s using ompressed nf s, in Comintoril Pttern Mthing, Third Annul Symposium, CPM 92, Tuson, Arizon, USA, April 29 - My 1, 1992, Proeedings, vol. 644 of Leture Notes in Computer Siene, pp. 90 110, Springer, 1992. [9] S. C. Kleene, Representtion of events in nerve nets nd finite utomt, in Automt Studies (C. E. Shnnon nd J. MCrthy, eds.), pp. 3 42, Prineton University Press, 1956. [10] B. W. Wtson, A txonomy of lgorithms for onstruting miniml yli deterministi finite utomt, South Afrin Computer Journl, vol. 27, pp. 12 17, 2001. [11] B. W. Wtson, Diretly onstruting miniml dfs : omining two lgorithms y rzozowski, South Afrin Computer Journl, vol. 29, pp. 17 23, 2002. [12] W. Gelde, Suintness of regulr expressions with interleving, intersetion nd ounting, Theor. Comput. Si., vol. 411, no. 31-33, pp. 2987 2998, 2010. [13] H. Gruer nd M. Holzer, Finite utomt, digrph onnetivity, nd regulr expression size, in ICALP, vol. 5126 of Leture Notes in Computer Siene, pp. 39 50, Springer, 2008. [14] M. Holzer nd M. Kutri, Nondeterministi finite utomt - reent results on the desriptionl nd omputtionl omplexity, Int. J. Found. Comput. Si., vol. 20, no. 4, pp. 563 580, 2009. [15] A. R. Meyer nd L. J. Stokmeyer, The equivlene prolem for regulr expressions with squring requires exponentil spe, in 13th Annul Symposium on Swithing nd Automt Theory (FOCS), pp. 125 129, IEEE, 1972. [16] L. C. Eggn, Trnsition grphs nd the str-height of regulr events, Mihign Mthemtil Journl, vol. 10, no. 4, pp. 385 397, 1963. short desription of pper 12 2019/3/18

arxiv: v1 [cs.fl] 13 Oct 2010