In Liner Time rom Regulr Expressions to NFAs e = ǫ ǫ ǫ ǫ e = e = ǫ ǫ e = ǫ ǫ e = ǫ ǫ ǫ Thompson s Algorithm Produces O(n) sttes or regulr expressions o length n Ken Thompson 27 / 282
Berry-Sethi Approch Berry-Sethi Algorithm Produces exctly n+1 sttes without ǫ-trnsitions Gerrd Berry Rvi Sethi nd demonstrtes Equlity Systems nd Attriute Grmmrs Ide: The utomton trcks (conceptionlly vi mrker ), in the syntx tree o regulr expression, which suexpressions in e re rechle consuming the rest o input w 28 / 282
Berry-Sethi Approch Glushkov Algorithm Produces exctly n+1 sttes without ǫ-trnsitions nd demonstrtes Equlity Systems nd Attriute Grmmrs Viktor M Glushkov Ide: The utomton trcks (conceptionlly vi mrker ), in the syntx tree o regulr expression, which suexpressions in e re rechle consuming the rest o input w 28 / 282
Berry-Sethi Approch or exmple: () () * 29 / 282
Berry-Sethi Approch or exmple: w = : * 29 / 282
Berry-Sethi Approch or exmple: w = : * 29 / 282
Berry-Sethi Approch or exmple: w = : * 29 / 282
Berry-Sethi Approch or exmple: w = : * 29 / 282
Berry-Sethi Approch or exmple: w = : * 29 / 282
Berry-Sethi Approch or exmple: w = : * 29 / 282
Berry-Sethi Approch or exmple: w = : * 29 / 282
Berry-Sethi Approch or exmple: w = : * 29 / 282
Berry-Sethi Approch or exmple: w = : * 29 / 282
Berry-Sethi Approch In generl: Input is only consumed t the leves Nvigting the tree does not consume input ǫ-trnsitions For orml construction we need identiiers or sttes For node n s identiier we tke the suexpression, corresponding to the sutree dominted y n There re possily identicl suexpressions in one regulr expression == we enumerte the leves 30 / 282
Berry-Sethi Approch or exmple: * 31 / 282
Berry-Sethi Approch or exmple: * 2 0 1 3 4 31 / 282
Berry-Sethi Approch or exmple: * 2 0 1 3 4 31 / 282
Berry-Sethi Approch (nive version) Construction (nive version): Sttes: r, r with r nodes o e; Strt stte: e; Finl stte: e ; Trnsitions: or leves r i x we require: ( r,x,r ) The letover trnsitions re: r Trnsitions r 1 r 2 ( r,ǫ, r 1 ) ( r,ǫ, r 2 ) (r 1,ǫ,r ) (r 2,ǫ,r ) r 1 r 2 ( r,ǫ, r 1 ) (r 1,ǫ, r 2 ) (r 2,ǫ,r ) r Trnsitions r1 ( r,ǫ,r ) ( r,ǫ, r 1 ) (r 1,ǫ, r 1 ) (r 1,ǫ,r ) r 1? ( r,ǫ,r ) ( r,ǫ, r 1 ) (r 1,ǫ,r ) 32 / 282
Berry-Sethi Approch Discussion: Most trnsitions nvigte through the expression The resulting utomton is in generl nondeterministic 33 / 282
Berry-Sethi Approch Discussion: Most trnsitions nvigte through the expression The resulting utomton is in generl nondeterministic Strtegy or the sophisticted version: Avoid generting ǫ-trnsitions 33 / 282
Berry-Sethi Approch Discussion: Ide: Most trnsitions nvigte through the expression The resulting utomton is in generl nondeterministic Strtegy or the sophisticted version: Avoid generting ǫ-trnsitions Pre-compute helper ttriutes during D(epth)F(irst)S(erch)! 33 / 282
Berry-Sethi Approch Discussion: Ide: Most trnsitions nvigte through the expression The resulting utomton is in generl nondeterministic Strtegy or the sophisticted version: Avoid generting ǫ-trnsitions Pre-compute helper ttriutes during D(epth)F(irst)S(erch)! Necessry node-ttriutes: irst the set o red sttes elow r, which my e reched irst, when descending into r next the set o red sttes to the right o r, which my e reched irst in the trversl ter r lst the set o red sttes elow r, which my e reched lst when descending into r empty cn the suexpression r consume ǫ? 33 / 282
Berry-Sethi Approch: 1st step empty[r] = t i nd only i ǫ [[r]] or exmple: * 2 0 1 3 4 34 / 282
Berry-Sethi Approch: 1st step empty[r] = t i nd only i ǫ [[r]] or exmple: * 2 0 1 3 4 34 / 282
Berry-Sethi Approch: 1st step empty[r] = t i nd only i ǫ [[r]] or exmple: * 2 0 1 3 4 34 / 282
Berry-Sethi Approch: 1st step empty[r] = t i nd only i ǫ [[r]] or exmple: t * 2 0 1 3 4 34 / 282
Berry-Sethi Approch: 1st step empty[r] = t i nd only i ǫ [[r]] or exmple: t * 2 0 1 3 4 34 / 282
Berry-Sethi Approch: 1st step Implementtion: DFS post-order trversl or leves r i x we ind empty[r] = (x ǫ) Otherwise: empty[r 1 r 2 ] = empty[r 1 ] empty[r 2 ] empty[r 1 r 2 ] = empty[r 1 ] empty[r 2 ] empty[r1] = t empty[r 1?] = t 35 / 282
Berry-Sethi Approch: 2nd step The my-set o irst reched red sttes: The set o red sttes, tht my e reched rom r (ie while descending into r) vi sequences o ǫ-trnsitions: irst[r] = {i in r ( r,ǫ, i x ) δ,x ǫ} or exmple: t * 2 0 1 3 4 36 / 282
Berry-Sethi Approch: 2nd step The my-set o irst reched red sttes: The set o red sttes, tht my e reched rom r (ie while descending into r) vi sequences o ǫ-trnsitions: irst[r] = {i in r ( r,ǫ, i x ) δ,x ǫ} or exmple: 0 0 t * 2 1 2 3 4 1 3 4 36 / 282
Berry-Sethi Approch: 2nd step The my-set o irst reched red sttes: The set o red sttes, tht my e reched rom r (ie while descending into r) vi sequences o ǫ-trnsitions: irst[r] = {i in r ( r,ǫ, i x ) δ,x ǫ} or exmple: 0 0 01 t * 2 34 1 2 3 4 1 3 4 36 / 282
Berry-Sethi Approch: 2nd step The my-set o irst reched red sttes: The set o red sttes, tht my e reched rom r (ie while descending into r) vi sequences o ǫ-trnsitions: irst[r] = {i in r ( r,ǫ, i x ) δ,x ǫ} or exmple: 0 0 01 01 2 t * 2 34 2 1 3 4 1 3 4 36 / 282
Berry-Sethi Approch: 2nd step The my-set o irst reched red sttes: The set o red sttes, tht my e reched rom r (ie while descending into r) vi sequences o ǫ-trnsitions: irst[r] = {i in r ( r,ǫ, i x ) δ,x ǫ} or exmple: 0 0 01 012 01 2 t * 2 34 2 1 3 4 1 3 4 36 / 282
Berry-Sethi Approch: 2nd step Implementtion: DFS post-order trversl or leves r i x we ind irst[r] = {i x ǫ} Otherwise: irst[r 1 r 2 ] = { irst[r 1 ] irst[r 2 ] irst[r1 ] irst[r irst[r 1 r 2 ] = 2 ] i empty[r 1 ] = t irst[r 1 ] i empty[r 1 ] = irst[r1] = irst[r 1 ] irst[r 1?] = irst[r 1 ] 37 / 282
Berry-Sethi Approch: 3rd step The my-set o next red sttes: The set o red sttes within the sutrees right o r, tht my e reched next vi sequences o ǫ-trnsitions next[r] = {i (r,ǫ, i x ) δ,x ǫ} or exmple: 0 0 01 01 t * 012 2 2 34 1 2 3 4 1 3 4 38 / 282
Berry-Sethi Approch: 3rd step The my-set o next red sttes: The set o red sttes within the sutrees right o r, tht my e reched next vi sequences o ǫ-trnsitions next[r] = {i (r,ǫ, i x ) δ,x ǫ} or exmple: 0 0 01 01 t * 012 2 2 34 1 2 3 4 1 3 4 38 / 282
Berry-Sethi Approch: 3rd step The my-set o next red sttes: The set o red sttes within the sutrees right o r, tht my e reched next vi sequences o ǫ-trnsitions next[r] = {i (r,ǫ, i x ) δ,x ǫ} or exmple: 0 0 01 01 2 t * 012 2 2 34 3 4 1 2 3 4 1 3 4 38 / 282
Berry-Sethi Approch: 3rd step The my-set o next red sttes: The set o red sttes within the sutrees right o r, tht my e reched next vi sequences o ǫ-trnsitions next[r] = {i (r,ǫ, i x ) δ,x ǫ} or exmple: 0 1 2 0 0 01 01 2 t * 012 2 2 34 3 4 1 2 3 4 1 3 4 38 / 282
Berry-Sethi Approch: 3rd step The my-set o next red sttes: The set o red sttes within the sutrees right o r, tht my e reched next vi sequences o ǫ-trnsitions next[r] = {i (r,ǫ, i x ) δ,x ǫ} or exmple: 012 01 2 2 t * 2 34 0 1 2 01 3 4 0 1 2 3 4 0 1 2 0 1 2 0 1 3 4 38 / 282
Berry-Sethi Approch: 3rd step Implementtion: DFS pre-order trversl For the root, we ind: next[e] = Aprt rom tht we distinguish, sed on the context: r Equlities r 1 r 2 next[r 1 ] = next[r] next[r 2 ] = next[r] { irst[r2 ] next[r] i empty[r r 1 r 2 next[r 1 ] = 2 ] = t irst[r 2 ] i empty[r 2 ] = next[r 2 ] = next[r] r1 next[r 1 ] = irst[r 1 ] next[r] r 1? next[r 1 ] = next[r] 39 / 282
Berry-Sethi Approch: 4th step The my-set o lst reched red sttes: The set o red sttes, which my e reched lst during the trversl o r connected to the root vi ǫ-trnsitions only: lst[r] = {i in r ( i x,ǫ,r ) δ,x ǫ} or exmple: 012 01 2 2 t * 2 34 0 1 2 01 3 4 0 1 2 3 4 0 1 2 0 1 2 0 1 3 4 40 / 282
Berry-Sethi Approch: 4th step The my-set o lst reched red sttes: The set o red sttes, which my e reched lst during the trversl o r connected to the root vi ǫ-trnsitions only: lst[r] = {i in r ( i x,ǫ,r ) δ,x ǫ} or exmple: 012 01 2 2 t * 2 2 34 0 1 2 01 3 4 0 0 1 1 2 3 3 4 4 0 1 2 0 1 2 0 1 3 4 40 / 282
Berry-Sethi Approch: 4th step The my-set o lst reched red sttes: The set o red sttes, which my e reched lst during the trversl o r connected to the root vi ǫ-trnsitions only: lst[r] = {i in r ( i x,ǫ,r ) δ,x ǫ} or exmple: 012 34 01 01 2 34 2 t * 2 2 34 34 0 1 2 01 01 3 4 0 0 1 1 2 3 3 4 4 0 1 2 0 1 2 0 1 3 4 40 / 282
Berry-Sethi Approch: 4th step Implementtion: DFS post-order trversl or leves r i x we ind lst[r] = {i x ǫ} Otherwise: lst[r 1 r 2 ] = { lst[r 1 ] lst[r 2 ] lst[r1 ] lst[r lst[r 1 r 2 ] = 2 ] i empty[r 2 ] = t lst[r 2 ] i empty[r 2 ] = lst[r1] = lst[r 1 ] lst[r 1?] = lst[r 1 ] 41 / 282
Berry-Sethi Approch: (sophisticted version) Construction (sophisticted version): Crete n utomnton sed on the syntx tree s new ttriutes: Sttes: { e} {i i le} Strt stte: e Finl sttes: lst[e] i empty[e] = { e} lst[e] otherwise Trnsitions: ( e,, i ) i i irst[e] nd i lled with (i,,i ) i i next[i] nd i lled with We cll the resulting utomton A e 42 / 282
Berry-Sethi Approch or exmple: 0 1 2 3 4 Remrks: This construction is known s Berry-Sethi- or Glushkov-construction It is used or XML to deine Content Models The result my not e, wht we hd in mind 43 / 282