Regular Expressions Chris Dyer Algorithms for NLP 11-711 Adapted from materials from Alon Lavie
Goals of Today s Lecture Understand the properties of NFAs with epsilon transitions Understand concepts and definitions of regular expressions (REs) Understand the relationships among REs, FSAs, and regular languages 2
NFA with Epsilons A NFA with "-transitions is an NFA that may change states without reading an input symbol. a b c q " " 0 q 1 q 2 3
NFA with Epsilons A nondeterministic finite automaton is a 5-tuple M = hq,,,q 0,Fi where Q is a finite set of states is a finite alphabet : Q Q! 2 Q is the a transition transition function relation q 0 : Q2 Q [ {"}! 2 Q is the transition relation qf 0 2 Q is the start (initial) state F Q is the set of final (accept) states L(M) is the language of M, i.e. the set of strings M accepts 4
Definitions 5
Definitions Let CL " (q) ={p 2 Q p is reachable from q by "-moves} 5
Definitions Let CL " (q) ={p 2 Q p is reachable from q by "-moves} We can generalize this to a set P CL " [ (P )= CL " (p) p2p 5
Definitions Let CL " (q) ={p 2 Q p is reachable from q by "-moves} We can generalize this to a set P CL " [ (P )= CL " (p) p2p Generalized transition definition ˆ(q, ") =CL " (q) ˆ(q, x )=CL " ( (ˆ(q, x), )) 5
Definitions Let CL " (q) ={p 2 Q p is reachable from q by "-moves} We can generalize this to a set P CL " [ (P )= CL " (p) p2p Generalized transition definition ˆ(q, ") =CL " (q) ˆ(q, x )=CL " ( (ˆ(q, x), )) May be further generalized to sets Generalized definition is different than base 5
Definitions Let CL " (q) ={p 2 Q p is reachable from q by "-moves} We can generalize this to a set P CL " [ Formal (P )= definition CL " (p) of L(M) n o L(M) p2p = w 2 ˆ(q 0, w) \ F 6= ; Generalized transition definition ˆ(q, ") =CL " (q) ˆ(q, x )=CL " ( (ˆ(q, x), )) May be further generalized to sets Generalized definition is different than base 5
" NFA& -NFA Equivalence Theorem. For every NFA A with epsilon moves there is an equivalent NFA A 0 without, s.t. L(A) =L(A 0 ) 6
" NFA& -NFA Equivalence Theorem. For every NFA A with epsilon moves there is an equivalent NFA A 0 without, s.t. L(A) =L(A 0 ) Proof. This is a constructive proof. 6
" NFA& -NFA Equivalence Theorem. For every NFA A with epsilon moves there is an equivalent NFA A 0 without, s.t. L(A) =L(A 0 ) Proof. This is a constructive proof. Construction. Given A = hq,,,q 0,Fi We construct A 0 = hq,, 0,q 0,F 0 i 6
" NFA& -NFA Equivalence Theorem. For every NFA equivalent NFA A with epsilon moves there is an A 0 without, s.t. L(A) =L(A 0 ) Proof. This is a constructive proof. Construction. Given A = hq,,,q 0,Fi We construct A 0 = hq,, 0,q 0,F 0 i ( F 0 F [ {q 0 } if CL " (q 0 ) \ F 6= ; = F otherwise 6
" NFA& -NFA Equivalence Theorem. For every NFA equivalent NFA A with epsilon moves there is an A 0 without, s.t. L(A) =L(A 0 ) Proof. This is a constructive proof. Construction. Given A = hq,,,q 0,Fi We construct A 0 = hq,, 0,q 0,F 0 i ( F 0 F [ {q 0 } if CL " (q 0 ) \ F 6= ; = F otherwise Using the generalized transition definition, 6
" NFA& -NFA Equivalence Theorem. For every NFA equivalent NFA A with epsilon moves there is an A 0 without, s.t. L(A) =L(A 0 ) Proof. This is a constructive proof. Construction. Given A = hq,,,q 0,Fi We construct A 0 = hq,, 0,q 0,F 0 i ( F 0 F [ {q 0 } if CL " (q 0 ) \ F 6= ; = F otherwise Using the generalized transition definition, 0 (q, )=ˆ(q, ) 6
" NFA& -NFA Equivalence It remains to show: 0 (q 0, x) =ˆ(q 0, x) (i) base: x =1 0 (q, a) =ˆ(q, a) by definition of 0 7
Regular Expression A regular expression is a way of describing the languages accepted by FSAs. Defined recursively: 1. ; is an RE denoting the empty set 2. " is an RE denoting the set {"} 3. for each a 2, a is a RE denoting {a} 4. If r and s are REs denoting the languages R and S (r s) (rs) r* denotes denotes denotes R [ S R.S R Precedence means parentheses can sometimes be omitted: *. 8
Examples (0 1)* 0* 1* denotes all finite words over = {0, 1} denotes all finite words containing only 0 s and 1 s 9
REs and "-NFAs Theorem. For every RE L(r) =L(A) r there is an "-NFA s.t. 10
REs and "-NFAs Theorem. For every RE L(r) =L(A) r there is an "-NFA s.t. Proof. We will construct A compositionally using induction on the number of operators in r. 10
REs and "-NFAs Theorem. For every RE L(r) =L(A) r there is an "-NFA s.t. Proof. We will construct A compositionally using induction on the number of operators in r. Base cases. r has 0 operators 10
REs and "-NFAs Theorem. For every RE L(r) =L(A) r there is an "-NFA s.t. Proof. We will construct A compositionally using induction on the number of operators in r. Base cases. r has 0 operators r = ; q 0 q f 10
REs and "-NFAs Theorem. For every RE L(r) =L(A) r there is an "-NFA s.t. Proof. We will construct A compositionally using induction on the number of operators in r. Base cases. r has 0 operators r = ; q 0 q f r = " q " 0 q f 10
REs and "-NFAs Theorem. For every RE L(r) =L(A) r there is an "-NFA s.t. Proof. We will construct A compositionally using induction on the number of operators in r. Base cases. r has 0 operators r = ; q 0 q f r = " q " 0 q f r = a q a 0 q f 10
REs and "-NFAs Theorem. For every RE L(r) =L(A) r there is an "-NFA s.t. Proof. We will construct A compositionally using induction on the number of operators in r. Base cases. r has 0 operators r = ; q 0 q f Note: we assume there is exactly one final state. r = " q " 0 q f r = a q a 0 q f 10
REs and "-NFAs Inductive step. We assume hypothesis is true for all REs with n operations, and then prove is true for n+1 operations. 11
REs and "-NFAs Inductive step. We assume hypothesis is true for all REs with n operations, and then prove is true for n+1 operations. There are three cases to be dealt with: (1) (2) (3) r = r 1 r 2 r = r 1 r 2 r = r 1 * 11
Case 1: r = r 1 r 2 By the inductive hypothesis, there are two epsilon NFAs and. A 1 A 2 A 1 q 01 q f1 q 02 q f2 12
Case 1: r = r 1 r 2 By the inductive hypothesis, there are two epsilon NFAs and. A 1 A 2 A 1 q 01 q f1 A 2 q 02 q f2 13
Case 1: r = r 1 r 2 By the inductive hypothesis, there are two epsilon NFAs and. A 1 A 2 Construct the following A. " q 01 q f1 " q 0 q f " " q 02 q f2 14
Case 1: r = r 1 r 2 Formally, if A 1 = hq 1,, 1,q 01, {q f1 }i then, A 2 = hq 2,, 2,q 02, {q f2 }i A = hq 1 [ Q 2 [ {q 0 } [ {q f },,,q 0, {q f }i (q 0, ") ={q 01,q 02 } (q f1, ") ={q f } (q f2, ") ={q f } (q, )= 1 q, 8q 2 Q 1 {q f1 }, 2 [ {"} (q, )= 2 q, 8q 2 Q 2 {q f2 }, 2 [ {"} 15
Case 1: r = r 1 r 2 It remains to show that L(A) =L(A 1 ) [ L(A 2 ) How to do this? Set containment. 16
Cases 2 & 3 Strategy for showing this proceeds as with Case 1 Refer to textbook for details. 17