CS 301 Lecture 04 Regulr Expressions Stephen Checkowy Jnury 29, 2018 1 / 35
Review from lst time NFA N = (Q, Σ, δ, q 0, F ) where δ Q Σ P (Q) mps stte nd n lphet symol (or ) to set of sttes We run n NFA on n input w y keeping trck of ll possile sttes the NFA could e in We cn convert n NFA to DFA y letting ech stte of the DFA represent set of sttes in the NFA 2 / 35
Building new lnguges using regulr opertion Use regulr opertions to uild new lnguges A = {w w strts nd ends with the sme symols} B = { k k 1} C = {,, } D = C E = A (B C) F = (D C) (B E) 3 / 35
Descriing complex lnguges using simpler ones Use regulr opertions to rek complex lnguges down into simpler ones A = {w w strts nd ends with the sme symols} B = { k k 1} C = {,, } 4 / 35
Descriing complex lnguges using simpler ones Use regulr opertions to rek complex lnguges down into simpler ones A = {w w strts nd ends with the sme symols} B = { k k 1} C = {,, } C = {} {} {} 4 / 35
Descriing complex lnguges using simpler ones Use regulr opertions to rek complex lnguges down into simpler ones A = {w w strts nd ends with the sme symols} B = { k k 1} C = {,, } C = {} {} {} = {} ({} {}) ({} {} {}) 4 / 35
Descriing complex lnguges using simpler ones Use regulr opertions to rek complex lnguges down into simpler ones A = {w w strts nd ends with the sme symols} B = { k k 1} C = {,, } C = {} {} {} = {} ({} {}) ({} {} {}) B = {} {} {} 4 / 35
Descriing complex lnguges using simpler ones Use regulr opertions to rek complex lnguges down into simpler ones A = {w w strts nd ends with the sme symols} B = { k k 1} C = {,, } C = {} {} {} = {} ({} {}) ({} {} {}) B = {} {} {} A = {} {} ({} Σ {}) ({} Σ {}) 4 / 35
Descriing complex lnguges using simpler ones Use regulr opertions to rek complex lnguges down into simpler ones A = {w w strts nd ends with the sme symols} B = { k k 1} C = {,, } C = {} {} {} = {} ({} {}) ({} {} {}) B = {} {} {} A = {} {} ({} Σ {}) ({} Σ {}) = {} {} ({} ({} {}) {}) ({} ({} {}) {}) 4 / 35
Descriing complex lnguges using simpler ones Use regulr opertions to rek complex lnguges down into simpler ones A = {w w strts nd ends with the sme symols} B = { k k 1} C = {,, } C = {} {} {} = {} ({} {}) ({} {} {}) B = {} {} {} A = {} {} ({} Σ {}) ({} Σ {}) = {} {} ({} ({} {}) {}) ({} ({} {}) {}) We roke ech lnguge down into lnguges contining {}, {}, or {} nd comined them using the three regulr opertions,, nd 4 / 35
Regulr expressions The rces ren t dding nything since ll of our sets re singletons; let s drop them Similrly, let s drop the much s how we drop multipliction symols Let s lso replce with (which we red s or ) This gives us regulr expressions (regex) A = {} {} ({} ({} {}) {}) ({} ({} {}) {}) = B = {} {} {} = C = {} ({} {}) ({} {} {}) = 5 / 35
Regulr expressions The rces ren t dding nything since ll of our sets re singletons; let s drop them Similrly, let s drop the much s how we drop multipliction symols Let s lso replce with (which we red s or ) This gives us regulr expressions (regex) A = {} {} ({} ({} {}) {}) ({} ({} {}) {}) = ( ) ( ) B = {} {} {} = C = {} ({} {}) ({} {} {}) = 5 / 35
Regulr expressions The rces ren t dding nything since ll of our sets re singletons; let s drop them Similrly, let s drop the much s how we drop multipliction symols Let s lso replce with (which we red s or ) This gives us regulr expressions (regex) A = {} {} ({} ({} {}) {}) ({} ({} {}) {}) = ( ) ( ) B = {} {} {} = C = {} ({} {}) ({} {} {}) = 5 / 35
Regulr expressions The rces ren t dding nything since ll of our sets re singletons; let s drop them Similrly, let s drop the much s how we drop multipliction symols Let s lso replce with (which we red s or ) This gives us regulr expressions (regex) A = {} {} ({} ({} {}) {}) ({} ({} {}) {}) = ( ) ( ) B = {} {} {} = C = {} ({} {}) ({} {} {}) = 5 / 35
Regulr expressions The rces ren t dding nything since ll of our sets re singletons; let s drop them Similrly, let s drop the much s how we drop multipliction symols Let s lso replce with (which we red s or ) This gives us regulr expressions (regex) A = {} {} ({} ({} {}) {}) ({} ({} {}) {}) = ( ) ( ) B = {} {} {} = C = {} ({} {}) ({} {} {}) = Order of opertion:,, Prentheses used for grouping We underline the expression to differentite the string from the regulr expression 5 / 35
Regulr expressions Six types of regulr expressions: three se types, three recursive types Regex Lnguge (very rrely used) {} t {t} for ech t Σ R 1 R 2 L(R 1 ) L(R 2 ) R 1 nd R 2 re regex R 1 R 2 L(R 1 ) L(R 2 ) R 1 nd R 2 re regex R L(R) R is regex As shorthnd, we ll use Σ to men (or similr for other lphets) A = Σ Σ 6 / 35
Techniclities Techniclly, regulr expression genertes or descries (regulr) lnguge, it is not lnguge itself Given regulr expression R, the lnguge L(R) is the set of strings generted y R E.g., R = genertes strings,,,... L(R) = { k k 0} A DFA M recognizes (regulr) lnguge L(M) ut we don t identify M with its lnguge Similrly, we shouldn t identify regulr expression R with its lnguge L(R); however it is customry to do so Still, even if we let {} =, tht doesn t men is the sme s! 7 / 35
Kleene str = { k k 0} 8 / 35
Kleene str = { k k 0} ( c) = {w w contins ny numer of,, or c in ny order} 8 / 35
Kleene str = { k k 0} ( c) = {w w contins ny numer of,, or c in ny order} ( ) = {w w is the conctention of 0 or more or } 8 / 35
Kleene str = { k k 0} ( c) = {w w contins ny numer of,, or c in ny order} ( ) = {w w is the conctention of 0 or more or } = { m n m, n 0} 8 / 35
Kleene str = { k k 0} ( c) = {w w contins ny numer of,, or c in ny order} ( ) = {w w is the conctention of 0 or more or } = { m n m, n 0} = {} = 8 / 35
Kleene str = { k k 0} ( c) = {w w contins ny numer of,, or c in ny order} ( ) = {w w is the conctention of 0 or more or } = { m n m, n 0} = {} = = {} = 8 / 35
Kleene str = { k k 0} ( c) = {w w contins ny numer of,, or c in ny order} ( ) = {w w is the conctention of 0 or more or } = { m n m, n 0} = {} = = {} = Σ = Σ = {w w is string over Σ} 8 / 35
Regulr expression exmples ΣΣ = {w w = 2} 9 / 35
Regulr expression exmples ΣΣ = {w w = 2} (ΣΣ) = {w w is even} 9 / 35
Regulr expression exmples ΣΣ = {w w = 2} (ΣΣ) = {w w is even} ( ) = {w every in w is followed y t lest one } 9 / 35
Regulr expression exmples ΣΣ = {w w = 2} (ΣΣ) = {w w is even} ( ) = {w every in w is followed y t lest one } ( ) = 9 / 35
Regulr expression exmples ΣΣ = {w w = 2} (ΣΣ) = {w w is even} ( ) = {w every in w is followed y t lest one } ( ) = = {w w contins exctly one } 9 / 35
Question 1 Wht strings re in the lnguge given y the regulr expression ( )( )? 10 / 35
Question 1 Wht strings re in the lnguge given y the regulr expression ( )( )?,,, 10 / 35
Question 2 True or flse. If lnguges A nd B ech contin 2 strings, then A B contins 4 strings. 11 / 35
Question 2 True or flse. If lnguges A nd B ech contin 2 strings, then A B contins 4 strings. Flse. Counter exmple: A = B = {, }. A B = {,, } Another counter exmple A = {, } nd B = {, }. A B = {,, } 11 / 35
Question 3 Is in the lnguge given y ( )? 12 / 35
Question 3 Is in the lnguge given y ( )? Yes. = 12 / 35
Question 4 Write regex for the lnguge {w is sustring of w} 13 / 35
Question 4 Write regex for the lnguge {w is sustring of w} Σ Σ 13 / 35
Question 5 Write regex for the lnguge {w the second symol of w is or the third to lst symol of w is } 14 / 35
Question 5 Write regex for the lnguge {w the second symol of w is or the third to lst symol of w is } ΣΣ Σ ΣΣ 14 / 35
Question 6 Let Σ = {0, 1,..., 9, } nd D = 0 1 9. Wht strings re generted y the following regulr expression? ((1 )DDD )DDD DDD 15 / 35
Question 6 Let Σ = {0, 1,..., 9, } nd D = 0 1 9. Wht strings re generted y the following regulr expression? ((1 )DDD )DDD DDD U.S. phone numers. We cn rewrite this regex s 1 DDD DDD DDDD DDD DDD DDDD DDD DDDD 15 / 35
Question 7 If R is regulr expression, then the lnguge generted y R is either infinite or contins exctly one string. Under wht condition on R is R infinite? When R contins exctly one string, wht is the string nd wht is R? 16 / 35
Question 7 If R is regulr expression, then the lnguge generted y R is either infinite or contins exctly one string. Under wht condition on R is R infinite? When R contins exctly one string, wht is the string nd wht is R? R is infinite if R contins t lest one nonempty string R contins exctly one string, nmely, when R = or R = 16 / 35
Regulr expression mnipultion Let R 1, R 2, nd R 3 e regulr expressions R 1 = R 1 17 / 35
Regulr expression mnipultion Let R 1, R 2, nd R 3 e regulr expressions R 1 = R 1 R 1 = R 1 17 / 35
Regulr expression mnipultion Let R 1, R 2, nd R 3 e regulr expressions R 1 = R 1 R 1 = R 1 (R 1 R 2 )R 3 = R 1 R 3 R 2 R 3 17 / 35
Regulr expression mnipultion Let R 1, R 2, nd R 3 e regulr expressions R 1 = R 1 R 1 = R 1 (R 1 R 2 )R 3 = R 1 R 3 R 2 R 3 R 1 (R 2 R 3 ) = R 1 R 2 R 1 R 3 17 / 35
Regulr expression mnipultion Let R 1, R 2, nd R 3 e regulr expressions R 1 = R 1 R 1 = R 1 (R 1 R 2 )R 3 = R 1 R 3 R 2 R 3 R 1 (R 2 R 3 ) = R 1 R 2 R 1 R 3 (R1 ) = R1 17 / 35
Regulr expression mnipultion Let R 1, R 2, nd R 3 e regulr expressions R 1 = R 1 R 1 = R 1 (R 1 R 2 )R 3 = R 1 R 3 R 2 R 3 R 1 (R 2 R 3 ) = R 1 R 2 R 1 R 3 (R1 ) = R1 (R 1 R 2 ) = (R1 R2 ) 17 / 35
Regulr expression mnipultion Let R 1, R 2, nd R 3 e regulr expressions R 1 = R 1 R 1 = R 1 (R 1 R 2 )R 3 = R 1 R 3 R 2 R 3 R 1 (R 2 R 3 ) = R 1 R 2 R 1 R 3 (R1 ) = R1 (R 1 R 2 ) = (R1 R2 ) Theorem Every regulr expression R cn e rewritten s n equivlent regulr expression R 1 R 2 R k such tht none of the R i contin n or ( ) 17 / 35
Converting regulr expressions to NFAs Theorem Every regulr expression R cn e converted to n equivlent NFA N. I.e., L(N) = L(R) Proof ide Induction on the structure of the regex We need to construct NFAs directly for the three se cses,, nd t for t Σ Then, we hndle the three inductive cses, R 1 R 2, R 1 R 2, nd R 1 For the inductive cses, we ssume there exist NFAs for R 1 nd R 2 nd use them to uild NFAs for the three inductive cses 18 / 35
Converting regulr expressions to NFAs Proof. Bse cses. 1 R = 19 / 35
Converting regulr expressions to NFAs Proof. Bse cses. 1 R = 2 R = 19 / 35
Converting regulr expressions to NFAs Proof. Bse cses. 1 R = 2 R = 3 R = t t for t Σ 19 / 35
Converting regulr expressions to NFAs Proof. Bse cses. 1 R = 2 R = 3 R = t Inductive cses. 4 R = R 1 R 2 5 R = R 1 R 2 6 R = R 1 t for t Σ 19 / 35
Converting regulr expressions to NFAs Proof. Bse cses. 1 R = 2 R = 3 R = t Inductive cses. 4 R = R 1 R 2 5 R = R 1 R 2 6 R = R 1 t for t Σ By the inductive hypothesis, there exist NFAs N 1 nd N 2 such tht L(N 1 ) = L(R 1 ) nd L(N 2 ) = L(R 2 ). 19 / 35
Converting regulr expressions to NFAs Proof. Bse cses. 1 R = 2 R = 3 R = t Inductive cses. 4 R = R 1 R 2 5 R = R 1 R 2 6 R = R 1 t for t Σ By the inductive hypothesis, there exist NFAs N 1 nd N 2 such tht L(N 1 ) = L(R 1 ) nd L(N 2 ) = L(R 2 ). Since regulr lnguges re closed under union, conctention, nd Kleene str, L(R) is regulr so there exists some NFA N such tht L(N) = L(R). 19 / 35
Converting regulr expressions to NFAs The proof of the inductive cses pplied previous theorems to show tht some NFA exists But we know how to perform the constructions explicitly: N 1 N 1 N 2 N 2 N 1 20 / 35
Regulr expressions descrie regulr lnguges The lnguge of regulr expression is regulr This follow directly from the previous theorem: Regulr expression NFA DFA regulr lnguge 21 / 35
Regulr expression to NFA: R = () () 1 22 / 35
Regulr expression to NFA: R = () () 1 2 22 / 35
Regulr expression to NFA: R = () () 1 2 3 22 / 35
Regulr expression to NFA: R = () () 1 2 3 4 () 22 / 35
Regulr expression to NFA: R = () () 1 2 3 4 () 5 () 22 / 35
Regulr expression to NFA: R = () () 1 2 3 4 () 5 () 6 () 22 / 35
Regulr expression to NFA: R = () () 1 2 3 4 () 5 () 6 () 7 R 22 / 35
Not the smllest possile NFA 23 / 35
Not the smllest possile NFA 23 / 35
Not the smllest possile NFA 23 / 35
Not the smllest possile NFA 23 / 35
Not the smllest possile NFA 23 / 35
Not the smllest possile NFA "Accepted 23 / 35
Not the smllest possile NFA "Accepted 24 / 35
Not the smllest possile NFA "Accepted 24 / 35
Not the smllest possile NFA "Accepted 24 / 35
Not the smllest possile NFA "Accepted 24 / 35
Not the smllest possile NFA "Accepted $Rejected 24 / 35
Not the smllest possile NFA "Accepted $Rejected 25 / 35
Not the smllest possile NFA "Accepted $Rejected 25 / 35
Not the smllest possile NFA "Accepted $Rejected 25 / 35
Not the smllest possile NFA "Accepted $Rejected $Rejected 25 / 35
Converting from NFAs to regex Theorem Every NFA (nd thus every DFA) cn e converted to n equivlent regulr expression. Proof ide 1 Convert the NFA to new type of finite utomton whose edges re leled with regulr expressions 2 Remove sttes nd updte trnsitions one t time from the new utomton to produce n equivlent utomton 3 When only the strt nd (single) ccept stte remin, the trnsition etween them is the regulr expression 26 / 35
Generlized NFA (GNFA) A GNFA is finite utomton with single ccept stte, no trnsitions to the strt stte, no trnsitions from the ccept stte, nd ech trnsition is leled with regulr expression q 1 q 0 q q 2 27 / 35
GNFA cceptnce A GNFA trnsitions from one stte to the next y reding lock of input symols generted y the regex q 1 q 0 q q 2 28 / 35
GNFA cceptnce A GNFA trnsitions from one stte to the next y reding lock of input symols generted y the regex q 1 q 0 q q 2 28 / 35
GNFA cceptnce A GNFA trnsitions from one stte to the next y reding lock of input symols generted y the regex q 1 q 0 q q 2 28 / 35
GNFA cceptnce A GNFA trnsitions from one stte to the next y reding lock of input symols generted y the regex q 1 q 0 q q 2 28 / 35
GNFA cceptnce A GNFA trnsitions from one stte to the next y reding lock of input symols generted y the regex q 1 q 0 q q 2 "Accepted 28 / 35
Removing sttes in GNFA 1 Select stte to remove r other thn the strt or ccept sttes (r Q {q 0, q }) 2 For ech q, s Q {r} we hve R 2 R 1 R 3 q r s R 4 If trnsition is missing from the GNFA, then the corresponding regex is Remove stte r nd replce regex R 4 with R 1 R 2 R3 R 4 q R 1 R 2 R 3 R 4 s 29 / 35
Removing sttes in GNFA 1 Select stte to remove r other thn the strt or ccept sttes (r Q {q 0, q }) 2 For ech q, s Q {r} we hve R 2 R 4 R 2 R 1 R 3 R 1 q r s q r R 4 If trnsition is missing from the GNFA, then the corresponding regex is Remove stte r nd replce regex R 4 with R 1 R 2 R3 R 4 R 3 q R 1 R2 R 3 R 4 s q R 1 R2 R 3 R 4 29 / 35
Remove stte q 1 q 1 q 0 q q 2 q 0 q q 2 30 / 35
Remove stte q 1 R 1 = q 1 R 2 = R 3 = q 0 q R 4 = q 2 q 0 q q 2 30 / 35
Remove stte q 1 R 1 = q 1 R 2 = R 3 = q 0 q R 4 = ( )( ) q 2 q 0 q q 2 30 / 35
Remove stte q 1 R 1 = q 1 R 2 = R 3 = q 0 q R 4 = ( )( ) q 2 q 0 q q 2 30 / 35
Remove stte q 1 R 1 = q 1 R 2 = R 3 = q 0 q R 4 = ( )( ) q 2 q 0 q ( )( ) q 2 30 / 35
Remove stte q 1 R 1 = q 1 R 2 = R 3 = q 0 q R 4 = ( )( ) q 2 q 0 q ( )( ) q 2 30 / 35
Remove stte q 1 R 1 = q 1 R 2 = R 3 = q 0 q R 4 = ( )( ) q 2 q 0 q ( )( ) q 2 ( ) 30 / 35
Remove stte q 1 R 1 = q 1 R 2 = R 3 = q 0 q R 4 = ( )( ) q 2 q 0 q ( )( ) q 2 ( ) 30 / 35
Remove stte q 1 R 1 = q 1 R 2 = R 3 = q 0 q R 4 = ( )( ) q 2 q 0 q ( )( ) q 2 ( ) ( ) 30 / 35
Remove stte q 2 ( )( ) q 0 q ( )( ) q 2 ( ) ( ) (( )( ) )(( ) ) (( ) ) (( )( ) ) q 0 q 31 / 35
Converting GNFA to regulr expression Remove sttes one t time until only the strt nd ccept remin The one remining trnsition is n equivlent regex G: q 1 q 0 q q 2 L(G) = (( )( ) )(( ) ) (( ) ) (( )( ) ) 32 / 35
Converting n NFA (or DFA) to GNFA 1 Add new strt stte with n epsilon trnsition to the originl strt stte 2 Add new ccept stte with epsilon trnsitions from the originl ccept sttes 3 Convert multiple trnsitions etween pir of nodes to single regex using to seprte them, 33 / 35
Converting n NFA (or DFA) to regulr expression Theorem Every NFA (nd thus every DFA) cn e converted to n equivlent regulr expression. Proof. Given n NFA N, convert it to n equivlent GNFA G. Convert G to n equivlent regulr expression. (Some detils missing, ut see the ook.) 34 / 35
Exmple, q 1 q 2 First, convert to GNFA. q 0 q 1 q 2 q 35 / 35
Exmple, q 1 q 2 First, convert to GNFA. q 0 q 1 q 2 q ( ) Next, remove q 1 q 0 ( ) q 2 q 35 / 35
Exmple, q 1 q 2 First, convert to GNFA. q 0 q 1 q 2 q ( ) Next, remove q 1 q 0 ( ) q 2 q ( )(( )) Next, remove q 2 q 0 q 35 / 35
Exmple, q 1 q 2 First, convert to GNFA. q 0 q 1 q 2 q ( ) Next, remove q 1 q 0 ( ) q 2 q ( )(( )) Next, remove q 2 q 0 q Equivlent regulr expression ( )(( )) 35 / 35
Exmple, q 1 q 2 First, convert to GNFA. q 0 q 1 q 2 q ( ) Next, remove q 1 q 0 ( ) q 2 q ( )(( )) Next, remove q 2 q 0 q Equivlent regulr expression ( )(( )) = Σ(Σ) 35 / 35