CS103 Handout 32 Fall 2016 November 11, 2016 Problem Set 7

CS103 Hndout 32 Fll 2016 Novemer 11, 2016 Prolem Set 7 Wht cn you do with regulr expressions? Wht re the limits of regulr lnguges? On this prolem set, you'll find out! As lwys, plese feel free to drop y office hours, sk on Pizz, or send us emils if you hve ny questions. We'd e hppy to help out. Good luck, nd hve fun! Due Fridy, Novemer 18 t the strt of clss. Skills developed in this prolem set: Designing nd testing regulr expressions. Switching etween different representtions of regulr lnguge to prove results out those lnguges. Using the stte-elimintion lgorithm to convert finite utomt into regulr expressions. Understnding the definition of distinguishility t conceptul level. Using the Myhill-Nerode theorem to prove tht lnguges re or re not regulr. Building n intuition for wht mkes lnguge regulr or nonregulr. Developing nunced understnding of the proof of the Myhill-Nerode theorem nd using it to generlize the proof of tht result. Exploring the nunces of closure properties nd their limits.

Prolem One: Designing Regulr Expressions (14 Points) Below re list of lphets nd lnguges over those lphets. For ech lnguge, write regulr expression for tht lnguge. Plese use our online tool to design, test, nd sumit your regulr expressions. Typed or hndwritten solutions will not e ccepted. To use it, visit the CS103 wesite nd click the Regex Editor link under the Resources heder. As efore, mke note in your GrdeScope sumission of which tem memer sumitted your nswers to this question so tht we know where to look. Also, s reminder, plese test your sumissions thoroughly, since we'll e grding them with n utogrder. i. Let Σ = {, } nd let L = { w Σ* w does not contin s sustring }. Write regulr expression for L. ii. Let Σ = {, } nd let L = { w Σ* w does not contin s sustring }. Write regulr expression for L. iii. Suppose you re tking wlk with your dog on lesh of length two. Let Σ = {y, d} nd let L = { w Σ* w represents wlk with your dog on lesh where you nd your dog oth end up t the sme loction }. For exmple, the string yyddddyy is in L ecuse you nd your dog re never more thn two steps prt nd oth of you end up four steps hed of where you strted; similrly, ddydyy L. However, yyyyddd L, since hlfwy through your wlk you re three steps hed of your dog; ddyd L, ecuse your dog ends up two steps hed of you; nd ddyddyyy L, ecuse t one point in your wlk your dog is three steps hed of you. Write regulr expression for L. iv. Let Σ = {, } nd let L = { w Σ* w }. Write regulr expression for L. v. Let Σ = {M, D, C, L, X, V, I} nd let L = { w Σ* w is numer less thn 1,000 represented in Romn numerls }. For exmple, CMXCIX L, since it represents the numer 999, s re the strings L (50), VIII (8), DCLXVI (666), CXXXVII (137), nd CDXII (412). However, we hve VIIII L (you'll never hve four I's in row; use IX or IV insted), tht MI L (it's Romn numerl, ut it's for numer tht's too lrge), tht VX L (this isn't vlid Romn numerl), nd tht IM L (the nottion of using smller digit to sutrct from lrger one only lets you use I to prefix V nd X, or X to prefix L nd C, or C to prefix D nd M). The Romns didn't hve wy of expressing the numer 0, so to mke your life esier we'll sy tht ε L nd tht the empty string represents 0. (Oh, those silly Romns.) Write regulr expression for L. Prolem Two: Finite nd Cofinite Lnguges (6 Points) A lnguge L is clled finite if L contins finitely mny strings. More precisely, lnguge L is finite lnguge if L is nturl numer. A lnguge L is clled cofinite if its complement is finite lnguge; tht is, L is cofinite if L is nturl numer. i. Prove tht ny finite lnguge is regulr. ii. Prove tht ny cofinite lnguge is regulr.

Prolem Three: Stte Elimintion (6 Points) The stte elimintion lgorithm gives wy to trnsform finite utomton (DFA or NFA) into regulr expression. It's relly eutiful lgorithm once you get the hng of it, so we thought tht we'd let you try it out on prticulr exmple. Let Σ = {, } nd let L = { w Σ* w hs n even numer of 's nd n even numer of 's}. Below is finite utomton for L tht we've prepred for the stte elimintion lgorithm y dding in new strt stte q strt nd new ccept stte q cc : q cc strt q strt ε ε q 0 q 1 q 2 q 3 We'd like you to use the stte elimintion lgorithm to produce regulr expression for L. i. Run two steps of the stte elimintion lgorithm on the ove utomton. Specificlly, first remove stte q₁, then remove stte q₂. Show your result t this point. ii. Finish the stte elimintion lgorithm y removing q₃, then q₀. Wht regulr expression do you get for L? Prolem Four: Distinguishle Strings (6 Points) The Myhill-Nerode theorem is one of the trickier nd more nunced theorems we've covered this qurter. This question explores wht the theorem mens nd, importntly, wht it doesn't men. Let Σ = {, } nd let L = { w Σ* w is even }. i. Show tht L is regulr lnguge. ii. Prove tht there is infinite set S Σ* where there re infinitely mny pirs of distinct strings x, y S such tht x L y. iii. Prove tht there is no infinite set S Σ* where ll pirs of distinct strings x, y S stisfy x L y. The distinction etween prts (ii) nd (iii) is importnt for understnding the Myhill-Nerode theorem. A lnguge is nonregulr not if you cn find infinitely mny pirs of distinguishle strings, ut rther if you cn find infinitely mny strings tht re ll pirwise distinguishle. This is sutle distinction, ut it's n importnt one!

Prolem Five: Blnced Prentheses (12 Points) Let Σ = {(, )} nd consider the lnguge L₁ = { w Σ* w is string of lnced prentheses }. For exmple, we hve () L₁, (()) L₁, (()())() L₁, ε L₁, nd (())((()())) L₁, ut )( L₁, (() L₁, nd ((()))) L₁. This question explores properties of this lnguge. i. Prove tht L₁ is not regulr lnguge. One consequence of this result which you don't need to prove is tht most lnguges tht support some sort of nested prentheses, such s most progrmming lnguges nd HTML, ren't regulr nd so cn't e prsed using regulr expressions. Let's sy tht the nesting depth of string of lnced prentheses is the mximum numer of unmtched open prentheses t ny point inside the string. For exmple, the string ((())) hs nesting depth three, the string (()())() hs nesting depth two, nd the string ε hs nesting depth zero. Consider the lnguge L₂ = { w Σ* w is string of lnced prentheses nd w's nesting depth is t most four }. For exmple, ((())) L₂, (()()) L₂, nd (((())))(()) L₂, ut ((((())))) L₂ ecuse lthough it's string of lnced prentheses, the nesting goes five levels deep. ii. Design DFA for L₂, showing tht L₂ is regulr. A consequence of this result which, gin, you don't need to prove is tht while you cn't prse generl progrms or HTML with regulr expressions, you cn (in principle) prse progrms with low nesting depth or HTML documents without deeply-nested tgs using regulr expressions. Plese sumit this DFA using the DFA editor on the course wesite. iii. Look ck t your proof from prt (i) of this prolem. Imgine tht you were to tke tht exct proof nd lindly replce every instnce of L₁ with L₂. This would give you (incorrect) proof tht L₂ is nonregulr (we know is hs to e wrong ecuse L₂ is indeed regulr.) Where would the error e in tht proof? Be s specific s possile. iv. Without mking reference to DFAs, NFAs, regulr expressions, or the Myhill-Nerode theorem, explin, intuitively, why L₁ is nonregulr while L₂ is regulr. Prolem Six: Tutonyms (8 Points) A tutonym is word tht consists of the sme string repeted twice. For exmple, the words ulul, crcr, nd dikdik re ll tutoynms (the first two re species of irds, nd the lst is the cutest niml you'll ever see), s is the word hotshots (people who ren't very fun to e round). Let Σ = {, } nd consider the following lnguge: L = { ww w Σ* } This is the lnguge of ll tutonyms over Σ. Below is n incorrect proof tht L is not regulr: Proof: Let S = { n n N }. This set is infinite ecuse it contins one string for ech nturl numer. We clim tht ny two strings in S re distinguishle reltive to L. To see this, consider ny two distinct strings n nd m in the set S. Then n n L ut m n L, so n L m. This mens tht S is n infinite set of strings tht re pirwise distinguishle to L. Therefore, y the Myhill-Nerode theorem, L is not regulr. Although this lnguge is indeed nonregulr, this proof is incorrect. i. Wht's wrong with this proof? Be specific. ii. Although the ove proof is incorrect, the lnguge L isn't regulr. Prove this.

Prolem Seven: Stte Lower Bounds (6 Points) The Myhill-Nerode theorem we proved in lecture is ctully specil cse of more generl theorem out regulr lnguges tht cn e used to prove lower ounds on the numer of sttes necessry to construct DFA for given lnguge. i. Let L e lnguge over Σ. Suppose there's finite set S such tht ny two distinct strings x, y S re distinguishle reltive to L (tht is, x L y). Prove tht ny DFA for L must hve t lest S sttes. (You sometimes her this referred to s lower-ounding the size of ny DFA for L.) Consider this lnguge from Prolem Two, prt (iii) from Prolem Set Six: L₁ = { w {, }* w contins t lest two 's with exctly five chrcters etween them } It's possile to uild seven-stte NFA for this prticulr lnguge, ut ny DFA for this lnguge will hve to hve huge numer of sttes. ii. Let S = {, } 6. Prove tht ny pir of distinct strings in S re distinguishle reltive to L₁. This shows tht ny DFA for L₁ must hve t lest 64 sttes, since there re 64 strings in S. Prolem Eight: Closure Properties Revisited (9 Points) When uilding up the regulr expressions, we explored severl closure properties of the regulr lnguges. This prolem explores some of their nunces. The regulr lnguges re closed under complementtion: If L is regulr, so is L. i. Prove or disprove: the nonregulr lnguges re closed under complementtion. The regulr lnguges re closed under union: If L₁ nd L₂ re regulr, so is L₁ L₂. ii. Prove or disprove: the nonregulr lnguges re closed under union. We know tht the union of ny two regulr lnguges is regulr. Using induction, we cn show tht the union of ny finite numer of regulr lnguges is lso regulr. As result, we sy tht the regulr lnguges re closed under finite union. An infinite union is the union of infinitely mny sets. For exmple, the rtionl numers cn e expressed s the infinite union { x / 1 x Z } { x / 2 x Z } { x / 3 x Z } out to infinity. iii. Prove or disprove: the regulr lnguges re closed under infinite union.

Extr Credit Prolem: Fooling Sets (1 Point Extr Credit) In Prolem Seven, you sw how to use distinguishility to lower-ound the size of DFAs for prticulr lnguge. Unfortuntely, distinguishility is not powerful enough technique to lower-ound the sizes of NFAs. In fct, it's in generl quite hrd to ound NFA sizes; there's $1,000,000 prize for nyone who finds polynomil-time lgorithm tht, given n ritrry NFA, converts it to the smllest possile equivlent NFA! Although it's generlly difficult to lower-ound the sizes of NFAs is in gener, there re some techniques we cn use to find lower ounds on the sizes of NFAs. Let L e lnguge over Σ. A generlized fooling set for L is set F Σ* Σ* is set with the following properties: For ny (x, y) F, we hve xy L. For ny distinct pirs (x₁, y₁), (x₂, y₂) F, we hve x₁y₂ L or x₂y₁ L (this is n inclusive OR.) As n exmple, consider this lnguge L₁: L₁ = { w {, }* w contins t lest two 's with exctly five chrcters etween them } The following set is generlized fooling set for L₁: F₁ = { (, ), (, ), (, ), (, ), (, ), (, ), (, ε) } It's worth investigting why, exctly, this is generlized fooling set for L₁. Prove tht if there is generlized fooling set F for some lnguge L tht contins n pirs of strings, then ny NFA for L must hve t lest n sttes.