Regular Expressions
Regular Languages Fundamental Question -- Cardinality Alphabet = Σ is finite Strings = Σ is countable Languages = P(Σ ) is uncountable # Finite Automata is countable -- Q Σ +1 transition functions -- 2 Q accepting states -- Q starting states Problem Characterize those Languages recognized by Finite Automata. Solution Regular Languages
Concatenation of Languages Definitions and Notation L 1 L 2 = {x y x L 1 and y L 2 } L 2 = L L L k = L L k 1 Examples L 1 = {0,1} L 2 = {a, b} L 1 L 2 = {0a, 0b, 1a 1b} L 2 L 1 = {a 0, b0, a1, b1} L 2 L 1 2 L 1 = {00, 01, 10, 11}
Kleene Closure Recursive Definition Base Cases: -- ε L where ε is the empty string -- ω L whenever ω L Recursive Step: -- ω L and υ L ωυ L Explicit Definitions L* = {x 1 x 2 x n x k L and n 0} L* = {ε} L L 2 = k 0 L k L+ = L L = L L 2 = k 1 L k
Kleene Closure (continued) Example 1 L= {0,1} L = all bit strings including the empty string Example 2 L = {10, 1} L = {ε, 1, 10, 11, 110, 101, } Example 3 φ = ε
Regular Expressions Regular Expressions Base Cases φ, ε, a (a ) Recursive Cases -- (r), r s, rs, r Languages for Regular Expressions (Interpretation) Base Cases -- L(φ) = φ, L(ε) = {ε}, L(a) = {a} Recursive Cases -- L((r)) = L(r) -- L(r s) = L(r) L(s) -- L(r s) = L(r) L(s) -- L(r ) = L(r)
Observations {ε} φ because L{ε} = L φ = Lφ ε = φ r + = rr * L(r + ) = L(r) + = L(r)L(r) * L(r r) = L(r) L(r) = L(r) -- Regular Expressions for a Language are NOT unique.
Examples of Regular Languages Examples L = (0 1) = {x x is any string of zeros and ones} L = (0 1) 01101(0 1) = {x x is any bitstring containing 01101} L = 0 1 2 = {x x contains zeros followed by ones followed by twos} L = 1 01 01 = {x x is a string with exactly two zeros} L = 1 (1 0 1 01 ) = {x x is a string an even number of zeros} Observations Regular expressions are not unique. A language can be represented by infinitely different regular expressions.
More Examples Dice -- Rules for Winning Σ = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12} r = 7 11 4(2 3 5 6 8 9 10 12) 4 Programming Languages L = a b z D = 0 1 2 3 4 5 6 7 8 9 -- Identifier = L(L + D) -- Integer = D +
Rules for Languages Defined by Regular Expressions Rules r s = s r (r s) t = r (s t) r φ = r rφ = φ = φ r rε = r = ε r r(st) = (rs)t r(s t) = rs rt (r ) = r φ = ε = ε
Rules for that Fail for Languages Defined by Regular Expressions Inequalities rs sr r (st) (r s)(r t) (not a Boolean Algebra) (r s) r s (rs) r s Counterexamples r = {u u has only zeros} s = {v v has only ones}
Fundamental Theorem of Regular Languages Theorem 1: A language L can be recognized by a DFA if and only if L can be represented by a regular expression. Proof: See Diagrams! : Regular expressions DFA. Structural induction. Base Cases: φ : M has no accepting states. ε : Initial state accepting, all other states rejecting. a: M has three states: start state, accept a, reject everything else.
Recursive Steps: Union: Key ideas: Run machines simultaneously and nondeterministically. Build a new start state and transition simultaneously to M 1 and M 2. Concatenation: Key ideas: Follow M 1 with M 2, nondeterministically. For each transitions into accepting states of M 1, build a transition into the starting state of M 2. Star: Key ideas: Build a new transition from each accepting state, mimicking the transitions from the original starting state. Introduce a new starting state to accept ε. : DFA Regular expressions.
Base Cases φ ε a a
Union ε ε M 1 M 2
Concatenation M 1 M 2 ε ε
Kleene Closure ε ε ε ε M 1
: DFA Regular Expressions Algorithm: Basic Idea = Removing States 1. Create New Start State (No Transitions In / ε -Transition Out) 2. Create New Accepting State (ε -Transition In / No Transitions Out) 3. Create a Single Transition to and from Each State a. Collapse multiple transition between states by union b. Insert φ -transition where no transitions occur 4. Rip out state r by inserting new transitions from state p to state q R new (p,q) = R old (p,q) R(p,r)R(r,r) R(r,q) 5. Repeat Step 4 until only the Start and Accept States Remain
: DFA Regular Expressions (continued) Observations 1. Each step of the Algorithm Produces a Regular Expression. 2. At each step of the Algorithm, the New Machine Accepts Exactly the Same Language as the Original Machine. 3. The Final Machine Accepts a Regular Expression. Examples 2 States 3 States
Example 1 1 q 0 0,1 0 q 1
Example 2 q 0 1 0 p 0,1 r
Regular Languages and Finite State Automata Observations Regular Languages are easier to construct when order matters -- Keywords -- Spam -- Low Level Lexical Analysis Finite State Automata are easier to construct when order does not matter. -- Vending Machines
Regular Languages Definition A language L is called regular if L is the language represented by a regular expression or equivalently if L is the language accepted by a DFA. Theorem 2: Regular languages are closed under i,,,, c Proof: We proved this result for union, concatenation, and star in Theorem 1. Complement: If L(M ) = L, then L(M ) = L c, where F = Q F = F c. Intersection: If L(M ) = L 1 and L(N) = L 2, then L(M N ) = L 1 L 2 where F = F 1 F 2. Also follows by De Morgan s Laws since L 1 L 2 = ( c c L 1 L2 ) c. QED
A Non Regular Language Lemma: L = {0 n 1 n } is not regular. Proof: Suppose L is regular, and M is a DFA that recognizes L. Let M have p states, and consider the string 0 p +1 1 p+1. By the pigeonhole principle, M must be in the same state q at least twice during its traversal of the string 0 p +1. Thus M transitions from q back to q on some substring 0 k, k p +1. Therefore -- δ(0 r 0 k 0 p+1 r ) = δ(0 p+1 ) so M cannot distinguish between 0 p +1 and 0 p +k +1 ; M loses count! Hence δ( p+1+k 1 p +1 ) = δ(0 p+1 1 p+1 ) -- Contradiction.