Regular languages, regular expressions, & finite automata (intro) CS 350 Fall 2018 gilray.org/classes/fall2018/cs350/ 1
L = {hello, bonjour, konnichiwa, } Σ = {a, b, c,, y, z}!2
Σ = {a, b, c,, y, z} Σ* = {w 0 w k k, i N. w i Σ}!3
Σ = {a, b, c,, y, z} Σ* = {w 0 w k k, i N. w i Σ} Examples: abcabc Σ* ϵ Σ* aaaa Σ*!4
Regular languages!5
Regular expressions: E!6
a Σ. L(a) = {a}!7
The interpretation of regex a is the singleton set containing just the string a. a Σ. L(a) = {a}!8
All characters in the alphabet are regular expressions. a Σ. L(a) = {a}!9
There is also an empty/null language,, ϵ and an empty-string language,. Ø L(Ø) = {} L(ϵ) = {ϵ}!10
Composite forms of regular expressions can be derived, from other composite forms, and terminally, from null, empty, or single-character REs. A minimal and sufficient set of derived forms is: disjunction of REs, composition of REs, and kleene star of REs.!11
Ø E, ϵ E, a Σ a E e E (e) E e 0 E e 1 E e 0 e 1 E e 0 E e 1 E e 0 e 1 E e E e* E!12
e E ::= w Ø ϵ (e) e e 0 1 e + e 0 1 e* w Σ = { } There is one base case defining regexes and four inductive cases. Both and + are commonly used to signify disjunction in regexes.!13
https://docs.python.org/3/reference/grammar.html!14
Interpreting Regexes!15
Precedence: kleene star (*), concatenation (ab), then disjunction (a b). Thus, a bc bcd* is the same as (a) (bc) (bc(d*))!16
L(a) = {a} L(ϵ) = {ϵ} L(Ø) = {} Juxtaposition is language concatenation, disjunction is language union, kleene star is interpreted as kleene closure: L(e 0 e 1 ) = L(e 0 ) L(e 1 ) L(e 0 e 1 ) = L(e 0 ) L(e 1 ) L(e* 0 ) = L(e 0 )*!17
Language concatenation: L L = {s 0 s 1 s 0 L s 1 L } Kleene-closure of a Language: L* = {s 0 s 1 s k k N s i L}!18
Kleene-closure can also be defined as a fixed point! L(e)* = L, where f e (L) = L f e (L) = L L(e) {ϵ}!19
(L*)* = L* This means kleene star is idempotent L(e)* = L, where f e (L) = L f e (L) = L L(e) {ϵ}!20
Try an example: L is the language of odd-length strings of zeros. Give a regex for L. L = {s 1 s 2k+1 k N s i = 0}!21
Try an example: L is the language of odd-length strings of zeros. Give a regex for L. 0(00)*!22
Try an example: L is the language of all strings over alphabet {0,1} where every 1 has an adjacent 1.!23
Try an example: L is the language of all strings over alphabet {0,1} where every 1 has an adjacent 1. (0 111*)* = (0 11 + )* e + is syntactic sugar for ee*!24
Try an example: L is the language of odd decimal integers greater than zero. Give a regex for L.!25
Try an example: L is the language of odd decimal integers greater than zero. Give a regex for L. (1 2 3 4 5 6 7 8 9)(0 1 2 3 4 5 6 7 8 9) * (1 3 5 7 9) 1 3 5 7 9!26
Try an example: L is the language of odd decimal integers greater than zero. Give a regex for L. [1 9][0 9] * (1 3 5 7 9) 1 3 5 7 9 [a-g] is a character class, and is syntactic sugar for (a b c d e f g)!27
Try an example: L is the language of odd decimal integers greater than zero. Give a regex for L. ([1 9][0 9] * )?(1 3 5 7 9) e? is syntactic sugar for (e ϵ)!28
Regexes in Python import re r a a r ab ab r b b r a b a b!29
Regexes in Python import re r a* r b+ a* b + r c? c?!30
Regexes in Python import re r \d r [0-9] (0 1 2 3 4 5 6 7 8 9) (0 1 2 3 4 5 6 7 8 9) r z{2,4} (zz zzz zzzz)!31
Regexes in Python import re >>> m = re.match(r \d, 5 ) >>> m.group(0) 5!32
Regexes in Python import re >>> m = re.match(r \d, ) >>> m == None True!33
Regexes in Python import re >>> m = re.match(r (\d)\d\d, 456 ) >>> m.group(0) 456 >>> m.group(1) 4!34
Regexes in Python import re >>> m = re.match(r (\d)\d\d, 4567 ) >>> m.group(0) 456 >>> m.group(1) 4!35
Regexes in Python import re >>> m = re.match(r (\d)\d\d, 4567 ) >>> m.group(0) 456 >>> m.group(1) 4 >>> m = re.match(r ^(\d)\d\d$, 4567 ) >>> m == None True!36
Finite Automata!37
Every automata has a set of states, one of which must be a designated start state. This state is marked by an incoming arrow, like so. q0!38
There may also be zero or more final states also called accept states. These are shown with an extra circle around them. q0 q1!39
q0 The starting state may also be an accept state.!40
L(Ø) L(ϵ) q0 q0 These are also two of the simplest languages.!41
Edges are labeled with characters from. This DFA is equivalent to the RE: a Σ a q0 q1 it reads the character a, and then accepts it.!42
a q0 q1 b q2 This encodes the language {a, ab}!43
a b a q0 q1 b q2 This encodes the language a + b*!44
Edges not shown implicitly reach a dead state. a b a q0 q1 b q2 b a dead a,b!45
Try an example: L is the language of all strings over {0,1} where there are an even number of 1s.!46
Try an example: L is the language of all strings over {0,1} where there are an even number of 1s. 1 1 q0 q1 0 0!47
Try an example: L is the language of all strings over {0,1,2} where there are an odd number of 1s and no 2s.!48
Try an example: L is the language of all strings over {0,1,2} where there are an odd number of 1s and no 2s. 1 1 q0 q1 0 0!49
Deterministic Finite Automata (DFA) Non-deterministic Finite Automata (NFA)!50
Equivalent models of regular languages Converts to GNFA Converts to RE DFA Minimizes to Converts to NFA Converts to!51