Agend CMSC/LING 723, LBSC 744 Kristy Hollingshed Seitz Institute for Advnced Computer Studies University of Mrylnd HW0 questions? Due Thursdy before clss! When in doubt, keep it simple... Lecture 2: 6 September 2011 2 Agend Regulr Expressions HW0 questions? Due Thursdy before clss! A met-lnguge for specifying simple clsses of strings Very useful in serching nd mtching text strings re everywhere! Implementtions in the shell (sed, wk, bsh, grep), Perl, Jv, Python, 3 4 Regulr Expressions (crsh course) Exmples of Regulr Expressions [-z] exctly one lowercse letter [-z]* zero or more lowercse letters [-z]+ one or more lowercse letters [-z]? zero or one lowercse letters [-za-z0-9] one lowercse or uppercse letter, or digit [^(] mtch nything tht is not '(' Bsic regulr expressions /hppy/ hppy /[bcd]/, b, c, d /[-d]/, b, c, d /[^-d]/ e, f, g, z /[Tt]he/ The, the /(dog ct)/ dog, ct Specil metchrcters /colou?r/ color, colour /oo*h!/ oh!, ooh!, oooh!, /oo+h!/ ooh!, oooh!, ooooh!, /beg.n/ begn, begin, begun, begbn, 5 6 1
Agend Equivlence Reltions We cn sy the following describe regulr lnguge cn be implemented by finite-stte utomt Regulr lnguges cn be generted by regulr grmmrs Finite-Stte Automt Regulr Expressions Regulr Lnguges Regulr Grmmrs 7 8 Chomsky Hierrchy Context-free Lnguge Mechnisms Exmples Regulr Context-free Context-sensitive Regulr expressions Regulr grmmrs Finite-stte utomt Finite-stte trnsducers WFSAs/WFSTs Context-free grmmrs (CFGs) Pushdown utomt Unifiction grmmrs Lexiclized formlisms (e.g., TAG, CCG) x n y Morphology Phonology Tggers n b n Most syntx n b m c n d m Cross-seril dependencies 9 10 Chomsky Hierrchy Context-sensitive: Unifiction Lnguge Mechnisms Exmples Regulr Context-free Context-sensitive Regulr expressions Regulr grmmrs Finite-stte utomt Finite-stte trnsducers WFSAs/WFSTs Context-free grmmrs (CFGs) Pushdown utomt Unifiction grmmrs Lexiclized formlisms (e.g., TAG, CCG) x n y Morphology Phonology Tggers n b n Most syntx n b m c n d m Cross-seril dependencies 11 12 2
Context-sensitive: Cross-seril Dependencies Context-sensitive: Cross-seril Dependencies 13 14 Context-sensitive: Cross-seril Dependencies Chomsky Hierrchy Lnguge Mechnisms Exmples Regulr Context-free Context-sensitive Regulr expressions Regulr grmmrs Finite-stte utomt Finite-stte trnsducers WFSAs/WFSTs Context-free grmmrs (CFGs) Pushdown utomt Unifiction grmmrs Lexiclized formlisms (e.g., TAG, CCG) x n y Morphology Phonology Tggers n b n Most syntx n b m c n d m Cross-seril dependencies 15 16 Sheep-speech Automton Lnguge: b! b! b! b!... Regulr Expression: /b+!/ Nturl Lnguge Automton Finite-Stte Automton: b! 17 18 3
Nturl Lnguge Automton Finite-Stte Automt (FSA) Forml definitions Wht re they? Wht do they do? How do they work? 19 20 FSA: Wht re they? Q: finite set of N sttes Q = {q 0, q 1, q 2, q 3, q 4 } The strt stte: q 0 The set of finl sttes: F = {q 4 } Σ: finite input lphbet of symbols Σ = {, b,!} δ(q,i): trnsition function Given stte q nd input symbol i, return new stte q' FSA: Stte Trnsition Tble Input Stte b! 0 1 1 2 2 3 3 3 4 4 δ(q 3,!) q 4 b! b! 21 22 FSA: Wht do they do? Given string, FSA either rejects or ccepts it b! reject b! ccept bz! reject b! ccept b! ccept b reject moooo reject Applictions in NLP? Grmmticlity (on the word level) Morphology (sub-word level) Orthogrphy (chrcter-level) Phonology (phoneme-level) FSA: How do they work? q 0 q 1 q 2 q 3 q 3 q 4 b! ACCEPT b! 23 24 4
FSA: How do they work? D-RECOGNIZE q 0 q 1 q 2 b!!! REJECT b! 25 26 Accept or Generte? Simple NLP with FSAs Forml lnguges re sets of strings Strings composed of symbols drwn from finite lphbet Finite-stte utomt define forml lnguges Without hving to enumerte ll the strings in the lnguge Two views of FSAs: Acceptors to tell you if string is in the lnguge Genertors to produce ll nd only the strings in the lnguge 27 28 Agend Introducing Non-Determinism Deterministic vs. Non-deterministic FSAs /b+!/ /b+!/ Epsilon (ε) trnsitions 29 30 5
Using NFSAs to Accept Strings ND-RECOGNIZE Wht does it men? Accept: there exists t lest one pth (need not be ll pths) Reject: no pths exist Generl pproches: Bckup: dd mrkers t choice points, then possibly revisit unexplored rcs t mrked choice point Look-hed: look hed in input to provide clues Prllelism: look t lterntives in prllel Recognition with NFSAs s serch through stte spce Agend holds (stte, tpe position) pirs 31 32 ND-RECOGNIZE Stte Orderings Stck (LIFO): depth-first Queue (FIFO): bredth-first 33 34 ND-RECOGNIZE: Exmple Wht s the point? NFSAs nd DFSAs re equivlent For every NFSA, there is equivlent DFSA (nd vice vers) Equivlence between regulr expressions nd FSA Esy to show with NFSAs Why use NFSAs? ACCEPT 35 36 6
Agend Finite-Stte Trnsducers (FSTs) A two-tpe utomton tht recognizes or genertes pirs of strings Think of n FST s n FSA with two symbol strings on ech rc One symbol string from ech tpe 37 38 Four-fold view of FSTs Agend As recognizer As genertor As trnsltor As set relter 39 40 Regulr Lnguge: Definition Regulr Lnguges: Strting Points Regulr lnguges/fsas s sets Set mth is regulr lnguge Σ ε, {} is regulr lnguge If L 1 nd L 2 re regulr lnguges, then so re: L 1 L 2 = {x y x L 1, y L 2 }, the conctention of L 1 nd L 2 L 1 L 2, the union or disjunction of L 1 nd L 2 L 1, the Kleene closure of L 1 41 42 7
Regulr Lnguges: Conctention Regulr Lnguges: Disjunction 43 44 Regulr Lnguges: Kleene Closure Agend: Summry 45 46 8