Introduction to Computational Linguistics Olga Zamaraeva (2018) Based on Bender (prev. years) University of Washington May 3, 2018 1 / 101
Midterm Project Milestone 2: due Friday Assgnments 4& 5 due dates moved Start as soon as you can though 2 / 101
Overview We will focus on the concept of parsing and why naive parsing is inefficient idea Then will briefly cover and then Assignment 4 asks you to study it in detail next time: probabilistic parsing (also need for Assignment 4) 3 / 101
Parsing Recognizing string as input and assigning structure to it Syntactic parsing: assigning syntactic structure Semantic parsing: assigning semantic structure 4 / 101
Parsing: Making explicit structure that is inherent (implicit) in natural language strings What is that structure? Why would we need it? 5 / 101
Bottom Up vs. Top Down parsing S / \??? a b c d d f 6 / 101
Recursive Top-Down Parsing slide from: https://www.cs.cornell.edu/courses/cs4740/2012sp/lectures/parsing-intro-4pp.pdf 7 / 101
Recursive Top-Down Parsing slide from: https://www.cs.cornell.edu/courses/cs4740/2012sp/lectures/parsing-intro-4pp.pdf 8 / 101
Recursive Top-Down Parsing slide from: https://www.cs.cornell.edu/courses/cs4740/2012sp/lectures/parsing-intro-4pp.pdf 9 / 101
Recursive Top-Down Parsing slide from: https://www.cs.cornell.edu/courses/cs4740/2012sp/lectures/parsing-intro-4pp.pdf 10 / 101
Recursive Top-Down Parsing slide from: https://www.cs.cornell.edu/courses/cs4740/2012sp/lectures/parsing-intro-4pp.pdf 11 / 101
Recursive Top-Down Parsing What is the time complexity of what we just saw? 12 / 101
Recursive Top-Down Parsing What is the time complexity of what we just saw? Exponential in n! (Meaning, as we increase the length of input, the time to do parsing increases exponentially) (Which is very very bad) 13 / 101
**Optional exercise: Recursive Parsing Running Time S A A a A b a A c ɛ Consider a string a n c n Convince yourself that you will expand A 2 n times 14 / 101
Example 15 / 101
Bottom-up parsing Book the flight through Houston 16 / 101
Bottom-up parsing N D N P PN Book the flight through Houston 17 / 101
Bottom-up parsing D P PN N the N through Houston Book flight 18 / 101
Bottom-up parsing NP PP N D P PN Book the N through Houston flight No more possibilities! Backtrack... 19 / 101
Bottom-up parsing D PP N the N P PN Book flight through Houston 20 / 101
Bottom-up parsing D N the PP Book N P PN flight through Houston 21 / 101
Bottom-up parsing NP N D Book the PP N P PN flight through Houston No more possibilities! Backtrack... Up to where?.. 22 / 101
Bottom-up parsing Backtrack to the very beginning, actually! Book the flight through Houston 23 / 101
Bottom-up parsing V D N P PN Book the flight through Houston 24 / 101
Bottom-up parsing V D P PN Book the N through Houston flight 25 / 101
Bottom-up parsing V NP PP Book D P PN the N through Houston flight 26 / 101
Bottom-up parsing VP PP V NP P PN Book D through Houston the N flight 27 / 101
Bottom-up parsing VP VP PP V NP P PN Book D through Houston the N flight 28 / 101
Bottom-up parsing S VP VP PP V NP P PN Book D through Houston the N flight 29 / 101
Bottom-up parsing Or, we could have instead done: V D PP Book the N P PN flight through Houston 30 / 101
Bottom-up parsing V D Book the PP N P PN flight through Houston 31 / 101
Bottom-up parsing V NP Book D the PP N P PN flight through Houston 32 / 101
Bottom-up parsing VP V NP Book D the PP N P PN flight through Houston 33 / 101
Bottom-up parsing S VP V NP Book D the PP N P PN flight through Houston 34 / 101
How do we make sure we get both trees? Go through all possibilities for productions 35 / 101
Top-Down Parsing S 36 / 101
Top-Down Parsing S NP VP 37 / 101
Top-Down Parsing S NP VP Pronoun 38 / 101
Top-Down Parsing S NP VP 39 / 101
Top-Down Parsing S NP VP PN 40 / 101
Top-Down Parsing S NP VP 41 / 101
Top-Down Parsing S NP VP D 42 / 101
Top-Down Parsing S NP VP 43 / 101
Top-Down Parsing S 44 / 101
Top-Down Parsing S Aux NP VP 45 / 101
Top-Down Parsing S 46 / 101
Top-Down Parsing S VP 47 / 101
Top-Down Parsing S VP V 48 / 101
Top-Down Parsing S VP V Book Yes, but we have more input still... 49 / 101
Top-Down Parsing S VP V NP 50 / 101
Top-Down Parsing S VP V NP Book 51 / 101
Top-Down Parsing S VP V NP Book Pronoun 52 / 101
Top-Down Parsing S VP V NP Book 53 / 101
Top-Down Parsing S VP V NP Book PN 54 / 101
Top-Down Parsing S VP V NP Book 55 / 101
Top-Down Parsing S VP V NP Book D 56 / 101
Top-Down Parsing S VP V NP Book D the 57 / 101
Top-Down Parsing S VP V NP Book D the N 58 / 101
Top-Down Parsing S VP V NP Book D the N flight Yes, but we have more input still... 59 / 101
Top-Down Parsing S VP V NP Book D the N 60 / 101
Top-Down Parsing S VP V NP Book D the N N 61 / 101
Top-Down Parsing S VP V NP Book D the N N flight Nope... Backtrack again... 62 / 101
Top-Down Parsing S VP V NP Book D the 63 / 101
Top-Down Parsing S VP V NP Book D the PP 64 / 101
Top-Down Parsing S VP V NP Book D the PP N 65 / 101
Top-Down Parsing S VP V NP Book D the PP N flight 66 / 101
Top-Down Parsing S VP V NP Book D the PP N P NP flight 67 / 101
Top-Down Parsing S VP V NP Book D the PP N P NP flight through 68 / 101
Top-Down Parsing S VP V NP Book D the PP N P NP flight through Pronoun 69 / 101
Top-Down Parsing S VP V NP Book D the PP N P NP flight through 70 / 101
Top-Down Parsing S VP V NP Book D the PP N P NP flight through PN 71 / 101
Top-Down Parsing S V VP NP Book D the PP N P NP flight through PN Houston 72 / 101
Top-Down Parsing Could we have gotten the second tree by top-down parsing? 73 / 101
Top-Down Parsing Could we have gotten the second tree by top-down parsing? Yes; it is a matter of which rule happened to be on the top of the stack We grabbed VP V NP But the option VP VP PP is also on the stack somewhere Thus the returned parse is subject to an arbitrary listing of rules in the grammar 74 / 101
Bottom Up vs. Top Down parsing Top-down parsers do not waste time exploring hypotheses not leading to S...but do waste time exploring hypotheses not matching the input Bottom-up parsers do not waste time exploring hypotheses not matching input...but do waste time exploring hypotheses not leading to S Both can take exponential time (in the worst case, easier shown on abstract CFG) Some recursive parsers are O(n 4 ) An answer to poor time complexity: dynamic O(n 3 ) 75 / 101
Example: The Fibonacci numbers Recursive definition: f(0) = 0; f(1) = 1 f(n) = f(n-1) + f(n-2) 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657... f(100) = 218922995834555169026 76 / 101
The Fibonacci numbers: naive implementation Since we have a recursive definition, let s implement the Fibonacci numbers printer recursively! def fibonacci(n): if n in [0,1]: return n return fibonacci(n-1) + fibonacci(n-2) What s the problem with this? 77 / 101
The Fibonacci numbers: better implementation def fibonacci(n): return fibonacci_helper(n,{}) def fibonacci_helper(n,memo): if n in [0,1]: return n if not n in memo: memo[n] = fibonacci_helper(n-1,memo) + fibonacci_helper(n-2,memo) return memo[n] 78 / 101
Fill in a table with solutions to subproblems Then can just look up momentarily the precomputed solution No need to perform the same computation many times 79 / 101
Fibonacci numbers f(0) = 0 f(1) = 1 f(2) = f(1) + f(0) what s f(1)? what s f(0)? 80 / 101
Fibonacci numbers f(3) = f(2) + f(1) f(2) = f(1) + f(0) f(1) = 1 f(0) = 0 81 / 101
Fibonacci numbers f(4) = f(3) + f(2) f(3) = f(2) + f(1) f(2) = f(1) + f(0) f(1) = 1 f(0) = 0 82 / 101
Fibonacci numbers f(5) = f(4) + f(3) f(4) = f(3) + f(2) f(3) = f(2) + f(1) f(2) = f(1) + f(0) f(1) = 1 f(0) = 0 etc... (deep recursion; slow; do the same computation again and again) 83 / 101
Fibonacci numbers f(0) =? 0 1 2 3 4 5 84 / 101
Fibonacci numbers f(0) =? Not in the table, so compute: f(0)=0 (or rather, return the base case) fill in the cell 0 1 2 3 4 5 85 / 101
Fibonacci numbers f(1) =? Not in the table, so compute: f(1)=1 (or rather, return the base case) fill in the cell 0 1 2 3 4 5 0 86 / 101
Fibonacci numbers f(2) =? Not in the table, so compute: f(2)=f(2-1) + f(2-2) = f(1) + f(0) But both f(1) and f(0) are already in the table! No need to compute! Just look up! fill in the cell 0 1 2 3 4 5 0 1 87 / 101
Fibonacci numbers f(3) =? Not in the table, so compute: f(3)=f(3-1) + f(3-2) = f(2) + f(1) But both f(2) and f(1) are already in the table! No need to compute! Just look up! fill in the cell 0 1 2 3 4 5 0 1 1 88 / 101
Fibonacci numbers f(4) =? Not in the table, so compute: f(4)=f(4-1) + f(4-2) = f(3) + f(2) But both f(3) and f(2) are already in the table! No need to compute! Just look up! fill in the cell 0 1 2 3 4 5 0 1 1 2 89 / 101
for parsing Once the constituent has been discovered, store the information Example: The algorithm (Cocke-Kazami-Younger) 90 / 101
Chomsky Normal Form All productions must conform to two forms: A BC A w i.e. only binary branching trees (and leaves) 91 / 101
Chomsky Normal Form To convert to CNF: copy all conforming rules as is Flatten unit productions N, N cat dog becomes: cat, dog Introduce dummy rules to get rid of mixed terminal-nonterminal RHS (right-hand side) INF to VP becomes: TO to and INF TO VP Introduce dummy rules to expand rules with RHS greater than 2 nonterminals A B C D becomes: A X D, X B C 92 / 101
Original grammar 93 / 101
Chomsky Normal Form 94 / 101
: Main idea A 2-dimensional array (aka a table) can encode the structure of the tree Each cell [i,j] contains all constituents that span positions i through j of the input string 0 Book 1 that 2 flight 3 Cell [0,n] must have the Start symbol if we have a parse...and can have more than one! 95 / 101
96 / 101
97 / 101
98 / 101
99 / 101
Limitations of classical It is a recognizer Turn a recognizer into a parser by storing all tree paths leading to S...but returning all possible trees is again exponential time! Also, we modified the grammar! 100 / 101
Limitations of classical : Solutions Probabilistic parsing Train a probabilistic grammar and then return the most probable parse Modify to be able to recover original grammar Employ e.g. partial parsing to get accommodate CFG directly (not in CNF) 101 / 101