Driving Semantic Parsing from the World s Response

Size: px

Start display at page:

Download "Driving Semantic Parsing from the World s Response"

Augusta Richardson
5 years ago
Views:

1 Driving Semantic Parsing from the World s Response James Clarke, Dan Goldwasser, Ming-Wei Chang, Dan Roth Cognitive Computation Group University of Illinois at Urbana-Champaign CoNLL 2010 Clarke, Goldwasser, Chang, Roth 1

2 What is Semantic Parsing? Meaning Representation make(coffee, sugar=0, milk=0.3) I d like a coffee with no sugar and just a little milk Clarke, Goldwasser, Chang, Roth 2

3 What is Semantic Parsing? Meaning Representation make(coffee, sugar=0, milk=0.3) I d like a coffee with no sugar and just a little milk Clarke, Goldwasser, Chang, Roth 2

4 Supervised Learning Problem text meaning Training algorithm Model Challenges: Structured Prediction problem Model part of the structure as hidden? Clarke, Goldwasser, Chang, Roth 3

5 Lots of previous work Multiple approaches to the problem: KRISP (Kate & Mooney 2006) SVM-based parser using string kernels. Zettlemoyer & Collins 2005; Zettlemoyer & Collins 2007 Probabilistic parser based on relaxed CCG grammars. WASP (Wong & Mooney 2006; Wong & Mooney 2007) Based on Synchronous CFG. Ge & Mooney 2009 Integrated syntactic and semantic parser. Clarke, Goldwasser, Chang, Roth 4

6 Lots of previous work Multiple approaches to the problem: KRISP (Kate & Mooney 2006) SVM-based parser using string kernels. Zettlemoyer & Collins 2005; Zettlemoyer & Collins 2007 Probabilistic parser based on relaxed CCG grammars. WASP (Wong & Mooney 2006; Wong & Mooney 2007) Based on Synchronous CFG. Ge & Mooney 2009 Integrated syntactic and semantic parser. Assumption: A training set consisting of natural language and meaning representation pairs. Clarke, Goldwasser, Chang, Roth 4

7 Using the World s response Meaning Representation make(coffee, sugar=0, milk=0.3) I d like a coffee with no sugar and just a little milk Clarke, Goldwasser, Chang, Roth 5

8 Using the World s response Meaning Representation make(coffee, sugar=0, milk=0.3) I d like a coffee with no sugar and just a little milk Good! Bad! Clarke, Goldwasser, Chang, Roth 5

9 Using the World s response Meaning Representation make(coffee, sugar=0, milk=0.3) I d like a coffee with no sugar and just a little milk Good! Bad! Question: Can we use feedback based on the response to provide supervision? Clarke, Goldwasser, Chang, Roth 5

10 This work We aim to: Reduce the burden of annotation for semantic parsing. We focus on: Using the World s response to learn a semantic parser. Developing new training algorithms to support this learning paradigm. A lightweight semantic parsing model that doesn t require annotated data. This results in: Learning a semantic parser using zero annotated meaning representations. Clarke, Goldwasser, Chang, Roth 6

11 Outline 1 Semantic Parsing 2 Learning DIRECT Approach AGGRESSIVE Approach 3 Semantic Parsing Model 4 Experiments Clarke, Goldwasser, Chang, Roth 7

12 Outline 1 Semantic Parsing 2 Learning DIRECT Approach AGGRESSIVE Approach 3 Semantic Parsing Model 4 Experiments Clarke, Goldwasser, Chang, Roth 8

13 Semantic Parsing INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas))) Clarke, Goldwasser, Chang, Roth 9

14 Semantic Parsing INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas))) F : X Z ẑ = F w (x) = arg max w T Φ(x, y, z) y Y,z Z Clarke, Goldwasser, Chang, Roth 9

15 Semantic Parsing INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas))) F : X Z ẑ = F w (x) = arg max w T Φ(x, y, z) y Y,z Z Model The nature of inference and feature functions. Learning Strategy How we obtain the weights. Clarke, Goldwasser, Chang, Roth 9

16 Semantic Parsing INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas))) Response r New Mexico F : X Z ẑ = F w (x) = arg max w T Φ(x, y, z) y Y,z Z Model The nature of inference and feature functions. Learning Strategy How we obtain the weights. Clarke, Goldwasser, Chang, Roth 9

17 Outline 1 Semantic Parsing 2 Learning DIRECT Approach AGGRESSIVE Approach 3 Semantic Parsing Model 4 Experiments Clarke, Goldwasser, Chang, Roth 10

18 Learning Inputs: Natural language sentences. Feedback : X Z {+1, 1}. Zero meaning representations. Clarke, Goldwasser, Chang, Roth 11

19 Learning Inputs: Natural language sentences. Feedback : X Z {+1, 1}. Zero meaning representations. { +1 if execute(z) = r Feedback(x, z) = 1 otherwise Clarke, Goldwasser, Chang, Roth 11

20 Learning Inputs: Natural language sentences. Feedback : X Z {+1, 1}. Zero meaning representations. Goal: A weight vector that scores the correct meaning representation higher than all other meaning representations. Response Driven Learning: Feedback Input text predict Meaning Representation apply to World Clarke, Goldwasser, Chang, Roth 11

21 Learning Strategies x 1 x 2 x 3. repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n Clarke, Goldwasser, Chang, Roth 12

22 Learning Strategies x 1 y 1 z 1 x 2 y 2 z 2 x 3 y 3 z 3... x n y n z n repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence y, z = arg max w T Φ(x, y, z) Clarke, Goldwasser, Chang, Roth 12

23 Learning Strategies x 1 y 1 z 1 +1 x 2 y 2 z 2 1 x 3 y 3 z repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n y n z n 1 Clarke, Goldwasser, Chang, Roth 12

24 Learning Strategies x 1 y 1 z 1 +1 x 2 y 2 z 2 1 x 3 y 3 z repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n y n z n 1 Clarke, Goldwasser, Chang, Roth 12

25 Outline 1 Semantic Parsing 2 Learning DIRECT Approach AGGRESSIVE Approach 3 Semantic Parsing Model 4 Experiments Clarke, Goldwasser, Chang, Roth 13

26 DIRECT Approach Binary Learning Feedback Input text predict Meaning Representation apply to World DIRECT Learn a binary classifier to discriminate between good and bad meaning representations. Clarke, Goldwasser, Chang, Roth 14

27 DIRECT Approach x 1 y 1 z 1 +1 x 2 y 2 z 2 1 x 3 y 3 z 3 1 Use (x, y, z) as a training example with label from feedback..... x n y n z n 1 Clarke, Goldwasser, Chang, Roth 15

28 DIRECT Approach x 1, y 1, z 1 +1 x 2, y 2, z 2 x 3, y 3, z Use (x, y, z) as a training example with label from feedback. Find w such that f w T Φ(x, y, z) > 0 x n, y n, z n 1 Clarke, Goldwasser, Chang, Roth 15

29 DIRECT Approach Each point represented by Φ(x, y, x) normalized by x Clarke, Goldwasser, Chang, Roth 16

30 DIRECT Approach w Learn a binary classifier to discriminate between good and bad meaning representations. Clarke, Goldwasser, Chang, Roth 16

31 DIRECT Approach repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence Clarke, Goldwasser, Chang, Roth 17

32 DIRECT Approach x 1 x 2 x 3. repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n Clarke, Goldwasser, Chang, Roth 17

33 DIRECT Approach x 1 y 1 z 1 x 2 y 2 z 2 x 3 y 3 z 3... repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence y, z = arg max w T Φ(x, y, z) x n y n z n Clarke, Goldwasser, Chang, Roth 17

34 DIRECT Approach x 1 y 1 z 1 +1 x 2 y 2 z 2 +1 x 3 y 3 z repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n y n z n 1 Clarke, Goldwasser, Chang, Roth 17

35 DIRECT Approach x 1, y 1, z 1 +1 x 2, y 2, z 2 +1 x 3, y 3, z repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n, y n, z n 1 Clarke, Goldwasser, Chang, Roth 17

36 DIRECT Approach w Clarke, Goldwasser, Chang, Roth 18

37 DIRECT Approach w Clarke, Goldwasser, Chang, Roth 18

38 DIRECT Approach w Clarke, Goldwasser, Chang, Roth 18

39 DIRECT Approach w Repeat until convergence! Clarke, Goldwasser, Chang, Roth 18

40 Outline 1 Semantic Parsing 2 Learning DIRECT Approach AGGRESSIVE Approach 3 Semantic Parsing Model 4 Experiments Clarke, Goldwasser, Chang, Roth 19

41 AGGRESSIVE Approach Structured Learning Feedback Input text predict Meaning Representation apply to World AGGRESSIVE Positive feedback is a good indicator of the correct meaning representation. Use data with positive feedback as training data for structured learning. Clarke, Goldwasser, Chang, Roth 20

42 AGGRESSIVE Approach x 1 y 1 z 1 +1 x 2 y 2 z 2 1 x 3 y 3 z repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n y n z n 1 Clarke, Goldwasser, Chang, Roth 21

43 AGGRESSIVE Approach x 1 y 1 z 1 +1 x 2 y 2 z 2 1 Use items with positive feedback as training data for a structured learner. x 3 y 3 z x n y n z n 1 Clarke, Goldwasser, Chang, Roth 21

44 AGGRESSIVE Approach x 1 y 1 z 1 +1 x 2 y 2 z 2 1 Use items with positive feedback as training data for a structured learner. x 3 y 3 z x n y n z n 1 Clarke, Goldwasser, Chang, Roth 21

45 AGGRESSIVE Approach x 1 y 1 z 1 x 2 y 2 z 2 Use items with positive feedback as training data for a structured learner. x 3 y 3 z x n y n z n Clarke, Goldwasser, Chang, Roth 21

46 AGGRESSIVE Approach x 1 y 1 z 1 x 2 y 2 z 2 x 3 y 3 z 3... x n y n z n. Use items with positive feedback as training data for a structured learner. Implicitly consider all other meaning representations for these examples as bad. Find w such that w T Φ(x, y, z ) > w T Φ(x, y, z ) Clarke, Goldwasser, Chang, Roth 21

47 AGGRESSIVE Approach repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence Clarke, Goldwasser, Chang, Roth 22

48 AGGRESSIVE Approach x 1 x 2 x 3. repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n Clarke, Goldwasser, Chang, Roth 22

49 AGGRESSIVE Approach x 1 y 1 z 1 x 2 y 2 z 2 x 3 y 3 z 3... repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence y, z = arg max w T Φ(x, y, z) x n y n z n Clarke, Goldwasser, Chang, Roth 22

50 AGGRESSIVE Approach x 1 y 1 z 1 +1 x 2 y 2 z 2 +1 x 3 y 3 z repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n y n z n +1 Clarke, Goldwasser, Chang, Roth 22

51 AGGRESSIVE Approach x 1 y 1 z 1 +1 x 2 y 2 z 2 +1 x 3 y 3 z repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n y n z n +1 Clarke, Goldwasser, Chang, Roth 22

52 AGGRESSIVE Approach x 1 y 1 z 1 x 2 y 2 z 2 x 3 y 3 z 3... repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n y n z n Clarke, Goldwasser, Chang, Roth 22

53 AGGRESSIVE Approach x 1 y 1 z 1 x 2 y 2 z 2 x 3 y 3 z 3... repeat for all input sentences do Solve the inference problem Query Feedback function end for Learn a new w using feedback until Convergence x n y n z n Clarke, Goldwasser, Chang, Roth 22

54 Summary of Learning Strategies Learning Strategy Feedback Input text predict Meaning Representation apply to World DIRECT Uses both positive and negative feedback as examples to train a binary classifier. AGGRESSIVE Adapts the feedback signal and uses only positive feedback to train a structured predictor. Clarke, Goldwasser, Chang, Roth 23

55 Outline 1 Semantic Parsing 2 Learning DIRECT Approach AGGRESSIVE Approach 3 Semantic Parsing Model 4 Experiments Clarke, Goldwasser, Chang, Roth 24

56 Model INPUT x What is the largest state that borders Texas? HIDDEN y OUTPUT z largest(state(next_to(texas))) ẑ = F w (x) = arg max w T Φ(x, y, z) y Y,z Z First-order: Map lexical items. largest largest Second-order: Composition. next_to(state( )) or state(next_to( )) Inference procedure leverages the typing information of the domain. Clarke, Goldwasser, Chang, Roth 25

57 First-order Decisions How many people live in the state of Texas? Goal: population(state(texas))

58 First-order Decisions How many people live in the state of Texas? loc texas next_to state population null Goal: population(state(texas))

59 First-order Decisions How many people live in the state of Texas? loc texas next_to state population null Goal: population(state(texas))

60 First-order Decisions How many people live in the state of Texas? loc texas next_to state population null > texas texas Use a simple lexicon to bootstrap the > state process. state > population population > loc in > next_to next borders adjacent Goal: population(state(texas))

61 First-order Decisions How many people live in the state of Texas? loc texas next_to state population null > texas texas Use a simple lexicon to bootstrap the > state process. state > population population > loc in > next_to next borders adjacent Goal: population(state(texas))

62 First-order Decisions How many people live in the state of Texas? loc texas next_to state population null > texas texas Use a simple lexicon to bootstrap the > state process. state > population population > loc in > next_to next borders adjacent Goal: population(state(texas))

63 First-order Decisions How many people live in the state of Texas? loc texas next_to state population null > texas texas > state state > population population > loc in > next_to next borders adjacent Goal: population(state(texas)) Use a simple lexicon to bootstrap the process. Lexical resources help us move beyond the lexicon. wordnet_sim(people,population)

64 First-order Decisions How many people live in the state of Texas? loc texas next_to state population null > texas texas > state state > population population > loc in > next_to next borders adjacent Goal: population(state(texas)) Use a simple lexicon to bootstrap the process. Lexical resources help us move beyond the lexicon. wordnet_sim(people,population)

65 First-order Decisions How many people live in the state of Texas? loc texas next_to state population null > texas texas > state state > population population > loc in > next_to next borders adjacent Goal: population(state(texas)) Use a simple lexicon to bootstrap the process. Lexical resources help us move beyond the lexicon. wordnet_sim(people,population) Context helps disambiguate between choices.

66 Second-order Decisions How do we compose the predicates and constants. Domain dependent: Encode typing information inherent in the domain into the inference procedure. population(state( )) vs state(population( )) Features: Dependency path distance. Word position distance. Predicate bigrams. next_to(state( )) vs state(next_to( )) Clarke, Goldwasser, Chang, Roth 27

67 Outline 1 Semantic Parsing 2 Learning DIRECT Approach AGGRESSIVE Approach 3 Semantic Parsing Model 4 Experiments Clarke, Goldwasser, Chang, Roth 28

68 Evaluation Domain: GEOQUERY U.S Geographical Questions. Response 250. (x, r) pairs. Zero meaning representations. Query 250. (x) sentences. Evaluation metric: Accuracy (percentage of meaning representations that return the correct answer). Clarke, Goldwasser, Chang, Roth 29

69 Learning Behavior Algorithm R250 Q250 NOLEARN 22.2 DIRECT AGGRESSIVE SUPERVISED Clarke, Goldwasser, Chang, Roth 30

70 Learning Behavior Algorithm R250 Q250 NOLEARN 22.2 DIRECT AGGRESSIVE SUPERVISED NOLEARN used to initialize both learning approaches. Clarke, Goldwasser, Chang, Roth 30

71 Learning Behavior Algorithm R250 Q250 NOLEARN 22.2 DIRECT AGGRESSIVE SUPERVISED Q: How good is our model when trained in a fully supervised manner? Clarke, Goldwasser, Chang, Roth 30

72 Learning Behavior Algorithm R250 Q250 NOLEARN 22.2 DIRECT AGGRESSIVE SUPERVISED Q: How good is our model when trained in a fully supervised manner? A: 80% on test data. Other supervised methods range from 60% to 85% accuracy. Clarke, Goldwasser, Chang, Roth 30

73 Learning Behavior Algorithm R250 Q250 NOLEARN 22.2 DIRECT AGGRESSIVE SUPERVISED Q: Is it possible to learn without any meaning representations? Clarke, Goldwasser, Chang, Roth 30

74 Learning Behavior Algorithm R250 Q250 NOLEARN 22.2 DIRECT AGGRESSIVE SUPERVISED Q: Is it possible to learn without any meaning representations? A: Yes! A: Learns to cover more of the Response data set. A: And only 7% below the SUPERVISED upper bound. Clarke, Goldwasser, Chang, Roth 30

75 Learning Behavior Algorithm R250 Q250 NOLEARN 22.2 DIRECT AGGRESSIVE SUPERVISED Q: Is it possible to learn without any meaning representations? A: Yes! A: Learns to cover more of the Response data set. A: And only 7% below the SUPERVISED upper bound. Clarke, Goldwasser, Chang, Roth 30

76 Learning Behavior Accuracy on Response Initialization Learning Iterations AGGRESSIVE DIRECT AGGRESSIVE correctly interprets 16% that DIRECT does not. 9% vice-versa. Leaving only 9% incorrect. Clarke, Goldwasser, Chang, Roth 31

77 Learning from Indirect Supervision Similar to indirect learning protocols: Learning a binary classifier with hidden explanation. Supervision only required for binary data. No labeled structures. NAACL 2010 (Chang, Goldwasser, Roth, Srikumar 2010a). Structured learning with binary and structured labels. Mix of supervision for binary data and structured data. Binary label indicates whether input has a good structure. ICML 2010 (Chang, Goldwasser, Roth, Srikumar 2010b). Clarke, Goldwasser, Chang, Roth 32

78 Conclusions Contributions: Response Driven Learning. A new learning paradigm that doesn t rely on annotated meaning representations. Supervised at the response level. Natural supervision signal. Two learning algorithms capable of working within response driven learning. A shallow semantic parsing model. Future work: Can we combine the two learning algorithms? Other semantic parsing domains? Response driven learning for other tasks? Clarke, Goldwasser, Chang, Roth 33

Semantic Parsing with Combinatory Categorial Grammars

Semantic Parsing with Combinatory Categorial Grammars Yoav Artzi, Nicholas FitzGerald and Luke Zettlemoyer University of Washington ACL 2013 Tutorial Sofia, Bulgaria Learning Data Learning Algorithm CCG