What s so Hard about Natural Language Understanding?

Size: px

Start display at page:

Download "What s so Hard about Natural Language Understanding?"

Osborne Summers
6 years ago
Views:

1 What s so Hard about Natural Language Understanding? Alan Ritter Computer Science and Engineering The Ohio State University Collaborators: Jiwei Li, Dan Jurafsky (Stanford) Bill Dolan, Michel Galley, Jianfeng Gao (MSR), Colin Cherry (Google) Jeniya Tabassum (Ohio State), Alexander Konovalov (Ohio State), Wei Xu (Ohio State) Brendan O Connor (Umass)

Jiwei Li, Dan Jurafsky (Stanford) Bill Dolan, Michel Galley, Jianfeng Gao (MSR), Colin

2 What s so Hard about Natural Language Understanding? Alan Ritter Computer Science and Engineering The Ohio State University Collaborators: Jiwei Li, Dan Jurafsky (Stanford) Bill Dolan, Michel Galley, Jianfeng Gao (MSR), Colin Cherry (Google) Jeniya Tabassum (Ohio State), Alexander Konovalov (Ohio State), Wei Xu (Ohio State) Brendan O Connor (Umass)

5 Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe.

6 Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe. Q: Large, End-to-End Datasets for NLU? Web-scale Conversations? Web-scale Structured Data?

7 Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe. Q: Large, End-to-End Datasets for NLU? Web-scale Conversations? Web-scale Structured Data?

8 Data-Driven Conversation Twitter: ~ 500 Million Public SMS-Style Conversations per Month Goal: Learn conversational agents directly from massive volumes of data. 6

9 Data-Driven Conversation Twitter: ~ 500 Million Public SMS-Style Conversations per Month Goal: Learn conversational agents directly from massive volumes of data. 6

10 [Ritter, Cherry, Dolan EMNLP 2011] Noisy Channel Model Input: Who wants to come over for dinner tomorrow? 7

11 [Ritter, Cherry, Dolan EMNLP 2011] Noisy Channel Model Input: Who wants to come over for dinner tomorrow? { Output: Yum! I 7

12 [Ritter, Cherry, Dolan EMNLP 2011] Noisy Channel Model Input: Who wants to come over for dinner tomorrow? { { Output: Yum! I want to 7

13 [Ritter, Cherry, Dolan EMNLP 2011] Noisy Channel Model Input: Who wants to come over for dinner tomorrow? { { { Output: Yum! I want to be there 7

14 [Ritter, Cherry, Dolan EMNLP 2011] Noisy Channel Model Input: Who wants to come over for dinner tomorrow? { { { { Output: Yum! I want to be there tomorrow! 7

15 Neural Conversation [Sordoni et. al. 2015] [Xu et. al. 2016] [Wen et. al. 2016] [Li et. al. 2016] [Kannan et. al. 2016] [Serban et. al. 2016] 8

16 Neural Conversation [Sordoni et. al. 2015] [Xu et. al. 2016] [Wen et. al. 2016] [Li et. al. 2016] [Kannan et. al. 2016] [Serban et. al. 2016] 8

17 How old are you? 9 Slide Credit: Jiwei Li

18 How old are you? i 'm Slide Credit: Jiwei Li

19 How old are you? i 'm ? 11 Slide Credit: Jiwei Li

20 How old are you? i 'm ? i don 't know what you 're talking about 12 Slide Credit: Jiwei Li

21 How old are you? i 'm ? i don 't know what you 're talking about you don 't know what you 're saying 13 Slide Credit: Jiwei Li

22 How old are you? i 'm ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about Slide Credit: Jiwei Li

23 How old are you? Bad Action i 'm ? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about Slide Credit: Jiwei Li

24 How old are you? Bad Action i 'm ? Outcome i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about Slide Credit: Jiwei Li

25 Deep Reinforcement Learning [Li, Monroe, Ritter, Galley, Gao, Jurafsky EMNLP 2016] How old are you? State Encoding how old are you

26 Deep Reinforcement Learning [Li, Monroe, Ritter, Galley, Gao, Jurafsky EMNLP 2016] How old are you? Action i 'm 16. I m 16. EOS Encoding Decoding how old are you EOS I m 16.

27 Learning: Policy Gradient REINFORCE Algorithm (Williams,1992) What we want to learn How old are you? Action i 'm 16. I m 16. EOS Encoding Decoding how old are you EOS I m 16.

28 Q: Rewards?

29 Q: Rewards? A: Turing Test

30 Q: Rewards? A: Turing Test Adversarial Learning (Goodfellow et al., 2014)

31 Adversarial Learning for Neural Dialogue [Li, Monroe, Shi, Jean, Ritter, Jurafsky EMNLP 2016] Real-world conversations sample human response Discriminator Real or Fake? Response Generator generate response

32 Adversarial Learning for Neural Dialogue [Li, Monroe, Shi, Jean, Ritter, Jurafsky EMNLP 2016] (Alternate Between Training Generator and Discriminator) Real-world conversations sample human response Discriminator Real or Fake? Response Generator generate response

33 Adversarial Learning for Neural Dialogue [Li, Monroe, Shi, Jean, Ritter, Jurafsky EMNLP 2016] (Alternate Between Training Generator and Discriminator) Real-world conversations sample human response Discriminator Real or Fake? Response Generator generate response REINFORCE Algorithm (Williams,1992)

34 Adversarial Learning Improves Response Generation vs vanilla generation model Human Evaluator: Machine Evaluator: [Bowman et. al. 2016] Adversarial Win Adversarial Lose Tie 62% 18% 20% Adversarial Success (How often can you fool a machine) Adversarial Learning 8.0% Standard Seq2Seq model 4.9% Slide Credit: Jiwei Li

35 Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe. Q: Large, End-to-End Datasets for NLU? Web-scale Conversations? Web-scale Structured Data?

36 Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe. Q: Large, End-to-End Datasets for NLU? Web-scale Conversations? Generates fluent open domain replies Web-scale Structured Data?

37 Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe. Q: Large, End-to-End Datasets for NLU? Web-scale Conversations? Web-scale Structured Data? Generates fluent open domain replies Really Natural Language Understanding?

38 Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe. Q: Large, End-to-End Datasets for NLU? Web-scale Conversations? Web-scale Structured Data? Generates fluent open domain replies Really Natural Language Understanding?

39 Learning from Distant Supervision [Mintz et. al. 2009] 1) Named Entity Recognition Challenge: highly ambiguous labels [Ritter, et. al. EMNLP 2011] 2) Relation Extraction Challenge: missing data [Ritter, et. al. TACL 2013] 3) Time Normalization Challenge: diversity in noisy text [Tabassum, Ritter, Xu, EMNLP 2016] 4) Event Extraction Challenge: lack of negative examples [Ritter, et. al. WWW 2015] [Konovalov, et. al. WWW 2017] O( ) = NX log p (y i x i ) i {z } Log Likelihood U D( p ˆp unlabeled ) {z } Label regularization

EMNLP 2011] 2) Relation Extraction Challenge: missing data [Ritter, et. al.

Xu, EMNLP 2016] 4) Event Extraction Challenge: lack of negative examples [Ritter, et.

40 Learning from Distant Supervision [Mintz et. al. 2009] 1) Named Entity Recognition Challenge: highly ambiguous labels [Ritter, et. al. EMNLP 2011] 2) Relation Extraction Challenge: missing data [Ritter, et. al. TACL 2013] 3) Time Normalization Challenge: diversity in noisy text [Tabassum, Ritter, Xu, EMNLP 2016] 4) Event Extraction Challenge: lack of negative examples [Ritter, et. al. WWW 2015] [Konovalov, et. al. WWW 2017] O( ) = NX log p (y i x i ) i {z } Log Likelihood U D( p ˆp unlabeled ) {z } Label regularization

41 Time Normalization [Tabassum, Ritter, Xu EMNLP 2016] State-ofthe-art time resolvers { } TempEX HeidelTime SUTime UWTime 1 Jan 2016

42 Time Normalization Distant Supervision (no human labels or rules!) [Tabassum, Ritter, Xu EMNLP 2016] State-ofthe-art time resolvers { } TempEX HeidelTime SUTime UWTime 1 Jan 2016

43 Distant Supervision Assumption Mercury Transit May 9,2016

44 Distant Supervision Assumption Mercury Transit May 9,2016

45 Distant Supervision Assumption Mercury Transit May 9, May 9 May 10 May

46 Distant Supervision Assumption Mercury Transit May 9, May 9 May 10 May

47 Distant Supervision Assumption Mercury Transit May 9, May 9 May 10 May

48 Distant Supervision Assumption Mercury Transit May 9, May 9 May 10 May

49 Distant Supervision Assumption Mercury Transit May 9, May 9 May 10 May

50 Distant Supervision Assumption Mercury Transit May 9, May 9 May 10 May

51 Distant Supervision Assumption Mercury Transit May 9, May 9 May 10 May

52 Distant Supervision Assumption Mercury Transit May 9, May 9 May 10 May

53 Multiple Instance Learning Tagger [ Mercury, 5/9/2016 ] w 1 w 2 w 3 w n Words t 1 t 2 t 3 t 4 Mon Sun Past Present Future Sentence Level Tags [Event Database]

54 Multiple Instance Learning Tagger [ Mercury, 5/9/2016 ] Local Classifier exp( f(w i,z i )) w 1 w 2 w 3 w n Words z 1 z 2 z 3 z n Word Level Tags t 1 t 2 t 3 t 4 Mon Sun Past Present Future Sentence Level Tags [Event Database]

55 Multiple Instance Learning Tagger [ Mercury, 5/9/2016 ] Local Classifier exp( f(w i,z i )) w 1 w 2 w 3 w n Words z 1 z 2 z 3 z n Word Level Tags Deterministic OR t 1 t 2 t 3 t 4 Mon Sun Past Present Future Sentence Level Tags [Hoffmann et. al. 2011] [Event Database]

56 Multiple Instance Learning Tagger [ Mercury, 5/9/2016 ] Local Classifier exp( f(w i,z i )) w 1 w 2 w 3 w n Words z 1 z 2 z 3 z n Word Level Tags Deterministic OR Maximize Conditional X Likelihood: z P (z,t w, ) [Hoffmann et. al. 2011] t 1 t 2 t 3 t 4 Mon Sun [Event Database] Past Present Future Sentence Level Tags

57 Missing Data Problem Sentence Level Tags: TL = Future MOY= May DOM=9 DOW= Mon

58 Missing Data Extension w 1 w 2 w 3 w n z 1 z 2 z 3 z n Aggregated Sentence Level Tags t 1 t 2 t 3 t 4 [Event Database]

59 Missing Data Extension Missing Data Problem In Distant Supervision [Ritter, et. al. TACL 2013] w 1 w 2 w 3 w n z 1 z 2 z 3 z n t 0 1 t 0 2 t 0 3 t 0 4 m 1 m 2 m 3 m 4 [Event Database]

60 Missing Data Extension Missing Data Problem In Distant Supervision [Ritter, et. al. TACL 2013] w 1 w 2 w 3 w n z 1 z 2 z 3 z n Mentioned in Text t 0 1 t 0 2 t 0 3 t 0 4 m 1 m 2 m 3 m 4 [Event Database]

61 Missing Data Extension Missing Data Problem In Distant Supervision [Ritter, et. al. TACL 2013] w 1 w 2 w 3 w n z 1 z 2 z 3 z n Mentioned in Text t 0 1 t 0 2 t 0 3 t 0 4 Implied by Event Date m 1 m 2 m 3 m 4 [Event Database]

62 Missing Data Extension Missing Data Problem In Distant Supervision [Ritter, et. al. TACL 2013] w 1 w 2 w 3 w n z 1 z 2 z 3 z n Mentioned in Text t 0 1 t 0 2 t 0 3 t 0 4 Encourage Agreement Implied by Event Date m 1 m 2 m 3 m 4 [Event Database]

63 Example Tags Word Im Hella excited for tomorrow Tag NA NA Future NA Future Word Thnks for a Christmas party on fri Tag NA NA NA December NA NA Friday

64 Evaluation

65 Evaluation 17% increase in F- score over SUTime

67 Where can we find NLU? Follow the data!

68 Where can we find NLU? Follow the data!

69 Where can we find NLU? Follow the data! Opportunistically Gathered Data: Twitter Events (Time Normalization) Billions of Internet Conversations

70 Where can we find NLU? Follow the data! Opportunistically Gathered Data: Twitter Events (Time Normalization) Billions of Internet Conversations Design Models for the Data (rather than the other way around)

71 Where can we find NLU? Follow the data! Opportunistically Gathered Data: Twitter Events (Time Normalization) Billions of Internet Conversations Design Models for the Data (rather than the other way around) Thank You!

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging Stephen Clark Natural Language and Information Processing (NLIP) Group sc609@cam.ac.uk The POS Tagging Problem 2 England NNP s POS fencers