What s so Hard about Natural Language Understanding? Alan Ritter Computer Science and Engineering The Ohio State University Collaborators: Jiwei Li, Dan Jurafsky (Stanford) Bill Dolan, Michel Galley, Jianfeng Gao (MSR), Colin Cherry (Google) Jeniya Tabassum (Ohio State), Alexander Konovalov (Ohio State), Wei Xu (Ohio State) Brendan O Connor (Umass)
What s so Hard about Natural Language Understanding? Alan Ritter Computer Science and Engineering The Ohio State University Collaborators: Jiwei Li, Dan Jurafsky (Stanford) Bill Dolan, Michel Galley, Jianfeng Gao (MSR), Colin Cherry (Google) Jeniya Tabassum (Ohio State), Alexander Konovalov (Ohio State), Wei Xu (Ohio State) Brendan O Connor (Umass)
Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe.
Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe. Q: Large, End-to-End Datasets for NLU? Web-scale Conversations? Web-scale Structured Data?
Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe. Q: Large, End-to-End Datasets for NLU? Web-scale Conversations? Web-scale Structured Data?
Data-Driven Conversation Twitter: ~ 500 Million Public SMS-Style Conversations per Month Goal: Learn conversational agents directly from massive volumes of data. 6
Data-Driven Conversation Twitter: ~ 500 Million Public SMS-Style Conversations per Month Goal: Learn conversational agents directly from massive volumes of data. 6
[Ritter, Cherry, Dolan EMNLP 2011] Noisy Channel Model Input: Who wants to come over for dinner tomorrow? 7
[Ritter, Cherry, Dolan EMNLP 2011] Noisy Channel Model Input: Who wants to come over for dinner tomorrow? { Output: Yum! I 7
[Ritter, Cherry, Dolan EMNLP 2011] Noisy Channel Model Input: Who wants to come over for dinner tomorrow? { { Output: Yum! I want to 7
[Ritter, Cherry, Dolan EMNLP 2011] Noisy Channel Model Input: Who wants to come over for dinner tomorrow? { { { Output: Yum! I want to be there 7
[Ritter, Cherry, Dolan EMNLP 2011] Noisy Channel Model Input: Who wants to come over for dinner tomorrow? { { { { Output: Yum! I want to be there tomorrow! 7
Neural Conversation [Sordoni et. al. 2015] [Xu et. al. 2016] [Wen et. al. 2016] [Li et. al. 2016] [Kannan et. al. 2016] [Serban et. al. 2016] 8
Neural Conversation [Sordoni et. al. 2015] [Xu et. al. 2016] [Wen et. al. 2016] [Li et. al. 2016] [Kannan et. al. 2016] [Serban et. al. 2016] 8
How old are you? 9 Slide Credit: Jiwei Li
How old are you? i 'm 16. 10 Slide Credit: Jiwei Li
How old are you? i 'm 16. 16? 11 Slide Credit: Jiwei Li
How old are you? i 'm 16. 16? i don 't know what you 're talking about 12 Slide Credit: Jiwei Li
How old are you? i 'm 16. 16? i don 't know what you 're talking about you don 't know what you 're saying 13 Slide Credit: Jiwei Li
How old are you? i 'm 16. 16? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about Slide Credit: Jiwei Li
How old are you? Bad Action i 'm 16. 16? i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about Slide Credit: Jiwei Li
How old are you? Bad Action i 'm 16. 16? Outcome i don 't know what you 're talking about you don 't know what you 're saying i don 't know what you 're talking about Slide Credit: Jiwei Li
Deep Reinforcement Learning [Li, Monroe, Ritter, Galley, Gao, Jurafsky EMNLP 2016] How old are you? State Encoding how old are you
Deep Reinforcement Learning [Li, Monroe, Ritter, Galley, Gao, Jurafsky EMNLP 2016] How old are you? Action i 'm 16. I m 16. EOS Encoding Decoding how old are you EOS I m 16.
Learning: Policy Gradient REINFORCE Algorithm (Williams,1992) What we want to learn How old are you? Action i 'm 16. I m 16. EOS Encoding Decoding how old are you EOS I m 16.
Q: Rewards?
Q: Rewards? A: Turing Test
Q: Rewards? A: Turing Test Adversarial Learning (Goodfellow et al., 2014)
Adversarial Learning for Neural Dialogue [Li, Monroe, Shi, Jean, Ritter, Jurafsky EMNLP 2016] Real-world conversations sample human response Discriminator Real or Fake? Response Generator generate response
Adversarial Learning for Neural Dialogue [Li, Monroe, Shi, Jean, Ritter, Jurafsky EMNLP 2016] (Alternate Between Training Generator and Discriminator) Real-world conversations sample human response Discriminator Real or Fake? Response Generator generate response
Adversarial Learning for Neural Dialogue [Li, Monroe, Shi, Jean, Ritter, Jurafsky EMNLP 2016] (Alternate Between Training Generator and Discriminator) Real-world conversations sample human response Discriminator Real or Fake? Response Generator generate response REINFORCE Algorithm (Williams,1992)
Adversarial Learning Improves Response Generation vs vanilla generation model Human Evaluator: Machine Evaluator: [Bowman et. al. 2016] Adversarial Win Adversarial Lose Tie 62% 18% 20% Adversarial Success (How often can you fool a machine) Adversarial Learning 8.0% Standard Seq2Seq model 4.9% Slide Credit: Jiwei Li
Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe. Q: Large, End-to-End Datasets for NLU? Web-scale Conversations? Web-scale Structured Data?
Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe. Q: Large, End-to-End Datasets for NLU? Web-scale Conversations? Generates fluent open domain replies Web-scale Structured Data?
Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe. Q: Large, End-to-End Datasets for NLU? Web-scale Conversations? Web-scale Structured Data? Generates fluent open domain replies Really Natural Language Understanding?
Q: Why are we so good at Speech, MT (but bad at NLU)? People naturally translate and transcribe. Q: Large, End-to-End Datasets for NLU? Web-scale Conversations? Web-scale Structured Data? Generates fluent open domain replies Really Natural Language Understanding?
Learning from Distant Supervision [Mintz et. al. 2009] 1) Named Entity Recognition Challenge: highly ambiguous labels [Ritter, et. al. EMNLP 2011] 2) Relation Extraction Challenge: missing data [Ritter, et. al. TACL 2013] 3) Time Normalization Challenge: diversity in noisy text [Tabassum, Ritter, Xu, EMNLP 2016] 4) Event Extraction Challenge: lack of negative examples [Ritter, et. al. WWW 2015] [Konovalov, et. al. WWW 2017] O( ) = NX log p (y i x i ) i {z } Log Likelihood U D( p ˆp unlabeled ) {z } Label regularization
Learning from Distant Supervision [Mintz et. al. 2009] 1) Named Entity Recognition Challenge: highly ambiguous labels [Ritter, et. al. EMNLP 2011] 2) Relation Extraction Challenge: missing data [Ritter, et. al. TACL 2013] 3) Time Normalization Challenge: diversity in noisy text [Tabassum, Ritter, Xu, EMNLP 2016] 4) Event Extraction Challenge: lack of negative examples [Ritter, et. al. WWW 2015] [Konovalov, et. al. WWW 2017] O( ) = NX log p (y i x i ) i {z } Log Likelihood U D( p ˆp unlabeled ) {z } Label regularization
Time Normalization [Tabassum, Ritter, Xu EMNLP 2016] State-ofthe-art time resolvers { } TempEX HeidelTime SUTime UWTime 1 Jan 2016
Time Normalization Distant Supervision (no human labels or rules!) [Tabassum, Ritter, Xu EMNLP 2016] State-ofthe-art time resolvers { } TempEX HeidelTime SUTime UWTime 1 Jan 2016
Distant Supervision Assumption Mercury Transit May 9,2016
Distant Supervision Assumption Mercury Transit May 9,2016
Distant Supervision Assumption Mercury Transit May 9,2016 8 May 9 May 10 May
Distant Supervision Assumption Mercury Transit May 9,2016 8 May 9 May 10 May
Distant Supervision Assumption Mercury Transit May 9,2016 8 May 9 May 10 May
Distant Supervision Assumption Mercury Transit May 9,2016 8 May 9 May 10 May
Distant Supervision Assumption Mercury Transit May 9,2016 8 May 9 May 10 May
Distant Supervision Assumption Mercury Transit May 9,2016 8 May 9 May 10 May
Distant Supervision Assumption Mercury Transit May 9,2016 8 May 9 May 10 May
Distant Supervision Assumption Mercury Transit May 9,2016 8 May 9 May 10 May
Multiple Instance Learning Tagger [ Mercury, 5/9/2016 ] w 1 w 2 w 3 w n Words t 1 t 2 t 3 t 4 Mon Sun 1 31 1 12 Past Present Future Sentence Level Tags [Event Database]
Multiple Instance Learning Tagger [ Mercury, 5/9/2016 ] Local Classifier exp( f(w i,z i )) w 1 w 2 w 3 w n Words z 1 z 2 z 3 z n Word Level Tags t 1 t 2 t 3 t 4 Mon Sun 1 31 1 12 Past Present Future Sentence Level Tags [Event Database]
Multiple Instance Learning Tagger [ Mercury, 5/9/2016 ] Local Classifier exp( f(w i,z i )) w 1 w 2 w 3 w n Words z 1 z 2 z 3 z n Word Level Tags Deterministic OR t 1 t 2 t 3 t 4 Mon Sun 1 31 1 12 Past Present Future Sentence Level Tags [Hoffmann et. al. 2011] [Event Database]
Multiple Instance Learning Tagger [ Mercury, 5/9/2016 ] Local Classifier exp( f(w i,z i )) w 1 w 2 w 3 w n Words z 1 z 2 z 3 z n Word Level Tags Deterministic OR Maximize Conditional X Likelihood: z P (z,t w, ) [Hoffmann et. al. 2011] t 1 t 2 t 3 t 4 Mon Sun 1 31 1 12 [Event Database] Past Present Future Sentence Level Tags
Missing Data Problem Sentence Level Tags: TL = Future MOY= May DOM=9 DOW= Mon
Missing Data Extension w 1 w 2 w 3 w n z 1 z 2 z 3 z n Aggregated Sentence Level Tags t 1 t 2 t 3 t 4 [Event Database]
Missing Data Extension Missing Data Problem In Distant Supervision [Ritter, et. al. TACL 2013] w 1 w 2 w 3 w n z 1 z 2 z 3 z n t 0 1 t 0 2 t 0 3 t 0 4 m 1 m 2 m 3 m 4 [Event Database]
Missing Data Extension Missing Data Problem In Distant Supervision [Ritter, et. al. TACL 2013] w 1 w 2 w 3 w n z 1 z 2 z 3 z n Mentioned in Text t 0 1 t 0 2 t 0 3 t 0 4 m 1 m 2 m 3 m 4 [Event Database]
Missing Data Extension Missing Data Problem In Distant Supervision [Ritter, et. al. TACL 2013] w 1 w 2 w 3 w n z 1 z 2 z 3 z n Mentioned in Text t 0 1 t 0 2 t 0 3 t 0 4 Implied by Event Date m 1 m 2 m 3 m 4 [Event Database]
Missing Data Extension Missing Data Problem In Distant Supervision [Ritter, et. al. TACL 2013] w 1 w 2 w 3 w n z 1 z 2 z 3 z n Mentioned in Text t 0 1 t 0 2 t 0 3 t 0 4 Encourage Agreement Implied by Event Date m 1 m 2 m 3 m 4 [Event Database]
Example Tags Word Im Hella excited for tomorrow Tag NA NA Future NA Future Word Thnks for a Christmas party on fri Tag NA NA NA December NA NA Friday
Evaluation
Evaluation 17% increase in F- score over SUTime
Where can we find NLU? Follow the data!
Where can we find NLU? Follow the data!
Where can we find NLU? Follow the data! Opportunistically Gathered Data: Twitter Events (Time Normalization) Billions of Internet Conversations
Where can we find NLU? Follow the data! Opportunistically Gathered Data: Twitter Events (Time Normalization) Billions of Internet Conversations Design Models for the Data (rather than the other way around)
Where can we find NLU? Follow the data! Opportunistically Gathered Data: Twitter Events (Time Normalization) Billions of Internet Conversations Design Models for the Data (rather than the other way around) Thank You!