Attention Based Joint Model with Negative Sampling for New Slot Values Recognition. By: Mulan Hou

Size: px

Start display at page:

Download "Attention Based Joint Model with Negative Sampling for New Slot Values Recognition. By: Mulan Hou"

Madison Lamb
5 years ago
Views:

1 Attention Based Joint Model with Negative Sampling for New Slot Values Recognition By: Mulan Hou

2 CONTE NTS Introduction Related work Motivation Proposed model Experiments Conclusion

3 CHAPTE R 1 Introduction

4 Introduction User input System output Natural Language Understanding Natural Language Generation Dialogue Manager

5 Introduction User input System output Natural Language Understanding Natural Language Generation Dialogue Manager

6 Introduction User input Natural Language Understanding Dialogue Manager

7 Introduction User input Natural Language Understanding Slot-value Dialogue Manager

8 Introduction User input Natural Language Understanding Slot-value Dialogue Manager E.g NLU Function = FaceTime Can I have a video chat on my phone? (Slot = Value) DM

9 Introduction Natural Language Understanding NEW Slot-value Query Dialogue Manager Database SLOT Standard Value0 Standard Value1

10 Introduction E.g Natural Language Understanding NEW Slot-value Query Dialogue Manager Database FUNCTION Can I scroll the screen to have a screenshot on my phone? Function = Smart Screenshot (Slot = Value) Not in the predefined set Face Time Wechat

11 Introduction Natural Language Understanding New values Standard values Dialogue Manager Sequence labeling raw text in utterances standard slot values Classification Problems of new slot values recognition for the lack of training data. Attention based joint model with negative sampling Query Database SLOT Standard Value0 Standard Value1

12 CHAPTE R 2 Related work Sequence labeling Pipeline based Classification based

13 Related work Sequence labeling Sequence labels words l l l l w w w w O O B I I I O O O O Can I have a video chat on my phone? Pipeline based methods Can I have a video chat on my phone? 1 extract the raw texts from utterance have a video chat FaceTim e Classification based methods Can I have a video chat on my phone? 2 normalized the texts into standard slot values FaceTim e

14 Related work Sequence labeling Extra normalization operations Pipeline based methods Prone to accumulating errors Classification based methods Disability to deal with new slot values Losing local information

15 Related work Sequence labeling Xuezhe Ma and Eduard Hovy. End-to-end sequence labeling via bi-directional lstm-cnnscrf. Gokhan 2016 Tur, Dilek Hakkani-Tur et al. Sentence simplification for spoken language understanding. Kaisheng 2011 Yao, Baolin Peng et al. Recurrent neural networks for language understanding Kaisheng Yao, Baolin Peng et al. Spoken language understanding using long short-term memory neural networks Pipeline based methods F Lef evre. Dynamic bayesian networks and discriminative classifiers for multi-stage semantic interpretation.2007 Peter Z Yeh, Benjamin Douglas et al. A speech-driven second screen application for tv program discovery Classification based methods Rahul Bhagat, Anton Leuski, and Eduard Hovy. Statistical shallow semantic parsing despite little training data Franc ois Mairesse, Milica Gasic et al. Spoken language understanding from unaligned data using discriminative classification models Ana Mendes Pedro Mota, Lu Natural language understanding as a classification process: report of initial experiments and results. 2012

16 CHAPTE R 3 Motivation

17 Motivation Sequence labeling Classification based methods Take local information into consideration Obtains normalized slot values directly + Attention based joint model with negative sampling Able to deal with new slot values

18 CHAPTE R 4 Proposed model Attention based joint model Negative sampling

19 Proposed model Attention based joint model utterance Model SLOT UNK NULL Standard Standard Value0 Standard Value1 Value2

20 Proposed model Attention based joint model Model SLOT UNK NULL Standard Standard Value0 Standard Value1 Value2 utterance

21 Proposed model Attention based joint model Sequence tagger Bidirectional Classifier Attention layer SLOT UNK NULL Standard Standard Value0 Standard Value1 Value2 Embedding layer utterance

22 Proposed model Attention based joint model t-1 v v v t-1 t t t+1 t+ 1 = H Attention layer qt-1 q t q t+ 1 Classifier Sequence tagger v t-1 h et-1 t-1 s t-1 et v v t t+ 1 h e h t t+1 t +1 s s t t+ 1 W y h t- 1 h t h t+ 1 Bidirectional Embedding layer wt-1 w w t t+ 1 h T softmax function align function concatenate

23 Proposed model Attention based joint model t-1 vt -1 t vt q q q t-1 t t+ 1 t+1 t+ 1 v = H Attention layer Classifier Sequence tagger v v v t-1 t t+1 e e t-1 h e t-1 t h t t +1h t +1 st-1 s s t t+ 1 W y h t- 1 h t h t+ 1 Bidirectional h T Embedding layer wt-1 w t w t+ 1

24 Proposed model Attention based joint model t-1 v v v t-1 t t t+1 t+ 1 = H qt-1 q t q t+ 1 Classifier L tagging Sequence tagger N 1 1 N i T i T i t i L s, s t i t v t-1 h et-1 t-1 s t-1 et v v t t+ 1 h e h t t+1 t +1 s s t t+ 1 h t- 1 h t h t+ 1 L W y N 1 = L y, y N classification i i i h T wt-1 w w t t+ 1 L = g L (1 g) L tagging classification

25 Proposed model Negative sampling Models will fail in recognizing new slot values without corresponding training data Construct negative samples of the existing slot values to simulate new ones Existing slot value: Can I have a video chat on my phone? Slot = FaceTime New slot value: Can I scroll the screen to have a screenshot on my phone? Slot = Smart Screenshot Shared context: Can I phone? Random words Non-value on my phone?

26 Proposed model Negative sampling Models will fail in recognizing new slot values without corresponding training data Construct negative samples of the existing slot values to simulate new ones Existing slot value: Can I have a video chat on my phone? Slot = FaceTime New slot value: Can I scroll the screen to have a screenshot on my phone? Slot = Smart Screenshot Can I Shared context: Random words Non-value phone? on my phone? Negative sampling

27 Proposed model Negative sampling Words distribution count ( word ) U( word ) Data Old values as templates Can I have a video chat on my phone? Chinese food Sample from vocabulary by words distribution to fill the templates Can I have a video chat on my phone? O O B-func I-func I-func I-func O O O O negative sampling: You use battery let Can I You use battery let on my phone? O O B-func I-func I-func I-func O O O O FaceTime UNK negative sampling: O O O O O O B- func I-func I-func I-func I-func I-func I-func O O UNK

28 CHAPTE R 5 Experiments Results Analyses

29 Experiments Results Dataset: DSTC(English) Service(Chinese) DSTC --- an English dataset from a public contest and we use DSTC2 and DSTC3 together. It collects 5510 dialogues about hotels and restaurants booking. Only the slot Service food --- a Chinese dialogue dataset which is mainly about consultation for cell phones and contains a single slot named Corpus DSTC Service train dev test train dev test old ne Original data w null negative samples overall size Statistics of two dataset Corpus DSTC Service train dev test train dev test old new Statistics of value types

30 Experiments Results Baselines: _FM _C, without negative samples _FM: pipeline based method, labeling the words with slot values tags by model and then normalized them into standard values by Fuzzy Matching. _C: classification based mothed, encoding the utterance by model and then use a full-connected layer as a Classifier. st-1 s s t t+ 1 h t- 1 h t+ 1 h t Fuzzy Matching y W y _FM _C

31 Experiments Results: Results DSTC Service all NEW OLD NULL all NEW OLD NULL _FM _C (a) F1 AJM_NS(ours) scores of classification for different models

32 Experiments Results: Results DSTC Service all NEW OLD NULL all NEW OLD NULL _FM _C (a) F1 AJM_NS(ours) scores of classification for different models

33 Experiments Results: Results DSTC Service all NEW OLD NULL all NEW OLD NULL _FM _C (a) F1 AJM_NS(ours) scores of classification for different models _FM 6 AJM_NS(ours) DSTC Service all NEW OLD all NEW OLD (b) F1 scores of sequence labeling

34 Experiments Results: Results DSTC Service all NEW OLD NULL all NEW OLD NULL _FM _C (a) F1 AJM_NS(ours) scores of classification for different models _FM 6 AJM_NS(ours) DSTC Service all NEW OLD all NEW OLD (b) F1 scores of sequence labeling

35 Experiments Results Comparison inside model: Attention mechanism(ajm) & Negative sampling(jm_ns) DSTC Service all NEW OLD NULL All NEW OLD NULL Full(AJM_N Full(AJM_NS) S) -Attention only(jm_ns) NS only(ajm)

36 Experiments Results Comparison inside model: Attention mechanism(ajm) & Negative sampling(jm_ns) DSTC Service all NEW OLD NULL All NEW OLD NULL Full(AJM_NS) NS only(ajm) Confusion metrices AJM DSTC NEW OLD NULL NEW OLD NULL NEW NEW OLD OLD NULL NULL Service NEW OLD NULL NEW OLD NULL NEW NEW OLD 0 0 OLD NULL NULL AJM_NS

37 Experiments Results Comparison inside model: Attention mechanism(ajm) & Negative sampling(jm_ns) DSTC Service all NEW OLD NULL All NEW OLD NULL Full(AJM_NS) NS only(ajm) Classification results based on negative samples _FM_NS _C_NS JM_NS 5 DSTC Service all NEW OLD NULL all NEW OLD NULL

38 Experiments Results Comparison inside model: Attention mechanism(ajm) & Negative sampling(jm_ns) DSTC Service all NEW OLD NULL All NEW OLD NULL Full(AJM_NS) Attention only(jm_ns) Attention mechanism v t-1 et-1 h t -1 et v t ht v t+1 et+1 h t+ 1

39 Experiments Results Comparison inside model: Attention mechanism(ajm) & Negative sampling(jm_ns) DSTC Service all NEW OLD NULL All NEW OLD NULL Full(AJM_NS) Attention only(jm_ns) Attention mechanism DSTC Service Full (AJM_NS) -Attention (JM_NS) heatmap True Pred True i want an indonesia n restauran t in the north part of town O O O B-food O O O O O O O O O O B-food O O O O O O O i want an indonesia n restauran t in the north part of town O O O B-food O O O O O O O indonesia n indonesia n indonesia n A7 O O B-func I-func I-func I-func O O O O O B-func I-func I-func I-func O O O A7 O O B-func I-func I-func I-func O O O

40 CHAPTE R 6 Conclusion

41 Conclusion We propose an attention based joint model with negative sampling. Maps the utterance into standard slot values directly without extra normalization operations Negative sampling for existing values for a certain slot S enables our model to effectively recognize new slot values Joint model collaborated by attention mechanism promotes the performance Experimental results demonstrate that our model achieves impressive improvements on new slot values with less damage on other sub-datasets

42 THANK YOU

Task-Oriented Dialogue System (Young, 2000)

2 Review Task-Oriented Dialogue System (Young, 2000) 3 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Speech Recognition Hypothesis are there any action movies to see