Task-Oriented Dialogue System (Young, 2000)

Similar documents
Lecture 5 Neural models for NLP

Slide credit from Hung-Yi Lee & Richard Socher

arxiv: v3 [cs.lg] 14 Jan 2018

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

NEURAL LANGUAGE MODELS

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Attention Based Joint Model with Negative Sampling for New Slot Values Recognition. By: Mulan Hou

Neural Architectures for Image, Language, and Speech Processing

Applied Natural Language Processing

Natural Language Processing

Deep Learning Recurrent Networks 2/28/2018

with Local Dependencies

Sequence Modeling with Neural Networks

Deep Learning Architectures and Algorithms

Lecture 17: Neural Networks and Deep Learning

Long-Short Term Memory and Other Gated RNNs

Advanced Cutting Edge Research Seminar. Dialogue System with Deep Neural Networks

UNSUPERVISED LEARNING

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Statistical NLP for the Web

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Recurrent Neural Networks

Multimodal context analysis and prediction

text classification 3: neural networks

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Feedforward Neural Networks

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Recurrent and Recursive Networks

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Hidden Markov Models Hamid R. Rabiee

Part-of-Speech Tagging + Neural Networks 2 CS 287

Brief Introduction of Machine Learning Techniques for Content Analysis

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

CSC321 Lecture 16: ResNets and Attention

Deep Recurrent Neural Networks

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Introduction to RNNs!

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Machine Learning for Structured Prediction

Stephen Scott.

Deep Learning for Natural Language Processing

ML4NLP Multiclass Classification

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Recurrent Neural Networks 2. CS 287 (Based on Yoav Goldberg s notes)

Introduction to Deep Neural Networks

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Logistic Regression & Neural Networks

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Deep Learning for NLP

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)

Intelligent Systems (AI-2)

Lecture 13: Structured Prediction

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Presented By: Omer Shmueli and Sivan Niv

From perceptrons to word embeddings. Simon Šuster University of Groningen

Machine Learning. Boris

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6

Based on the original slides of Hung-yi Lee

APPLIED DEEP LEARNING PROF ALEXIEI DINGLI

Intelligent Systems (AI-2)

Sequential Supervised Learning

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018

Natural Language Processing

Neural Networks and Deep Learning

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Structured Prediction

Speech and Language Processing

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

More on HMMs and other sequence models. Intro to NLP - ETHZ - 18/03/2013

Natural Language Processing and Recurrent Neural Networks

Hidden Markov Models in Language Processing

Introduction to Convolutional Neural Networks (CNNs)

Neural Networks and the Back-propagation Algorithm

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Course 395: Machine Learning - Lectures

Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig

Recurrent Neural Networks. COMP-550 Oct 5, 2017

arxiv: v1 [cs.cl] 21 May 2017

smart reply and implicit semantics Matthew Henderson and Brian Strope Google AI

Probabilistic Graphical Models

Conditional Language Modeling. Chris Dyer

Learning to translate with neural networks. Michael Auli

Jakub Hajic Artificial Intelligence Seminar I

Neural networks. Chapter 20, Section 5 1

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

ECE521 Lectures 9 Fully Connected Neural Networks

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

EE-559 Deep learning Recurrent Neural Networks

Deep Learning and Lexical, Syntactic and Semantic Analysis. Wanxiang Che and Yue Zhang

Introduction to Support Vector Machines

Statistical Machine Learning Theory. From Multi-class Classification to Structured Output Prediction. Hisashi Kashima.

Transcription:

2 Review

Task-Oriented Dialogue System (Young, 2000) 3 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Speech Recognition Hypothesis are there any action movies to see this weekend Text Input Are there any action movies to see this weekend? Language Understanding (LU) Domain Identification User Intent Detection Slot Filling Semantic Frame request_movie genre=action, date=this weekend Text response Where are you located? Natural Language Generation (NLG) System Action/Policy request_location Dialogue Management (DM) Dialogue State Tracking (DST) Dialogue Policy Backend Database/ Knowledge Providers 3

Task-Oriented Dialogue System (Young, 2000) 4 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Speech Recognition Hypothesis are there any action movies to see this weekend Text Input Are there any action movies to see this weekend? Language Understanding (LU) Domain Identification User Intent Detection Slot Filling Semantic Frame request_movie genre=action, date=this weekend Text response Where are you located? Natural Language Generation (NLG) System Action/Policy request_location Dialogue Management (DM) Dialogue State Tracking (DST) Dialogue Policy Backend Action / Knowledge Providers 4

5 Conventional LU

6 Language Understanding (LU) Pipelined 1. Domain Classification 2. Intent Classification 3. Slot Filling 6

7 LU Domain/Intent Classification As an utterance classification task Given a collection of utterances u i with labels c i, D= {(u 1,c 1 ),,(u n,c n )} where c i C, train a model to estimate labels for new utterances u k. find me a cheap taiwanese restaurant in oakland Movies Restaurants Music Sports Domain find_movie, buy_tickets find_restaurant, find_price, book_table find_lyrics, find_singer Intent

8 Conventional Approach Data dialogue utterances annotated with domains/intents Model machine learning classification model e.g. support vector machine (SVM) Prediction domains/intents 8

Theory: Support Vector Machine 9 http://www.csie.ntu.edu.tw/~htlin/mooc/ SVM is a maximum margin classifier Input data points are mapped into a high dimensional feature space where the data is linearly separable Support vectors are input data points that lie on the margin 9

10 Theory: Support Vector Machine Multiclass SVM prob for each class Extended using one-versus-rest approach Then transform into probability P 1 P 2 P 3 P k http://www.csie.ntu.edu.tw/~htlin/mooc/ z z score for each class S 1 S 2 S 3 S k SVM 1 SVM 2 SVM 3 SVM k Domain/intent can be decided based on the estimated scores

11 LU Slot Filling As a sequence tagging task Given a collection tagged word sequences, S={((w 1,1,w 1,2,, w 1,n1 ), (t 1,1,t 1,2,,t 1,n1 )), ((w 2,1,w 2,2,,w 2,n2 ), (t 2,1,t 2,2,,t 2,n2 )) } where t i M, the goal is to estimate tags for a new word sequence. flights from Boston to New York today Entity Tag Slot Tag flights from Boston to New York today O O B-city O B-city I-city O O O B-dept O B-arrival I-arrival B-date

12 Conventional Approach Data dialogue utterances annotated with slots Model machine learning tagging model e.g. conditional random fields (CRF) Prediction slots and their values 12

13 Theory: Conditional Random Fields CRF assumes that the label at time step t depends on the label in the previous time step t-1 output input Maximize the log probability log p(y x) with respect to parameters λ Slots can be tagged based on the y that maximizes p(y x) 13

14 Neural Network Based LU

15 A Single Neuron x 1 w 1 x w 2 2 x N 1 w N b bias Activation function z z z 1 1 e z Sigmoid function y z z w, b are the parameters of this neuron 15

A Single Neuron 16 x 1 w 1 f : R N R M x 2 w 2 z y x N 1 w N b bias is not "2" "2" y y 0.5 0.5 A single neuron can only handle binary classification 16

17 A Layer of Neurons Handwriting digit classification x 1 x 2 x N 1 A layer of neurons can handle multiple possible output, and the result depends on the max one f : R 1 1 or not 10 neurons/10 classes y y 2 2 or not y 3 3 or not N R M Which one is max?

Deep Neural Networks (DNN) 18 Fully connected feedforward network f : R N R M Input Layer 1 Layer 2 Layer L Output vector x x 1 x 2 y 1 y 2 vector y x N y M Deep NN: multiple hidden layers

19 Recurrent Neural Network (RNN) : tanh, ReLU RNN can learn accumulated sequential information (time-series) http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/ time

20 Model Training All model parameters can be updated by SGD y t-1 y t y t+1 target predicted http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/ 20

21 BPTT Forward Pass: Backward Pass: Compute s 1, s 2, s 3, s 4 For C (4) For C (3) For C (2) For C (1) y 1 y 2 y 3 y 4 C (1) C (2) C (3) C (4) o 1 o 2 o 3 o 4 ini t s 1 s 2 s 3 s 4 x 1 x 2 x 3 x 4 The model is trained by comparing the correct sequence tags and the predicted ones 21

22 Deep Learning Approach Data dialogue utterances annotated with semantic frames (user intents & slots) Model deep learning model (classification/tagging) e.g. recurrent neural networks (RNN) Prediction user intents, slots and their values 22

23 Classification Model As an utterance classification task Given a collection of utterances u i with labels c i, D= {(u 1,c 1 ),,(u n,c n )} where c i C, train a model to estimate labels for new utterances u k. Input: each utterance u i is represented as a feature vector f i Output: a domain/intent label c i for each input utterance How to represent a sentence using a feature vector 23

24 Sequence Tagging Model As a sequence tagging task Given a collection tagged word sequences, S={((w 1,1,w 1,2,, w 1,n1 ), (t 1,1,t 1,2,,t 1,n1 )), ((w 2,1,w 2,2,,w 2,n2 ), (t 2,1,t 2,2,,t 2,n2 )) } where t i M, the goal is to estimate tags for a new word sequence. Input: each word w i,j is represented as a feature vector f i,j Output: a slot label t i for each word in the utterance How to represent a word using a feature vector

25 Word Representation Atomic symbols: one-hot representation car [0 0 0 0 0 0 1 0 0 0] Issues: difficult to compute the similarity (i.e. comparing car and motorcycle ) [0 0 0 0 0 0 1 0 0 0] AND [0 0 1 0 0 0 0 0 0 0] = 0 car car motorcycle 25

26 Word Representation Neighbor-based: low-dimensional dense word embedding Idea: words with similar meanings often have similar neighbors 26

Chinese Input Unit of Representation 27 Character Feed each char to each time step Word Word segmentation required 你知道美女與野獸電影的評價如何嗎? 你 / 知道 / 美女與野獸 / 電影 / 的 / 評價 / 如何 / 嗎 Can two types of information fuse together for better performance?

28 LU Domain/Intent Classification As an utterance classification task Given a collection of utterances u i with labels c i, D= {(u 1,c 1 ),,(u n,c n )} where c i C, train a model to estimate labels for new utterances u k. find me a cheap taiwanese restaurant in oakland Movies Restaurants Music Sports Domain find_movie, buy_tickets find_restaurant, find_price, book_table find_lyrics, find_singer Intent

Deep Neural Networks for Domain/Intent Classification I (Sarikaya et al, 2011) 29 http://ieeexplore.ieee.org/abstract/document/5947649/ Deep belief nets (DBN) Unsupervised training of weights Fine-tuning by back-propagation Compared to MaxEnt, SVM, and boosting 29

Deep Neural Networks for Domain/Intent Classification II (Tur et al., 2012; Deng et al., 2012) 30 http://ieeexplore.ieee.org/abstract/document/6289054/; http://ieeexplore.ieee.org/abstract/document/6424224/ Deep convex networks (DCN) Simple classifiers are stacked to learn complex functions Feature selection of salient n-grams Extension to kernel-dcn 30

Deep Neural Networks for Domain/Intent Classification III (Ravuri and Stolcke, 2015) 31 RNN and LSTMs for utterance classification Word hashing to deal with large number of singletons Kat: #Ka, Kat, at# Each character n-gram is associated with a bit in the input encoding https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/rnnlm_addressee.pdf 31

32 LU Slot Filling As a sequence tagging task Given a collection tagged word sequences, S={((w 1,1,w 1,2,, w 1,n1 ), (t 1,1,t 1,2,,t 1,n1 )), ((w 2,1,w 2,2,,w 2,n2 ), (t 2,1,t 2,2,,t 2,n2 )) } where t i M, the goal is to estimate tags for a new word sequence. flights from Boston to New York today Entity Tag Slot Tag flights from Boston to New York today O O B-city O B-city I-city O O O B-dept O B-arrival I-arrival B-date

33 Recurrent Neural Nets for Slot Tagging I (Yao et al, 2013; Mesnil et al, 2015) http://131.107.65.14/en-us/um/people/gzweig/pubs/interspeech2013rnnlu.pdf; http://dl.acm.org/citation.cfm?id=2876380 Variations: a. RNNs with LSTM cells b. Input, sliding window of n-grams c. Bi-directional LSTMs y 0 y 1 y 2 y n y 0 y 1 y 2 y n y 0 y 1 y 2 y n h 0 b h 1 b h 2 b h n b h 0 h 1 h 2 h n h 0 h 1 h 2 h n h 0 f h 1 f h 2 f h n f w 0 w 1 w 2 w n (a) LSTM w 0 w 1 w 2 w n (b) LSTM-LA w 0 w 1 w 2 w n (c) blstm

Recurrent Neural Nets for Slot Tagging II (Kurata et al., 2016; Simonnet et al., 2015) 34 http://www.aclweb.org/anthology/d16-1223 Encoder-decoder networks Leverages sentence level information Attention-based encoderdecoder Use of attention (as in MT) in the encoder-decoder network Attention is estimated using a feed-forward network with input: h t and s t at time t w n w 2 w 1 w 0 y 0 y 1 y 2 y n h n h 2 h 1 h 0 w0 w1 w2 wn w 0 w 1 w 2 w n y 0 y 1 y 2 y n h 0 h 1 h 2 h n s 0 s 1 s 2 s n h 0 h n c i

Joint Semantic Frame Parsing 35 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/06/is16_multijoint.pdf; https://arxiv.org/abs/1609.01454 Sequencebased (Hakkani-Tur et al., 2016) Slot filling and intent prediction in the same output sequence Parallel (Liu and Lane, 2016) Intent prediction and slot filling are performed in two branches taiwanese U food U please U U EOS h t- 1 h t h t+ 1 W W W W V V V B-type O O Slot Filling h T+1 V FIND_RES T Intent Prediction

Milestone 1 Language Understanding 36 3) Collect and annotate data 4) Use machine learning method to train your system Conventional SVM for domain/intent classification CRF for slot filling Deep learning LSTM for domain/intent classification and slot filling 5) Test your system performance 36

Concluding Remarks 37 Speech Signal Speech Recognition Hypothesis are there any action movies to see this weekend Text Input Are there any action movies to see this weekend? Language Understanding (LU) Domain Identification User Intent Detection Slot Filling Semantic Frame request_movie genre=action, date=this weekend Text response Where are you located? Natural Language Generation (NLG) System Action/Policy request_location Dialogue Management (DM) Dialogue State Tracking (DST) Dialogue Policy Backend Database/ Knowledge Providers 37