Recent Developments in Statistical Dialogue Systems

Recent Developments in Statistical Dialogue Systems Steve Young Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department Cambridge, UK

Contents Review of Basic Ideas and Current Limitations Semantic Decoding Fast Learning Parameter Optimisation and Structure Learning Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 2

Spoken Dialog Systems (SDS) I want to find a restaurant? inform(venuetype=restaurant) User Recognizer Semantic Decoder Waveforms Words Dialog Acts Dialog Control Synthesizer Message Generator Database What kind of food would you like? request(food) Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 3

A Statistical Spoken Dialogue System Supervised Learning Ontology I want an Italian ASR Id like italian {.4} I want an Italian {.2} Id like Indian{.2} In the east{.1} Semantic Decoder inform(food=italian){.6} inform(food=indian) {.2} inform(area=east){.1} null(){.1} Evidence Bayesian Belief Network Belief Propagation Belief State Your looking for an Italian restaurant, whereabouts? Response Generator Decision confirm(food=italian) request(area) Stochastic Policy Reinforcement Learning Rewards: success/fail Reward Function Partially Observable Markov Decision Process (POMDP) Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 4

The POMDP SDS Framework observation w t o t = p(u t x t ) = p(u t w t )p(w t x t ) belief state user acts speech semantic decoder user acts s t 1 words speech recogniser words speech b t (s t ) = ηp(o t s t, a t 1 ) P(s t s t 1, a t 1 )b t 1 (s t 1 ) state state action reward function " policy belief state $ & $ parameters features R = E # r(s a t ~ π (b t, a t ) = eθ φ at (b t ) t, a t )b t (s t )' %$ s t ($ system action a e θ φ a (b t ) Maximise θ Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 5

Dialogue State Goal g type Tourist Information Domain type = bar,restaurant food = French, Chinese, none NB: All conditioned by previous action User Act Memory u type User Behaviour Recognition/ Understanding Errors o type g food u food o food Next Time Slice t+1 History h type h food Observation at time t J. Williams (27). POMDPs for Spoken Dialog Systems." Computer Speech and Language 21(2) Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 6

Belief Monitoring (Tracking) B R F C - B R F C - g type g food g type g food u type u food u type u food t=1 t=2 o type o food o type o food inform(type=bar, food=french) {.6} inform(type=restaurant, food=french) {.3} confirm(type=restaurant, food=french) affirm() {.9} Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 7

Choosing the next action the Policy type food 1 1 Quantize B R F C - φ 1 φ 2 φ 3 1 1 1 1 1 1 Policy Vector θ π(a b,θ) = a = select a % Sample eθ.φ a (b) e θ.φ a % (b) g type g food inform(type=bar) {.4} All Possible Summary Actions: inform, select, confirm, etc? Map select(type=bar, type=restaurant) Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 8

Let s Go 21 Control Test Results Success Predicted Success Rate B. Thomson "Bayesian Update of State for the Let's Go Spoken Dialogue Fail Challenge. SLT 21...2.4.6.8 1. All Baseline Qualifying Systems Average Success = 64.8% Average WER = 42.4% 2 4 6 8 1 Word Error Rate (WER) Cambridge 89% Success 33% WER Baseline 65% Success 42% WER Another 75% Success 34% WER Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 9

Demo of Cambridge Restaurant Information Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 1

Issues with the 21 System Design Poor coverage of N-best of semantic hypotheses Hand-crafting of summary belief space Slow policy optimisation and reliance on user simulation Dependence on hand-crafted dialogue model parameters Dependence on static ontology/database Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 11

N-best Semantic Decoding Conventional N-best decoding speech ASR N-best word strings Semantic Decoder (applied N times) N-best dialogue acts Merge Duplicates M-best dialogue acts M<<N Confusion Network decoding speech ASR Word confusion network Feature Extraction n-gram features Semantic Decoder (applied once) M-best dialogue acts M<<N optional context Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 12

Confusion Network Decoder (Mairesse/Henderson) I I d Is In were want water well indian italian in -- input features multi-way SVM inform venue=restaurant x i = E{ C(n-gram i )} 1/ n-gram i food=italian area=centre n=1,2,3 SVM food=italian.91 M-best semantic trees SVM food=indian.42 Context Binary Vector b j if item j in last system act SVM area=east.1 Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 13

Confusion Network Decoder Evaluation Predicted F-score Predicted ICE F-score.3.5.7.9 C.Net. + Context Phoenix 2 4 6 8 1 WER ICE 1 2 3 4 5 6 C.Net. + Context Phoenix 2 4 6 8 1 Comparison of item retrieval on corpus of 4.8k utterances N-best hand-crafted Phoenix decoder vs Confusion network decoder (trained on 1k utterances) Live Dialogue System N-best Phoenix F-score.8.82 ICE 2.2 1.26 Average Reward 1.6 11.15 WER Confusion Net Discriminative Spoken Language Understanding using Word Confusion Networks, Henderson et al, IEEE SLT 212 Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 14

Policy Optimization Policy parameters chosen to maximize expected reward % J(θ) = E 1 & ' T t ( r(s t,a t ) π θ ) * Natural gradient ascent works well J(θ) = F 1 θ J(θ) Gradient is estimated by sampling dialogues and in practice Fisher Information Matrix does not need to be explicitly computed. This is the Natural Actor Critic Algorithm. However, A) slow (~1k dialogues) and B) requires summary space approximations Fisher Information Matrix J. Peters and S. Schaal (28). "Natural Actor-Critic." Neurocomputing 71(7-9) Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 15

Q-functions and the SARSA algorithm Traditional reinforcement learning is commonly based on finding the optimal Q function: Q * (b, a) = max π ( " T % + * E π # r(b τ, a τ )&- ) $ τ =t+1 ', The optimal deterministic policy is then simply π * (b) = argmax a!q " * (b, a) # $ Q* can be found sequentially using the SARSA algorithm b = b ; choose action a e-greedily from π(b) For each dialogue turn Take action a, observe reward r and next state b choose action a e-greedily from π(b ) Q(b, a) = Q(b, a) +λ[q(b, a ) (Q(b, a) r)] b = b ; a = a end Eventually, Q à Q* Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 16

Gaussian Process based Learning Milica Gasic For POMDPs, the belief space is continuous and direct representations of Q are intractable. However, Q can be approximated as a zero mean Gaussian process by designing a kernel to represent the correlations between points in belief x action space. Thus: Q(b, a) ~ (, k ((b, a),(b, a) )) Given a sequence of state-action pairs [ ]! and rewards r t = r,r 1,.., r t B t = (b, a ),(b 1, a 1 ),..,(b t, a t ) there is a closed form solution for the posterior: Q(b, a) B t, r t ~ N ( Q(b, a), cov ((b, a),(b, a) )) This suggests a SARSA-like sequential optimisation: b = b ; choose action a e-greedily from Q(b, a) For each dialogue turn Take action a, observe reward r and next state b choose action a e-greedily from Q( b!, a!) Update the posterior covariance estimate b = b ; a = a end [ ]! Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 17

Benefits of GP-SARSA sequential estimation of distribution of Q (not Q itself) each new data point can impact on whole distribution via the covariance function à very efficient use of training data much faster learning than gradient methods such a Natural Actor Critic (NAC) GP TownInfo System NAC ε-greedy exploration & ( a = ' ( ) argmax!q(b, " a) # $ with prob 1 ε a random action with prob ε Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 18

Benefits of GP-SARSA variance of Q is known at each stage à more intelligent exploration: Variance exploration & ( a = ' ( ) argmax a argmax a!q(b, " a) # $ with prob 1 ε [ cov((b, a),(b, a)) ] with prob ε 15 14 13 12 Stochastic policy Q i (b, a i ) ~ N Q(b, a i ), cov((b, a i ),(b, a i ) a = argmax! " Q i (b, a i )# $ a i ( ) Average reward 11 1 9 8 7 6 Variance exploration Stochastic policy TownInfo System 5 5 1 15 2 25 3 Training dialogues Well trained within 3k dialogues And summary space mapping no longer needed On-line policy optimisation of SDS via live interaction with human subjects, Gasic et al, ASRU 211 Gaussian processes for policy optimisation of large scale POMDP-based SDS, Gasic et al, SLT 212. Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 19

Parameter Estimation Blaise Thomson Goal g t g t+1 a t-1 Transition Model p(g t+1 g t,a t ) User Behaviour p(u t g t,a t-1 ) User Act Recognition Errors p(o t u t ) u t Memory p(h t u t,h t-1,a t-1 ) History h t o t These parameters typically hand-crafted. How can we learn them from data? Observation at time t Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 2

Factor Graphs and Expectation Propagation Goal g t g t+1 a {θ t-1 i } a t-1 System Act User Act f 1 (S 1 ) where S 1 = ( a t 1, g t,u t ) u t Posterior K k=1 b(s t o t, s t 1, a t 1 ) f k (S k ) History h t o t exact computation is intractable can be approximated using belief propagation we use Expectation Propagation (EP) using EP factors can be discrete & continuous hence, parameters can be added and updated simultaneously Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 21

Effect of Parameter Learning on TownInfo System Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 22

Structure Learning The ability to learn parameters can be extended to learn structure. Let G = { g k } be a set of Bayesian Networks (or Factor Graphs): Corpus data D Compute Evidence p(d g k ) for all g k in G Augment G by perturbing each g k Evidence can be computed efficiently using expectation propagation Prune G Stop? Potential to learn: a) additional values for an existing variable b) new hidden variables c) new links between variables Select g k with highest Evidence Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 23

Conclusions Statistical Dialogue Systems based on POMDPs are viable, offer increased robustness to noise and require no hand-crafting Good progress is being made on increasing accuracy and speeding up learning Learning directly on human users rather than depending on user simulators is now possible Current systems are built from static ontologies for closed domains Next steps will include building more flexible systems capable of dynamically adapting to new information content. Acknowledgement - this work is the result of a team effort: Catherin Breslin, Milica Gasic, Matt Henderson, Dongho Kim, Martin Szummer, Blaise Thomson, Pirros Tsiakoulis and former members of the CUED Dialogue Systems Group Recent Developments in Statistical Dialogue Systems, Braunschweig, Sept 212 Steve Young 24