smart reply and implicit semantics Matthew Henderson and Brian Strope Google AI

Size: px

Start display at page:

Download "smart reply and implicit semantics Matthew Henderson and Brian Strope Google AI"

Rosaline Palmer
6 years ago
Views:

1 smart reply and implicit semantics Matthew Henderson and Brian Strope Google AI

2 collaborators include: Rami Al-Rfou, Yun-hsuan Sung Laszlo Lukacs, Ruiqi Guo, Sanjiv Kumar Balint Miklos, Ray Kurzweil and many others

3 Machine learning works when it generalizes to things unseen in training. training: How do you commute to work? -> I ride my bike. What s your favorite color? -> I like red. testing: Do you like red bikes? ->

4 generalization strategies explicit semantics: discrete frames, slots and values implicit semantics: continuous vectors

5 explicit semantics specified by humans (often for a task) debuggable fundamental to understanding (?) implicit semantics not specified derived during training emergent natural efficiency for compression and generalization (?)

6 explicit and implicit semantics: analog / digital (explicit) discrete digital text input discrete logical computation discrete digital text output (implicit) discrete digital text input continuous high-dim analog computation discrete digital text output

7 signal processing (opposite) real world real world continuous analog input discrete computation continuous analog output

8 implicit semantics -- why go back to continuous? digital channel implicit semantics digital channel discrete digital text input continuous high-dim analog computation discrete digital text output

9 channels of communication are digital ideas digital channel implicit semantics digital channel ideas continuous high-dim analog thinking discrete digital text input continuous high-dim analog computation discrete digital text output continuous high-dim analog thinking ideas and even reasoning can be continuous

10 training task with semantic pressure next sentence prediction, reply prediction I saw a really good band last night.

11 training task with semantic pressure next sentence prediction, reply prediction I saw a really good band last night. They played upbeat dance music.

12 training task with semantic pressure next sentence prediction, reply prediction I saw a really good band last night. It often rains in the winter. On Thursdays we like to go out. They played upbeat dance music. The tree looks good to me. Did you get a new car? My son likes to windsurf. Looking forward to lunch.

13 an initial application: smart reply

Smart reply for Inbox & Gmail feature that suggests short responses to emails initial system used an LSTM to read input email, and

14 Smart reply for Inbox & Gmail feature that suggests short responses to s initial system used an LSTM to read input , and did a beam search over the whitelist measure 'suggest conversion', %age of times shown suggestions are clicked INNOVATION + ASSISTANCE 14

15 The direct smartreply system input trained to give a high score for the response found in the data, low score for random responses. N best list final score of an and response is a dot-product of two vectors whitelist of responses INNOVATION + ASSISTANCE 15

16 Training a dot-product model x 1.y 1 x 1.y 2 x 1.y 3 x 1.y 4 x 1.y 5 network encodes a batch of input x 2.y 1 x 2.y 2 x 2.y 3 x 2.y 4 x 2.y 5 s to vectors: x 3.y 1 x 3.y 2 x 3.y 3 x 3.y 4 x 3.y 5 x 1 x 2... x N and responses to vectors: y 1 y 2... y N x 4.y 1 x 4.y 2 x 4.y 3 x 4.y 4 x 4.y 5 x 5.y 1 x 5.y 2 x 5.y 3 x 5.y 4 x 5.y 5 INNOVATION + ASSISTANCE 16

17 Training a dot-product model x 1.y 1 x 1.y 2 x 1.y 3 x 1.y 4 x 1.y 5 the N x N matrix of all scores is a fast matrix product. x 2.y 1 x 2.y 2 x 2.y 3 x 2.y 4 x 2.y 5 x 3.y 1 x 3.y 2 x 3.y 3 x 3.y 4 x 3.y 5 10% absolute improvement in 1 of 100 ranking accuracy over binary x 4.y 1 x 4.y 2 x 4.y 3 x 4.y 4 x 4.y 5 classification. x 5.y 1 x 5.y 2 x 5.y 3 x 5.y 4 x 5.y 5 INNOVATION + ASSISTANCE 17

18 x i = DNN( n-grams of i ) y i = DNN( n-grams of response i ) S ij = x i. y j P( response j i ) e Sij - log P( example i ) = - S ii + log Σ j e Sij "dot product loss" INNOVATION + ASSISTANCE 18

19 Precomputation for dot product model input x the representations of the whitelist Y can be precomputed. scores = x.y approximate nearest neighbor search can speed up the top N search Y precomputed whitelist of responses INNOVATION + ASSISTANCE 19

20 final score Multi-loss dot product model x y each feature predicts the response on its own, then are combined originally used to inspect importance of each feature gives extra depth and hierarchy 10% absolute improvement in 1 of 100 ranking accuracy over concatenating input features and using a single loss body response subject line response INNOVATION + ASSISTANCE 20

21 final score Multi-loss dot product model x Y precomputed each feature predicts the response on its own, then are combined originally used to inspect importance of each feature but gives extra depth and hierarchy 10% absolute improvement in 1 of 100 ranking accuracy over concatenating input features and using a single loss body response subject line response INNOVATION + ASSISTANCE 21

22 Latency LSTM DNN Dot product + DNN Dot product only Approximate search 5x latency 0.1x 0.02x Beam search over prefix Score everything on the Use dot product model as Use improved multi-loss trie of whitelist. whitelist with a first pass to select 100, dot product model in one fully-connected DNN. then score with DNN. pass of scoring. (non-lstm systems can achieve suggest conversion around 4% higher than LSTM) 0.01x Speed up top N search in dot product space using an efficient nearest neighbor search. INNOVATION + ASSISTANCE 22

23 Response biases Thank you so much for the wonderful gifts. initial direct system got about half the number of clicks of LSTM baseline language model bias improves clicks Glad you liked the gifts. Our pleasure! probability-of-click model on actual smartreply s helps more combinations improve click rate above LSTM baseline You are very welcome! You're welcome! Thank you! INNOVATION + ASSISTANCE 23

24 Quality and latency progress latency relative to LSTM (%) conversion rate relative to LSTM (%) tokenization bug response bias from LM multiloss DNN p(click) improved LSTM efficient search dot product & exhaustive search INNOVATION + ASSISTANCE 8 weeks 24

25 conclusions implicitly semantic representations are useful beam search isn t always necessary (simple works too) having user quality signals (like clicks) can be very helpful Thank you!

Task-Oriented Dialogue System (Young, 2000)

2 Review Task-Oriented Dialogue System (Young, 2000) 3 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Speech Recognition Hypothesis are there any action movies to see