Improving Lexical Choice in Neural Machine Translation. Toan Q. Nguyen & David Chiang

Size: px
Start display at page:

Download "Improving Lexical Choice in Neural Machine Translation. Toan Q. Nguyen & David Chiang"

Transcription

1 Improving Lexical Choice in Neural Machine Translation Toan Q. Nguyen & David Chiang 1

2 Overview Problem: Rare word mistranslation Model 1: softmax cosine similarity (fixnorm) Model 2: direction connections (fixnorm+lex) Experiments & Results 2

3 Rare word mistranslation src ref NMT Ammo muammolar hali ko p, deydi amerikalik olim Entoni Fauchi. But sull there are many problems, says American scienust Anthony Fauci. But there is sull a lot of problems, says James Chan. NMT tends to translate words that seem natural in the context, but do not reflect the content of the source sentence (Arthur et al., 2016). 3

4 h<latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> Softmax query vector B A 4

5 h<latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> Softmax using dot product query vector A B 5

6 h<latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> Softmax using dot product query vector A B A might be closer in direction still, B wins 6

7 h<latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> Softmax using dot product query vector A B 7

8 <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> Softmax using dot product query vector h<latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit>

9 h<latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> Softmax using dot product Source: Ammo muammolar hali ko p, deydi amerikalik olim Entoni Fauchi. Reference: But still there are many problems, says American scientist Anthony Fauci. NMT: But there is still a lot of problems, says James Chan. query vector Anthony Fauci Director NIAID Margaret Chan Former Director WHO 9

10 <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> Softmax using cosine similarity (fixnorm) Source: Ammo muammolar hali ko p, deydi amerikalik olim Entoni Fauchi. Reference: But still there are many problems, says American scientist Anthony Fauci. NMT: But there is still a lot of problems, says James Chan. Fixed-norm: But there is still a lot of problems, says American scientist. query vector Solution: fix magnitudes of vectors to constant h<latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> with weight normalization Anthony Fauci Director NIAID Salimans and Kingma (2016) Margaret Chan Former Director WHO 10

11 Direct Connections (fixnorm+lex) Chan Pro: Word choice depends on output source and target context Con: Word choice depends on decoder source and target context encoder Fauchi 11

12 Direct Connections (fixnorm+lex) Fauci Pro: Word choice depends on output decoder source and target context Con: Word choice depends on source and target context Solution: Add a more direct path encoder that depends only on source word Fauchi Reference: But still there are many problems, says American scientist Anthony Fauci. Fixed-norm + lexical: But there are still problems, says American scientist Anthony Fauci. 12

13 Direct Connections (fixnorm+lex) 13

14 Direct Connections (fixnorm+lex) Fauci l nmt h t c t h t h s Ammo olim Entoni Fauchi <bos> Anthony Fauci 13

15 Direct Connections (fixnorm+lex) Fauci l lex l nmt h t c t c e h t h s Ammo olim Entoni Fauchi <bos> Anthony Fauci 13

16 Experiments We compare our models against the following baselines: Moses NMT + tied embedding (Inan et al., 2017; Press and Wolf, 2017) Arthur: NMT + tied embedding + discrete lexicon by Arthur et al. (2016) NMT systems: Global attention + general scoring + feed input (Luong et al., 2015a) Datasets: 9 datasets, 8 language pairs, ranging from 0.1-8M words Training: Adadelta, dropout, select checkpoint based on dev BLEU 14

17 Results (word-based) 15

18 Results (word-based) NMT Moses/bitext LM Arthur fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Ja(KFTT)* Lower resource Higher resource 15

19 Results (word-based) NMT Moses/bitext LM Arthur fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Ja(KFTT)* Lower resource Higher resource 15

20 Results (word-based) NMT Moses/bitext LM Arthur fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Ja(KFTT)* Lower resource Higher resource 15

21 Results (word-based) NMT Moses/bitext LM Arthur fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Ja(KFTT)* Lower resource Higher resource 15

22 Results (word-based) NMT Moses/bitext LM Arthur fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Ja(KFTT)* Lower resource Higher resource 15

23 Results (word-based) NMT Moses/bitext LM Arthur fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Ja(KFTT)* Lower resource Higher resource 15

24 Alignments & replacement Koehn and Knowles (2017): Alignments are sometimes shifted Could affect replacement (Luong et al., 2015b) Koehn and Knowles,

25 17

26 17

27 17

28 17

29 17

30 17

31 17

32 17

33 17

34 17

35 17

36 Byte Pair Encoding (BPE) communists > + sts agglomerációs > ációs 18

37 Results (BPE) 19

38 Results (BPE) NMT Moses fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Lower resource Higher resource 19

39 Results (BPE) NMT Moses fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Lower resource Higher resource 19

40 Results (BPE) NMT Moses fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Lower resource Higher resource 19

41 Results (BPE) NMT Moses fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Lower resource Higher resource 19

42 Results (BPE) 30 NMT Moses fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Lower resource Higher resource 19

43 What s different about BPE? 20

44 What s different about BPE? Here &apos;s V@@ la@@ di@@ mi@@ r src T@@ sa@@ st@@ sin form T@@ art@@ u in E@@ st@@ onia. ref Đây là Vladimir Tsastsin đến từ Tartu, Estonia Đây là V@@ la@@ di@@ mi@@ r T@@ NMT sa@@ st@@ sin ở E@@ st@@ onia. (Đây là Vladimir Tsastsin ở Estonia.) 21

45 Conclusion We present two simple and effective solutions for rare word mistranslation Word-based: improvements on all tested languages, up to +5.5 BLEU BPE-based: improvements on lowresource languages, up to +1.9 BLEU 22

46 Code improving_lexical_choice_in_ nmt Thanks! 23

47 Train perplexity 24

48 Train perplexity NMT fixnorm fixnorm+lex Train perplexity Ta Ur Ha Tu Uz Hu Vi 24

49 Train perplexity NMT fixnorm fixnorm+lex Train perplexity Ta Ur Ha Tu Uz Hu Vi 24

50 Train perplexity NMT fixnorm fixnorm+lex Train perplexity Ta Ur Ha Tu Uz Hu Vi 24

51 Train perplexity NMT fixnorm fixnorm+lex Train perplexity Ta Ur Ha Tu Uz Hu Vi 24

52 Dev perplexity 25

53 Dev perplexity NMT fixnorm fixnorm+lex Dev perplexity Ta Ur Ha Tu Uz Hu Vi 25

54 Dev perplexity NMT fixnorm fixnorm+lex Dev perplexity Ta Ur Ha Tu Uz Hu Vi 25

55 Dev perplexity NMT fixnorm fixnorm+lex Dev perplexity Ta Ur Ha Tu Uz Hu Vi 25

56 Dev perplexity NMT fixnorm fixnorm+lex Dev perplexity Ta Ur Ha Tu Uz Hu Vi 25

Neural Hidden Markov Model for Machine Translation

Neural Hidden Markov Model for Machine Translation Neural Hidden Markov Model for Machine Translation Weiyue Wang, Derui Zhu, Tamer Alkhouli, Zixuan Gan and Hermann Ney {surname}@i6.informatik.rwth-aachen.de July 17th, 2018 Human Language Technology and

More information

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill

More information

Learning to translate with neural networks. Michael Auli

Learning to translate with neural networks. Michael Auli Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each

More information

Machine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26

Machine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26 Machine Translation 10: Advanced Neural Machine Translation Architectures Rico Sennrich University of Edinburgh R. Sennrich MT 2018 10 1 / 26 Today s Lecture so far today we discussed RNNs as encoder and

More information

Multi-Source Neural Translation

Multi-Source Neural Translation Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu In the neural encoder-decoder

More information

Multi-Source Neural Translation

Multi-Source Neural Translation Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu Abstract We build a multi-source

More information

Coverage Embedding Models for Neural Machine Translation

Coverage Embedding Models for Neural Machine Translation Coverage Embedding Models for Neural Machine Translation Haitao Mi Baskaran Sankaran Zhiguo Wang Abe Ittycheriah T.J. Watson Research Center IBM 1101 Kitchawan Rd, Yorktown Heights, NY 10598 {hmi, bsankara,

More information

Today s Lecture. Dropout

Today s Lecture. Dropout Today s Lecture so far we discussed RNNs as encoder and decoder we discussed some architecture variants: RNN vs. GRU vs. LSTM attention mechanisms Machine Translation 1: Advanced Neural Machine Translation

More information

Better Conditional Language Modeling. Chris Dyer

Better Conditional Language Modeling. Chris Dyer Better Conditional Language Modeling Chris Dyer Conditional LMs A conditional language model assigns probabilities to sequences of words, w =(w 1,w 2,...,w`), given some conditioning context, x. As with

More information

Factored Neural Machine Translation Architectures

Factored Neural Machine Translation Architectures Factored Neural Machine Translation Architectures Mercedes García-Martínez, Loïc Barrault and Fethi Bougares LIUM - University of Le Mans IWSLT 2016 December 8-9, 2016 1 / 19 Neural Machine Translation

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Sequence to Sequence Models and Attention

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Sequence to Sequence Models and Attention TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Sequence to Sequence Models and Attention Encode-Decode Architectures for Machine Translation [Figure from Luong et al.] In Sutskever

More information

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35 Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How

More information

Phrase-Based Statistical Machine Translation with Pivot Languages

Phrase-Based Statistical Machine Translation with Pivot Languages Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008

More information

Word Attention for Sequence to Sequence Text Understanding

Word Attention for Sequence to Sequence Text Understanding Word Attention for Sequence to Sequence Text Understanding Lijun Wu 1, Fei Tian 2, Li Zhao 2, Jianhuang Lai 1,3 and Tie-Yan Liu 2 1 School of Data and Computer Science, Sun Yat-sen University 2 Microsoft

More information

Conditional Language Modeling. Chris Dyer

Conditional Language Modeling. Chris Dyer Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain

More information

Tuning as Linear Regression

Tuning as Linear Regression Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical

More information

The Geometry of Statistical Machine Translation

The Geometry of Statistical Machine Translation The Geometry of Statistical Machine Translation Presented by Rory Waite 16th of December 2015 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions ntroduction We provide

More information

NEURAL LANGUAGE MODELS

NEURAL LANGUAGE MODELS COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,

More information

TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto

TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto Marta R. Costa-jussà, Josep M. Crego, Adrià de Gispert, Patrik Lambert, Maxim Khalilov, José A.R. Fonollosa, José

More information

Improved Learning through Augmenting the Loss

Improved Learning through Augmenting the Loss Improved Learning through Augmenting the Loss Hakan Inan inanh@stanford.edu Khashayar Khosravi khosravi@stanford.edu Abstract We present two improvements to the well-known Recurrent Neural Network Language

More information

arxiv: v1 [cs.cl] 21 May 2017

arxiv: v1 [cs.cl] 21 May 2017 Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we

More information

Generating Sentences by Editing Prototypes

Generating Sentences by Editing Prototypes Generating Sentences by Editing Prototypes K. Guu 2, T.B. Hashimoto 1,2, Y. Oren 1, P. Liang 1,2 1 Department of Computer Science Stanford University 2 Department of Statistics Stanford University arxiv

More information

How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures

How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures Tobias Domhan Amazon Berlin, Germany domhant@amazon.com Abstract With recent advances in network architectures

More information

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley

Algorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Algorithms for NLP Machine Translation II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Project 4: Word Alignment! Will be released soon! (~Monday) Phrase-Based System Overview

More information

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26 The Task (Logical Form) λx 0.state(x 0

More information

Analysis of techniques for coarse-to-fine decoding in neural machine translation

Analysis of techniques for coarse-to-fine decoding in neural machine translation Analysis of techniques for coarse-to-fine decoding in neural machine translation Soňa Galovičová E H U N I V E R S I T Y T O H F R G E D I N B U Master of Science by Research School of Informatics University

More information

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech

More information

CS224N Final Project: Exploiting the Redundancy in Neural Machine Translation

CS224N Final Project: Exploiting the Redundancy in Neural Machine Translation CS224N Final Project: Exploiting the Redundancy in Neural Machine Translation Abi See Stanford University abisee@stanford.edu Abstract Neural Machine Translation (NMT) has enjoyed success in recent years,

More information

Applications of Deep Learning

Applications of Deep Learning Applications of Deep Learning Alpha Go Google Translate Data Center Optimisation Robin Sigurdson, Yvonne Krumbeck, Henrik Arnelid November 23, 2016 Template by Philipp Arndt Applications of Deep Learning

More information

Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation

Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation Yilin Yang 1 Liang Huang 1,2 Mingbo Ma 1,2 1 Oregon State University Corvallis, OR,

More information

CKY-based Convolutional Attention for Neural Machine Translation

CKY-based Convolutional Attention for Neural Machine Translation CKY-based Convolutional Attention for Neural Machine Translation Taiki Watanabe and Akihiro Tamura and Takashi Ninomiya Ehime University 3 Bunkyo-cho, Matsuyama, Ehime, JAPAN {t_watanabe@ai.cs, tamura@cs,

More information

Triplet Lexicon Models for Statistical Machine Translation

Triplet Lexicon Models for Statistical Machine Translation Triplet Lexicon Models for Statistical Machine Translation Saša Hasan, Juri Ganitkevitch, Hermann Ney and Jesús Andrés Ferrer lastname@cs.rwth-aachen.de CLSP Student Seminar February 6, 2009 Human Language

More information

Random Coattention Forest for Question Answering

Random Coattention Forest for Question Answering Random Coattention Forest for Question Answering Jheng-Hao Chen Stanford University jhenghao@stanford.edu Ting-Po Lee Stanford University tingpo@stanford.edu Yi-Chun Chen Stanford University yichunc@stanford.edu

More information

Tracking the World State with Recurrent Entity Networks

Tracking the World State with Recurrent Entity Networks Tracking the World State with Recurrent Entity Networks Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes, Yann LeCun Task At each timestep, get information (in the form of a sentence) about the

More information

Topics in Natural Language Processing

Topics in Natural Language Processing Topics in Natural Language Processing Shay Cohen Institute for Language, Cognition and Computation University of Edinburgh Lecture 9 Administrativia Next class will be a summary Please email me questions

More information

CS230: Lecture 10 Sequence models II

CS230: Lecture 10 Sequence models II CS23: Lecture 1 Sequence models II Today s outline We will learn how to: - Automatically score an NLP model I. BLEU score - Improve Machine II. Beam Search Translation results with Beam search III. Speech

More information

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:

More information

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018 Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:

More information

Self-Attention with Relative Position Representations

Self-Attention with Relative Position Representations Self-Attention with Relative Position Representations Peter Shaw Google petershaw@google.com Jakob Uszkoreit Google Brain usz@google.com Ashish Vaswani Google Brain avaswani@google.com Abstract Relying

More information

Modelling Latent Variation in Translation Data

Modelling Latent Variation in Translation Data Modelling Latent Variation in Translation Data Wilker Aziz University of Amsterdam (joint work with Philip Schulz and Trevor Cohn) May 25, 2018 Wilker Aziz Stochastic decoder 1 Outline 1 2 3 4 5 Wilker

More information

Edinburgh Research Explorer

Edinburgh Research Explorer Edinburgh Research Explorer Nematus: a Toolkit for Neural Machine Translation Citation for published version: Sennrich R Firat O Cho K Birch-Mayne A Haddow B Hitschler J Junczys-Dowmunt M Läubli S Miceli

More information

Presented By: Omer Shmueli and Sivan Niv

Presented By: Omer Shmueli and Sivan Niv Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan

More information

Overview (Fall 2007) Machine Translation Part III. Roadmap for the Next Few Lectures. Phrase-Based Models. Learning phrases from alignments

Overview (Fall 2007) Machine Translation Part III. Roadmap for the Next Few Lectures. Phrase-Based Models. Learning phrases from alignments Overview Learning phrases from alignments 6.864 (Fall 2007) Machine Translation Part III A phrase-based model Decoding in phrase-based models (Thanks to Philipp Koehn for giving me slides from his EACL

More information

Improving Sequence-to-Sequence Constituency Parsing

Improving Sequence-to-Sequence Constituency Parsing Improving Sequence-to-Sequence Constituency Parsing Lemao Liu, Muhua Zhu and Shuming Shi Tencent AI Lab, Shenzhen, China {redmondliu,muhuazhu, shumingshi}@tencent.com Abstract Sequence-to-sequence constituency

More information

Conditional Language modeling with attention

Conditional Language modeling with attention Conditional Language modeling with attention 2017.08.25 Oxford Deep NLP 조수현 Review Conditional language model: assign probabilities to sequence of words given some conditioning context x What is the probability

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

with Local Dependencies

with Local Dependencies CS11-747 Neural Networks for NLP Structured Prediction with Local Dependencies Xuezhe Ma (Max) Site https://phontron.com/class/nn4nlp2017/ An Example Structured Prediction Problem: Sequence Labeling Sequence

More information

Variational Attention for Sequence-to-Sequence Models

Variational Attention for Sequence-to-Sequence Models Variational Attention for Sequence-to-Sequence Models Hareesh Bahuleyan, 1 Lili Mou, 1 Olga Vechtomova, Pascal Poupart University of Waterloo March, 2018 Outline 1 Introduction 2 Background & Motivation

More information

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017 Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion

More information

CHEMICAL NAMES STANDARDIZATION USING NEURAL SEQUENCE TO SEQUENCE MODEL

CHEMICAL NAMES STANDARDIZATION USING NEURAL SEQUENCE TO SEQUENCE MODEL CHEMICAL NAMES STANDARDIZATION USING NEURAL SEQUENCE TO SEQUENCE MODEL Anonymous authors Paper under double-blind review ABSTRACT Chemical information extraction is to convert chemical knowledge in text

More information

Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction

Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction Maha Elbayad 1,2 Laurent Besacier 1 Jakob Verbeek 2 Univ. Grenoble Alpes, CNRS, Grenoble INP, Inria, LIG, LJK,

More information

arxiv: v1 [cs.cl] 22 Jun 2015

arxiv: v1 [cs.cl] 22 Jun 2015 Neural Transformation Machine: A New Architecture for Sequence-to-Sequence Learning arxiv:1506.06442v1 [cs.cl] 22 Jun 2015 Fandong Meng 1 Zhengdong Lu 2 Zhaopeng Tu 2 Hang Li 2 and Qun Liu 1 1 Institute

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Info 159/259 Lecture 12: Features and hypothesis tests (Oct 3, 2017) David Bamman, UC Berkeley Announcements No office hours for DB this Friday (email if you d like to chat)

More information

Memory-Augmented Attention Model for Scene Text Recognition

Memory-Augmented Attention Model for Scene Text Recognition Memory-Augmented Attention Model for Scene Text Recognition Cong Wang 1,2, Fei Yin 1,2, Cheng-Lin Liu 1,2,3 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences

More information

arxiv: v2 [cs.cl] 20 Jun 2017

arxiv: v2 [cs.cl] 20 Jun 2017 Po-Sen Huang * 1 Chong Wang * 1 Dengyong Zhou 1 Li Deng 2 arxiv:1706.05565v2 [cs.cl] 20 Jun 2017 Abstract In this paper, we propose Neural Phrase-based Machine Translation (NPMT). Our method explicitly

More information

Introduction to Deep Learning CMPT 733. Steven Bergner

Introduction to Deep Learning CMPT 733. Steven Bergner Introduction to Deep Learning CMPT 733 Steven Bergner Overview Renaissance of artificial neural networks Representation learning vs feature engineering Background Linear Algebra, Optimization Regularization

More information

CSC321 Lecture 16: ResNets and Attention

CSC321 Lecture 16: ResNets and Attention CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the

More information

Structured deep models: Deep learning on graphs and beyond

Structured deep models: Deep learning on graphs and beyond Structured deep models: Deep learning on graphs and beyond Hidden layer Hidden layer Input Output ReLU ReLU, 25 May 2018 CompBio Seminar, University of Cambridge In collaboration with Ethan Fetaya, Rianne

More information

Cross-Lingual Language Modeling for Automatic Speech Recogntion

Cross-Lingual Language Modeling for Automatic Speech Recogntion GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The

More information

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation

Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology

More information

A Convolutional Neural Network-based

A Convolutional Neural Network-based A Convolutional Neural Network-based Model for Knowledge Base Completion Dat Quoc Nguyen Joint work with: Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung April 16, 2018 Introduction Word vectors learned

More information

Deep learning for Natural Language Processing and Machine Translation

Deep learning for Natural Language Processing and Machine Translation Deep learning for Natural Language Processing and Machine Translation 2015.10.16 Seung-Hoon Na Contents Introduction: Neural network, deep learning Deep learning for Natural language processing Neural

More information

Attention Based Joint Model with Negative Sampling for New Slot Values Recognition. By: Mulan Hou

Attention Based Joint Model with Negative Sampling for New Slot Values Recognition. By: Mulan Hou Attention Based Joint Model with Negative Sampling for New Slot Values Recognition By: Mulan Hou houmulan@bupt.edu.cn CONTE NTS 1 2 3 4 5 6 Introduction Related work Motivation Proposed model Experiments

More information

CSC321 Lecture 20: Reversible and Autoregressive Models

CSC321 Lecture 20: Reversible and Autoregressive Models CSC321 Lecture 20: Reversible and Autoregressive Models Roger Grosse Roger Grosse CSC321 Lecture 20: Reversible and Autoregressive Models 1 / 23 Overview Four modern approaches to generative modeling:

More information

Applications of Tree Automata Theory Lecture VI: Back to Machine Translation

Applications of Tree Automata Theory Lecture VI: Back to Machine Translation Applications of Tree Automata Theory Lecture VI: Back to Machine Translation Andreas Maletti Institute of Computer Science Universität Leipzig, Germany on leave from: Institute for Natural Language Processing

More information

Minimum Error Rate Training Semiring

Minimum Error Rate Training Semiring Minimum Error Rate Training Semiring Artem Sokolov & François Yvon LIMSI-CNRS & LIMSI-CNRS/Univ. Paris Sud {artem.sokolov,francois.yvon}@limsi.fr EAMT 2011 31 May 2011 Artem Sokolov & François Yvon (LIMSI)

More information

Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation

Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation Joern Wuebker, Hermann Ney Human Language Technology and Pattern Recognition Group Computer Science

More information

text classification 3: neural networks

text classification 3: neural networks text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University

More information

Sequence Modeling with Neural Networks

Sequence Modeling with Neural Networks Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes

More information

FIRST EXPERIMENTS WITH NEURAL TRANSLATION

FIRST EXPERIMENTS WITH NEURAL TRANSLATION FIRST EXPERIMENTS WITH NEURAL TRANSLATION OF INFORMAL MATHEMATICS TO FORMAL Qingxiang Wang Cezary Kaliszyk Josef Urban University of Innsbruck Czech Technical University in Prague ICMS 2018 July 25, 2018

More information

Neural'Machine'Transla/on''

Neural'Machine'Transla/on'' NeuralMachineTransla/on ThangLuong Lecture@CS224N (ThankstoChrisManning,AbigailSee,and RussellStewartforcommentsanddiscussions.) Sixiroastedhusband MeatMuscleStupid BeanSprouts Let sbacktrack (FromChrisManning)

More information

Multi-Task Word Alignment Triangulation for Low-Resource Languages

Multi-Task Word Alignment Triangulation for Low-Resource Languages Multi-Task Word Alignment Triangulation for Low-Resource Languages Tomer Levinboim and David Chiang Department of Computer Science and Engineering University of Notre Dame {levinboim.1,dchiang}@nd.edu

More information

Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition

Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition Jianshu Zhang, Yixing Zhu, Jun Du and Lirong Dai National Engineering Laboratory for Speech and Language Information

More information

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions 2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors: S. Scardapane, S. Van Vaerenbergh,

More information

Recap: Language models. Foundations of Natural Language Processing Lecture 4 Language Models: Evaluation and Smoothing. Two types of evaluation in NLP

Recap: Language models. Foundations of Natural Language Processing Lecture 4 Language Models: Evaluation and Smoothing. Two types of evaluation in NLP Recap: Language models Foundations of atural Language Processing Lecture 4 Language Models: Evaluation and Smoothing Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipp

More information

arxiv: v2 [cs.cl] 1 Jan 2019

arxiv: v2 [cs.cl] 1 Jan 2019 Variational Self-attention Model for Sentence Representation arxiv:1812.11559v2 [cs.cl] 1 Jan 2019 Qiang Zhang 1, Shangsong Liang 2, Emine Yilmaz 1 1 University College London, London, United Kingdom 2

More information

COMS 4705, Fall Machine Translation Part III

COMS 4705, Fall Machine Translation Part III COMS 4705, Fall 2011 Machine Translation Part III 1 Roadmap for the Next Few Lectures Lecture 1 (last time): IBM Models 1 and 2 Lecture 2 (today): phrase-based models Lecture 3: Syntax in statistical machine

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

arxiv: v1 [cs.cl] 30 Oct 2018

arxiv: v1 [cs.cl] 30 Oct 2018 Journal of Computer Science and Cybernetics, V.xx, N.x (20xx), 1 DOI no. 10.15625/1813-9663/xx/x/xxxx NEURAL MACHINE TRANSLATION BETWEEN VIETNAMESE AND ENGLISH: AN EMPIRICAL STUDY Hong-Hai Phan-Vu 1, Viet-Trung

More information

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding

More information

Bidirectional Attention for SQL Generation. Tong Guo 1 Huilin Gao 2. Beijing, China

Bidirectional Attention for SQL Generation. Tong Guo 1 Huilin Gao 2. Beijing, China Bidirectional Attention for SQL Generation Tong Guo 1 Huilin Gao 2 1 Lenovo AI Lab 2 China Electronic Technology Group Corporation Information Science Academy, Beijing, China arxiv:1801.00076v6 [cs.cl]

More information

CS230: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention

CS230: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention CS23: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention Today s outline We will learn how to: I. Word Vector Representation i. Training - Generalize results with word vectors -

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Machine Translation. Uwe Dick

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Machine Translation. Uwe Dick Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Machine Translation Uwe Dick Google Translate Rosetta Stone Hieroglyphs Demotic Greek Machine Translation Automatically translate

More information

Discriminative Training

Discriminative Training Discriminative Training February 19, 2013 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Vinod Variyam and Ian Goodfellow) sscott@cse.unl.edu 2 / 35 All our architectures so far work on fixed-sized inputs neural networks work on sequences of inputs E.g., text, biological

More information

Extractive Question Answering Using Match-LSTM and Answer Pointer

Extractive Question Answering Using Match-LSTM and Answer Pointer Extractive Question Answering Using Match-LSTM and Answer Pointer Alexander Haigh, Cameron van de Graaf, Jake Rachleff Department of Computer Science Stanford University Stanford, CA 94305 Codalab username:

More information

Language Model Rest Costs and Space-Efficient Storage

Language Model Rest Costs and Space-Efficient Storage Language Model Rest Costs and Space-Efficient Storage Kenneth Heafield Philipp Koehn Alon Lavie Carnegie Mellon, University of Edinburgh July 14, 2012 Complaint About Language Models Make Search Expensive

More information

LEARNING SPARSE STRUCTURED ENSEMBLES WITH STOCASTIC GTADIENT MCMC SAMPLING AND NETWORK PRUNING

LEARNING SPARSE STRUCTURED ENSEMBLES WITH STOCASTIC GTADIENT MCMC SAMPLING AND NETWORK PRUNING LEARNING SPARSE STRUCTURED ENSEMBLES WITH STOCASTIC GTADIENT MCMC SAMPLING AND NETWORK PRUNING Yichi Zhang Zhijian Ou Speech Processing and Machine Intelligence (SPMI) Lab Department of Electronic Engineering

More information

A Discriminative Model for Semantics-to-String Translation

A Discriminative Model for Semantics-to-String Translation A Discriminative Model for Semantics-to-String Translation Aleš Tamchyna 1 and Chris Quirk 2 and Michel Galley 2 1 Charles University in Prague 2 Microsoft Research July 30, 2015 Tamchyna, Quirk, Galley

More information

Deep Neural Machine Translation with Linear Associative Unit

Deep Neural Machine Translation with Linear Associative Unit Deep Neural Machine Translation with Linear Associative Unit Mingxuan Wang 1 Zhengdong Lu 2 Jie Zhou 2 Qun Liu 4,5 1 Mobile Internet Group, Tencent Technology Co., Ltd wangmingxuan@ict.ac.cn 2 DeeplyCurious.ai

More information

A TIME-RESTRICTED SELF-ATTENTION LAYER FOR ASR

A TIME-RESTRICTED SELF-ATTENTION LAYER FOR ASR A TIME-RESTRICTED SELF-ATTENTION LAYER FOR ASR Daniel Povey,2, Hossein Hadian, Pegah Ghahremani, Ke Li, Sanjeev Khudanpur,2 Center for Language and Speech Processing, Johns Hopkins University, Baltimore,

More information

statistical machine translation

statistical machine translation statistical machine translation P A R T 3 : D E C O D I N G & E V A L U A T I O N CSC401/2511 Natural Language Computing Spring 2019 Lecture 6 Frank Rudzicz and Chloé Pou-Prom 1 University of Toronto Statistical

More information

Towards Binary-Valued Gates for Robust LSTM Training

Towards Binary-Valued Gates for Robust LSTM Training Zhuohan Li 1 Di He 1 Fei Tian 2 Wei Chen 2 Tao Qin 2 Liwei Wang 1 3 Tie-Yan Liu 2 Abstract Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling. It aims

More information

Neural Architectures for Image, Language, and Speech Processing

Neural Architectures for Image, Language, and Speech Processing Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent

More information

Machine Learning Basics

Machine Learning Basics Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each

More information

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale Noah Smith c 2017 University of Washington nasmith@cs.washington.edu May 22, 2017 1 / 30 To-Do List Online

More information

Generative Models for Sentences

Generative Models for Sentences Generative Models for Sentences Amjad Almahairi PhD student August 16 th 2014 Outline 1. Motivation Language modelling Full Sentence Embeddings 2. Approach Bayesian Networks Variational Autoencoders (VAE)

More information

arxiv: v1 [cs.cl] 22 Jun 2017

arxiv: v1 [cs.cl] 22 Jun 2017 Neural Machine Translation with Gumbel-Greedy Decoding Jiatao Gu, Daniel Jiwoong Im, and Victor O.K. Li The University of Hong Kong AIFounded Inc. {jiataogu, vli}@eee.hku.hk daniel.im@aifounded.com arxiv:1706.07518v1

More information