Improving Lexical Choice in Neural Machine Translation. Toan Q. Nguyen & David Chiang
|
|
- Anna Harmon
- 5 years ago
- Views:
Transcription
1 Improving Lexical Choice in Neural Machine Translation Toan Q. Nguyen & David Chiang 1
2 Overview Problem: Rare word mistranslation Model 1: softmax cosine similarity (fixnorm) Model 2: direction connections (fixnorm+lex) Experiments & Results 2
3 Rare word mistranslation src ref NMT Ammo muammolar hali ko p, deydi amerikalik olim Entoni Fauchi. But sull there are many problems, says American scienust Anthony Fauci. But there is sull a lot of problems, says James Chan. NMT tends to translate words that seem natural in the context, but do not reflect the content of the source sentence (Arthur et al., 2016). 3
4 h<latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> Softmax query vector B A 4
5 h<latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> Softmax using dot product query vector A B 5
6 h<latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> Softmax using dot product query vector A B A might be closer in direction still, B wins 6
7 h<latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> Softmax using dot product query vector A B 7
8 <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> Softmax using dot product query vector h<latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit>
9 h<latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> Softmax using dot product Source: Ammo muammolar hali ko p, deydi amerikalik olim Entoni Fauchi. Reference: But still there are many problems, says American scientist Anthony Fauci. NMT: But there is still a lot of problems, says James Chan. query vector Anthony Fauci Director NIAID Margaret Chan Former Director WHO 9
10 <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> <latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> Softmax using cosine similarity (fixnorm) Source: Ammo muammolar hali ko p, deydi amerikalik olim Entoni Fauchi. Reference: But still there are many problems, says American scientist Anthony Fauci. NMT: But there is still a lot of problems, says James Chan. Fixed-norm: But there is still a lot of problems, says American scientist. query vector Solution: fix magnitudes of vectors to constant h<latexit sha1_base64="eocxerqep48atrgieqyvv2nhrs4=">aaab8hicbvbns8naej3ur1q/qh69lbbbu0le0gpri8ck9kpaudabtbt0swm7e6ge/govhhtx6s/x5r9x2+agrq8ghu/nmdmvskuw6lrftmltfwnzq7xd2dnd2z+ohh61tzjpxlsskynubtrwkrrvoudju6nmna4k7wtj25nfeelaieq94ctlfkyhskscubtsyx+fdhk+mg6qnbfuzkfwiveqghrodqpf/tbhwcwvmkmn6xluin5onqom+btszwxpkrvtie9zqmjmjz/pd56sm6uejeq0lyvkrv6eyglszcqobgdmcwswvzn4n9flmlr2c6hsdllii0vrjgkmzpy9cyxmdoxeesq0slcsnqkamrqzvwwi3vllq6r9uffcund/wwvcfhgu4qro4rw8uiig3eetwsaghmd4htdhoy/ou/oxac05xcwx/ihz+qmrdzci</latexit> with weight normalization Anthony Fauci Director NIAID Salimans and Kingma (2016) Margaret Chan Former Director WHO 10
11 Direct Connections (fixnorm+lex) Chan Pro: Word choice depends on output source and target context Con: Word choice depends on decoder source and target context encoder Fauchi 11
12 Direct Connections (fixnorm+lex) Fauci Pro: Word choice depends on output decoder source and target context Con: Word choice depends on source and target context Solution: Add a more direct path encoder that depends only on source word Fauchi Reference: But still there are many problems, says American scientist Anthony Fauci. Fixed-norm + lexical: But there are still problems, says American scientist Anthony Fauci. 12
13 Direct Connections (fixnorm+lex) 13
14 Direct Connections (fixnorm+lex) Fauci l nmt h t c t h t h s Ammo olim Entoni Fauchi <bos> Anthony Fauci 13
15 Direct Connections (fixnorm+lex) Fauci l lex l nmt h t c t c e h t h s Ammo olim Entoni Fauchi <bos> Anthony Fauci 13
16 Experiments We compare our models against the following baselines: Moses NMT + tied embedding (Inan et al., 2017; Press and Wolf, 2017) Arthur: NMT + tied embedding + discrete lexicon by Arthur et al. (2016) NMT systems: Global attention + general scoring + feed input (Luong et al., 2015a) Datasets: 9 datasets, 8 language pairs, ranging from 0.1-8M words Training: Adadelta, dropout, select checkpoint based on dev BLEU 14
17 Results (word-based) 15
18 Results (word-based) NMT Moses/bitext LM Arthur fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Ja(KFTT)* Lower resource Higher resource 15
19 Results (word-based) NMT Moses/bitext LM Arthur fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Ja(KFTT)* Lower resource Higher resource 15
20 Results (word-based) NMT Moses/bitext LM Arthur fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Ja(KFTT)* Lower resource Higher resource 15
21 Results (word-based) NMT Moses/bitext LM Arthur fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Ja(KFTT)* Lower resource Higher resource 15
22 Results (word-based) NMT Moses/bitext LM Arthur fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Ja(KFTT)* Lower resource Higher resource 15
23 Results (word-based) NMT Moses/bitext LM Arthur fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Ja(KFTT)* Lower resource Higher resource 15
24 Alignments & replacement Koehn and Knowles (2017): Alignments are sometimes shifted Could affect replacement (Luong et al., 2015b) Koehn and Knowles,
25 17
26 17
27 17
28 17
29 17
30 17
31 17
32 17
33 17
34 17
35 17
36 Byte Pair Encoding (BPE) communists > + sts agglomerációs > ációs 18
37 Results (BPE) 19
38 Results (BPE) NMT Moses fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Lower resource Higher resource 19
39 Results (BPE) NMT Moses fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Lower resource Higher resource 19
40 Results (BPE) NMT Moses fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Lower resource Higher resource 19
41 Results (BPE) NMT Moses fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Lower resource Higher resource 19
42 Results (BPE) 30 NMT Moses fixnorm fixnorm+lex BLEU Ta Ur Ha Tu Uz Hu Vi Lower resource Higher resource 19
43 What s different about BPE? 20
44 What s different about BPE? Here 's V@@ la@@ di@@ mi@@ r src T@@ sa@@ st@@ sin form T@@ art@@ u in E@@ st@@ onia. ref Đây là Vladimir Tsastsin đến từ Tartu, Estonia Đây là V@@ la@@ di@@ mi@@ r T@@ NMT sa@@ st@@ sin ở E@@ st@@ onia. (Đây là Vladimir Tsastsin ở Estonia.) 21
45 Conclusion We present two simple and effective solutions for rare word mistranslation Word-based: improvements on all tested languages, up to +5.5 BLEU BPE-based: improvements on lowresource languages, up to +1.9 BLEU 22
46 Code improving_lexical_choice_in_ nmt Thanks! 23
47 Train perplexity 24
48 Train perplexity NMT fixnorm fixnorm+lex Train perplexity Ta Ur Ha Tu Uz Hu Vi 24
49 Train perplexity NMT fixnorm fixnorm+lex Train perplexity Ta Ur Ha Tu Uz Hu Vi 24
50 Train perplexity NMT fixnorm fixnorm+lex Train perplexity Ta Ur Ha Tu Uz Hu Vi 24
51 Train perplexity NMT fixnorm fixnorm+lex Train perplexity Ta Ur Ha Tu Uz Hu Vi 24
52 Dev perplexity 25
53 Dev perplexity NMT fixnorm fixnorm+lex Dev perplexity Ta Ur Ha Tu Uz Hu Vi 25
54 Dev perplexity NMT fixnorm fixnorm+lex Dev perplexity Ta Ur Ha Tu Uz Hu Vi 25
55 Dev perplexity NMT fixnorm fixnorm+lex Dev perplexity Ta Ur Ha Tu Uz Hu Vi 25
56 Dev perplexity NMT fixnorm fixnorm+lex Dev perplexity Ta Ur Ha Tu Uz Hu Vi 25
Neural Hidden Markov Model for Machine Translation
Neural Hidden Markov Model for Machine Translation Weiyue Wang, Derui Zhu, Tamer Alkhouli, Zixuan Gan and Hermann Ney {surname}@i6.informatik.rwth-aachen.de July 17th, 2018 Human Language Technology and
More informationGoogle s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill
More informationLearning to translate with neural networks. Michael Auli
Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each
More informationMachine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26
Machine Translation 10: Advanced Neural Machine Translation Architectures Rico Sennrich University of Edinburgh R. Sennrich MT 2018 10 1 / 26 Today s Lecture so far today we discussed RNNs as encoder and
More informationMulti-Source Neural Translation
Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu In the neural encoder-decoder
More informationMulti-Source Neural Translation
Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu Abstract We build a multi-source
More informationCoverage Embedding Models for Neural Machine Translation
Coverage Embedding Models for Neural Machine Translation Haitao Mi Baskaran Sankaran Zhiguo Wang Abe Ittycheriah T.J. Watson Research Center IBM 1101 Kitchawan Rd, Yorktown Heights, NY 10598 {hmi, bsankara,
More informationToday s Lecture. Dropout
Today s Lecture so far we discussed RNNs as encoder and decoder we discussed some architecture variants: RNN vs. GRU vs. LSTM attention mechanisms Machine Translation 1: Advanced Neural Machine Translation
More informationBetter Conditional Language Modeling. Chris Dyer
Better Conditional Language Modeling Chris Dyer Conditional LMs A conditional language model assigns probabilities to sequences of words, w =(w 1,w 2,...,w`), given some conditioning context, x. As with
More informationFactored Neural Machine Translation Architectures
Factored Neural Machine Translation Architectures Mercedes García-Martínez, Loïc Barrault and Fethi Bougares LIUM - University of Le Mans IWSLT 2016 December 8-9, 2016 1 / 19 Neural Machine Translation
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, April Sequence to Sequence Models and Attention
TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Sequence to Sequence Models and Attention Encode-Decode Architectures for Machine Translation [Figure from Luong et al.] In Sutskever
More informationNeural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35
Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How
More informationPhrase-Based Statistical Machine Translation with Pivot Languages
Phrase-Based Statistical Machine Translation with Pivot Languages N. Bertoldi, M. Barbaiani, M. Federico, R. Cattoni FBK, Trento - Italy Rovira i Virgili University, Tarragona - Spain October 21st, 2008
More informationWord Attention for Sequence to Sequence Text Understanding
Word Attention for Sequence to Sequence Text Understanding Lijun Wu 1, Fei Tian 2, Li Zhao 2, Jianhuang Lai 1,3 and Tie-Yan Liu 2 1 School of Data and Computer Science, Sun Yat-sen University 2 Microsoft
More informationConditional Language Modeling. Chris Dyer
Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain
More informationTuning as Linear Regression
Tuning as Linear Regression Marzieh Bazrafshan, Tagyoung Chung and Daniel Gildea Department of Computer Science University of Rochester Rochester, NY 14627 Abstract We propose a tuning method for statistical
More informationThe Geometry of Statistical Machine Translation
The Geometry of Statistical Machine Translation Presented by Rory Waite 16th of December 2015 ntroduction Linear Models Convex Geometry The Minkowski Sum Projected MERT Conclusions ntroduction We provide
More informationNEURAL LANGUAGE MODELS
COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,
More informationTALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto
TALP Phrase-Based System and TALP System Combination for the IWSLT 2006 IWSLT 2006, Kyoto Marta R. Costa-jussà, Josep M. Crego, Adrià de Gispert, Patrik Lambert, Maxim Khalilov, José A.R. Fonollosa, José
More informationImproved Learning through Augmenting the Loss
Improved Learning through Augmenting the Loss Hakan Inan inanh@stanford.edu Khashayar Khosravi khosravi@stanford.edu Abstract We present two improvements to the well-known Recurrent Neural Network Language
More informationarxiv: v1 [cs.cl] 21 May 2017
Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we
More informationGenerating Sentences by Editing Prototypes
Generating Sentences by Editing Prototypes K. Guu 2, T.B. Hashimoto 1,2, Y. Oren 1, P. Liang 1,2 1 Department of Computer Science Stanford University 2 Department of Statistics Stanford University arxiv
More informationHow Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures
How Much Attention Do You Need? A Granular Analysis of Neural Machine Translation Architectures Tobias Domhan Amazon Berlin, Germany domhant@amazon.com Abstract With recent advances in network architectures
More informationAlgorithms for NLP. Machine Translation II. Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley
Algorithms for NLP Machine Translation II Taylor Berg-Kirkpatrick CMU Slides: Dan Klein UC Berkeley Announcements Project 4: Word Alignment! Will be released soon! (~Monday) Phrase-Based System Overview
More informationA Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions Wei Lu and Hwee Tou Ng National University of Singapore 1/26 The Task (Logical Form) λx 0.state(x 0
More informationAnalysis of techniques for coarse-to-fine decoding in neural machine translation
Analysis of techniques for coarse-to-fine decoding in neural machine translation Soňa Galovičová E H U N I V E R S I T Y T O H F R G E D I N B U Master of Science by Research School of Informatics University
More informationRecurrent Neural Networks (Part - 2) Sumit Chopra Facebook
Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech
More informationCS224N Final Project: Exploiting the Redundancy in Neural Machine Translation
CS224N Final Project: Exploiting the Redundancy in Neural Machine Translation Abi See Stanford University abisee@stanford.edu Abstract Neural Machine Translation (NMT) has enjoyed success in recent years,
More informationApplications of Deep Learning
Applications of Deep Learning Alpha Go Google Translate Data Center Optimisation Robin Sigurdson, Yvonne Krumbeck, Henrik Arnelid November 23, 2016 Template by Philipp Arndt Applications of Deep Learning
More informationBreaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation
Breaking the Beam Search Curse: A Study of (Re-)Scoring Methods and Stopping Criteria for Neural Machine Translation Yilin Yang 1 Liang Huang 1,2 Mingbo Ma 1,2 1 Oregon State University Corvallis, OR,
More informationCKY-based Convolutional Attention for Neural Machine Translation
CKY-based Convolutional Attention for Neural Machine Translation Taiki Watanabe and Akihiro Tamura and Takashi Ninomiya Ehime University 3 Bunkyo-cho, Matsuyama, Ehime, JAPAN {t_watanabe@ai.cs, tamura@cs,
More informationTriplet Lexicon Models for Statistical Machine Translation
Triplet Lexicon Models for Statistical Machine Translation Saša Hasan, Juri Ganitkevitch, Hermann Ney and Jesús Andrés Ferrer lastname@cs.rwth-aachen.de CLSP Student Seminar February 6, 2009 Human Language
More informationRandom Coattention Forest for Question Answering
Random Coattention Forest for Question Answering Jheng-Hao Chen Stanford University jhenghao@stanford.edu Ting-Po Lee Stanford University tingpo@stanford.edu Yi-Chun Chen Stanford University yichunc@stanford.edu
More informationTracking the World State with Recurrent Entity Networks
Tracking the World State with Recurrent Entity Networks Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes, Yann LeCun Task At each timestep, get information (in the form of a sentence) about the
More informationTopics in Natural Language Processing
Topics in Natural Language Processing Shay Cohen Institute for Language, Cognition and Computation University of Edinburgh Lecture 9 Administrativia Next class will be a summary Please email me questions
More informationCS230: Lecture 10 Sequence models II
CS23: Lecture 1 Sequence models II Today s outline We will learn how to: - Automatically score an NLP model I. BLEU score - Improve Machine II. Beam Search Translation results with Beam search III. Speech
More informationCOMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection
COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:
More informationDeep Learning Sequence to Sequence models: Attention Models. 17 March 2018
Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:
More informationSelf-Attention with Relative Position Representations
Self-Attention with Relative Position Representations Peter Shaw Google petershaw@google.com Jakob Uszkoreit Google Brain usz@google.com Ashish Vaswani Google Brain avaswani@google.com Abstract Relying
More informationModelling Latent Variation in Translation Data
Modelling Latent Variation in Translation Data Wilker Aziz University of Amsterdam (joint work with Philip Schulz and Trevor Cohn) May 25, 2018 Wilker Aziz Stochastic decoder 1 Outline 1 2 3 4 5 Wilker
More informationEdinburgh Research Explorer
Edinburgh Research Explorer Nematus: a Toolkit for Neural Machine Translation Citation for published version: Sennrich R Firat O Cho K Birch-Mayne A Haddow B Hitschler J Junczys-Dowmunt M Läubli S Miceli
More informationPresented By: Omer Shmueli and Sivan Niv
Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan
More informationOverview (Fall 2007) Machine Translation Part III. Roadmap for the Next Few Lectures. Phrase-Based Models. Learning phrases from alignments
Overview Learning phrases from alignments 6.864 (Fall 2007) Machine Translation Part III A phrase-based model Decoding in phrase-based models (Thanks to Philipp Koehn for giving me slides from his EACL
More informationImproving Sequence-to-Sequence Constituency Parsing
Improving Sequence-to-Sequence Constituency Parsing Lemao Liu, Muhua Zhu and Shuming Shi Tencent AI Lab, Shenzhen, China {redmondliu,muhuazhu, shumingshi}@tencent.com Abstract Sequence-to-sequence constituency
More informationConditional Language modeling with attention
Conditional Language modeling with attention 2017.08.25 Oxford Deep NLP 조수현 Review Conditional language model: assign probabilities to sequence of words given some conditioning context x What is the probability
More informationA Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister
A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model
More informationwith Local Dependencies
CS11-747 Neural Networks for NLP Structured Prediction with Local Dependencies Xuezhe Ma (Max) Site https://phontron.com/class/nn4nlp2017/ An Example Structured Prediction Problem: Sequence Labeling Sequence
More informationVariational Attention for Sequence-to-Sequence Models
Variational Attention for Sequence-to-Sequence Models Hareesh Bahuleyan, 1 Lili Mou, 1 Olga Vechtomova, Pascal Poupart University of Waterloo March, 2018 Outline 1 Introduction 2 Background & Motivation
More informationDeep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017
Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion
More informationCHEMICAL NAMES STANDARDIZATION USING NEURAL SEQUENCE TO SEQUENCE MODEL
CHEMICAL NAMES STANDARDIZATION USING NEURAL SEQUENCE TO SEQUENCE MODEL Anonymous authors Paper under double-blind review ABSTRACT Chemical information extraction is to convert chemical knowledge in text
More informationPervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction
Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction Maha Elbayad 1,2 Laurent Besacier 1 Jakob Verbeek 2 Univ. Grenoble Alpes, CNRS, Grenoble INP, Inria, LIG, LJK,
More informationarxiv: v1 [cs.cl] 22 Jun 2015
Neural Transformation Machine: A New Architecture for Sequence-to-Sequence Learning arxiv:1506.06442v1 [cs.cl] 22 Jun 2015 Fandong Meng 1 Zhengdong Lu 2 Zhaopeng Tu 2 Hang Li 2 and Qun Liu 1 1 Institute
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationNatural Language Processing
Natural Language Processing Info 159/259 Lecture 12: Features and hypothesis tests (Oct 3, 2017) David Bamman, UC Berkeley Announcements No office hours for DB this Friday (email if you d like to chat)
More informationMemory-Augmented Attention Model for Scene Text Recognition
Memory-Augmented Attention Model for Scene Text Recognition Cong Wang 1,2, Fei Yin 1,2, Cheng-Lin Liu 1,2,3 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences
More informationarxiv: v2 [cs.cl] 20 Jun 2017
Po-Sen Huang * 1 Chong Wang * 1 Dengyong Zhou 1 Li Deng 2 arxiv:1706.05565v2 [cs.cl] 20 Jun 2017 Abstract In this paper, we propose Neural Phrase-based Machine Translation (NPMT). Our method explicitly
More informationIntroduction to Deep Learning CMPT 733. Steven Bergner
Introduction to Deep Learning CMPT 733 Steven Bergner Overview Renaissance of artificial neural networks Representation learning vs feature engineering Background Linear Algebra, Optimization Regularization
More informationCSC321 Lecture 16: ResNets and Attention
CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the
More informationStructured deep models: Deep learning on graphs and beyond
Structured deep models: Deep learning on graphs and beyond Hidden layer Hidden layer Input Output ReLU ReLU, 25 May 2018 CompBio Seminar, University of Cambridge In collaboration with Ethan Fetaya, Rianne
More informationCross-Lingual Language Modeling for Automatic Speech Recogntion
GBO Presentation Cross-Lingual Language Modeling for Automatic Speech Recogntion November 14, 2003 Woosung Kim woosung@cs.jhu.edu Center for Language and Speech Processing Dept. of Computer Science The
More informationAnalysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation
Analysing Soft Syntax Features and Heuristics for Hierarchical Phrase Based Machine Translation David Vilar, Daniel Stein, Hermann Ney IWSLT 2008, Honolulu, Hawaii 20. October 2008 Human Language Technology
More informationA Convolutional Neural Network-based
A Convolutional Neural Network-based Model for Knowledge Base Completion Dat Quoc Nguyen Joint work with: Dai Quoc Nguyen, Tu Dinh Nguyen and Dinh Phung April 16, 2018 Introduction Word vectors learned
More informationDeep learning for Natural Language Processing and Machine Translation
Deep learning for Natural Language Processing and Machine Translation 2015.10.16 Seung-Hoon Na Contents Introduction: Neural network, deep learning Deep learning for Natural language processing Neural
More informationAttention Based Joint Model with Negative Sampling for New Slot Values Recognition. By: Mulan Hou
Attention Based Joint Model with Negative Sampling for New Slot Values Recognition By: Mulan Hou houmulan@bupt.edu.cn CONTE NTS 1 2 3 4 5 6 Introduction Related work Motivation Proposed model Experiments
More informationCSC321 Lecture 20: Reversible and Autoregressive Models
CSC321 Lecture 20: Reversible and Autoregressive Models Roger Grosse Roger Grosse CSC321 Lecture 20: Reversible and Autoregressive Models 1 / 23 Overview Four modern approaches to generative modeling:
More informationApplications of Tree Automata Theory Lecture VI: Back to Machine Translation
Applications of Tree Automata Theory Lecture VI: Back to Machine Translation Andreas Maletti Institute of Computer Science Universität Leipzig, Germany on leave from: Institute for Natural Language Processing
More informationMinimum Error Rate Training Semiring
Minimum Error Rate Training Semiring Artem Sokolov & François Yvon LIMSI-CNRS & LIMSI-CNRS/Univ. Paris Sud {artem.sokolov,francois.yvon}@limsi.fr EAMT 2011 31 May 2011 Artem Sokolov & François Yvon (LIMSI)
More informationFast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation
Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation Joern Wuebker, Hermann Ney Human Language Technology and Pattern Recognition Group Computer Science
More informationtext classification 3: neural networks
text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University
More informationSequence Modeling with Neural Networks
Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes
More informationFIRST EXPERIMENTS WITH NEURAL TRANSLATION
FIRST EXPERIMENTS WITH NEURAL TRANSLATION OF INFORMAL MATHEMATICS TO FORMAL Qingxiang Wang Cezary Kaliszyk Josef Urban University of Innsbruck Czech Technical University in Prague ICMS 2018 July 25, 2018
More informationNeural'Machine'Transla/on''
NeuralMachineTransla/on ThangLuong Lecture@CS224N (ThankstoChrisManning,AbigailSee,and RussellStewartforcommentsanddiscussions.) Sixiroastedhusband MeatMuscleStupid BeanSprouts Let sbacktrack (FromChrisManning)
More informationMulti-Task Word Alignment Triangulation for Low-Resource Languages
Multi-Task Word Alignment Triangulation for Low-Resource Languages Tomer Levinboim and David Chiang Department of Computer Science and Engineering University of Notre Dame {levinboim.1,dchiang}@nd.edu
More informationTrajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition
Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition Jianshu Zhang, Yixing Zhu, Jun Du and Lirong Dai National Engineering Laboratory for Speech and Language Information
More informationRecurrent Neural Networks with Flexible Gates using Kernel Activation Functions
2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors: S. Scardapane, S. Van Vaerenbergh,
More informationRecap: Language models. Foundations of Natural Language Processing Lecture 4 Language Models: Evaluation and Smoothing. Two types of evaluation in NLP
Recap: Language models Foundations of atural Language Processing Lecture 4 Language Models: Evaluation and Smoothing Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipp
More informationarxiv: v2 [cs.cl] 1 Jan 2019
Variational Self-attention Model for Sentence Representation arxiv:1812.11559v2 [cs.cl] 1 Jan 2019 Qiang Zhang 1, Shangsong Liang 2, Emine Yilmaz 1 1 University College London, London, United Kingdom 2
More informationCOMS 4705, Fall Machine Translation Part III
COMS 4705, Fall 2011 Machine Translation Part III 1 Roadmap for the Next Few Lectures Lecture 1 (last time): IBM Models 1 and 2 Lecture 2 (today): phrase-based models Lecture 3: Syntax in statistical machine
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationarxiv: v1 [cs.cl] 30 Oct 2018
Journal of Computer Science and Cybernetics, V.xx, N.x (20xx), 1 DOI no. 10.15625/1813-9663/xx/x/xxxx NEURAL MACHINE TRANSLATION BETWEEN VIETNAMESE AND ENGLISH: AN EMPIRICAL STUDY Hong-Hai Phan-Vu 1, Viet-Trung
More informationSemantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing
Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding
More informationBidirectional Attention for SQL Generation. Tong Guo 1 Huilin Gao 2. Beijing, China
Bidirectional Attention for SQL Generation Tong Guo 1 Huilin Gao 2 1 Lenovo AI Lab 2 China Electronic Technology Group Corporation Information Science Academy, Beijing, China arxiv:1801.00076v6 [cs.cl]
More informationCS230: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention
CS23: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention Today s outline We will learn how to: I. Word Vector Representation i. Training - Generalize results with word vectors -
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Machine Translation. Uwe Dick
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Machine Translation Uwe Dick Google Translate Rosetta Stone Hieroglyphs Demotic Greek Machine Translation Automatically translate
More informationDiscriminative Training
Discriminative Training February 19, 2013 Noisy Channels Again p(e) source English Noisy Channels Again p(e) p(g e) source English German Noisy Channels Again p(e) p(g e) source English German decoder
More informationStephen Scott.
1 / 35 (Adapted from Vinod Variyam and Ian Goodfellow) sscott@cse.unl.edu 2 / 35 All our architectures so far work on fixed-sized inputs neural networks work on sequences of inputs E.g., text, biological
More informationExtractive Question Answering Using Match-LSTM and Answer Pointer
Extractive Question Answering Using Match-LSTM and Answer Pointer Alexander Haigh, Cameron van de Graaf, Jake Rachleff Department of Computer Science Stanford University Stanford, CA 94305 Codalab username:
More informationLanguage Model Rest Costs and Space-Efficient Storage
Language Model Rest Costs and Space-Efficient Storage Kenneth Heafield Philipp Koehn Alon Lavie Carnegie Mellon, University of Edinburgh July 14, 2012 Complaint About Language Models Make Search Expensive
More informationLEARNING SPARSE STRUCTURED ENSEMBLES WITH STOCASTIC GTADIENT MCMC SAMPLING AND NETWORK PRUNING
LEARNING SPARSE STRUCTURED ENSEMBLES WITH STOCASTIC GTADIENT MCMC SAMPLING AND NETWORK PRUNING Yichi Zhang Zhijian Ou Speech Processing and Machine Intelligence (SPMI) Lab Department of Electronic Engineering
More informationA Discriminative Model for Semantics-to-String Translation
A Discriminative Model for Semantics-to-String Translation Aleš Tamchyna 1 and Chris Quirk 2 and Michel Galley 2 1 Charles University in Prague 2 Microsoft Research July 30, 2015 Tamchyna, Quirk, Galley
More informationDeep Neural Machine Translation with Linear Associative Unit
Deep Neural Machine Translation with Linear Associative Unit Mingxuan Wang 1 Zhengdong Lu 2 Jie Zhou 2 Qun Liu 4,5 1 Mobile Internet Group, Tencent Technology Co., Ltd wangmingxuan@ict.ac.cn 2 DeeplyCurious.ai
More informationA TIME-RESTRICTED SELF-ATTENTION LAYER FOR ASR
A TIME-RESTRICTED SELF-ATTENTION LAYER FOR ASR Daniel Povey,2, Hossein Hadian, Pegah Ghahremani, Ke Li, Sanjeev Khudanpur,2 Center for Language and Speech Processing, Johns Hopkins University, Baltimore,
More informationstatistical machine translation
statistical machine translation P A R T 3 : D E C O D I N G & E V A L U A T I O N CSC401/2511 Natural Language Computing Spring 2019 Lecture 6 Frank Rudzicz and Chloé Pou-Prom 1 University of Toronto Statistical
More informationTowards Binary-Valued Gates for Robust LSTM Training
Zhuohan Li 1 Di He 1 Fei Tian 2 Wei Chen 2 Tao Qin 2 Liwei Wang 1 3 Tie-Yan Liu 2 Abstract Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling. It aims
More informationNeural Architectures for Image, Language, and Speech Processing
Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent
More informationMachine Learning Basics
Security and Fairness of Deep Learning Machine Learning Basics Anupam Datta CMU Spring 2019 Image Classification Image Classification Image classification pipeline Input: A training set of N images, each
More informationNatural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale
Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale Noah Smith c 2017 University of Washington nasmith@cs.washington.edu May 22, 2017 1 / 30 To-Do List Online
More informationGenerative Models for Sentences
Generative Models for Sentences Amjad Almahairi PhD student August 16 th 2014 Outline 1. Motivation Language modelling Full Sentence Embeddings 2. Approach Bayesian Networks Variational Autoencoders (VAE)
More informationarxiv: v1 [cs.cl] 22 Jun 2017
Neural Machine Translation with Gumbel-Greedy Decoding Jiatao Gu, Daniel Jiwoong Im, and Victor O.K. Li The University of Hong Kong AIFounded Inc. {jiataogu, vli}@eee.hku.hk daniel.im@aifounded.com arxiv:1706.07518v1
More information