CS230: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention

Size: px
Start display at page:

Download "CS230: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention"

Transcription

1 CS23: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention

2 Today s outline We will learn how to: I. Word Vector Representation i. Training - Generalize results with word vectors - Augment a RNN with ii. Operations iii. Applications: debasing / restaurant reviews Attention mechanisms II. Attention i. Machine Translation ii. Image Captioning

3 Word Vector Representation vocabulary one-hot representation word-vector representation "a". V eng = "abs" " football" "zone" "zoo" v football = e football = How do you get the vector representation?

4 Word Vector Representation: Training v football =. Data? window Stanford is going to beat Cal next week target p("a" " football" p("abs" " football" p("abyss" " football" p("zebra" " football" p("zone" " football" p("zoo" " football" =.38 =.26 =. =.3 =.2 =.5 Target word (x Stanford is is going going loss? e "a" L = Nearby word (y is Stanford going is to ylog(ŷ z [] = W [] v + b [] e "abs" one-hot " e "zone" shape = (2, V e "zoo"

5 e "a" e "abs" " e "zone" e "zoo" = e football W [] v football = Word Vector Representation: Embedding matrix

6 Word Vector Representation: Visualization and Operations Scatter plot of Word vectors Operations on vectors France dog cat lion zoo parc Vegas casino Paris Italia Rome Paris baguette goal ball football maths deep-learning computer-science man king = women x x = queen Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean: Distributed Representations of Words and Phrases and their Compositionality Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean: Efficient Estimation of Word Representations in Vector Space Laurens van der Maaten, Geoffrey Hinton: Visualizing Data using t-sne

7 Word Vector Representation: bias Sexist bias man king = women x x = queen but also man - woman = computer programmer - homemaker Tolga Bolukbasi et al. : Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

8 Word Vector Representation: Application Sentiment analysis on restaurant reviews. Data? 5 reviews Review (x Label (Negative/ Tonight was awesome Worst entrée ever.. 2. Architecture? Tonight v "tonight" =.2 e "tonight" =.32 was v "was" =.75 e "was" =. AVERAGE e σ (We + b >.5.74 awesome v "awesome" =.54 e "awesome" =.3 Generalizes very well because of word vector representations 3. Loss? L = ylog(ŷ + ( ylog( ŷ

9 Word Vector Representation: Application To remember: - In NLP, Words are often represented by meaningful vectors - These vectors are trained thanks to a Neural Network - We can do operations on these vectors - They can be biased, depending on the dataset used to train them - They have a great generalization power

10 Attention: Motivation Neural Machine Translation is in Encoder Decoder h h h 2 h 2 h 3 h 3 the kitchen <eos> h 4 h 5 h 4 h 5 s..2 s 2..2 dans s 3..2 la s dans cuisine..2 <eos> s 5 s 6 la cuisine.32.64

11 Attention: Motivation Neural Machine Translation Encoder Decoder h h h 2 h 2 h 3 h h 4 h 5 h 4 h 5 s..2 s 2..2 dans s 3..2 la s dans cuisine..2 <eos> s 5 s 6 la cuisine is in the kitchen <eos> Inverting input works better?

12 Neural Machine Translation with Attention: Problems & Ideas Some problems: - The encoder encodes all information in the source sentence into a fixed length vector - While it seems that some specific parts of the source sentence are more useful to predict some parts of the output sentence - Bad performances on long sentences Ideas: - We d like to spread the information encoded from the source sentence and selectively retrieve the relevant parts at each prediction of the output sentence - Why don t we use every hidden state hj from the encoding part?

13 Attention: Motivation h h h 2 h 2 h 3 h h 4 h 5 h 4 h 5 s..2 s 2..2 dans s 3..2 la s dans cuisine..2 <eos> s 5 s 6 la cuisine is in the kitchen <eos> c = α, j j= 6 h j

14 Attention: Motivation dans la cuisine <eos> generated at each step during decoding c = 6 α, j h j j= h h 2 h 3 h 4 h 5 s s 2 s 3 s 4 s 5 s 6 h h 2 h 3 h 4 h h 4 = contains information about the input sentence up to is with a stronger focus on the parts closer to is α 2,3 = f (s,h 3 y 2 x 3 = probability that the target word ( is translated from source word ( in (= score that went through a function c = the expected annotation over all annotations with probabilities α, j kitchen the in is <eos>

15 Neural Machine Translation with Attention: Architecture..2 c i = the expected annotation over all annotations with probabilities α i, j = f (s i,h j α i, j s c Attention mechanism h h 2 h 3 h 4 h 5.

16 Neural Machine Translation with Attention: Architecture c i = the expected annotation over all annotations with probabilities α i, j α i, j = f (s i,h j s s 2 c 2 s Attention mechanism h h 2 h 3 h 4 h 5.

17 Neural Machine Translation with Attention: Architecture dans c i = the expected annotation over all annotations with probabilities α i, j α i, j = f (s i,h j s s 2 s 3 c 3 s 2 Attention mechanism h h 2 h 3 h 4 h 5.

18 Neural Machine Translation with Attention: Architecture dans la s s 2 s 3 s 4 c 4 s 3 Attention mechanism h h 2 h 3 h 4 h 5.

19 Neural Machine Translation with Attention: Architecture dans la How to train this? s s 2 s 3 s 4 c s c 2 s 2 c 3 s 3 c 4 same as Machine Translation but takes also derivatives with respect to attention parameters Attention mechanism Attention mechanism Attention mechanism Attention mechanism h h 2 h 3 h 4 h 5.

20 Neural Machine Translation with Attention: Training What are the parameters? Encoder Attention Decoder W [] = EmbeddingMatrix LSMT: f t = σ (W f [h t, x t ]+ b f i t = σ (W i [h t, x t ]+ b i C ~ t = tanh(w C [h t, x t ]+ b C C t = f t C t + i t C ~ t o t = σ (W o [h t, x t ]+ b o c i = T x j= α ij h j α ij = exp(e ij T x k= exp(e ik e ij = v a T tanh(w a s i +U a h j LSMT: f t = σ (W f [h t, x t ]+ b f i t = σ (W i [h t, x t ]+ b i C ~ t = tanh(w C [h t, x t ]+ b C C t = f t C t + i t C ~ t o t = σ (W o [h t, x t ]+ b o h t = o t tanh(c t h t = o t tanh(c t batch t output Loss function: L = log P(y x;θ + λ ( α ti i patch

21 Neural Machine Translation with Attention: Training Visualizing Attention Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio : Neural Machine Translation by Jointly Learning to Align and Translate

22 Image Captioning with Attention Neural Machine Translation is in the kitchen <eos> dans..2 la..2 cuisine..2 <eos> Encoder Decoder

23 Image Captioning with Attention dans la cuisine <eos> Image Captioning Encoder image Decoder

24 Image Captioning with Attention a bird flying over a. Image Captioning with no attention Encoder CNN image Decoder Vinyals et al. : Show and Tell: A Neural Image Caption Generator

25 Image Captioning with Attention Image Captioning with attention A bird is flying. Encoder CNN {a,a2,...,al}. s s s 2 3 s 4 average. c s 2 s 3 s c c c Attention mechanism Attention mechanism Attention mechanism Attention mechanism. a a 2 a 3 a 4 a 5 a 6 image Decoder Vinyals et al. : Show and Tell: A Neural Image Caption Generator Kelvin Xu et al. : Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018 Sequence Models Ji Yang Department of Computing Science, University of Alberta February 14, 2018 This is a note mainly based on Prof. Andrew Ng s MOOC Sequential Models. I also include materials (equations,

More information

Conditional Language modeling with attention

Conditional Language modeling with attention Conditional Language modeling with attention 2017.08.25 Oxford Deep NLP 조수현 Review Conditional language model: assign probabilities to sequence of words given some conditioning context x What is the probability

More information

a) b) (Natural Language Processing; NLP) (Deep Learning) Bag of words White House RGB [1] IBM

a) b) (Natural Language Processing; NLP) (Deep Learning) Bag of words White House RGB [1] IBM c 1. (Natural Language Processing; NLP) (Deep Learning) RGB IBM 135 8511 5 6 52 yutat@jp.ibm.com a) b) 2. 1 0 2 1 Bag of words White House 2 [1] 2015 4 Copyright c by ORSJ. Unauthorized reproduction of

More information

arxiv: v1 [cs.cl] 21 May 2017

arxiv: v1 [cs.cl] 21 May 2017 Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we

More information

CSC321 Lecture 15: Recurrent Neural Networks

CSC321 Lecture 15: Recurrent Neural Networks CSC321 Lecture 15: Recurrent Neural Networks Roger Grosse Roger Grosse CSC321 Lecture 15: Recurrent Neural Networks 1 / 26 Overview Sometimes we re interested in predicting sequences Speech-to-text and

More information

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018 Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:

More information

High Order LSTM/GRU. Wenjie Luo. January 19, 2016

High Order LSTM/GRU. Wenjie Luo. January 19, 2016 High Order LSTM/GRU Wenjie Luo January 19, 2016 1 Introduction RNN is a powerful model for sequence data but suffers from gradient vanishing and explosion, thus difficult to be trained to capture long

More information

An overview of word2vec

An overview of word2vec An overview of word2vec Benjamin Wilson Berlin ML Meetup, July 8 2014 Benjamin Wilson word2vec Berlin ML Meetup 1 / 25 Outline 1 Introduction 2 Background & Significance 3 Architecture 4 CBOW word representations

More information

Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST

Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST 1 Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST Summary We have shown: Now First order optimization methods: GD (BP), SGD, Nesterov, Adagrad, ADAM, RMSPROP, etc. Second

More information

Improved Learning through Augmenting the Loss

Improved Learning through Augmenting the Loss Improved Learning through Augmenting the Loss Hakan Inan inanh@stanford.edu Khashayar Khosravi khosravi@stanford.edu Abstract We present two improvements to the well-known Recurrent Neural Network Language

More information

Data Mining & Machine Learning

Data Mining & Machine Learning Data Mining & Machine Learning CS57300 Purdue University April 10, 2018 1 Predicting Sequences 2 But first, a detour to Noise Contrastive Estimation 3 } Machine learning methods are much better at classifying

More information

Introduction to RNNs!

Introduction to RNNs! Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed Outline Why Recurrent Neural Networks (RNNs)? The Vanilla RNN unit The RNN forward pass Backpropagation refresher The RNN

More information

CSC321 Lecture 16: ResNets and Attention

CSC321 Lecture 16: ResNets and Attention CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the

More information

arxiv: v2 [cs.cl] 1 Jan 2019

arxiv: v2 [cs.cl] 1 Jan 2019 Variational Self-attention Model for Sentence Representation arxiv:1812.11559v2 [cs.cl] 1 Jan 2019 Qiang Zhang 1, Shangsong Liang 2, Emine Yilmaz 1 1 University College London, London, United Kingdom 2

More information

CME323 Distributed Algorithms and Optimization. GloVe on Spark. Alex Adamson SUNet ID: aadamson. June 6, 2016

CME323 Distributed Algorithms and Optimization. GloVe on Spark. Alex Adamson SUNet ID: aadamson. June 6, 2016 GloVe on Spark Alex Adamson SUNet ID: aadamson June 6, 2016 Introduction Pennington et al. proposes a novel word representation algorithm called GloVe (Global Vectors for Word Representation) that synthesizes

More information

Deep learning for Natural Language Processing and Machine Translation

Deep learning for Natural Language Processing and Machine Translation Deep learning for Natural Language Processing and Machine Translation 2015.10.16 Seung-Hoon Na Contents Introduction: Neural network, deep learning Deep learning for Natural language processing Neural

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Info 59/259 Lecture 4: Text classification 3 (Sept 5, 207) David Bamman, UC Berkeley . https://www.forbes.com/sites/kevinmurnane/206/04/0/what-is-deep-learning-and-how-is-it-useful

More information

Combining Static and Dynamic Information for Clinical Event Prediction

Combining Static and Dynamic Information for Clinical Event Prediction Combining Static and Dynamic Information for Clinical Event Prediction Cristóbal Esteban 1, Antonio Artés 2, Yinchong Yang 1, Oliver Staeck 3, Enrique Baca-García 4 and Volker Tresp 1 1 Siemens AG and

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Deep Learning for NLP Part 2

Deep Learning for NLP Part 2 Deep Learning for NLP Part 2 CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) 2 Part 1.3: The Basics Word Representations The

More information

Sequence Modeling with Neural Networks

Sequence Modeling with Neural Networks Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes

More information

CSC321 Lecture 10 Training RNNs

CSC321 Lecture 10 Training RNNs CSC321 Lecture 10 Training RNNs Roger Grosse and Nitish Srivastava February 23, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 10 Training RNNs February 23, 2015 1 / 18 Overview Last time, we saw

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Sequence to Sequence Models and Attention

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Sequence to Sequence Models and Attention TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Sequence to Sequence Models and Attention Encode-Decode Architectures for Machine Translation [Figure from Luong et al.] In Sutskever

More information

DISTRIBUTIONAL SEMANTICS

DISTRIBUTIONAL SEMANTICS COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.

More information

Natural Language Understanding. Kyunghyun Cho, NYU & U. Montreal

Natural Language Understanding. Kyunghyun Cho, NYU & U. Montreal Natural Language Understanding Kyunghyun Cho, NYU & U. Montreal 2 Machine Translation NEURAL MACHINE TRANSLATION 3 Topics: Statistical Machine Translation log p(f e) =log p(e f) + log p(f) f = (La, croissance,

More information

Random Coattention Forest for Question Answering

Random Coattention Forest for Question Answering Random Coattention Forest for Question Answering Jheng-Hao Chen Stanford University jhenghao@stanford.edu Ting-Po Lee Stanford University tingpo@stanford.edu Yi-Chun Chen Stanford University yichunc@stanford.edu

More information

Based on the original slides of Hung-yi Lee

Based on the original slides of Hung-yi Lee Based on the original slides of Hung-yi Lee New Activation Function Rectified Linear Unit (ReLU) σ z a a = z Reason: 1. Fast to compute 2. Biological reason a = 0 [Xavier Glorot, AISTATS 11] [Andrew L.

More information

A QUESTION ANSWERING SYSTEM USING ENCODER-DECODER, SEQUENCE-TO-SEQUENCE, RECURRENT NEURAL NETWORKS. A Project. Presented to

A QUESTION ANSWERING SYSTEM USING ENCODER-DECODER, SEQUENCE-TO-SEQUENCE, RECURRENT NEURAL NETWORKS. A Project. Presented to A QUESTION ANSWERING SYSTEM USING ENCODER-DECODER, SEQUENCE-TO-SEQUENCE, RECURRENT NEURAL NETWORKS A Project Presented to The Faculty of the Department of Computer Science San José State University In

More information

A Survey of Techniques for Sentiment Analysis in Movie Reviews and Deep Stochastic Recurrent Nets

A Survey of Techniques for Sentiment Analysis in Movie Reviews and Deep Stochastic Recurrent Nets A Survey of Techniques for Sentiment Analysis in Movie Reviews and Deep Stochastic Recurrent Nets Chase Lochmiller Department of Computer Science Stanford University cjloch11@stanford.edu Abstract In this

More information

Neural Architectures for Image, Language, and Speech Processing

Neural Architectures for Image, Language, and Speech Processing Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent

More information

word2vec Parameter Learning Explained

word2vec Parameter Learning Explained word2vec Parameter Learning Explained Xin Rong ronxin@umich.edu Abstract The word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent two years. The vector

More information

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 10: Neural Language Models Princeton University COS 495 Instructor: Yingyu Liang Natural language Processing (NLP) The processing of the human languages by computers One of

More information

Deep Learning for Natural Language Processing

Deep Learning for Natural Language Processing Deep Learning for Natural Language Processing Dylan Drover, Borui Ye, Jie Peng University of Waterloo djdrover@uwaterloo.ca borui.ye@uwaterloo.ca July 8, 2015 Dylan Drover, Borui Ye, Jie Peng (University

More information

Homework 3 COMS 4705 Fall 2017 Prof. Kathleen McKeown

Homework 3 COMS 4705 Fall 2017 Prof. Kathleen McKeown Homework 3 COMS 4705 Fall 017 Prof. Kathleen McKeown The assignment consists of a programming part and a written part. For the programming part, make sure you have set up the development environment as

More information

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions? Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

Neural Networks for NLP. COMP-599 Nov 30, 2016

Neural Networks for NLP. COMP-599 Nov 30, 2016 Neural Networks for NLP COMP-599 Nov 30, 2016 Outline Neural networks and deep learning: introduction Feedforward neural networks word2vec Complex neural network architectures Convolutional neural networks

More information

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017 Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion

More information

CS230: Lecture 10 Sequence models II

CS230: Lecture 10 Sequence models II CS23: Lecture 1 Sequence models II Today s outline We will learn how to: - Automatically score an NLP model I. BLEU score - Improve Machine II. Beam Search Translation results with Beam search III. Speech

More information

Text2Action: Generative Adversarial Synthesis from Language to Action

Text2Action: Generative Adversarial Synthesis from Language to Action Text2Action: Generative Adversarial Synthesis from Language to Action Hyemin Ahn, Timothy Ha*, Yunho Choi*, Hwiyeon Yoo*, and Songhwai Oh Abstract In this paper, we propose a generative model which learns

More information

Lecture 15: Recurrent Neural Nets

Lecture 15: Recurrent Neural Nets Lecture 15: Recurrent Neural Nets Roger Grosse 1 Introduction Most of the prediction tasks we ve looked at have involved pretty simple kinds of outputs, such as real values or discrete categories. But

More information

EE-559 Deep learning LSTM and GRU

EE-559 Deep learning LSTM and GRU EE-559 Deep learning 11.2. LSTM and GRU François Fleuret https://fleuret.org/ee559/ Mon Feb 18 13:33:24 UTC 2019 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE The Long-Short Term Memory unit (LSTM) by Hochreiter

More information

GloVe: Global Vectors for Word Representation 1

GloVe: Global Vectors for Word Representation 1 GloVe: Global Vectors for Word Representation 1 J. Pennington, R. Socher, C.D. Manning M. Korniyenko, S. Samson Deep Learning for NLP, 13 Jun 2017 1 https://nlp.stanford.edu/projects/glove/ Outline Background

More information

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic

More information

arxiv: v3 [cs.cl] 30 Jan 2016

arxiv: v3 [cs.cl] 30 Jan 2016 word2vec Parameter Learning Explained Xin Rong ronxin@umich.edu arxiv:1411.2738v3 [cs.cl] 30 Jan 2016 Abstract The word2vec model and application by Mikolov et al. have attracted a great amount of attention

More information

RaRE: Social Rank Regulated Large-scale Network Embedding

RaRE: Social Rank Regulated Large-scale Network Embedding RaRE: Social Rank Regulated Large-scale Network Embedding Authors: Yupeng Gu 1, Yizhou Sun 1, Yanen Li 2, Yang Yang 3 04/26/2018 The Web Conference, 2018 1 University of California, Los Angeles 2 Snapchat

More information

Notes on Deep Learning for NLP

Notes on Deep Learning for NLP Notes on Deep Learning for NLP Antoine J.-P. Tixier Computer Science Department (DaSciM team) École Polytechnique, Palaiseau, France antoine.tixier-1@colorado.edu Last updated Thursday 30 th August, 2018

More information

Continuous Space Language Model(NNLM) Liu Rong Intern students of CSLT

Continuous Space Language Model(NNLM) Liu Rong Intern students of CSLT Continuous Space Language Model(NNLM) Liu Rong Intern students of CSLT 2013-12-30 Outline N-gram Introduction data sparsity and smooth NNLM Introduction Multi NNLMs Toolkit Word2vec(Deep learing in NLP)

More information

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the

More information

Deep Learning for NLP

Deep Learning for NLP Deep Learning for NLP Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Greg Durrett Outline Motivation for neural networks Feedforward neural networks Applying feedforward neural networks

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Vinod Variyam and Ian Goodfellow) sscott@cse.unl.edu 2 / 35 All our architectures so far work on fixed-sized inputs neural networks work on sequences of inputs E.g., text, biological

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

Conditional Language Modeling. Chris Dyer

Conditional Language Modeling. Chris Dyer Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain

More information

Recurrent Neural Network

Recurrent Neural Network Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Word2Vec Embedding. Embedding. Word Embedding 1.1 BEDORE. Word Embedding. 1.2 Embedding. Word Embedding. Embedding.

Word2Vec Embedding. Embedding. Word Embedding 1.1 BEDORE. Word Embedding. 1.2 Embedding. Word Embedding. Embedding. c Word Embedding Embedding Word2Vec Embedding Word EmbeddingWord2Vec 1. Embedding 1.1 BEDORE 0 1 BEDORE 113 0033 2 35 10 4F y katayama@bedore.jp Word Embedding Embedding 1.2 Embedding Embedding Word Embedding

More information

Interpreting Deep Classifiers

Interpreting Deep Classifiers Ruprecht-Karls-University Heidelberg Faculty of Mathematics and Computer Science Seminar: Explainable Machine Learning Interpreting Deep Classifiers by Visual Distillation of Dark Knowledge Author: Daniela

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Learning to translate with neural networks. Michael Auli

Learning to translate with neural networks. Michael Auli Learning to translate with neural networks Michael Auli 1 Neural networks for text processing Similar words near each other France Spain dog cat Neural networks for text processing Similar words near each

More information

Deep Learning For Mathematical Functions

Deep Learning For Mathematical Functions 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Machine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26

Machine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26 Machine Translation 10: Advanced Neural Machine Translation Architectures Rico Sennrich University of Edinburgh R. Sennrich MT 2018 10 1 / 26 Today s Lecture so far today we discussed RNNs as encoder and

More information

CSC321 Lecture 15: Exploding and Vanishing Gradients

CSC321 Lecture 15: Exploding and Vanishing Gradients CSC321 Lecture 15: Exploding and Vanishing Gradients Roger Grosse Roger Grosse CSC321 Lecture 15: Exploding and Vanishing Gradients 1 / 23 Overview Yesterday, we saw how to compute the gradient descent

More information

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing

Semantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding

More information

Self-Attention with Relative Position Representations

Self-Attention with Relative Position Representations Self-Attention with Relative Position Representations Peter Shaw Google petershaw@google.com Jakob Uszkoreit Google Brain usz@google.com Ashish Vaswani Google Brain avaswani@google.com Abstract Relying

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Recurrent Residual Learning for Sequence Classification

Recurrent Residual Learning for Sequence Classification Recurrent Residual Learning for Sequence Classification Yiren Wang University of Illinois at Urbana-Champaign yiren@illinois.edu Fei ian Microsoft Research fetia@microsoft.com Abstract In this paper, we

More information

Natural Language Processing and Recurrent Neural Networks

Natural Language Processing and Recurrent Neural Networks Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information

More information

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale

Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale Natural Language Processing (CSEP 517): Machine Translation (Continued), Summarization, & Finale Noah Smith c 2017 University of Washington nasmith@cs.washington.edu May 22, 2017 1 / 30 To-Do List Online

More information

Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition

Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition Jianshu Zhang, Yixing Zhu, Jun Du and Lirong Dai National Engineering Laboratory for Speech and Language Information

More information

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves Recurrent Neural Networks Deep Learning Lecture 5 Efstratios Gavves Sequential Data So far, all tasks assumed stationary data Neither all data, nor all tasks are stationary though Sequential Data: Text

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Gated RNN & Sequence Generation. Hung-yi Lee 李宏毅

Gated RNN & Sequence Generation. Hung-yi Lee 李宏毅 Gated RNN & Sequence Generation Hung-yi Lee 李宏毅 Outline RNN with Gated Mechanism Sequence Generation Conditional Sequence Generation Tips for Generation RNN with Gated Mechanism Recurrent Neural Network

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Deep Q-Learning with Recurrent Neural Networks

Deep Q-Learning with Recurrent Neural Networks Deep Q-Learning with Recurrent Neural Networks Clare Chen cchen9@stanford.edu Vincent Ying vincenthying@stanford.edu Dillon Laird dalaird@cs.stanford.edu Abstract Deep reinforcement learning models have

More information

Diversifying Neural Conversation Model with Maximal Marginal Relevance

Diversifying Neural Conversation Model with Maximal Marginal Relevance Diversifying Neural Conversation Model with Maximal Marginal Relevance Yiping Song, 1 Zhiliang Tian, 2 Dongyan Zhao, 2 Ming Zhang, 1 Rui Yan 2 Institute of Network Computing and Information Systems, Peking

More information

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018

Regularization Introduction to Machine Learning. Matt Gormley Lecture 10 Feb. 19, 2018 1-61 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Regularization Matt Gormley Lecture 1 Feb. 19, 218 1 Reminders Homework 4: Logistic

More information

Multi-Source Neural Translation

Multi-Source Neural Translation Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu In the neural encoder-decoder

More information

Better Conditional Language Modeling. Chris Dyer

Better Conditional Language Modeling. Chris Dyer Better Conditional Language Modeling Chris Dyer Conditional LMs A conditional language model assigns probabilities to sequences of words, w =(w 1,w 2,...,w`), given some conditioning context, x. As with

More information

Deep Learning and Lexical, Syntactic and Semantic Analysis. Wanxiang Che and Yue Zhang

Deep Learning and Lexical, Syntactic and Semantic Analysis. Wanxiang Che and Yue Zhang Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10 Part 2: Introduction to Deep Learning Part 2.1: Deep Learning Background What is Machine Learning? From Data

More information

Edinburgh Research Explorer

Edinburgh Research Explorer Edinburgh Research Explorer Nematus: a Toolkit for Neural Machine Translation Citation for published version: Sennrich R Firat O Cho K Birch-Mayne A Haddow B Hitschler J Junczys-Dowmunt M Läubli S Miceli

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»

More information

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture Neural Networks for Machine Learning Lecture 2a An overview of the main types of neural network architecture Geoffrey Hinton with Nitish Srivastava Kevin Swersky Feed-forward neural networks These are

More information

Towards Universal Sentence Embeddings

Towards Universal Sentence Embeddings Towards Universal Sentence Embeddings Towards Universal Paraphrastic Sentence Embeddings J. Wieting, M. Bansal, K. Gimpel and K. Livescu, ICLR 2016 A Simple But Tough-To-Beat Baseline For Sentence Embeddings

More information

Implicitly-Defined Neural Networks for Sequence Labeling

Implicitly-Defined Neural Networks for Sequence Labeling Implicitly-Defined Neural Networks for Sequence Labeling Michaeel Kazi MIT Lincoln Laboratory 244 Wood St, Lexington, MA, 02420, USA michaeel.kazi@ll.mit.edu Abstract We relax the causality assumption

More information

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35 Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How

More information

APPLIED DEEP LEARNING PROF ALEXIEI DINGLI

APPLIED DEEP LEARNING PROF ALEXIEI DINGLI APPLIED DEEP LEARNING PROF ALEXIEI DINGLI TECH NEWS TECH NEWS HOW TO DO IT? TECH NEWS APPLICATIONS TECH NEWS TECH NEWS NEURAL NETWORKS Interconnected set of nodes and edges Designed to perform complex

More information

Lecture 15: Exploding and Vanishing Gradients

Lecture 15: Exploding and Vanishing Gradients Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train

More information

Bayesian Networks (Part I)

Bayesian Networks (Part I) 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part I) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,

More information

Structured Neural Networks (I)

Structured Neural Networks (I) Structured Neural Networks (I) CS 690N, Spring 208 Advanced Natural Language Processing http://peoplecsumassedu/~brenocon/anlp208/ Brendan O Connor College of Information and Computer Sciences University

More information

Lecture 6: Neural Networks for Representing Word Meaning

Lecture 6: Neural Networks for Representing Word Meaning Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

CS224N: Natural Language Processing with Deep Learning Winter 2018 Midterm Exam

CS224N: Natural Language Processing with Deep Learning Winter 2018 Midterm Exam CS224N: Natural Language Processing with Deep Learning Winter 2018 Midterm Exam This examination consists of 17 printed sides, 5 questions, and 100 points. The exam accounts for 20% of your total grade.

More information

Feature Design. Feature Design. Feature Design. & Deep Learning

Feature Design. Feature Design. Feature Design. & Deep Learning Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately

More information

Neural Networks Language Models

Neural Networks Language Models Neural Networks Language Models Philipp Koehn 10 October 2017 N-Gram Backoff Language Model 1 Previously, we approximated... by applying the chain rule p(w ) = p(w 1, w 2,..., w n ) p(w ) = i p(w i w 1,...,

More information

NEURAL LANGUAGE MODELS

NEURAL LANGUAGE MODELS COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,

More information

Recurrent Neural Networks

Recurrent Neural Networks Charu C. Aggarwal IBM T J Watson Research Center Yorktown Heights, NY Recurrent Neural Networks Neural Networks and Deep Learning, Springer, 218 Chapter 7.1 7.2 The Challenges of Processing Sequences Conventional

More information

Multi-Source Neural Translation

Multi-Source Neural Translation Multi-Source Neural Translation Barret Zoph and Kevin Knight Information Sciences Institute Department of Computer Science University of Southern California {zoph,knight}@isi.edu Abstract We build a multi-source

More information

Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition

Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition Jianshu Zhang, Jun Du and Lirong Dai National Engineering Laboratory for Speech and Language Information Processing

More information

arxiv: v1 [stat.ml] 18 Nov 2017

arxiv: v1 [stat.ml] 18 Nov 2017 MinimalRNN: Toward More Interpretable and Trainable Recurrent Neural Networks arxiv:1711.06788v1 [stat.ml] 18 Nov 2017 Minmin Chen Google Mountain view, CA 94043 minminc@google.com Abstract We introduce

More information

Tackling the Limits of Deep Learning for NLP

Tackling the Limits of Deep Learning for NLP Tackling the Limits of Deep Learning for NLP Richard Socher Salesforce Research Caiming Xiong, Romain Paulus, Stephen Merity, James Bradbury, Victor Zhong Einstein s Deep Learning Einstein Vision Einstein

More information