Sequence Modeling with Neural Networks
|
|
- Ophelia Edwards
- 5 years ago
- Views:
Transcription
1 Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2
2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform
3 Successes of deep models Machine translation Question Answering Left: Right:
4 Successes of deep models
5 a sequence modeling problem: predict the next word
6 a sequence modeling problem This morning I took the dog for a walk.
7 a sequence modeling problem This morning I took the dog for a walk. given these words
8 a sequence modeling problem This morning I took the dog for a walk. given these words predict what comes next?
9 idea: use a fixed window This morning I took the dog for a walk. given these 2 words, predict the next word
10 idea: use a fixed window This morning I took the dog for a walk. given these 2 words, predict the next word [ ] for a One hot feature vector indicates what each word is prediction
11 problem: we can t model long-term dependencies In France, I had a great time and I learnt some of the language. e need information from the far past and future to accurately guess the correct word.
12 idea: use entire sequence, as a set of counts This morning I took the dog for a bag of words [ ] prediction
13 problem: counts don t preserve order
14 problem: counts don t preserve order The food was good, not bad at all. vs The food was bad, not good at all.
15 idea: use a really big fixed window This morning I took the dog for a walk. given these 7 words, predict the next word [ ] morning I took the dog... prediction
16 problem: no parameter sharing this morning [ ] each of these inputs has a separate parameter
17 problem: no parameter sharing this morning [ ] each of these inputs has a separate parameter this morning [ ]
18 problem: no parameter sharing this morning [ ] each of these inputs has a separate parameter this morning [ ] things we learn about the sequence won t transfer if they appear at different points in the sequence.
19 to model sequences, we need: 1. to deal with variable-length sequences 2. to maintain sequence order 3. to keep track of long-term dependencies 4. to share parameters across the sequence
20 to model sequences, we need: 1. to deal with variable-length sequences 2. to maintain sequence order 3. to keep track of long-term dependencies 4. to share parameters across the sequence let s turn to recurrent neural networks.
21 example network: input hidden output
22 example network: let s take a look at this one hidden unit input hidden output
23 RNNS remember their previous state: x 0 : it s 1 s 0 t = 0
24 RNNS remember their previous state: x 1 : was s s 1 t = 1
25 unfolding the RNN across time: time s 0 s 1 s 2... x 0 x 1 x 2
26 unfolding the RNN across time: time s 0 s 1 s 2... notice that we use the same parameters, and x 0 x 1 x 2
27 unfolding the RNN across time: time s 0 s 1 s 2... s n can contain information from all past timesteps x 0 x 1 x 2
28 how do we train an RNN?
29 how do we train an RNN? backpropagation! (through time)
30 remember: backpropagation 1. take the derivative (gradient) of the loss with respect to each parameter 2. shift parameters in the opposite direction in order to minimize loss
31 we have a loss at each timestep: (since we re making a prediction at each timestep) J 0 J 1 J 2 y 0 y 1 y 2 V V V s 0 s 1 s 2... x 0 x 1 x 2
32 we have a loss at each timestep: (since we re making a prediction at each timestep) J 0 J 1 J 2 timestep loss at each y 0 y 1 y 2 V V V s 0 s 1 s 2... x 0 x 1 x 2
33 we sum the losses across time: loss at time t = J t ( ) = our parameters, like weights total loss = J( ) = Σ t J t ( )
34 what are our gradients? we sum gradients across time for each parameter P:
35 let s try it out for with the chain rule: J 0 J 1 J 2 y 0 y 1 y 2 V V V s 0 s 1 s 2... x 0 x 1 x 2
36 let s try it out for with the chain rule: J 0 J 1 J 2 y 0 y 1 y 2 so let s take a single timestep t: V V V s 0 s 1 s 2... x 0 x 1 x 2
37 let s try it out for with the chain rule: J 0 J 1 J 2 y 0 y 1 y 2 so let s take a single timestep t: V V V s 0 s 1 s 2... x 0 x 1 x 2
38 let s try it out for with the chain rule: J 0 J 1 J 2 y 0 y 1 y 2 so let s take a single timestep t: V V V s 0 s 1 s 2... x 0 x 1 x 2
39 let s try it out for with the chain rule: J 0 J 1 J 2 y 0 y 1 y 2 so let s take a single timestep t: V V V s 0 s 1 s 2... x 0 x 1 x 2
40 let s try it out for with the chain rule: J 0 J 1 J 2 y 0 y 1 y 2 so let s take a single timestep t: V V V s 0 s 1 s 2... x 0 x 1 x 2
41 let s try it out for with the chain rule: J 0 J 1 J 2 y 0 y 1 y 2 so let s take a single timestep t: V V V s 0 s 1 s 2... but wait x 0 x 1 x 2
42 let s try it out for with the chain rule: J 0 J 1 J 2 y 0 y 1 y 2 so let s take a single timestep t: V V V s 0 s 1 s 2... but wait x 0 x 1 x 2
43 let s try it out for with the chain rule: J 0 J 1 J 2 y 0 y 1 y 2 so let s take a single timestep t: V V V s 0 s 1 s 2... but wait x 0 x 1 x 2 s 1 also depends on so we can t just treat as a constant!
44 how does s 2 depend on? J 0 J 1 J 2 y 0 y 1 y 2 V V V s 0 s 1 s 2... x 0 x 1 x 2
45 how does s 2 depend on? J 0 J 1 J 2 y 0 y 1 y 2 V V V s 0 s 1 s 2... x 0 x 1 x 2
46 how does s 2 depend on? J 0 J 1 J 2 y 0 y 1 y 2 V V V s 0 s 1 s 2... x 0 x 1 x 2
47 how does s 2 depend on? J 0 J 1 J 2 y 0 y 1 y 2 V V V s 0 s 1 s 2... x 0 x 1 x 2
48 backpropagation through time: Contributions of in previous timesteps to the error at timestep t
49 backpropagation through time: Contributions of in previous timesteps to the error at timestep t
50 why are RNNs hard to train?
51 problem: vanishing gradient
52 problem: vanishing gradient
53 problem: vanishing gradient y 0 y 1 y 2 s 0 s 1 s 2 at k = 0: x 0 x 1 x 2
54 problem: vanishing gradient y 0 y 1 y 2 y 3 y n s 0 s 1 s 2 s 3... s n x 0 x 1 x 2 x 3 x n
55 problem: vanishing gradient as the gap between timesteps gets bigger, this product gets longer and longer! y 0 y 1 y 2 y 3 y n s 0 s 1 s 2 s 3... s n x 0 x 1 x 2 x 3 x n
56 problem: vanishing gradient
57 problem: vanishing gradient what are each of these terms?
58 problem: vanishing gradient what are each of these terms? = sampled from standard normal distribution = mostly < 1 f = tanh or sigmoid so f < 1
59 problem: vanishing gradient what are each of these terms? = sampled from standard normal distribution = mostly < 1 f = tanh or sigmoid so f < 1 we re multiplying a lot of small numbers together.
60 we re multiplying a lot of small numbers together. so what? errors due to further back timesteps have increasingly smaller gradients. so what? parameters become biased to capture shorter-term dependencies.
61 In France, I had a great time and I learnt some of the language. our parameters are not trained to capture long-term dependencies, so the word we predict will mostly depend on the previous few words, not much earlier ones
62 solution #1: activation functions ReL derivative prevents f from shrinking the gradients tanh derivative sigmoid derivative
63 solution #2: initialization weights initialized to identity matrix biases initialized to zeros prevents from shrinking the gradients
64 a different type of solution: more complex cells
65 solution #3: gated cells rather each node being just a simple RNN cell, make each node a more complex unit with gates controlling what information is passed through. vs Long short term memory cells are able to keep track of information throughout many timesteps. RNN LSTM, GR, etc
66 solution #3: more on LSTMs c j c j+1 cell state
67 solution #3: more on LSTMs c j c j+1 forget irrelevant parts of previous state
68 solution #3: more on LSTMs c j c j+1 selectively update cell state values
69 solution #3: more on LSTMs c j c j+1 output certain parts of cell state
70 solution #3: more on LSTMs c j c j+1 forget irrelevant parts of previous state selectively update cell state values output certain parts of cell state
71 solution #3: more on LSTMs c X j + c j+1 forget gate input gate output gate s t s t s x t x t+1 t MIT 6.S191 Intro to Deep Learning IAP 2017
72 why do LSTMs help? 1. forget gate allows information to pass through unchanged 2. cell state is separate from what s outputted 3. s j depends on s j-1 through addition! derivatives don t expand into a long product!
73 possible task: classification (i.e. sentiment) :( :)
74 possible task: classification (i.e. sentiment) s 0 s 1... y negative V s n y is a probability distribution over possible classes (like positive, negative, neutral), aka a softmax don t fly luggage x 0 x 1 x n
75 possible task: music generation RNN Music by: Francesco Marchesani, Computer Science Engineer, PoliMi
76 possible task: music generation y 0 y 1 y 2 E D F# y i is actually a probability distribution over possible next notes, aka a softmax V V V s 0 s 1 s 2... <start> E D x 0 x 1 x 2
77 possible task: machine translation le chien mange <end> K K K K s 0 s 1 s 2 c 0 c 1 c 2 c 3 L L L J J J J the dog eats s 2, <start> s 2, le s 2, chien s 2, mange
78 possible task: machine translation le chien mange <end> K K K K s 0 s 1 s 2 c 0 c 1 c 2 c 3 L L L J J J J the dog eats s 2, <go> s 2, le s 2, chien s 2, mange
79 problem: a single encoding is limiting le chien mange <end> K K K K s 0 s 1 s 2 c 0 c 1 c 2 c 3 L L L J J J J the dog eats s 2, <go> s 2, le s 2, chien s 2, mange all the decoder knows about the input sentence is in one fixed length vector, s 2
80 solution: attend over all encoder states le chien mange <end> K K K K s 0 s 1 s 2 c 0 c 1 c 2 c 3 L L L J J J J s *, <go> s *, le s *, chien s *, mange
81 solution: attend over all encoder states le chien mange <end> K K K K s 0 s 1 s 2 c 0 c 1 c 2 c 3 L L L J J J s *, le s *, chien s *, mange
82 solution: attend over all encoder states le chien mange <end> K K K K s 0 s 1 s 2 c 0 c 1 c 2 c 3 L L L J J s *, chien s *, mange
83 now we can model sequences! why recurrent neural networks? training them with backpropagation through time solving the vanishing gradient problem with activation functions, initialization, and gated cells (like LSTMs) building models for classification, music generation and machine translation using attention mechanisms
84 and there s lots more to do! extending our models to timeseries + waveforms complex language models to generate long text or books language models to generate code controlling cars + robots predicting stock market trends summarizing books + articles handwriting generation multilingual translation models many more!
Long-Short Term Memory and Other Gated RNNs
Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling
More informationRecurrent and Recursive Networks
Neural Networks with Applications to Vision and Language Recurrent and Recursive Networks Marco Kuhlmann Introduction Applications of sequence modelling Map unsegmented connected handwriting to strings.
More informationModelling Time Series with Neural Networks. Volker Tresp Summer 2017
Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,
More informationarxiv: v3 [cs.lg] 14 Jan 2018
A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe
More informationRecurrent Neural Networks. Jian Tang
Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating
More informationNeural Networks Language Models
Neural Networks Language Models Philipp Koehn 10 October 2017 N-Gram Backoff Language Model 1 Previously, we approximated... by applying the chain rule p(w ) = p(w 1, w 2,..., w n ) p(w ) = i p(w i w 1,...,
More informationIntroduction to RNNs!
Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed Outline Why Recurrent Neural Networks (RNNs)? The Vanilla RNN unit The RNN forward pass Backpropagation refresher The RNN
More informationCSC321 Lecture 10 Training RNNs
CSC321 Lecture 10 Training RNNs Roger Grosse and Nitish Srivastava February 23, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 10 Training RNNs February 23, 2015 1 / 18 Overview Last time, we saw
More informationRECURRENT NETWORKS I. Philipp Krähenbühl
RECURRENT NETWORKS I Philipp Krähenbühl RECAP: CLASSIFICATION conv 1 conv 2 conv 3 conv 4 1 2 tu RECAP: SEGMENTATION conv 1 conv 2 conv 3 conv 4 RECAP: DETECTION conv 1 conv 2 conv 3 conv 4 RECAP: GENERATION
More informationRecurrent Neural Networks. deeplearning.ai. Why sequence models?
Recurrent Neural Networks deeplearning.ai Why sequence models? Examples of sequence data The quick brown fox jumped over the lazy dog. Speech recognition Music generation Sentiment classification There
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationCSCI 315: Artificial Intelligence through Deep Learning
CSCI 315: Artificial Intelligence through Deep Learning W&L Winter Term 2017 Prof. Levy Recurrent Neural Networks (Chapter 7) Recall our first-week discussion... How do we know stuff? (MIT Press 1996)
More informationRecurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves
Recurrent Neural Networks Deep Learning Lecture 5 Efstratios Gavves Sequential Data So far, all tasks assumed stationary data Neither all data, nor all tasks are stationary though Sequential Data: Text
More informationNatural Language Processing and Recurrent Neural Networks
Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information
More informationNeural Networks 2. 2 Receptive fields and dealing with image inputs
CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There
More informationCSC321 Lecture 15: Exploding and Vanishing Gradients
CSC321 Lecture 15: Exploding and Vanishing Gradients Roger Grosse Roger Grosse CSC321 Lecture 15: Exploding and Vanishing Gradients 1 / 23 Overview Yesterday, we saw how to compute the gradient descent
More informationSlide credit from Hung-Yi Lee & Richard Socher
Slide credit from Hung-Yi Lee & Richard Socher 1 Review Recurrent Neural Network 2 Recurrent Neural Network Idea: condition the neural network on all previous words and tie the weights at each time step
More informationConditional Language Modeling. Chris Dyer
Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain
More informationRecurrent Neural Networks (Part - 2) Sumit Chopra Facebook
Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech
More informationLecture 15: Exploding and Vanishing Gradients
Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train
More informationLecture 11 Recurrent Neural Networks I
Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks
More informationDeep Learning Recurrent Networks 2/28/2018
Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good
More informationLecture 11 Recurrent Neural Networks I
Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor niversity of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks
More information(
Class 15 - Long Short-Term Memory (LSTM) Study materials http://colah.github.io/posts/2015-08-understanding-lstms/ (http://colah.github.io/posts/2015-08-understanding-lstms/) http://karpathy.github.io/2015/05/21/rnn-effectiveness/
More informationCSC321 Lecture 16: ResNets and Attention
CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the
More informationEE-559 Deep learning Recurrent Neural Networks
EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret https://fleuret.org/ee559/ Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent
More informationStructured Neural Networks (I)
Structured Neural Networks (I) CS 690N, Spring 208 Advanced Natural Language Processing http://peoplecsumassedu/~brenocon/anlp208/ Brendan O Connor College of Information and Computer Sciences University
More informationDeep Learning Sequence to Sequence models: Attention Models. 17 March 2018
Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:
More informationTracking the World State with Recurrent Entity Networks
Tracking the World State with Recurrent Entity Networks Mikael Henaff, Jason Weston, Arthur Szlam, Antoine Bordes, Yann LeCun Task At each timestep, get information (in the form of a sentence) about the
More informationNatural Language Processing
Natural Language Processing Pushpak Bhattacharyya CSE Dept, IIT Patna and Bombay LSTM 15 jun, 2017 lgsoft:nlp:lstm:pushpak 1 Recap 15 jun, 2017 lgsoft:nlp:lstm:pushpak 2 Feedforward Network and Backpropagation
More informationNEURAL LANGUAGE MODELS
COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,
More informationLearning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials
Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials Contents 1 Pseudo-code for the damped Gauss-Newton vector product 2 2 Details of the pathological synthetic problems
More informationBetter Conditional Language Modeling. Chris Dyer
Better Conditional Language Modeling Chris Dyer Conditional LMs A conditional language model assigns probabilities to sequences of words, w =(w 1,w 2,...,w`), given some conditioning context, x. As with
More informationStephen Scott.
1 / 35 (Adapted from Vinod Variyam and Ian Goodfellow) sscott@cse.unl.edu 2 / 35 All our architectures so far work on fixed-sized inputs neural networks work on sequences of inputs E.g., text, biological
More informationMachine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016
Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text
More informationAnalysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function
Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function Austin Wang Adviser: Xiuyuan Cheng May 4, 2017 1 Abstract This study analyzes how simple recurrent neural
More informationA thorough derivation of back-propagation for people who really want to understand it by: Mike Gashler, September 2010
A thorough derivation of back-propagation for people who really want to understand it by: Mike Gashler, September 2010 Define the problem: Suppose we have a 5-layer feed-forward neural network. (I intentionally
More informationRecurrent Neural Network
Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks
More informationConditional Language modeling with attention
Conditional Language modeling with attention 2017.08.25 Oxford Deep NLP 조수현 Review Conditional language model: assign probabilities to sequence of words given some conditioning context x What is the probability
More informationRecurrent Neural Networks with Flexible Gates using Kernel Activation Functions
2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors: S. Scardapane, S. Van Vaerenbergh,
More informationLecture 15: Recurrent Neural Nets
Lecture 15: Recurrent Neural Nets Roger Grosse 1 Introduction Most of the prediction tasks we ve looked at have involved pretty simple kinds of outputs, such as real values or discrete categories. But
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen
Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationSequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018
Sequence Models Ji Yang Department of Computing Science, University of Alberta February 14, 2018 This is a note mainly based on Prof. Andrew Ng s MOOC Sequential Models. I also include materials (equations,
More informationNatural Language Understanding. Recap: probability, language models, and feedforward networks. Lecture 12: Recurrent Neural Networks and LSTMs
Natural Language Understanding Lecture 12: Recurrent Neural Networks and LSTMs Recap: probability, language models, and feedforward networks Simple Recurrent Networks Adam Lopez Credits: Mirella Lapata
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationDeep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017
Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Richard Socher Organization Main midterm: Feb 13 Alternative midterm: Friday Feb
More informationAssignment 1. Learning distributed word representations. Jimmy Ba
Assignment 1 Learning distributed word representations Jimmy Ba csc321ta@cstorontoedu Background Text and language play central role in a wide range of computer science and engineering problems Applications
More informationOnline Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?
Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification
More informationDeep Learning for NLP
Deep Learning for NLP Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Greg Durrett Outline Motivation for neural networks Feedforward neural networks Applying feedforward neural networks
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationWaveNet: A Generative Model for Raw Audio
WaveNet: A Generative Model for Raw Audio Ido Guy & Daniel Brodeski Deep Learning Seminar 2017 TAU Outline Introduction WaveNet Experiments Introduction WaveNet is a deep generative model of raw audio
More informationNatural Language Processing
Natural Language Processing Info 159/259 Lecture 7: Language models 2 (Sept 14, 2017) David Bamman, UC Berkeley Language Model Vocabulary V is a finite set of discrete symbols (e.g., words, characters);
More informationMachine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/
More informationLecture 6: Neural Networks for Representing Word Meaning
Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization
TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Vanishing and Exploding Gradients ReLUs Xavier Initialization Batch Normalization Highway Architectures: Resnets, LSTMs and GRUs Causes
More informationRecurrent Neural Networks
Charu C. Aggarwal IBM T J Watson Research Center Yorktown Heights, NY Recurrent Neural Networks Neural Networks and Deep Learning, Springer, 218 Chapter 7.1 7.2 The Challenges of Processing Sequences Conventional
More informationNeural Architectures for Image, Language, and Speech Processing
Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent
More informationNeural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016
Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph
More informationRecurrent Neural Networks
Recurrent Neural Networks Datamining Seminar Kaspar Märtens Karl-Oskar Masing Today's Topics Modeling sequences: a brief overview Training RNNs with back propagation A toy example of training an RNN Why
More informationGenerating Sequences with Recurrent Neural Networks
Generating Sequences with Recurrent Neural Networks Alex Graves University of Toronto & Google DeepMind Presented by Zhe Gan, Duke University May 15, 2015 1 / 23 Outline Deep recurrent neural network based
More informationBased on the original slides of Hung-yi Lee
Based on the original slides of Hung-yi Lee Google Trends Deep learning obtains many exciting results. Can contribute to new Smart Services in the Context of the Internet of Things (IoT). IoT Services
More informationStatistical NLP for the Web
Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg Announcements Please ask HW2 related questions in courseworks
More informationReservoir Computing and Echo State Networks
An Introduction to: Reservoir Computing and Echo State Networks Claudio Gallicchio gallicch@di.unipi.it Outline Focus: Supervised learning in domain of sequences Recurrent Neural networks for supervised
More informationAnalysis of Multilayer Neural Network Modeling and Long Short-Term Memory
Analysis of Multilayer Neural Network Modeling and Long Short-Term Memory Danilo López, Nelson Vera, Luis Pedraza International Science Index, Mathematical and Computational Sciences waset.org/publication/10006216
More informationSpeech and Language Processing
Speech and Language Processing Lecture 5 Neural network based acoustic and language models Information and Communications Engineering Course Takahiro Shinoaki 08//6 Lecture Plan (Shinoaki s part) I gives
More informationLong-Short Term Memory
Long-Short Term Memory Sepp Hochreiter, Jürgen Schmidhuber Presented by Derek Jones Table of Contents 1. Introduction 2. Previous Work 3. Issues in Learning Long-Term Dependencies 4. Constant Error Flow
More informationRecurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST
1 Recurrent Neural Networks (RNN) and Long-Short-Term-Memory (LSTM) Yuan YAO HKUST Summary We have shown: Now First order optimization methods: GD (BP), SGD, Nesterov, Adagrad, ADAM, RMSPROP, etc. Second
More informationtext classification 3: neural networks
text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationTask-Oriented Dialogue System (Young, 2000)
2 Review Task-Oriented Dialogue System (Young, 2000) 3 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Speech Recognition Hypothesis are there any action movies to see
More informationNeural Networks in Structured Prediction. November 17, 2015
Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving
More informationNeural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35
Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How
More informationRecurrent Neural Networks (RNNs) Lecture 9 - Networks for Sequential Data RNNs & LSTMs. RNN with no outputs. RNN with no outputs
Recurrent Neural Networks (RNNs) RNNs are a family of networks for processing sequential data. Lecture 9 - Networks for Sequential Data RNNs & LSTMs DD2424 September 6, 2017 A RNN applies the same function
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationIntroduction to Deep Learning
Introduction to Deep Learning A. G. Schwing & S. Fidler University of Toronto, 2015 A. G. Schwing & S. Fidler (UofT) CSC420: Intro to Image Understanding 2015 1 / 39 Outline 1 Universality of Neural Networks
More informationArtificial Neural Networks. MGS Lecture 2
Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation
More informationSemantics with Dense Vectors. Reference: D. Jurafsky and J. Martin, Speech and Language Processing
Semantics with Dense Vectors Reference: D. Jurafsky and J. Martin, Speech and Language Processing 1 Semantics with Dense Vectors We saw how to represent a word as a sparse vector with dimensions corresponding
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationCSC321 Lecture 15: Recurrent Neural Networks
CSC321 Lecture 15: Recurrent Neural Networks Roger Grosse Roger Grosse CSC321 Lecture 15: Recurrent Neural Networks 1 / 26 Overview Sometimes we re interested in predicting sequences Speech-to-text and
More informationarxiv: v1 [cs.cl] 21 May 2017
Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we
More informationConvolutional Neural Networks
Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»
More informationMachine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26
Machine Translation 10: Advanced Neural Machine Translation Architectures Rico Sennrich University of Edinburgh R. Sennrich MT 2018 10 1 / 26 Today s Lecture so far today we discussed RNNs as encoder and
More informationArtificial Intelligence
Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement
More informationBased on the original slides of Hung-yi Lee
Based on the original slides of Hung-yi Lee New Activation Function Rectified Linear Unit (ReLU) σ z a a = z Reason: 1. Fast to compute 2. Biological reason a = 0 [Xavier Glorot, AISTATS 11] [Andrew L.
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationLong Short- Term Memory (LSTM) M1 Yuichiro Sawai Computa;onal Linguis;cs Lab. January 15, Deep Lunch
Long Short- Term Memory (LSTM) M1 Yuichiro Sawai Computa;onal Linguis;cs Lab. January 15, 2015 @ Deep Lunch 1 Why LSTM? OJen used in many recent RNN- based systems Machine transla;on Program execu;on Can
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationLecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep
More informationIntroduction to Deep Neural Networks
Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationLong Short-Term Memory (LSTM)
Long Short-Term Memory (LSTM) A brief introduction Daniel Renshaw 24th November 2014 1 / 15 Context and notation Just to give the LSTM something to do: neural network language modelling Vocabulary, size
More informationCSC321 Lecture 4: Learning a Classifier
CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 31 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron
More informationIntroduction to Machine Learning Spring 2018 Note Neural Networks
CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,
More informationOverview Today: From one-layer to multi layer neural networks! Backprop (last bit of heavy math) Different descriptions and viewpoints of backprop
Overview Today: From one-layer to multi layer neural networks! Backprop (last bit of heavy math) Different descriptions and viewpoints of backprop Project Tips Announcement: Hint for PSet1: Understand
More informationEE-559 Deep learning LSTM and GRU
EE-559 Deep learning 11.2. LSTM and GRU François Fleuret https://fleuret.org/ee559/ Mon Feb 18 13:33:24 UTC 2019 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE The Long-Short Term Memory unit (LSTM) by Hochreiter
More informationDeep Learning Recurrent Networks 10/11/2017
Deep Learning Recurrent Networks 10/11/2017 1 Which open source project? Related math. What is it talking about? And a Wikipedia page explaining it all The unreasonable effectiveness of recurrent neural
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationCS230: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention
CS23: Lecture 8 Word2Vec applications + Recurrent Neural Networks with Attention Today s outline We will learn how to: I. Word Vector Representation i. Training - Generalize results with word vectors -
More information