Advanced Cutting Edge Research Seminar. Dialogue System with Deep Neural Networks

Size: px
Start display at page:

Download "Advanced Cutting Edge Research Seminar. Dialogue System with Deep Neural Networks"

Transcription

1 Advanced Cutting Edge Research Seminar Dialogue System with Deep Neural Networks Assistant Professor Koichiro Yoshino Nara Institute of Science and Technology Augmented Human Communication Laboratory PRESTO, Japan Science and Technology Agency AHC-Lab. NAIST, PRESTO JST Advanced Cutting Edge Research Seminar 2 1

2 Course works 1. Basis of spoken dialogue systems Type and modules of spoken dialogue systems 2. Deep learning for spoken dialogue systems Basis of deep learning (deep neural networks) Recent approaches of deep learning for spoken dialogue systems 3. Dialogue management using reinforcement learning Basis of reinforcement learning Statistical dialogue management using intention dependency graph 4. Dialogue management using deep reinforcement learning Implementation of deep Q-network in dialogue management Advanced Cutting Edge Research Seminar 2 2

3 Basis of deep neural networks Perceptron Simple binary classifier Multi-layer perceptron Combination of binary classifiers Deep neural networks Ways to apply deep neural networks What kind of problem can be solved with DNN? Advanced Cutting Edge Research Seminar 2 3

4 Simple perceptron Simplest unit of neural networks Takes several inputs produces one output x y Perceptron does a single decision given several inputs y = sign i w i φ i (x) + b bias b Binary output +1 or -1 Weight for φ i Feature function on x Advanced Cutting Edge Research Seminar 2 4

5 Properties of simple perceptron Problems the perceptron can solve Linearly solvable binary classification problem Positive: This room is good positive good cool Positive: This building is cool Negative: This room is bad negative bad Problems the perceptron cannot solve Non-linear problems Positive: very good Positive: not bad very very bad very good Negative: very bad not not bad not good Negative: not good bad good Advanced Cutting Edge Research Seminar 2 5

6 Solutions for non-linear problems Using several classifiers Multi-layer perceptron (MLP) Feed-forward network y Classifiers of the 1 st layer take the same input, but learn different weights very x 2 : very bad x 1 : very good If we have two linear separating planes, we can classify the example of not x 3 : not bad x 4 : not good non-linear problem bad good Advanced Cutting Edge Research Seminar 2 6

7 Multi-layer perceptron (MLP) 1 st layer Input: same as the single perceptron Output: features for the decision of the 2 nd layer Each perceptron may learn the mapping between the input feature space and the new feature space Kernel methods do the similar thing 2 nd layer Input: features that come from the 1 st layer Output: classification result very not x 2 x 1 x 3 x 4 bad good x 3 φ 2 (x) x 1 x 2 x 4 φ 1 (x) φ 1 x φ 2 x = negative φ 1 x φ 2 x = positive Advanced Cutting Edge Research Seminar 2 7

8 Deep neural networks x 1 h 1 1 h 1 2 h 1 3 x 2 x 3 1 h 2 1 h 3 2 h 2 2 h 3 3 h 2 3 h 3 h 1 y x 4 h 4 1 h 4 2 h 4 3 Deep neural network (DNN) is deeply layered multi-layer perceptron It can train the mapping of X and y (y = f(x)) even if it is complex Restricted Boltzmann machine (RBM) is a key technique to train the model Pre-trains mapping of each layer (X and H 1, H 1 and H 2, ) and finetunes entire the network (backpropagation) Advanced Cutting Edge Research Seminar 2 8

9 Variation of neural networks: Recurrent neural network (RNN) Recurrent neural network is a neural network that has a recursion h t = tanh(w xh x t + W hh h t 1 + b h ) y t p = softmax(w hyp h t + b yp ) h 0 x 1 x 2 x 3 x 4 x 5 h 1 h 2 h 3 h 4 h 5 tanh = ex e x e x +e x, softmax = ey i k ey k y p 1 y p 2 y p 3 y p 4 y p 5 This structure works well for sequential input (X 1, X 2,, X t ) t is a time step Input h t 1 for time-step t will be a memory of previous input Advanced Cutting Edge Research Seminar 2 9

10 Variation of neural networks: Convolutional neural network (CNN) Convolution Activation Max-pooling Full-connection and softmax CNN is the state of the art algorithm for classification m 1 c i,j = n 1 s=0 t w st x i+s,(j+t) + b c a i,j = tanh c i,j, p i,j = max a i,j o = tanh W po p + b o, y = softmax(w oy o + b y ) Advanced Cutting Edge Research Seminar 2 10

11 Ways to apply deep learning Deep learning can learn the mapping between x and y if we have a large-scale aligned data Speech sounds and their phonemes Transcribed utterances and dialogue states Belief and action-value function State and action Action and utterance Of course, it is not so simple, but successful works of deep learning solve any problem of mapping that was hard to solve existing frameworks Advanced Cutting Edge Research Seminar 2 11

12 Tasks of spoken dialogue systems Speech recognition I d like to take Kintetsu-line from Ikoma stat. Spoken language understanding Dialogue state tracking Action decision Language generation End-to-end dialogue $FROM=Ikoma $LINE=Kintetsu SLU Model Knowledge base LG Where will you go? $FROM=Ikoma $TO_GO=??? $LINE=Kintetsu DM 1 ask $TO_GO 2 inform $NEXT Advanced Cutting Edge Research Seminar 2 12

13 Speech recognition with DNN in early stage Conventional ASR architecture argmax W P(W X) = argmax W P X W P(W) Acoustic model W is word sequence and X is speech GMM-HMM Language model DNN-HMM a r a a r a x 1 x 2 x 3 x 1 x 2 x 3 Advanced Cutting Edge Research Seminar 2 13

14 Speech recognition with DNN in early stage GMM-HMM DNN-HMM a r a a r a x 1 x 2 x 3 x 1 x 2 x 3 Just replace the generative probability of GMM for a phoneme with discriminative probability to classify a phoneme from speech The other architecture (HMM-based phoneme sequence selection and n- gram based language modeling) was the same, but it reduced 20-30% errors of speech recognition Advanced Cutting Edge Research Seminar 2 14

15 Language model and recurrent neural network Language model calculates likelihoods of word sequence W P W = P w 1 P w 2 w 1 P w n w 1,, w n 1 Existing language models are modeled with N-gram model that approximate given words P W i P(w i w i 1:i N 1 ) The problem can be solved with RNN h t = tanh(w wh w t 1 + W hh h t 1 + b h ) w t = softmax(w hyp h t + b yp ) RNN (and its successors) became a state of the art LM Advanced Cutting Edge Research Seminar 2 15

16 End-to-end speech recognition system Early DNN-based speech recognition system just replaced some modules with deep neural networks, but recent researchers tries to train the model of argmax W P(W X) Including pre-processing of ASR Ochiai et al., Multichannel end-to-end speech recognition. In Proc. ICML, 図は論文より引用 Advanced Cutting Edge Research Seminar 2 16

17 Problem of Spoken language understanding (SLU) and dialogue state tracking (DST) SLU DM Convert the user utterance into machine-readable expressions I want to take Kintetsu from Ikoma Decide the next system action from the SLU result and dialogue history SLU result Train_info{$FROM= Ikoma,$LINE=Kintetsu} history $FROM=??? $TO_GO=Namba $LINE=??? I want to take Kintetsu from Ikoma Line From Stat Dialogue state Train_info{ $FROM=Ikoma $TO_GO=Namba $LINE=Kintetsu } Train_info{$FROM= Ikoma,$LINE=Kintetsu} Action decision 1 inform $NEXT_TRAIN 2 ask $TO_GO Advanced Cutting Edge Research Seminar 2 17

18 Simple classification for SLU Chinese word model Chinese char model (Translated) English word model A MULTICHANNEL CONVOLUTIONAL NEURAL NETWORK FOR CROSS-LANGUAGE DIALOG STATE TRACKING Shi et al., In Proc. IEEE-SLT 2016 Advanced Cutting Edge Research Seminar 2 18

19 CNN for classification He doesn't have himself 0-padding: fill with 0 if the sentence length is smaller than maximum sentence length word embedding artificial intelligence class fixed length vector of words x 1 x 2 x 3 CNN requires to use fixed size of matrix as the input Using two techniques Embedding converts word into fixed length meaning vector 0-padding sets the height of matrix with the maximum sentence length in the training data and fill with 0 if the sentence length is smaller than the max Advanced Cutting Edge Research Seminar 2 19

20 Problem definition of SLU and DST SLU: find a dialogue frame F given words in the utterance W argmax P(F W) F For the tagging problem, there are many works Slot filling, domain/intent classification, dialogue act classification, DST: find a dialogue state S given sequence of F 1:t argmax P(S F 1:t ) S It can be solved as a joint problem P S W 1:t = P S F 1:t P(F 1:t W 1:t ) Can be solved with the same model? (with sequential model) RNN, long short term memory neural network (LSTM) Advanced Cutting Edge Research Seminar 2 20

21 RNN-based dialogue state tracking Word-Based Dialog State Tracking with Recurrent Neural Networks. Henderson et al., In Proc. SIGDIAL, pp, , Advanced Cutting Edge Research Seminar 2 21

22 LSTM-based dialogue state tracking T Is there any activity in Singapore? User utterance Other features word sequence Word embedding LSTM Task: activity{ Area: Singapore Price range: - } Dialogue State Tracking using Long Short Term Memory Neural Networks. Yoshino et al., In Proc. IWSDS, Advanced Cutting Edge Research Seminar 2 22

23 Relation between belief update and RNNbased dialogue state tracker RNN: Output dialogue state given sequence of words (utterance) belief h t = tanh(w Xh X t + W hh h t 1 + b h ) observation state transition X t is a sequence of words in time t Dialogue history is propagated as hidden layer h t 1 y t p = softmax(w hyp h t + b yp ) Output a dialogue state y t p that has the highest probability Belief update b t P(o t s t ) si P s j t s i t 1 b t 1 observation state transition belief Advanced Cutting Edge Research Seminar 2 23

24 Problem of action decision Decide action a t given belief b t I d like to go to Namba with Kintetsu s 4 s 3 s 2 s 1 There are two ways: Belief of dialogue states Train_info{ b t Train_info{ $FROM=Ikoma? $TO_GO=Bamba Train_info{ $FROM=Ikoma $TO_GO=Bamba $LINE=Kintetsu Train_info{ $FROM=Ikoma $LINE=Kintetsu } $FROM=Ikoma $TO_GO=Bamba $LINE=Kintetsu } $TO_GO=Bamba $LINE=Kintetsu } } Find the best policy (policy gradient) Find the best Q-function (Q-network) Action decision 1 inform $NEXT_TRAIN 2 ask $TO_GO a t Advanced Cutting Edge Research Seminar 2 24

25 What was good actions? Maximize the expected future reward (value function) V π s t = max π Vπ (s t ) = max a s t+1 P s t+1 s t, a t a R s t, π s, s t+1 + γv π s t+1 Q π s, a = s t+1 P s t+1 s t, a t R s t, a t, s t+1 + γmax a Q π (s t+1, a t+1 ) Policy gradient directly calculates the score V π s t Q-network calculates Q π s, a for each action according to the sampling manner of Q-learning Advanced Cutting Edge Research Seminar 2 25

26 Policy gradient In policy gradient, policy is not decisive: π(s, a) is a probability of selecting action a given state s J θ = V π θ s θ J θ = E πθ [ θ logπ_θ(s, a)q π θ s, a ] If we can learn parameter θ of policy π θ as maximizing the J θ in existing data, we can get the policy that maximizes the reward for existing data sequence We can use deep learning for the parameter learning Advanced Cutting Edge Research Seminar 2 26

27 Q-learning Premise: if we can calculate Q s, a for every pair, we should decide action a according to max a Q s, a Problem: we don t know P s t+1 s t, a t to calculate Q s, a Solution: approximate the P s t+1 s t, a t with sampling Q s t, a t update 1 α Q st, a t + α R s t, a t, s t+1 + γ max a t+1 Q(st+1, a t+1 ) Back-propagate the reward from the end of the sample We will try to build a dialogue manager by using this algorithm in the next class Advanced Cutting Edge Research Seminar 2 27

28 Q-network Idea: If we can regress the Q-value at each sampling, learning will be efficient L θ i = E s,a,r,s [ y Q s, a 2 ] y = R s t, a t, s t+1 + γmax a Q(s t+1, a t+1 ) Regression? train the mapping between y and Q s, a? Deep learning! (Deep Q-network; DQN) b s = s 1 : 0. 0 s = s 2 : 0. 9 s = s n : 0. 0 tanh Q(b, a) Advanced Cutting Edge Research Seminar 2 28

29 Joint learning of dialogue state tracking and action decision with deep learning Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning. Zhao et al., In Proc. SIGDIAL, 2016 LSTM-based DST results are used as the input of DQN LSTM: calculate b t from observation DQN: optimize Q(b t, a t+1 ; θ) with regression Fine tuning of entire the network Advanced Cutting Edge Research Seminar 2 29

30 Language generation Generate a sentence given a system action Ask $TO_GO Language generation Where will you go? Conventional: statistical template based approach It is still weak for out of vocabulary (and out of template) フローグラフからの手順書の生成. 情処論文誌, 両手鍋 /T を油 /F で熱 /Ac する セロリ /F と青ねぎ /F とニンニク /F を加え /Ac る 1 分ほど /D 炒め /Ac る Advanced Cutting Edge Research Seminar 2 30

31 RNN-language model for generation RNN can generate a sequence of words by using the generated words as its next input (decoder model) h t = tanh(w wh w t 1 + W hh h t 1 + b h ) w t = softmax(w hyp h t + b yp ) x 1 =He x 2 =doesn t x 3 =have x 7 =in Wx 1 Wx 2 Wx 3 Wx 7 Wh 0 Wh 1 Wh 2 Wh 3 Wh n 2 Wh 5 Dimension size of output will be a y 1 p =doesn t y 2 p =have y 3 p =very y 8 vocabulary p =himself Advanced Cutting Edge Research Seminar 2 31

32 Decoder with condition: semantically conditioned LSTM recurrent hidden layer How to say (=LM) embedding of a word What to say (contents) 1-hot dialog act and slot values Semantically Conditioned LSTMbased Natural Language Generation for Spoken Dialogue Systems. Wen et al., In Proc. EMNLP, Advanced Cutting Edge Research Seminar 2 32

33 semantically conditioned LSTM decoding results for dialogue systems Advanced Cutting Edge Research Seminar 2 33

34 Context-aware NLG Sequence-to-sequence modeling of generation Change the response according to the dialogue context Dusek et al., A context-aware natural language generator for dialogue systems. In Proc. SIGDIAL Advanced Cutting Edge Research Seminar 2 34

35 QA style I d like to take Kintetsu-line from Ikoma stat. $FROM=Ikoma $LINE=Kintetsu SLU $FROM=Ikoma $TO_GO=??? $LINE=Kintetsu Skip the management (example-base) Model Knowledge base LG Where will you go? DM 1 ask $TO_GO 2 inform $NEXT Advanced Cutting Edge Research Seminar 2 35

36 Encoder-decoder We can combine ideas of encoder and decoder to make the neural network that remembers the input sentence and outputs the input sentence RNN may remember not only words but also order of words Vinyals, Oriol, and Quoc Le. "A neural conversational model." arxiv preprint arxiv: (2015). rest room is next. EOS where is the rest room EOS rest room is entrance. Advanced Cutting Edge Research Seminar 2 36

37 Attention model Gives attentional point to be decoded Decides to through the input or not! next to the entrance. EOS X X sign (softmax) where is the rest room EOS next to the entrance. Advanced Cutting Edge Research Seminar 2 37

38 Attention model a t,j = attention_score(h e j, hd t ) Decides to through the input or not! X X X X X next to the entrance. EOS attention_score & softmax where is the rest room EOS next to the entrance. Advanced Cutting Edge Research Seminar 2 38

39 ChitChat One typical end-to-end modeling of dialogue Serban et al.,building end-to-end dialogue systems using generative hierarchical neural network models. In Proc. AAAI Advanced Cutting Edge Research Seminar 2 39

40 Memory Network for dialogue systems The task is goal oriented (API call), but try to work the system on end-to-end memory network The problem is perfectly solved in DSTC6-Track1 Advanced Cutting Edge Research Seminar 2 40

41 If you are interested in more recent works Deep Learning for Dialogue Systems Tutorial by Yun-Nung Chen, Asli Celikyikmaz, and Dilek Hakkani-Tur Advanced Cutting Edge Research Seminar 2 41

42 Summary Deep learning is applied for several tasks of spoken dialogue systems in recent years Speech recognition, understanding, state tracking, action decision, generation, and end-to-end modeling How do we apply deep learning for dialogue (in development)? Clarify your problem to setup the input and output Find similar systems of existing works in recent (2-3 years) conferences (SIGDIAL, NAACL, ACL, EMNLP, COLING, AAAI, IJCAI, ) Can you prepare the enough data pair of the input and the output? How do we apply deep learning for dialogue (in research)? Find a mapping problem that requires hi-dimension or non-linear Consider properties of your input and output, even if it is a part of your problem Can you prepare the enough data pair of the input and the output? Advanced Cutting Edge Research Seminar 2 42

43 Next contents 1/30 Dialogue management with Q-learning We will see the detailed algorithm and implementation of the dialogue manager with Q-learning We will discuss about user simulator Advanced Cutting Edge Research Seminar 2 43

Task-Oriented Dialogue System (Young, 2000)

Task-Oriented Dialogue System (Young, 2000) 2 Review Task-Oriented Dialogue System (Young, 2000) 3 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Speech Recognition Hypothesis are there any action movies to see

More information

arxiv: v3 [cs.lg] 14 Jan 2018

arxiv: v3 [cs.lg] 14 Jan 2018 A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe

More information

Neural Architectures for Image, Language, and Speech Processing

Neural Architectures for Image, Language, and Speech Processing Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent

More information

Recurrent Neural Network

Recurrent Neural Network Recurrent Neural Network Xiaogang Wang xgwang@ee..edu.hk March 2, 2017 Xiaogang Wang (linux) Recurrent Neural Network March 2, 2017 1 / 48 Outline 1 Recurrent neural networks Recurrent neural networks

More information

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018 Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

arxiv: v1 [cs.cl] 21 May 2017

arxiv: v1 [cs.cl] 21 May 2017 Spelling Correction as a Foreign Language Yingbo Zhou yingbzhou@ebay.com Utkarsh Porwal uporwal@ebay.com Roberto Konow rkonow@ebay.com arxiv:1705.07371v1 [cs.cl] 21 May 2017 Abstract In this paper, we

More information

Conditional Language modeling with attention

Conditional Language modeling with attention Conditional Language modeling with attention 2017.08.25 Oxford Deep NLP 조수현 Review Conditional language model: assign probabilities to sequence of words given some conditioning context x What is the probability

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

CSC321 Lecture 16: ResNets and Attention

CSC321 Lecture 16: ResNets and Attention CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the

More information

Long-Short Term Memory and Other Gated RNNs

Long-Short Term Memory and Other Gated RNNs Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling

More information

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text

More information

Recent Developments in Statistical Dialogue Systems

Recent Developments in Statistical Dialogue Systems Recent Developments in Statistical Dialogue Systems Steve Young Machine Intelligence Laboratory Information Engineering Division Cambridge University Engineering Department Cambridge, UK Contents Review

More information

Sequence Modeling with Neural Networks

Sequence Modeling with Neural Networks Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Segmental Recurrent Neural Networks for End-to-end Speech Recognition

Segmental Recurrent Neural Networks for End-to-end Speech Recognition Segmental Recurrent Neural Networks for End-to-end Speech Recognition Liang Lu, Lingpeng Kong, Chris Dyer, Noah Smith and Steve Renals TTI-Chicago, UoE, CMU and UW 9 September 2016 Background A new wave

More information

Deep Recurrent Neural Networks

Deep Recurrent Neural Networks Deep Recurrent Neural Networks Artem Chernodub e-mail: a.chernodub@gmail.com web: http://zzphoto.me ZZ Photo IMMSP NASU 2 / 28 Neuroscience Biological-inspired models Machine Learning p x y = p y x p(x)/p(y)

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks

More information

Deep Learning Architectures and Algorithms

Deep Learning Architectures and Algorithms Deep Learning Architectures and Algorithms In-Jung Kim 2016. 12. 2. Agenda Introduction to Deep Learning RBM and Auto-Encoders Convolutional Neural Networks Recurrent Neural Networks Reinforcement Learning

More information

Deep Learning Recurrent Networks 2/28/2018

Deep Learning Recurrent Networks 2/28/2018 Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good

More information

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Neural Networks 2. 2 Receptive fields and dealing with image inputs CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There

More information

NEURAL LANGUAGE MODELS

NEURAL LANGUAGE MODELS COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,

More information

Attention Based Joint Model with Negative Sampling for New Slot Values Recognition. By: Mulan Hou

Attention Based Joint Model with Negative Sampling for New Slot Values Recognition. By: Mulan Hou Attention Based Joint Model with Negative Sampling for New Slot Values Recognition By: Mulan Hou houmulan@bupt.edu.cn CONTE NTS 1 2 3 4 5 6 Introduction Related work Motivation Proposed model Experiments

More information

Slide credit from Hung-Yi Lee & Richard Socher

Slide credit from Hung-Yi Lee & Richard Socher Slide credit from Hung-Yi Lee & Richard Socher 1 Review Recurrent Neural Network 2 Recurrent Neural Network Idea: condition the neural network on all previous words and tie the weights at each time step

More information

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017 Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion

More information

CSC321 Lecture 15: Recurrent Neural Networks

CSC321 Lecture 15: Recurrent Neural Networks CSC321 Lecture 15: Recurrent Neural Networks Roger Grosse Roger Grosse CSC321 Lecture 15: Recurrent Neural Networks 1 / 26 Overview Sometimes we re interested in predicting sequences Speech-to-text and

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

Deep Learning for Natural Language Processing

Deep Learning for Natural Language Processing Deep Learning for Natural Language Processing Dylan Drover, Borui Ye, Jie Peng University of Waterloo djdrover@uwaterloo.ca borui.ye@uwaterloo.ca July 8, 2015 Dylan Drover, Borui Ye, Jie Peng (University

More information

1 Introduction 2. 4 Q-Learning The Q-value The Temporal Difference The whole Q-Learning process... 5

1 Introduction 2. 4 Q-Learning The Q-value The Temporal Difference The whole Q-Learning process... 5 Table of contents 1 Introduction 2 2 Markov Decision Processes 2 3 Future Cumulative Reward 3 4 Q-Learning 4 4.1 The Q-value.............................................. 4 4.2 The Temporal Difference.......................................

More information

Name: Student number:

Name: Student number: UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2018 EXAMINATIONS CSC321H1S Duration 3 hours No Aids Allowed Name: Student number: This is a closed-book test. It is marked out of 35 marks. Please

More information

Deep Neural Networks

Deep Neural Networks Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction

More information

Speech and Language Processing

Speech and Language Processing Speech and Language Processing Lecture 5 Neural network based acoustic and language models Information and Communications Engineering Course Takahiro Shinoaki 08//6 Lecture Plan (Shinoaki s part) I gives

More information

Recurrent and Recursive Networks

Recurrent and Recursive Networks Neural Networks with Applications to Vision and Language Recurrent and Recursive Networks Marco Kuhlmann Introduction Applications of sequence modelling Map unsegmented connected handwriting to strings.

More information

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1 Multi-layer networks Steve Renals Machine Learning Practical MLP Lecture 3 7 October 2015 MLP Lecture 3 Multi-layer networks 2 What Do Single

More information

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech

More information

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves Recurrent Neural Networks Deep Learning Lecture 5 Efstratios Gavves Sequential Data So far, all tasks assumed stationary data Neither all data, nor all tasks are stationary though Sequential Data: Text

More information

Unfolded Recurrent Neural Networks for Speech Recognition

Unfolded Recurrent Neural Networks for Speech Recognition INTERSPEECH 2014 Unfolded Recurrent Neural Networks for Speech Recognition George Saon, Hagen Soltau, Ahmad Emami and Michael Picheny IBM T. J. Watson Research Center, Yorktown Heights, NY, 10598 gsaon@us.ibm.com

More information

Lecture 11 Recurrent Neural Networks I

Lecture 11 Recurrent Neural Networks I Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor niversity of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks

More information

Temporal Modeling and Basic Speech Recognition

Temporal Modeling and Basic Speech Recognition UNIVERSITY ILLINOIS @ URBANA-CHAMPAIGN OF CS 498PS Audio Computing Lab Temporal Modeling and Basic Speech Recognition Paris Smaragdis paris@illinois.edu paris.cs.illinois.edu Today s lecture Recognizing

More information

High Order LSTM/GRU. Wenjie Luo. January 19, 2016

High Order LSTM/GRU. Wenjie Luo. January 19, 2016 High Order LSTM/GRU Wenjie Luo January 19, 2016 1 Introduction RNN is a powerful model for sequence data but suffers from gradient vanishing and explosion, thus difficult to be trained to capture long

More information

text classification 3: neural networks

text classification 3: neural networks text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University

More information

Memory-Augmented Attention Model for Scene Text Recognition

Memory-Augmented Attention Model for Scene Text Recognition Memory-Augmented Attention Model for Scene Text Recognition Cong Wang 1,2, Fei Yin 1,2, Cheng-Lin Liu 1,2,3 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences

More information

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Natural Language Processing and Recurrent Neural Networks

Natural Language Processing and Recurrent Neural Networks Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Lecture 15: Recurrent Neural Nets

Lecture 15: Recurrent Neural Nets Lecture 15: Recurrent Neural Nets Roger Grosse 1 Introduction Most of the prediction tasks we ve looked at have involved pretty simple kinds of outputs, such as real values or discrete categories. But

More information

Improved Learning through Augmenting the Loss

Improved Learning through Augmenting the Loss Improved Learning through Augmenting the Loss Hakan Inan inanh@stanford.edu Khashayar Khosravi khosravi@stanford.edu Abstract We present two improvements to the well-known Recurrent Neural Network Language

More information

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018 Sequence Models Ji Yang Department of Computing Science, University of Alberta February 14, 2018 This is a note mainly based on Prof. Andrew Ng s MOOC Sequential Models. I also include materials (equations,

More information

Conditional Language Modeling. Chris Dyer

Conditional Language Modeling. Chris Dyer Conditional Language Modeling Chris Dyer Unconditional LMs A language model assigns probabilities to sequences of words,. w =(w 1,w 2,...,w`) It is convenient to decompose this probability using the chain

More information

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017 Modelling Time Series with Neural Networks Volker Tresp Summer 2017 1 Modelling of Time Series The next figure shows a time series (DAX) Other interesting time-series: energy prize, energy consumption,

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) Human Brain Neurons Input-Output Transformation Input Spikes Output Spike Spike (= a brief pulse) (Excitatory Post-Synaptic Potential)

More information

Forward algorithm vs. particle filtering

Forward algorithm vs. particle filtering Particle Filtering ØSometimes X is too big to use exact inference X may be too big to even store B(X) E.g. X is continuous X 2 may be too big to do updates ØSolution: approximate inference Track samples

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

Adversarial Training and Decoding Strategies for End-to-end Neural Conversation Models

Adversarial Training and Decoding Strategies for End-to-end Neural Conversation Models MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Adversarial Training and Decoding Strategies for End-to-end Neural Conversation Models Hori, T.; Wang, W.; Koji, Y.; Hori, C.; Harsham, B.A.;

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Deep Learning for Speech Recognition. Hung-yi Lee

Deep Learning for Speech Recognition. Hung-yi Lee Deep Learning for Speech Recognition Hung-yi Lee Outline Conventional Speech Recognition How to use Deep Learning in acoustic modeling? Why Deep Learning? Speaker Adaptation Multi-task Deep Learning New

More information

Presented By: Omer Shmueli and Sivan Niv

Presented By: Omer Shmueli and Sivan Niv Deep Speaker: an End-to-End Neural Speaker Embedding System Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu Presented By: Omer Shmueli and Sivan

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Machine Learning for Structured Prediction

Machine Learning for Structured Prediction Machine Learning for Structured Prediction Grzegorz Chrupa la National Centre for Language Technology School of Computing Dublin City University NCLT Seminar Grzegorz Chrupa la (DCU) Machine Learning for

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

CSC321 Lecture 10 Training RNNs

CSC321 Lecture 10 Training RNNs CSC321 Lecture 10 Training RNNs Roger Grosse and Nitish Srivastava February 23, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 10 Training RNNs February 23, 2015 1 / 18 Overview Last time, we saw

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Pushpak Bhattacharyya CSE Dept, IIT Patna and Bombay LSTM 15 jun, 2017 lgsoft:nlp:lstm:pushpak 1 Recap 15 jun, 2017 lgsoft:nlp:lstm:pushpak 2 Feedforward Network and Backpropagation

More information

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence Fall 2011 CS 188: Artificial Intelligence Fall 2011 Lecture 20: HMMs / Speech / ML 11/8/2011 Dan Klein UC Berkeley Today HMMs Demo bonanza! Most likely explanation queries Speech recognition A massive HMM! Details

More information

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure

More information

Why DNN Works for Acoustic Modeling in Speech Recognition?

Why DNN Works for Acoustic Modeling in Speech Recognition? Why DNN Works for Acoustic Modeling in Speech Recognition? Prof. Hui Jiang Department of Computer Science and Engineering York University, Toronto, Ont. M3J 1P3, CANADA Joint work with Y. Bao, J. Pan,

More information

Seq2Seq Losses (CTC)

Seq2Seq Losses (CTC) Seq2Seq Losses (CTC) Jerry Ding & Ryan Brigden 11-785 Recitation 6 February 23, 2018 Outline Tasks suited for recurrent networks Losses when the output is a sequence Kinds of errors Losses to use CTC Loss

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Recurrent Neural Networks. deeplearning.ai. Why sequence models?

Recurrent Neural Networks. deeplearning.ai. Why sequence models? Recurrent Neural Networks deeplearning.ai Why sequence models? Examples of sequence data The quick brown fox jumped over the lazy dog. Speech recognition Music generation Sentiment classification There

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Deep Learning Autoencoder Models

Deep Learning Autoencoder Models Deep Learning Autoencoder Models Davide Bacciu Dipartimento di Informatica Università di Pisa Intelligent Systems for Pattern Recognition (ISPR) Generative Models Wrap-up Deep Learning Module Lecture Generative

More information

Spatial Transformation

Spatial Transformation Spatial Transformation Presented by Liqun Chen June 30, 2017 1 Overview 2 Spatial Transformer Networks 3 STN experiments 4 Recurrent Models of Visual Attention (RAM) 5 Recurrent Models of Visual Attention

More information

Lecture 6: Neural Networks for Representing Word Meaning

Lecture 6: Neural Networks for Representing Word Meaning Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,

More information

Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation

Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation Dr. Yanjun Qi Department of Computer Science University of Virginia Tutorial @ ACM BCB-2018 8/29/18 Yanjun Qi / UVA

More information

EE-559 Deep learning Recurrent Neural Networks

EE-559 Deep learning Recurrent Neural Networks EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret https://fleuret.org/ee559/ Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Language Models. Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Language Models Tobias Scheffer Stochastic Language Models A stochastic language model is a probability distribution over words.

More information

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient

More information

Recurrent Neural Networks

Recurrent Neural Networks Charu C. Aggarwal IBM T J Watson Research Center Yorktown Heights, NY Recurrent Neural Networks Neural Networks and Deep Learning, Springer, 218 Chapter 7.1 7.2 The Challenges of Processing Sequences Conventional

More information

What s so Hard about Natural Language Understanding?

What s so Hard about Natural Language Understanding? What s so Hard about Natural Language Understanding? Alan Ritter Computer Science and Engineering The Ohio State University Collaborators: Jiwei Li, Dan Jurafsky (Stanford) Bill Dolan, Michel Galley, Jianfeng

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Deep Learning and Lexical, Syntactic and Semantic Analysis. Wanxiang Che and Yue Zhang

Deep Learning and Lexical, Syntactic and Semantic Analysis. Wanxiang Che and Yue Zhang Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10 Part 2: Introduction to Deep Learning Part 2.1: Deep Learning Background What is Machine Learning? From Data

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Vinod Variyam and Ian Goodfellow) sscott@cse.unl.edu 2 / 35 All our architectures so far work on fixed-sized inputs neural networks work on sequences of inputs E.g., text, biological

More information

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered

More information

Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks

Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks INTERSPEECH 2014 Boundary Contraction Training for Acoustic Models based on Discrete Deep Neural Networks Ryu Takeda, Naoyuki Kanda, and Nobuo Nukaga Central Research Laboratory, Hitachi Ltd., 1-280, Kokubunji-shi,

More information

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions 2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors: S. Scardapane, S. Van Vaerenbergh,

More information

Lecture 3: ASR: HMMs, Forward, Viterbi

Lecture 3: ASR: HMMs, Forward, Viterbi Original slides by Dan Jurafsky CS 224S / LINGUIST 285 Spoken Language Processing Andrew Maas Stanford University Spring 2017 Lecture 3: ASR: HMMs, Forward, Viterbi Fun informative read on phonetics The

More information

APPLIED DEEP LEARNING PROF ALEXIEI DINGLI

APPLIED DEEP LEARNING PROF ALEXIEI DINGLI APPLIED DEEP LEARNING PROF ALEXIEI DINGLI TECH NEWS TECH NEWS HOW TO DO IT? TECH NEWS APPLICATIONS TECH NEWS TECH NEWS NEURAL NETWORKS Interconnected set of nodes and edges Designed to perform complex

More information

Neural Networks in Structured Prediction. November 17, 2015

Neural Networks in Structured Prediction. November 17, 2015 Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving

More information

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms Recognition of Visual Speech Elements Using Adaptively Boosted Hidden Markov Models. Say Wei Foo, Yong Lian, Liang Dong. IEEE Transactions on Circuits and Systems for Video Technology, May 2004. Shankar

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Introduction to RNNs!

Introduction to RNNs! Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed Outline Why Recurrent Neural Networks (RNNs)? The Vanilla RNN unit The RNN forward pass Backpropagation refresher The RNN

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Info 59/259 Lecture 4: Text classification 3 (Sept 5, 207) David Bamman, UC Berkeley . https://www.forbes.com/sites/kevinmurnane/206/04/0/what-is-deep-learning-and-how-is-it-useful

More information