NEURAL LANGUAGE MODELS

Size: px
Start display at page:

Download "NEURAL LANGUAGE MODELS"

Transcription

1 COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS

2 LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g., n = 3, a trigram model!(# $, # %,.. # & ) = & *+$!(# * # *-% # *-$ ) Training (estimation) from frequency counts Difficulty with rare events smoothing But are there better solutions? Neural networks (deep learning) a popular alternative.

3 OUTLINE Neural network fundamentals Feed-forward neural language models Recurrent neural language models

4 LMS AS CLASSIFIERS LMs can be considered simple classifiers, e.g. trigram model! " # " #$% = ()", w -$. = /0ts ) classifies the likely next word in a sequence. Has a parameter for every combination of " #$%, w -$., " # Can think of this as a specific type of classifier one with a very simple parameterisation.

5 BIGRAM LM IN PICTURES 1 1 V eats P(grass eats) w i-1 Each word type assigned an id number between 1..V (vocabulary size) V grass Matrix stores bigram LM parameters, i.e., conditional probabilities w i Total of V 2 parameters

6 INFERENCE AS MATRIX MULTIPLICATION Using parameter matrix, can query with simple linear algebra eats as a 1-hot vector x P(grass eats) = P(w i w i-1 =eats) Parameters

7 FROM BIGRAMS TO TRIGRAMS Problems too many parameters (V 3 ) to be learned from data (growing with LM order) parameters completely separate for different trigrams even when sharing words, e.g., (cow, eats) vs (farmer, eats) [motivating smoothing] doesn t recognise similar words e.g., cow vs sheep, eats vs chews w i-2 sheep cow farmer grass eats P(grass cow,eats) w i-1 chews

8 CAN T WE USE WORD EMBEDDINGS? eats as a 1-hot vector V x P(grass eats) = P(w i w i-1 =eats) V Parameters V x V embedding of eats (dense) d x = d unembed, softmax P(w i w i-1 =eats) V Parameters d x d

9 LOG-BILINEAR LM Parameters: embedding matrix (or 2 matrices, for input & output) of size V x d weights now d x d If d V then considerable saving vs V 2 d chosen as parameter typically in [100, 1000] Model is called the log-bilinear LM, and is closely related to word2vec

10 FEED FORWARD NEURAL NET LMS Neural networks more general approach, based on same principle embed input context words transform in hidden space un-embed to prob over full vocab Neural network used to define transformations e.g., feed forward LM (FFLM) w i-1 as 1-hot vector w i-2 as 1-hot vector V V embedding (lookup) embedding (lookup) d d a function d T unembed, softmax P(w i ) V

11 INTERIM SUMMARY Ngram LMs cheap to train (just compute counts) but too many parameters, problems with sparsity and scaling to larger contexts don t adequately capture properties of words (grammatical and semantic similarity), e.g., film vs movie NNLMs more robust force words through low-dimensional embeddings automatically capture word properties, leading to more robust estimates

12 NEURAL NETWORKS Deep neural networks provide mechanism for learning richer models. Based on vector embeddings and compositional functions over these vectors. Word embeddings capture grammatical and semantic similarity cows ~ sheep, eats ~ chews etc. Vector composition can allow for combinations of features to be learned (e.g., humans consume meat) Limit size of vector representation to keep model capacity under control.

13 COMPONENTS OF NN CLASSIFIER NN = Neural Network a.k.a. artificial NN, deep learning, multilayer perceptron Composed of simple functions of vector-valued inputs y 1 y 2 y dout U h 1 h 2 h 3 h dh W b x 1 x 2 x din +1 Fig JM3 Ch 8

14 <latexit sha1_base64="vvtrjdgkfur43jqf10s1hlwks/e=">aaace3icbvdlsgmxfm34rpu16tjnsaivqpmkoc6eohuxfawtdiyhk2y6sznmknxrs+lhupfx3lhqcevgnx9j+lho64edh3pujbknsaxx4djf1tz8wulscm4lv7q2vrfpb23f6crtlnvpihlvdihmgktwbw6cnvpfsbwi1gi6f8o8cceu5om8hl7kvjh0ja85jwas3y5f+ay7qgsexcfckgjxz7f/i+8nhwxlomcu4p0idny74jsdefcsqexeau1q8+0vt53qlgysqcbatypocl6fkobuseheztrlce2sdmszkunmtncfhtxa+8zp4zbrhhlwyp290sex1r04mjmxguhpz0pzv6yvqxji9blmm2csjh8km4ehwcogcjsrrkh0jcbucfnxtcoicaxty96uujk+evbud8unzefqqfa9n7srq7todxvrbr2jkrpenvrhfd2iz/sk3qwn68v6tz7go3pwzgch/yh1+qo67jxs</latexit> <latexit sha1_base64="vvtrjdgkfur43jqf10s1hlwks/e=">aaace3icbvdlsgmxfm34rpu16tjnsaivqpmkoc6eohuxfawtdiyhk2y6sznmknxrs+lhupfx3lhqcevgnx9j+lho64edh3pujbknsaxx4djf1tz8wulscm4lv7q2vrfpb23f6crtlnvpihlvdihmgktwbw6cnvpfsbwi1gi6f8o8cceu5om8hl7kvjh0ja85jwas3y5f+ay7qgsexcfckgjxz7f/i+8nhwxlomcu4p0idny74jsdefcsqexeau1q8+0vt53qlgysqcbatypocl6fkobuseheztrlce2sdmszkunmtncfhtxa+8zp4zbrhhlwyp290sex1r04mjmxguhpz0pzv6yvqxji9blmm2csjh8km4ehwcogcjsrrkh0jcbucfnxtcoicaxty96uujk+evbud8unzefqqfa9n7srq7todxvrbr2jkrpenvrhfd2iz/sk3qwn68v6tz7go3pwzgch/yh1+qo67jxs</latexit> <latexit sha1_base64="vvtrjdgkfur43jqf10s1hlwks/e=">aaace3icbvdlsgmxfm34rpu16tjnsaivqpmkoc6eohuxfawtdiyhk2y6sznmknxrs+lhupfx3lhqcevgnx9j+lho64edh3pujbknsaxx4djf1tz8wulscm4lv7q2vrfpb23f6crtlnvpihlvdihmgktwbw6cnvpfsbwi1gi6f8o8cceu5om8hl7kvjh0ja85jwas3y5f+ay7qgsexcfckgjxz7f/i+8nhwxlomcu4p0idny74jsdefcsqexeau1q8+0vt53qlgysqcbatypocl6fkobuseheztrlce2sdmszkunmtncfhtxa+8zp4zbrhhlwyp290sex1r04mjmxguhpz0pzv6yvqxji9blmm2csjh8km4ehwcogcjsrrkh0jcbucfnxtcoicaxty96uujk+evbud8unzefqqfa9n7srq7todxvrbr2jkrpenvrhfd2iz/sk3qwn68v6tz7go3pwzgch/yh1+qo67jxs</latexit> NN UNITS Each unit is a function given input x, computes real-value (scalar) h 0 1 h = X j w j x j + ba scales input (with weights, w) and adds offset (bias, b) applies non-linear function, such as logistic sigmoid, hyperbolic sigmoid (tanh), or rectified linear unit

15 <latexit sha1_base64="afm6umpq4lc1x8gat7ozmgrc8+k=">aaacgnicbzbns8naeiy3flu/qh69lbzbeupabpugfl14vlbwaerzbcfn0s0m7e6kjer/epgvepgg4k28+g/cfhzuordw8l47zmzrj1iydn0vz2z2bn5hcwm5slk6tr5r3ny6nxgqodr5lgn95zmduiioo0ajd4kgfvksgn7vyug3+qcninundhjorayrrca4qyu1i1wvdzwlc3pgpwqqpj6eapdpg46m+5wejsnpqadfn8sddrhklt1r0wmotkbejnxvln54nzinesjkkhntrlgjtjkmuxajecfldssm91gxmhyvi8c0stftod2zsocgsbzpir2ppzsyfhkziox6exhd0pz1huj/xjpf4ksvczwkciqpbwwppbjtyvc0izrwlamljgthd6u8zjpxthewbaivvydpq71api2710el2vkkjswyq3bjpqmqy1ijl+sk1aknd+sjvjbx59f5dt6c9/hxgwfss01+lfp5dw/roai=</latexit> <latexit sha1_base64="afm6umpq4lc1x8gat7ozmgrc8+k=">aaacgnicbzbns8naeiy3flu/qh69lbzbeupabpugfl14vlbwaerzbcfn0s0m7e6kjer/epgvepgg4k28+g/cfhzuordw8l47zmzrj1iydn0vz2z2bn5hcwm5slk6tr5r3ny6nxgqodr5lgn95zmduiioo0ajd4kgfvksgn7vyug3+qcninundhjorayrrca4qyu1i1wvdzwlc3pgpwqqpj6eapdpg46m+5wejsnpqadfn8sddrhklt1r0wmotkbejnxvln54nzinesjkkhntrlgjtjkmuxajecfldssm91gxmhyvi8c0stftod2zsocgsbzpir2ppzsyfhkziox6exhd0pz1huj/xjpf4ksvczwkciqpbwwppbjtyvc0izrwlamljgthd6u8zjpxthewbaivvydpq71api2710el2vkkjswyq3bjpqmqy1ijl+sk1aknd+sjvjbx59f5dt6c9/hxgwfss01+lfp5dw/roai=</latexit> <latexit sha1_base64="afm6umpq4lc1x8gat7ozmgrc8+k=">aaacgnicbzbns8naeiy3flu/qh69lbzbeupabpugfl14vlbwaerzbcfn0s0m7e6kjer/epgvepgg4k28+g/cfhzuordw8l47zmzrj1iydn0vz2z2bn5hcwm5slk6tr5r3ny6nxgqodr5lgn95zmduiioo0ajd4kgfvksgn7vyug3+qcninundhjorayrrca4qyu1i1wvdzwlc3pgpwqqpj6eapdpg46m+5wejsnpqadfn8sddrhklt1r0wmotkbejnxvln54nzinesjkkhntrlgjtjkmuxajecfldssm91gxmhyvi8c0stftod2zsocgsbzpir2ppzsyfhkziox6exhd0pz1huj/xjpf4ksvczwkciqpbwwppbjtyvc0izrwlamljgthd6u8zjpxthewbaivvydpq71api2710el2vkkjswyq3bjpqmqy1ijl+sk1aknd+sjvjbx59f5dt6c9/hxgwfss01+lfp5dw/roai=</latexit> NEURAL NETWORK COMPONENTS Typically have several hidden units, i.e., 0 1 h i = X w ij x j + b i A j each with own weights (w i ) and bias term (b i ) can be expressed using matrix & vector operators where W is a matrix comprising the unit weight vectors, and b is a vector of all the bias terms and tanh applied element-wise to a vector Very efficient and simple implementation ~ h = tanh W ~x + ~ b

16 ANN IN PICTURES Pictorial representation of a single unit, for computing y from x z y a σ w 1 w 2 w 3 x 1 x 2 x 3 b +1 Typical networks have several units, and additional layers E.g., output layer, for classification target W U h 1 h 2 y 1 y 2 x 1 x 2 h 3 x din y dout h dh +1 b Figs JM3 Ch 8

17 BEHIND THE NON-LINEARITY Non-linearity lends expressive power beyond logistic regression (linear ANN is otherwise equivalent) y step(x) y y -2-2 tanh(x) relu(x) Non-linearity allows for complex functions to be learned e.g., logical operations: AND, OR, NOT etc. single hidden layer ANN is universal approximator: can represent any function with sufficiently large hidden state Fig from Neubig (2017)

18 EXAMPLE NETWORKS What do these units do? using the step activation function consider binary values for x 1 and x 2 h 1 h x 1 x 2 1

19 <latexit sha1_base64="ni/vzggiemypyfsdk3gv91wspla=">aaacw3icbvhratrafj3earex2lxxyzels6uilfkrqg9c0rcfkxi3sfmwyeznzuhkjszcfjeqn/stl/6ktripaoufgcm5c+beeyarlhqux9dbeg/v/op90uh08pdr46pxk6ffnamtweqyzexfxh0qqtehsqovkou8zbtos8vpnt6/quuk0d9ow+gy5bstcyk4ewo1dukviqzo4dvhsinralkfoz3ahhrlrwtvdihribvyu9brsnoop7y7m6nqcjjw8xibz3iqubcn7yrw0+hghq3gk3ga9wv3wwwaezbu+wr8m10buzeossju3giwv7rsucupflzrwjusuljkg1x42m3hlk0ftgvhnlldbqw/mqbn/3y0vhruw/r1jktohbutdet/tevn+ftli3vve2qxa5txcshalzsspuvbausbf1b6wueu3hjb/j+6ega3v74lkrftd9p467vj2achjrf7wv6yezzjp+ymfwhnlggcxbpfwsg4ch6fe2euhu6uhshgecb+qfd5hyd2s3y=</latexit> COUPLING THE OUTPUT LAYER To make this into a classifier, need to produce a classification output e.g., vector of probabilities for the next word Add another layer, which takes h as input, and maps to classification target (e.g., V) to ensure probabilities, apply softmax transform softmax applies exponential, and normalises, i.e., applied to vector v apple exp(v1 ) P i exp(v i), exp(v 2 ) P i exp(v i),... exp(v m ) P i exp(v i) W U y 1 h 1 h 2 x 1 x 2 y 2 h 3 x din y dout ~ h = tanh W ~x + ~ b ~y = softmax U ~ h h dh +1 b

20 DEEP STRUCTURES Can stack several hidden layers; e.g., 1. map from 1-hot words, w, to word embeddings, e (lookup) 2. transform e to hidden state h 1 (with non-linearity) 3. transform h 1 to hidden state h 2 (with non-linearity) 4. repeat 5. transform h n, to output classification space y (with softmax) Each layer typically fully-connected to next lower layer, i.e., each unit is connected to all input elements Depth allows for more complex functions, e.g., longer logical expressions

21 LEARNING WITH SGD How to learn the parameters from data? parameters = sets of weights, bias, embeddings Consider how well the model fits the training data, in terms of the probability it assigns to the correct output e.g., % "#$ &(( " ( "*+ ( "*$ ) for sequence of m words want to maximise this probability, equivalently minimise its negative log -log P is known as the loss function Trained using gradient based methods, much like logistic regression (tools like tensorflow, theano, torch etc use autodiff to compute gradients automatically)

22 FFNNLM Application to language modelling i-th output = P(w t = i context) softmax most computation here tanh C(w t n+1 )... Table look up in C... C(w t 2 ) C(w t 1 ) Matrix C shared parameters across words w t n+1 w t 2 index for index for index for w t 1 Bengio et al, 2003

23 RECURRENT NNLMS What if we structure the network differently, e.g., according to sequence with Recurrent Neural Networks (RNNs)

24 RECURRENT NNLMS Start with initial hidden state h 0 For each word, w i, in order i=1..m embed word to produce vector, e i compute hidden! " = tanh() * " +,! "-. + /) compute output P(2 "3. ) = softmax(9! " + :) Train such to minimise " log P(2 " ) to learn parameters W, V, U, b, c, h 0

25 RNNS Can results in very deep networks, difficult to train due to gradient explosion or vanishing variant RNNs designed to behave better, e.g., GRU, LSTM RNNs used widely as sentence encodings RNN processes sentence, word at a time, use final state as fixed dimensional representation of sentence (of any length) Can also run another RNN over reversed sentence, and concatenate both final representations Used in translation, summarisation, generation, text classification, and more

26 FINAL WORDS NNet models Robust to word variation, typos, etc Excellent generalization, especially RNNs Flexible forms the basis for many other models (translation, summarization, generation, tagging, etc) Cons Much slower than counts but hardware acceleration Need to limit vocabulary, not so good with rare words Not as good at memorizing fixed sequences Data hungry, not so good on tiny data sets

27 REQUIRED READING J&M3 Ch. 8 Neubig 2017, Neural Machine Translation and Sequence-to-sequence Models: A Tutorial, Sections 5 & 6

Lecture 5 Neural models for NLP

Lecture 5 Neural models for NLP CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Neural Network Language Modeling

Neural Network Language Modeling Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith Course Project Sign up your course project In-class presentation

More information

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as

More information

Lecture 17: Neural Networks and Deep Learning

Lecture 17: Neural Networks and Deep Learning UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions

More information

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017

Deep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017 Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Info 59/259 Lecture 4: Text classification 3 (Sept 5, 207) David Bamman, UC Berkeley . https://www.forbes.com/sites/kevinmurnane/206/04/0/what-is-deep-learning-and-how-is-it-useful

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Info 159/259 Lecture 7: Language models 2 (Sept 14, 2017) David Bamman, UC Berkeley Language Model Vocabulary V is a finite set of discrete symbols (e.g., words, characters);

More information

ECE521 Lecture 7/8. Logistic Regression

ECE521 Lecture 7/8. Logistic Regression ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Sequence Modeling with Neural Networks

Sequence Modeling with Neural Networks Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

Natural Language Processing and Recurrent Neural Networks

Natural Language Processing and Recurrent Neural Networks Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information

More information

Deep Learning for NLP

Deep Learning for NLP Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Slide credit from Hung-Yi Lee & Richard Socher

Slide credit from Hung-Yi Lee & Richard Socher Slide credit from Hung-Yi Lee & Richard Socher 1 Review Recurrent Neural Network 2 Recurrent Neural Network Idea: condition the neural network on all previous words and tie the weights at each time step

More information

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1) 11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes

More information

Lecture 6: Neural Networks for Representing Word Meaning

Lecture 6: Neural Networks for Representing Word Meaning Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,

More information

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook Recap Standard RNNs Training: Backpropagation Through Time (BPTT) Application to sequence modeling Language modeling Applications: Automatic speech

More information

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single

More information

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions? Online Videos FERPA Sign waiver or sit on the sides or in the back Off camera question time before and after lecture Questions? Lecture 1, Slide 1 CS224d Deep NLP Lecture 4: Word Window Classification

More information

Recurrent Neural Networks. COMP-550 Oct 5, 2017

Recurrent Neural Networks. COMP-550 Oct 5, 2017 Recurrent Neural Networks COMP-550 Oct 5, 2017 Outline Introduction to neural networks and deep learning Feedforward neural networks Recurrent neural networks 2 Classification Review y = f( x) output label

More information

DISTRIBUTIONAL SEMANTICS

DISTRIBUTIONAL SEMANTICS COMP90042 LECTURE 4 DISTRIBUTIONAL SEMANTICS LEXICAL DATABASES - PROBLEMS Manually constructed Expensive Human annotation can be biased and noisy Language is dynamic New words: slangs, terminology, etc.

More information

Neural Networks Language Models

Neural Networks Language Models Neural Networks Language Models Philipp Koehn 10 October 2017 N-Gram Backoff Language Model 1 Previously, we approximated... by applying the chain rule p(w ) = p(w 1, w 2,..., w n ) p(w ) = i p(w i w 1,...,

More information

Part-of-Speech Tagging + Neural Networks 2 CS 287

Part-of-Speech Tagging + Neural Networks 2 CS 287 Part-of-Speech Tagging + Neural Networks 2 CS 287 Review: Bilinear Model Bilinear model, ŷ = f ((x 0 W 0 )W 1 + b) x 0 R 1 d 0 start with one-hot. W 0 R d 0 d in, d 0 = F W 1 R d in d out, b R 1 d out

More information

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued) Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 3: Introduction to Deep Learning (continued) Course Logistics - Update on course registrations - 6 seats left now -

More information

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018

Deep Learning Sequence to Sequence models: Attention Models. 17 March 2018 Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:

More information

Task-Oriented Dialogue System (Young, 2000)

Task-Oriented Dialogue System (Young, 2000) 2 Review Task-Oriented Dialogue System (Young, 2000) 3 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Speech Recognition Hypothesis are there any action movies to see

More information

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph

More information

Introduction to RNNs!

Introduction to RNNs! Introduction to RNNs Arun Mallya Best viewed with Computer Modern fonts installed Outline Why Recurrent Neural Networks (RNNs)? The Vanilla RNN unit The RNN forward pass Backpropagation refresher The RNN

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep

More information

Homework 3 COMS 4705 Fall 2017 Prof. Kathleen McKeown

Homework 3 COMS 4705 Fall 2017 Prof. Kathleen McKeown Homework 3 COMS 4705 Fall 017 Prof. Kathleen McKeown The assignment consists of a programming part and a written part. For the programming part, make sure you have set up the development environment as

More information

EE-559 Deep learning Recurrent Neural Networks

EE-559 Deep learning Recurrent Neural Networks EE-559 Deep learning 11.1. Recurrent Neural Networks François Fleuret https://fleuret.org/ee559/ Sun Feb 24 20:33:31 UTC 2019 Inference from sequences François Fleuret EE-559 Deep learning / 11.1. Recurrent

More information

Natural Language Processing with Deep Learning CS224N/Ling284

Natural Language Processing with Deep Learning CS224N/Ling284 Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Richard Socher Organization Main midterm: Feb 13 Alternative midterm: Friday Feb

More information

Structured Neural Networks (I)

Structured Neural Networks (I) Structured Neural Networks (I) CS 690N, Spring 208 Advanced Natural Language Processing http://peoplecsumassedu/~brenocon/anlp208/ Brendan O Connor College of Information and Computer Sciences University

More information

Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig

Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Slides credit: Graham Neubig Outline Perceptron: recap and limitations Neural networks Multi-layer perceptron Forward propagation

More information

Deep Learning Recurrent Networks 2/28/2018

Deep Learning Recurrent Networks 2/28/2018 Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good

More information

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1 Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost

More information

Deep Learning for NLP

Deep Learning for NLP Deep Learning for NLP Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Greg Durrett Outline Motivation for neural networks Feedforward neural networks Applying feedforward neural networks

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In

More information

Neural Networks in Structured Prediction. November 17, 2015

Neural Networks in Structured Prediction. November 17, 2015 Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving

More information

arxiv: v3 [cs.lg] 14 Jan 2018

arxiv: v3 [cs.lg] 14 Jan 2018 A Gentle Tutorial of Recurrent Neural Network with Error Backpropagation Gang Chen Department of Computer Science and Engineering, SUNY at Buffalo arxiv:1610.02583v3 [cs.lg] 14 Jan 2018 1 abstract We describe

More information

Conditional Language modeling with attention

Conditional Language modeling with attention Conditional Language modeling with attention 2017.08.25 Oxford Deep NLP 조수현 Review Conditional language model: assign probabilities to sequence of words given some conditioning context x What is the probability

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information

Assignment 1. Learning distributed word representations. Jimmy Ba

Assignment 1. Learning distributed word representations. Jimmy Ba Assignment 1 Learning distributed word representations Jimmy Ba csc321ta@cstorontoedu Background Text and language play central role in a wide range of computer science and engineering problems Applications

More information

CSC321 Lecture 16: ResNets and Attention

CSC321 Lecture 16: ResNets and Attention CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the

More information

Long-Short Term Memory and Other Gated RNNs

Long-Short Term Memory and Other Gated RNNs Long-Short Term Memory and Other Gated RNNs Sargur Srihari srihari@buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Sequence Modeling

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the

More information

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Neural Networks 2. 2 Receptive fields and dealing with image inputs CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There

More information

An overview of word2vec

An overview of word2vec An overview of word2vec Benjamin Wilson Berlin ML Meetup, July 8 2014 Benjamin Wilson word2vec Berlin ML Meetup 1 / 25 Outline 1 Introduction 2 Background & Significance 3 Architecture 4 CBOW word representations

More information

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 10: Neural Language Models Princeton University COS 495 Instructor: Yingyu Liang Natural language Processing (NLP) The processing of the human languages by computers One of

More information

CSC321 Lecture 15: Exploding and Vanishing Gradients

CSC321 Lecture 15: Exploding and Vanishing Gradients CSC321 Lecture 15: Exploding and Vanishing Gradients Roger Grosse Roger Grosse CSC321 Lecture 15: Exploding and Vanishing Gradients 1 / 23 Overview Yesterday, we saw how to compute the gradient descent

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

Recurrent Neural Networks. Jian Tang

Recurrent Neural Networks. Jian Tang Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating

More information

Feedforward Neural Networks. Michael Collins, Columbia University

Feedforward Neural Networks. Michael Collins, Columbia University Feedforward Neural Networks Michael Collins, Columbia University Recap: Log-linear Models A log-linear model takes the following form: p(y x; v) = exp (v f(x, y)) y Y exp (v f(x, y )) f(x, y) is the representation

More information

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks INFOB2KI 2017-2018 Utrecht University The Netherlands ARTIFICIAL INTELLIGENCE Artificial Neural Networks Lecturer: Silja Renooij These slides are part of the INFOB2KI Course Notes available from www.cs.uu.nl/docs/vakken/b2ki/schema.html

More information

Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function

Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function Austin Wang Adviser: Xiuyuan Cheng May 4, 2017 1 Abstract This study analyzes how simple recurrent neural

More information

Feed-forward Network Functions

Feed-forward Network Functions Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

Neural Architectures for Image, Language, and Speech Processing

Neural Architectures for Image, Language, and Speech Processing Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent

More information

NLP Programming Tutorial 8 - Recurrent Neural Nets

NLP Programming Tutorial 8 - Recurrent Neural Nets NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Feed Forward Neural Nets All connections point forward ϕ( x) y It is a directed acyclic

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Vinod Variyam and Ian Goodfellow) sscott@cse.unl.edu 2 / 35 All our architectures so far work on fixed-sized inputs neural networks work on sequences of inputs E.g., text, biological

More information

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves

Recurrent Neural Networks Deep Learning Lecture 5. Efstratios Gavves Recurrent Neural Networks Deep Learning Lecture 5 Efstratios Gavves Sequential Data So far, all tasks assumed stationary data Neither all data, nor all tasks are stationary though Sequential Data: Text

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Language models Based on slides from Michael Collins, Chris Manning and Richard Soccer Plan Problem definition Trigram models Evaluation Estimation Interpolation Discounting

More information

Introduction to Deep Neural Networks

Introduction to Deep Neural Networks Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic

More information

Learning Deep Architectures for AI. Part I - Vijay Chakilam

Learning Deep Architectures for AI. Part I - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part I - Vijay Chakilam Chapter 0: Preliminaries Neural Network Models The basic idea behind the neural network approach is to model the response as a

More information

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018

Sequence Models. Ji Yang. Department of Computing Science, University of Alberta. February 14, 2018 Sequence Models Ji Yang Department of Computing Science, University of Alberta February 14, 2018 This is a note mainly based on Prof. Andrew Ng s MOOC Sequential Models. I also include materials (equations,

More information

Artificial Neuron (Perceptron)

Artificial Neuron (Perceptron) 9/6/208 Gradient Descent (GD) Hantao Zhang Deep Learning with Python Reading: https://en.wikipedia.org/wiki/gradient_descent Artificial Neuron (Perceptron) = w T = w 0 0 + + w 2 2 + + w d d where

More information

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan Boulder 1 of 39 HW1 turned in HW2 released Office

More information

Contents. (75pts) COS495 Midterm. (15pts) Short answers

Contents. (75pts) COS495 Midterm. (15pts) Short answers Contents (75pts) COS495 Midterm 1 (15pts) Short answers........................... 1 (5pts) Unequal loss............................. 2 (15pts) About LSTMs........................... 3 (25pts) Modular

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,

More information

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model Alex Lascarides (Slides based on those from Alex Lascarides, Sharon Goldwater and Philipop Koehn) 30 January

More information

Speech and Language Processing

Speech and Language Processing Speech and Language Processing Lecture 5 Neural network based acoustic and language models Information and Communications Engineering Course Takahiro Shinoaki 08//6 Lecture Plan (Shinoaki s part) I gives

More information

Deep Learning for NLP Part 2

Deep Learning for NLP Part 2 Deep Learning for NLP Part 2 CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) 2 Part 1.3: The Basics Word Representations The

More information

Dimensionality Reduction and Principle Components Analysis

Dimensionality Reduction and Principle Components Analysis Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture

More information

Machine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26

Machine Translation. 10: Advanced Neural Machine Translation Architectures. Rico Sennrich. University of Edinburgh. R. Sennrich MT / 26 Machine Translation 10: Advanced Neural Machine Translation Architectures Rico Sennrich University of Edinburgh R. Sennrich MT 2018 10 1 / 26 Today s Lecture so far today we discussed RNNs as encoder and

More information

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Google s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, et al. Google arxiv:1609.08144v2 Reviewed by : Bill

More information

Better Conditional Language Modeling. Chris Dyer

Better Conditional Language Modeling. Chris Dyer Better Conditional Language Modeling Chris Dyer Conditional LMs A conditional language model assigns probabilities to sequences of words, w =(w 1,w 2,...,w`), given some conditioning context, x. As with

More information

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo

More information

CSC321 Lecture 15: Recurrent Neural Networks

CSC321 Lecture 15: Recurrent Neural Networks CSC321 Lecture 15: Recurrent Neural Networks Roger Grosse Roger Grosse CSC321 Lecture 15: Recurrent Neural Networks 1 / 26 Overview Sometimes we re interested in predicting sequences Speech-to-text and

More information

Midterm sample questions

Midterm sample questions Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts

More information

CSC321 Lecture 4: Learning a Classifier

CSC321 Lecture 4: Learning a Classifier CSC321 Lecture 4: Learning a Classifier Roger Grosse Roger Grosse CSC321 Lecture 4: Learning a Classifier 1 / 31 Overview Last time: binary classification, perceptron algorithm Limitations of the perceptron

More information

Lecture 15: Exploding and Vanishing Gradients

Lecture 15: Exploding and Vanishing Gradients Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train

More information

Recurrent and Recursive Networks

Recurrent and Recursive Networks Neural Networks with Applications to Vision and Language Recurrent and Recursive Networks Marco Kuhlmann Introduction Applications of sequence modelling Map unsegmented connected handwriting to strings.

More information

Stochastic gradient descent; Classification

Stochastic gradient descent; Classification Stochastic gradient descent; Classification Steve Renals Machine Learning Practical MLP Lecture 2 28 September 2016 MLP Lecture 2 Stochastic gradient descent; Classification 1 Single Layer Networks MLP

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Introduction to Convolutional Neural Networks 2018 / 02 / 23 Introduction to Convolutional Neural Networks 2018 / 02 / 23 Buzzword: CNN Convolutional neural networks (CNN, ConvNet) is a class of deep, feed-forward (not recurrent) artificial neural networks that

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction

More information

text classification 3: neural networks

text classification 3: neural networks text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University

More information

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016

Machine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text

More information

CS224N: Natural Language Processing with Deep Learning Winter 2017 Midterm Exam

CS224N: Natural Language Processing with Deep Learning Winter 2017 Midterm Exam CS224N: Natural Language Processing with Deep Learning Winter 2017 Midterm Exam This examination consists of 14 printed sides, 5 questions, and 100 points. The exam accounts for 17% of your total grade.

More information