Part-of-Speech Tagging + Neural Networks 2 CS 287
|
|
- Alexina Casey
- 5 years ago
- Views:
Transcription
1 Part-of-Speech Tagging + Neural Networks 2 CS 287
2 Review: Bilinear Model Bilinear model, ŷ = f ((x 0 W 0 )W 1 + b) x 0 R 1 d 0 start with one-hot. W 0 R d 0 d in, d 0 = F W 1 R d in d out, b R 1 d out ; model parameters Notes: Bilinear parameter interaction. d 0 >> d in, e.g. d 0 = 10000, d in = 50
3 Review: Bilinear Model: Intuition (x 0 W 0 )W 1 + b [ ] w 0 1,1... w 0 0,d in... w 0 k,1... w 0 k,d in... w 0 d 0,1... w 0 d 0,d in w 1 1, w 1 0,d out w 1 d in, w 1 d in,d out
4 Review: Window Model Goal: predict t 5. Windowed word model. w 1 w 2 [w 3 w 4 w 5 w 6 w 7 ] w 8 w 3, w 4 ; left context w 5 ; Word of interest w 6, w 7 ; right context d win ; size of window (d win = 5)
5 Review: Dense Windowed BoW Features f 1,..., f dwin are words in window Input representation is the concatenation of embeddings Example: Tagging x = [v(f 1 ) v(f 2 )... v(f dwin )] w 1 w 2 [w 3 w 4 w 5 w 6 w 7 ] w 8 x = [v(w 3 ) v(w 4 ) v(w 5 ) v(w 6 ) v(w 7 )] x d in /5 d in /5 d in /5 d in /5 d in /5 Rows of W 1 encode position specific weights.
6 Quiz We are doing tagging with a windowed bilinear model with hinge-loss and no capitalization features. The model has d win = 5, d in = 50, d out = 40, and vocabulary size We are given the input window: The dog walked to the Unfortunately we incorrectly classify walked as NN as opposed to VP, in a bilinear model with a hinge-loss. What is the maximum number of parameters that receive a non-zero gradient?
7 Answer: [ ] w 0 1,1... w 0 0,d in w 0 the,1... w 0 the,d in. w 0 dog,1... w 0 dog,d in. w 0 walked,1... w 0 walked,d in. w 0 to,1... w 0 to,d in. w 0 the,1... w 0 the,d in. w 1 1,1... w 1 1,NN... w 1 1,VP w 1 0,d out w 1 d in,0... w 1 d in,nn... w 1 d in,vp w 1 d in,d out w 0 d 0,1... w 0 d 0,d in W 0 = 5 d in W 1 = d in 2
8 Part-of-Speech Tagging 1 Consider the following windowed model, and assume for now a linear model. [w 1 the w 3 w 4 w 5 ] What information do we have about the tag of w 3? What weight should the features values associated with the in position w 2 take?
9 Part-of-Speech Tagging 2 Next Consider the following windowed model, and assume for now a linear model. [w 1 w 2 w 3 dog w 5 ] What information do we have about the tag of w 3? What weight should the features values associated with dog in position w 4 take?
10 Part-of-Speech Tagging 3 Now finally consider the following windowed model, and assume for now a linear model. [w 1 the w 3 dog w 5 ] What information do we have about the tag of w 3? What weight would we want if we combined both the features values?
11 Contents Neural Networks Backpropagation
12
13 Neural Network One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the dense representation in R 1 d in W 1 R d in d hid, b 1 R 1 d hid; first affine transformation W 2 R d hid d out, b 2 R 1 d out ; second affine transformation g : R d hid d hid is an activation non-linearity (often pointwise) g(xw 1 + b 1 ) is the hidden layer
14 Schematic x 1 x 2 x 3. x din h 1 h 2. h dhid z 1 z 2 z 3. z dout
15 Non-Linearities: 0/1 0/1 function: 0/1(t) = 1(t > 0) 01((xW 1 + b 1 ) i ) Intuition: On, if above a threshold
16 Exercise Input layer to NN MLP1 is the sparse indicator features of the word at each position. Design a network to recognize [w 1 the w 3 dog w 5 ] Design a network to recognize where w 2 is not the [w 1 w 2 w 3 dog w 5 ]
17 Feature Conjunctions Many NLP tasks require conjunctive features, examples Sequence-based taggers look at last two-part of speech tags. Chinese part-of-speech taggers look at first character and last tag. Higher-level models (parses) look at tags of words and distances apart (example) For some natural language tasks, conjunctions are painstakingly hard. NNs: Capacity to learn conjunctions and feature combinations. Also possible with other convex models such as SVMs
18 Feature Conjunctions Many NLP tasks require conjunctive features, examples Sequence-based taggers look at last two-part of speech tags. Chinese part-of-speech taggers look at first character and last tag. Higher-level models (parses) look at tags of words and distances apart (example) For some natural language tasks, conjunctions are painstakingly hard. NNs: Capacity to learn conjunctions and feature combinations. Also possible with other convex models such as SVMs
19 Simple Antecedent/Pairwise Features Not Discriminative E.g., is [Lexus sales] the antecedent of [their sales]? Common pairwise features: String/Head Match, Sentences Between, Mention-Antecedent Numbers/Heads/Genders, etc. string-match=false head-match=true φ p ([their sales],[lexus sales]) = sentences-between=0 ment-ant-numbers=plur.,plur..
20 Dealing with the Feature Problem Finding discriminative features is a major challenge for coreference systems [Fernandes et al. 2012; Durrett and Klein 2013] Typical to define (or search for) feature conjunction-schemes to improve predictive performance [Fernandes et al. 2012; Durrett and Klein 2013; Björkelund and Kuhn 2014]. For instance: string-match(x, y) type(x) type(y) [Durrett and Klein 2013], where Nom. type(x) = Prop. citation-form(x) if x is nominal if x is proper if x is pronominal substring-match(head(x), y) substring-match(x, head(y)) coarse-type(y) coarse-type(x) [Björkelund and Kuhn 2014] Not just a problem for Mention Ranking systems!
21 Non-Linearities: 0/1 0/1 function: 0/1(t) = 1(t > 0) Issue: No gradient anywhere
22 Non-Linear Functions: Sigmoid Logistic sigmoid function: σ(t) = exp( t) σ((xw 1 + b 1 ) i ) Intuition: Each hidden dimension ( neuron ) is result of logistic regression.
23 Other Non-Linearities: ReLU Rectified Linear Unit: ReLU(t) = max{0, t} Intuition: Each hidden-unit gives activation margin No gradient (saturation) when below 0.
24 Saturation: Intuition x 1 x 2 x 3. x din h 1 h 2. h dhid z 1 z 2 z 3. z dout
25 Other Non-Linearities: Tanh Hyperbolic Tangeant: tanh(t) = exp(t) exp( t) exp(t) + exp( t) Intuition: Similar to sigmoid, but range between 0 and -1.
26 Other Non-Linearities: Hard Tanh Hyperbolic Tangeant: 1 t < 1 hardtanh(t) = t 1 t 1 1 t > 1 Intuition: Similar to sigmoid, but range between 0 and -1.
27 Other Non-Linearities: Cube Cube non-linearity (directly encourage parameter interaction): cube(t) = t 3 Intuition: Directly encourage higher-order interactions.
28 Tagging from Scratch
29 Function Approximator MLP1 is a universal approximator Can approximate with any desired non-zero amount of error a family of functions that include all continuous functions on a closed and bounded subset of R n, and any function mapping from any finite dimensional discrete space to another (YG) Caveats: Does not give size of hidden layer. Does not specify how hard this is to learn.
30 Deep Neural Networks (DNNs) Can stack MLPs, create deep fully connected networks, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 NN MLP2 (x) = g(nn MLP1 (x)w 1 + b 1 )W 2 + b 2 Can have multiple hidden layers, etc. Benefit: may be able to find better function Known to be harder to train (although other approaches)
31 Other Layers We will discuss many other neural network layers, convolutional attention-based gated layers...
32 Highway Network y : output from CharCNN Multilayer Perceptron z = g(wy + b) Highway Network (Srivastava, Greff, and Schmidhuber 2015) z = t g(w H y + b H ) + (1 t) y W H, b H : Affine transformation t = σ(w T y + b T ) : transform gate 1 t : carry gate Hierarchical, adaptive composition of character n-grams. Kim, Jernite, Sontag, Rush Character-Aware Neural Language Models 37 / 75
33 Highway Network Input to LSTM Input from CharCNN Kim, Jernite, Sontag, Rush Character-Aware Neural Language Models 38 / 75
34 Contents Neural Networks Backpropagation
35 Sequential Neural Network Sequential neural networks consist of a series of composed functions, Consider a vector-valued parameterized functions f 1,..., f k where f i (x; θ i ) : R n i 1 R n i ; function θ R d i ; function parameters Consider a scalar-valued loss function L(y, ŷ) where L(y, ) : R n k R; loss for input
36 Backpropagation Forward Step (f-prop): Compute L(f k (... f 1 (x 0 ))) Saving intermediary values f i (... f 1 (x 0 ))) Backward Step (b-prop): ni L f i (... f 1 (x 0 )) = f i+1 (... f 1 (x 0 )) j j=1 f i (... f 1 (x 0 )) L f i+1 (... f 1 (x 0 )) j n L i f i+1 (... f 1 (x = θ i 0 )) j L j=1 θ i f i+1 (... f 1 (x 0 )) j
37 Backpropagation Forward Step (f-prop): Compute L(f k (... f 1 (x 0 ))) Saving intermediary values f i (... f 1 (x 0 ))) Backward Step (b-prop): ni L f i (... f 1 (x 0 )) = f i+1 (... f 1 (x 0 )) j j=1 f i (... f 1 (x 0 )) L f i+1 (... f 1 (x 0 )) j n L i f i+1 (... f 1 (x = θ i 0 )) j L j=1 θ i f i+1 (... f 1 (x 0 )) j
38 Backpropagation Forward Step (f-prop): Compute L(f k (... f 1 (x 0 ))) Saving intermediary values f i (... f 1 (x 0 ))) Backward Step (b-prop): ni L f i (... f 1 (x 0 )) = f i+1 (... f 1 (x 0 )) j j=1 f i (... f 1 (x 0 )) L f i+1 (... f 1 (x 0 )) j n L i f i+1 (... f 1 (x = θ i 0 )) j L j=1 θ i f i+1 (... f 1 (x 0 )) j
39 Backpropagation: Data flow f i (... f 1 (x 0 )) f i+1 (f i (... f 1 (x 0 ))) L f i (... f 1 (x 0 )) f i+1 ( ; θ i+1 ) L θ i+1 L f i+1 (... f 1 (x 0 ))
40 Torch Implementation Torch uses declarative unit-based specification of NN Every function is a represented as a unit. Responsibilities: 1. Expose any parameters θ i+1 as tensors 2. Compute f i+1 (x, θ i+1 ) (fprop) 3. Compute any necessary state needed for bprop 4. Compute chain-rule given L f i+1 (...f 1 (x 0 )) and f i (... f 1 (x 0 )) 5. Compute parameter gradient L θ i+1 Contract: forward will always be called before backward.
41 Torch Units input f i (... f 1 (x 0 )) self.output f i+1 (f i (... f 1 (x 0 ))) weights f i+1 ( ; θ i+1 ) self.gradinput L f i (... f 1 (x 0 )) self.gradweight L θ i+1 gradoutput L f i+1 (... f 1 (x 0 ))
42 Loss Criterions prediction ŷ loss L(y, ŷ) weights L(, ) self.gradinput L ŷ target y
43 Torch Internals [Web]
44 Torch Units: Batch input f i (... f 1 (X 0 )) self.output f i+1 (f i (... f 1 (X 0 ))) f i+1 ( ; θ i+1 ) self.gradinput L f i (... f 1 (X 0 )) self.gradweight L θ i+1 gradoutput L f i+1 (... f 1 (X 0 ))
45 Loss Criterions prediction Ŷ loss L(Y, Ŷ) L(, ) self.gradinput L Ŷ target Y
46 Today Benefits of neural networks Training neural networks Next time: Pretraining and word embeddings
Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287
Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the
More informationRecurrent Neural Networks 2. CS 287 (Based on Yoav Goldberg s notes)
Recurrent Neural Networks 2 CS 287 (Based on Yoav Goldberg s notes) Review: Representation of Sequence Many tasks in NLP involve sequences w 1,..., w n Representations as matrix dense vectors X (Following
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationNEURAL LANGUAGE MODELS
COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationIntro to Neural Networks and Deep Learning
Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationCh.6 Deep Feedforward Networks (2/3)
Ch.6 Deep Feedforward Networks (2/3) 16. 10. 17. (Mon.) System Software Lab., Dept. of Mechanical & Information Eng. Woonggy Kim 1 Contents 6.3. Hidden Units 6.3.1. Rectified Linear Units and Their Generalizations
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationDeep Feedforward Networks
Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function
More informationLecture 11 Recurrent Neural Networks I
Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks
More informationIntroduction to Deep Neural Networks
Introduction to Deep Neural Networks Presenter: Chunyuan Li Pattern Classification and Recognition (ECE 681.01) Duke University April, 2016 Outline 1 Background and Preliminaries Why DNNs? Model: Logistic
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single
More informationRecurrent Neural Networks. COMP-550 Oct 5, 2017
Recurrent Neural Networks COMP-550 Oct 5, 2017 Outline Introduction to neural networks and deep learning Feedforward neural networks Recurrent neural networks 2 Classification Review y = f( x) output label
More informationDeep Feedforward Networks. Sargur N. Srihari
Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationtext classification 3: neural networks
text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University
More informationAdministration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6
Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects
More informationCS489/698: Intro to ML
CS489/698: Intro to ML Lecture 03: Multi-layer Perceptron Outline Failure of Perceptron Neural Network Backpropagation Universal Approximator 2 Outline Failure of Perceptron Neural Network Backpropagation
More information(2pts) What is the object being embedded (i.e. a vector representing this object is computed) when one uses
Contents (75pts) COS495 Midterm 1 (15pts) Short answers........................... 1 (5pts) Unequal loss............................. 2 (15pts) About LSTMs........................... 3 (25pts) Modular
More information4. Multilayer Perceptrons
4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output
More informationNeural Networks. Nicholas Ruozzi University of Texas at Dallas
Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan Boulder 1 of 39 HW1 turned in HW2 released Office
More informationDeep Learning Recurrent Networks 2/28/2018
Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good
More informationNatural Language Processing
Natural Language Processing Info 159/259 Lecture 7: Language models 2 (Sept 14, 2017) David Bamman, UC Berkeley Language Model Vocabulary V is a finite set of discrete symbols (e.g., words, characters);
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep
More informationLecture 11 Recurrent Neural Networks I
Lecture 11 Recurrent Neural Networks I CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor niversity of Chicago May 01, 2017 Introduction Sequence Learning with Neural Networks Some Sequence Tasks
More informationTask-Oriented Dialogue System (Young, 2000)
2 Review Task-Oriented Dialogue System (Young, 2000) 3 http://rsta.royalsocietypublishing.org/content/358/1769/1389.short Speech Signal Speech Recognition Hypothesis are there any action movies to see
More informationNeural Networks: Backpropagation
Neural Networks: Backpropagation Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others
More informationDeep Learning for NLP
Deep Learning for NLP Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Greg Durrett Outline Motivation for neural networks Feedforward neural networks Applying feedforward neural networks
More informationIntroduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen
Neural Networks - I Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 hic@cc.gatech.edu Henrik I Christensen (RIM@GT) Neural Networks 1 /
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I
Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In
More informationDeep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017
Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion
More informationMACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 4.1 Gradient-Based Learning: Back-Propagation and Multi-Module Systems Yann LeCun
Y. LeCun: Machine Learning and Pattern Recognition p. 1/2 MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 4.1 Gradient-Based Learning: Back-Propagation and Multi-Module Systems Yann LeCun The
More informationNeural Networks: Introduction
Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1
More informationPart-of-Speech Tagging + Neural Networks CS 287
Part-of-Speech Tagging + Neural Networks CS 287 Quiz Last class we focused on hinge loss. L hinge = max{0, 1 (ŷ c ŷ c )} Consider now the squared hinge loss, (also called l 2 SVM) L hinge 2 = max{0, 1
More informationFeedforward Neural Networks
Feedforward Neural Networks Michael Collins 1 Introduction In the previous notes, we introduced an important class of models, log-linear models. In this note, we describe feedforward neural networks, which
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationNeural Network Language Modeling
Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith Course Project Sign up your course project In-class presentation
More informationHomework 3 COMS 4705 Fall 2017 Prof. Kathleen McKeown
Homework 3 COMS 4705 Fall 017 Prof. Kathleen McKeown The assignment consists of a programming part and a written part. For the programming part, make sure you have set up the development environment as
More informationNeural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016
Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph
More informationy(x n, w) t n 2. (1)
Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,
More informationNatural Language Processing and Recurrent Neural Networks
Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information
More informationNeural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35
Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How
More informationNeural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav
Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1 Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationNeural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications
Neural Networks Bishop PRML Ch. 5 Alireza Ghane Neural Networks Alireza Ghane / Greg Mori 1 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of
More informationThe XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic
The XOR problem Fall 2013 x 2 Lecture 9, February 25, 2015 x 1 The XOR problem The XOR problem x 1 x 2 x 2 x 1 (picture adapted from Bishop 2006) It s the features, stupid It s the features, stupid The
More informationContents. (75pts) COS495 Midterm. (15pts) Short answers
Contents (75pts) COS495 Midterm 1 (15pts) Short answers........................... 1 (5pts) Unequal loss............................. 2 (15pts) About LSTMs........................... 3 (25pts) Modular
More informationMaking Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation
Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation Dr. Yanjun Qi Department of Computer Science University of Virginia Tutorial @ ACM BCB-2018 8/29/18 Yanjun Qi / UVA
More informationTopics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)
Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 3: Introduction to Deep Learning (continued) Course Logistics - Update on course registrations - 6 seats left now -
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationFeed-forward Network Functions
Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification
More informationStatistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks
Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks Jan Drchal Czech Technical University in Prague Faculty of Electrical Engineering Department of Computer Science Topics covered
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationBackpropagation and Neural Networks part 1. Lecture 4-1
Lecture 4: Backpropagation and Neural Networks part 1 Lecture 4-1 Administrative A1 is due Jan 20 (Wednesday). ~150 hours left Warning: Jan 18 (Monday) is Holiday (no class/office hours) Also note: Lectures
More informationNeural Networks in Structured Prediction. November 17, 2015
Neural Networks in Structured Prediction November 17, 2015 HWs and Paper Last homework is going to be posted soon Neural net NER tagging model This is a new structured model Paper - Thursday after Thanksgiving
More informationNeural Networks for NLP. COMP-599 Nov 30, 2016
Neural Networks for NLP COMP-599 Nov 30, 2016 Outline Neural networks and deep learning: introduction Feedforward neural networks word2vec Complex neural network architectures Convolutional neural networks
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationAnalysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function
Analysis of the Learning Process of a Recurrent Neural Network on the Last k-bit Parity Function Austin Wang Adviser: Xiuyuan Cheng May 4, 2017 1 Abstract This study analyzes how simple recurrent neural
More informationMachine Learning for Signal Processing Neural Networks Continue. Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016
Machine Learning for Signal Processing Neural Networks Continue Instructor: Bhiksha Raj Slides by Najim Dehak 1 Dec 2016 1 So what are neural networks?? Voice signal N.Net Transcription Image N.Net Text
More informationAdvanced Machine Learning
Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear
More informationDeep Learning and Lexical, Syntactic and Semantic Analysis. Wanxiang Che and Yue Zhang
Deep Learning and Lexical, Syntactic and Semantic Analysis Wanxiang Che and Yue Zhang 2016-10 Part 2: Introduction to Deep Learning Part 2.1: Deep Learning Background What is Machine Learning? From Data
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationMachine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler
+ Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions
More informationCSC321 Lecture 7 Neural language models
CSC321 Lecture 7 Neural language models Roger Grosse and Nitish Srivastava February 1, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 7 Neural language models February 1, 2015 1 / 19 Overview We
More informationStatistical Machine Learning from Data
January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole
More informationTTIC 31230, Fundamentals of Deep Learning David McAllester, April Vanishing and Exploding Gradients. ReLUs. Xavier Initialization
TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Vanishing and Exploding Gradients ReLUs Xavier Initialization Batch Normalization Highway Architectures: Resnets, LSTMs and GRUs Causes
More informationIntroduction to Machine Learning Spring 2018 Note Neural Networks
CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,
More informationOptmization Methods for Machine Learning Beyond Perceptron Feed Forward neural networks (FFN)
Optmization Methods for Machine Learning Beyond Perceptron Feed Forward neural networks (FFN) Laura Palagi http://www.dis.uniroma1.it/ palagi Dipartimento di Ingegneria informatica automatica e gestionale
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline
More informationOther Topologies. Y. LeCun: Machine Learning and Pattern Recognition p. 5/3
Y. LeCun: Machine Learning and Pattern Recognition p. 5/3 Other Topologies The back-propagation procedure is not limited to feed-forward cascades. It can be applied to networks of module with any topology,
More informationNeural Networks and the Back-propagation Algorithm
Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationLecture 4 Backpropagation
Lecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 5, 2017 Things we will look at today More Backpropagation Still more backpropagation Quiz
More informationArtificial Neuron (Perceptron)
9/6/208 Gradient Descent (GD) Hantao Zhang Deep Learning with Python Reading: https://en.wikipedia.org/wiki/gradient_descent Artificial Neuron (Perceptron) = w T = w 0 0 + + w 2 2 + + w d d where
More informationDeep Learning for NLP
Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning
More informationCSC321 Lecture 5: Multilayer Perceptrons
CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3
More informationSequence Modeling with Neural Networks
Sequence Modeling with Neural Networks Harini Suresh y 0 y 1 y 2 s 0 s 1 s 2... x 0 x 1 x 2 hat is a sequence? This morning I took the dog for a walk. sentence medical signals speech waveform Successes
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationNatural Language Processing
Natural Language Processing Info 59/259 Lecture 4: Text classification 3 (Sept 5, 207) David Bamman, UC Berkeley . https://www.forbes.com/sites/kevinmurnane/206/04/0/what-is-deep-learning-and-how-is-it-useful
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More information11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)
11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes
More informationLecture 6: Neural Networks for Representing Word Meaning
Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,
More informationNeural Architectures for Image, Language, and Speech Processing
Neural Architectures for Image, Language, and Speech Processing Karl Stratos June 26, 2018 1 / 31 Overview Feedforward Networks Need for Specialized Architectures Convolutional Neural Networks (CNNs) Recurrent
More informationNatural Language Processing
Natural Language Processing Pushpak Bhattacharyya CSE Dept, IIT Patna and Bombay LSTM 15 jun, 2017 lgsoft:nlp:lstm:pushpak 1 Recap 15 jun, 2017 lgsoft:nlp:lstm:pushpak 2 Feedforward Network and Backpropagation
More informationMachine Learning Lecture 10
Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes
More informationDeep Learning: a gentle introduction
Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationNeural networks COMS 4771
Neural networks COMS 4771 1. Logistic regression Logistic regression Suppose X = R d and Y = {0, 1}. A logistic regression model is a statistical model where the conditional probability function has a
More information