Neural Network Language Modeling
|
|
- Abigayle Newton
- 5 years ago
- Views:
Transcription
1 Neural Network Language Modeling Instructor: Wei Xu Ohio State University CSE 5525 Many slides from Marek Rei, Philipp Koehn and Noah Smith
2 Course Project Sign up your course project In-class presentation on next Friday, 5 minute each
3 Language Modeling (Recap) Goal: - calculate the probability of a sentence - calculate probability of a word in the sentence
4 N-gram Language Modeling (Recap)
5 Zero Probabilities (Recap) When we have sparse statistics: P(w denied the) 3 allegations 2 reports 1 claims 1 request 7 total allegations reports claims request attack man outcome Steal probability mass to generalize better P(w denied the) 2.5 allegations 1.5 reports 0.5 claims 0.5 request 2 other 7 total allegations reports claims request attack man outcome
6 Smoothing (Recap) Backoff Interpolation Kneser-Ney Smoothing
7 Problems with N-grams Problem 1: N-grams are sparse. - There are V 4 possible 4-grams. With V=10,000, that s grams. - We will only see a tiny fraction of them in our training data.
8 Problems with N-grams Problem 2: Words are independent. - It only map together identical words, but ignore similar or related words. - If we have seen yellow daffodil in the data, we could use the intuition that blue is related to yellow to handle blue daffodils.
9 Vector Representation Let s represent words (or any objects) as vectors. Let s choose them, so that similar words have similar vectors.
10 One-hot Word Vectors Each element represents the word.
11 One-hot Word Vectors That s a very large vector! They tell us very little.
12 Distributed Vectors (Representations) Each element represents a property, and they are shared between the words.
13 Distributed Vectors Groups similar words/objects together.
14 Distributed Vectors Use cosine similarity to calculate similarity between two words
15 Distributed Vectors Use cosine similarity to calculate similarity between two words
16 Distributed Vectors We can infer some information based only on the vector of the word.
17 Distributed Vectors We don t even need to know the labels on the vector elements.
18 Distributed Vectors The vectors are usually not 2 or 3-dimensional. More often dimensions.
19 Neural Network Language Models Represent each word as a vector, and similar words with similar vectors. Idea: similar contexts have similar words so we define a model that aims to predict between a word wt and context words: P(wt context) or P(context wt) Optimize the vectors together with the model, so we end up with vectors that perform well for language modeling (aka representation learning)
20 Artificial Neuron Perceptrons = Single-Layer Neural Networks!
21 Artificial Neuron
22 Neural Network Many neurons connected together Multi-Layer Feed-Forward Networks!
23 Neural Network Usually, a neuron is shown as a single unit
24 Neural Network Or a whole layer of neurons is represented as a block
25 Neuron Activation w/ Vectors
26 Feedforward Activation w/ Matrices The same process applies when activating multiple neurons Now the weights are in a matrix as opposed to a vector Activation f(z) is applied to each neuron separately
27 Feedforward Activation w/ Matrices
28 Neural Network
29 Feedforward Activation 1) Take vector from the previous layer 2) Multiply it with the weight matrix 3) Apply the activation function 4) Repeat
30 Feedforward Activation
31 Exercise
32 Input Layer
33 Hidden Layer Computation
34 Output Layer Computation
35 Error Computed output: y =.76 Correct output: t = 1.0 ) How do we adjust the weights?
36 Back-propagation Training Gradient descent error is a function of the weights we want to reduce the error gradient descent: move towards the error minimum compute gradient! get direction to the error minimum adjust weights towards direction of lower error Back-propagation first adjust last set of weights propagate error back to each previous layer adjust their weights
37 Final Layer Update Linear combination of weights s = P k w kh k Activation function y = sigmoid(s) Error (L2 norm) E = 1 2 (t y)2 Derivative of error with regard to one weight w k de = de dy ds dw k dy ds dw k
38 Final Layer Update (1) Linear combination of weights s = P k w kh k Activation function y = sigmoid(s) Error (L2 norm) E = 1 2 (t y)2 Derivative of error with regard to one weight w k de = de dy dw k dy ds ds dw k Error E is defined with respect to y de dy = d dy 1 2 (t y)2 = (t y)
39 Final Layer Update (2) Linear combination of weights s = P k w kh k Activation function y = sigmoid(s) Error (L2 norm) E = 1 2 (t y)2 Derivative of error with regard to one weight w k de = de dy dw k dy ds ds dw k y with respect to x is sigmoid(s) dy ds = d sigmoid(s) ds = sigmoid(s)(1 sigmoid(s)) = y(1 y)
40 Final Layer Update (3) Linear combination of weights s = P k w kh k Activation function y = sigmoid(s) Error (L2 norm) E = 1 2 (t y)2 Derivative of error with regard to one weight w k de = de dy dw k dy ds ds dw k x is weighted linear combination of hidden node values h k ds dw k = d X w k h k = h k dw k k
41 Putting it Together Derivative of error with regard to one weight w k de = de dy dw k dy ds ds dw k = (t y) y(1 y) h k error derivative of sigmoid: y 0 Weight adjustment will be scaled by a fixed learning rate µ w k = µ (t y) y 0 h k
42 Multiple Output Nodes Our example only had one output node Typically neural networks have multiple output nodes Error is computed over all j output nodes E = X j 1 2 (t j y j ) 2 Weights k! j are adjusted according to the node they point to w j k = µ(t j y j ) y 0 j h k
43 Hidden Layer Update In a hidden layer, we do not have a target output value But we can compute how much each node contributed to downstream error Definition of error term of each node j =(t j y j ) y 0 j Back-propagate the error term (why this way? there is math to back it up...) i = X j w j i j y 0 i Universal update formula w j k = µ j h k
44 Final Layer Update (example) A B C D E F G.76 Computed output: y =.76 Correct output: t = 1.0 Final layer weight updates (learning rate µ = 10) G =(t y) y 0 =(1.76) =.0434 w GD = µ G h D = =.391 w GE = µ G h E = =.074 w GF = µ G h F = =.434
45 Final Layer Update (example) A B C D E F G.76 Computed output: y =.76 Correct output: t = 1.0 Final layer weight updates (learning rate µ = 10) G =(t y) y 0 =(1.76) =.0434 w GD = µ G h D = =.391 w GE = µ G h E = =.074 w GF = µ G h F = =.434
46 Hidden Layer Update (example) A B C D E F Hidden node D P D = j w j i j y 0 D = w GD G y 0 D = =.0175 w DA = µ D h A = =.175 w DB = µ D h B = =0 w DC = µ D h C = =.175 Hidden node E P E = j w j i j y 0 E = w GE G y 0 E = =.0464 w EA = µ E h A = =.464 etc. G.76
47 Initialization of Weights Weights are initialized randomly e.g., uniformly from interval [ 0.01, 0.01] Glorot and Bengio (2010) suggest for shallow neural networks 1 pn, 1 p n n is the size of the previous layer for deep neural networks p 6 p nj + n j+1, p 6 p nj + n j+1 n j is the size of the previous layer, n j size of next layer
48 Neural Networks for Classification Predict class: one output node per class Training data output: One-hot vector, e.g., ~y =(0, 0, 1) T Prediction predicted class is output node y i with highest value obtain posterior probability distribution by soft-max softmax(y i )= ey i P j ey j
49 Feedforward Neural Network Language Model Input: vector representations of previous words E(wi-3) E(wi-2) E (wi-1) Output: the conditional probability of wj being the next word
50 Feedforward Neural Network Language Model We can also think of the input as a concatenation of the context vectors The hidden layer h is calculated as in previous examples How do we calculate?
51 Feedforward Neural Network Language Model Our output vector o has an element for each possible word wj We take a softmax over that vector How do we calculate?
52 Feedforward Neural Network Language Model Our output vector o has an element for each possible word wj We take a softmax over that vector
53 Feedforward Neural Network Language Model 1) Multiple input vectors with weights 2) Apply the activation function Bengio et al. (2003)
54 Feedforward Neural Network Language Model softmax 3) Multiply hidden vector with output weights tanh 4) Apply softmax to the output vector Now the j-th element in the output vector o j contains the probability of w j being the next word. Bengio et al. (2003)
55 Feedforward Neural Network Language Model softmax Like a log-linear language model with two kinds of features: tanh 1) concatenation of context-word embedding vectors 2) tanh-affine transformation of the above Bengio et al. (2003) one hot vectors
56 Feedforward Neural Network Language Model softmax M u, T tanh tanh W softmax b, A one hot vectors Bengio et al. (2003)
57 Neural Network: Definitions Warning: there is no widely accepted standard notation! Afeedforwardneuralnetworkn is defined by: I A function family that maps parameter values to functions of the form n : R d in! R d out ;typically: I Non-linear I Di erentiable with respect to its inputs I Assembled through a series of a ne transformations and non-linearities, composed together I Symbolic/discrete inputs handled through lookups. I Parameter values I I Typically a collection of scalars, vectors, and matrices We often assume they are linearized into R D
58 A Couple of Useful Functions I softmax : R k! R k hx 1,x 2,...,x k i 7! * e x 1 P k j=1 ex j, e x 2 P k j=1 ex j,..., e x k P k j=1 ex j + I tanh : R! [ 1, 1] e x x 7! ex e x + e x Generalized to be elementwise, sothatitmapsr k! [ 1, 1] k. I Others include: ReLUs, logistic sigmoids, PReLUs,...
59 Feedforward Neural Network Language Model (Bengio et al., 2003) Define the n-gram probability as follows: M u, T tanh W softmax p( hh 1,...,h n 1 i)=n he h1,...,e hn 1 i = 0 0 V + Xn 1 j=1 e hj V > MA j V d d V + W V H H + Xn 1 j=1 e > h j M T j AA d H 11 where each e hj 2 R V is a one-hot vector and H is the number of hidden units in the neural network (a hyperparameter ). b, A Parameters include: I M 2 R V d, which are called embeddings (row vectors), one for every word in V I Feedforward NN parameters b 2 R V, A 2 R (n 1) d V, W 2 R V H, u 2 R H, T 2 R (n 1) d H
60 Breaking It Down M u, T tanh Look up each of the history words h j, 8j 2 {1,...,n 1} in M; keep two copies. Rename the embedding for h j as m hj. W softmax e hj > M = m hj e hj > M = m hj b, A
61 Breaking It Down tanh M u, T Apply an a ne transformation to the second copy of the history-word embeddings (u, T) W softmax m hj b, A u H + Xn 1 j=1 m hj T j d H
62 Breaking It Down tanh M u, T Apply an a ne transformation to the second copy of the history-word embeddings (u, T) andatanh nonlinearity. W softmax m hj 0 u + Xn 1 m hj T j 1 A b, A j=1
63 Breaking It Down tanh M u, T Apply an a ne transformation to everything (b, A, W). b, A W softmax b V + + W V H Xn 1 j=1 m hj A j 0 d V u + Xn 1 j=1 m hj T j 1 A
64 Breaking It Down M u, T tanh Apply a softmax transformation to make the vector sum to one. b, A W softmax 0 b + Xn 1 j=1 m hj A j 0 + W u + Xn 1 j=1 m hj T j 1 1 A A
65 Breaking It Down M u, T tanh W softmax 0 b + Xn 1 j=1 m hj A j 0 + W u + Xn 1 j=1 m hj T j 1 1 A A Like a log-linear language model with two kinds of features: b, A I Concatenation of context-word embeddings vectors m hj I tanh-a ne transformation of the above New parameters arise from (i) embeddings and (ii) a ne transformation inside the nonlinearity.
66 Number of Parameters M u, T b, A tanh {z} {z} W softmax {z} {z} {z } {z} {z} {z } D = {z} V d + {z} V +(n 1)dV + V H {z } {z} + {z} H +(n 1)dH {z } M b A W u T I V (after OOV processing) I d 2 {30, 60} I H 2 {50, 100} I n 1=5 So D = 461V parameters, compared to O(V n ) for classical n-gram models. I Forcing A = 0 eliminated 300V parameters and performed a bit better, but was slower to converge. I If we averaged m hj instead of concatenating, we d get to 221V (this is a variant of continuous bag of words, Mikolov et al., 2013). 15 / 5
67 Why does it work?
Introduction to Neural Networks
Introduction to Neural Networks Philipp Koehn 4 April 205 Linear Models We used before weighted linear combination of feature values h j and weights λ j score(λ, d i ) = j λ j h j (d i ) Such models can
More informationIntroduction to Neural Networks
Introduction to Neural Networks Philipp Koehn 3 October 207 Linear Models We used before weighted linear combination of feature values h j and weights λ j score(λ, d i ) = j λ j h j (d i ) Such models
More informationNeural Networks, Computation Graphs. CMSC 470 Marine Carpuat
Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ
More informationNEURAL LANGUAGE MODELS
COMP90042 LECTURE 14 NEURAL LANGUAGE MODELS LANGUAGE MODELS Assign a probability to a sequence of words Framed as sliding a window over the sentence, predicting each word from finite context to left E.g.,
More informationNeural Networks Language Models
Neural Networks Language Models Philipp Koehn 10 October 2017 N-Gram Backoff Language Model 1 Previously, we approximated... by applying the chain rule p(w ) = p(w 1, w 2,..., w n ) p(w ) = i p(w i w 1,...,
More informationword2vec Parameter Learning Explained
word2vec Parameter Learning Explained Xin Rong ronxin@umich.edu Abstract The word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent two years. The vector
More informationDeep Learning Recurrent Networks 2/28/2018
Deep Learning Recurrent Networks /8/8 Recap: Recurrent networks can be incredibly effective Story so far Y(t+) Stock vector X(t) X(t+) X(t+) X(t+) X(t+) X(t+5) X(t+) X(t+7) Iterated structures are good
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationLogistic Regression & Neural Networks
Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability
More informationFeedforward Neural Networks. Michael Collins, Columbia University
Feedforward Neural Networks Michael Collins, Columbia University Recap: Log-linear Models A log-linear model takes the following form: p(y x; v) = exp (v f(x, y)) y Y exp (v f(x, y )) f(x, y) is the representation
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single
More informationECE521 Lectures 9 Fully Connected Neural Networks
ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance
More informationtext classification 3: neural networks
text classification 3: neural networks CS 585, Fall 2018 Introduction to Natural Language Processing http://people.cs.umass.edu/~miyyer/cs585/ Mohit Iyyer College of Information and Computer Sciences University
More informationFrom perceptrons to word embeddings. Simon Šuster University of Groningen
From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written
More informationStatistical Machine Learning from Data
January 17, 2006 Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Multi-Layer Perceptrons Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole
More informationIntroduction to Machine Learning Spring 2018 Note Neural Networks
CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 2 October 2018 http://www.inf.ed.ac.u/teaching/courses/mlp/ MLP Lecture 3 / 2 October 2018 Deep
More informationLecture 17: Neural Networks and Deep Learning
UVA CS 6316 / CS 4501-004 Machine Learning Fall 2016 Lecture 17: Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationIntroduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis
Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.
More informationTopics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)
Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound Lecture 3: Introduction to Deep Learning (continued) Course Logistics - Update on course registrations - 6 seats left now -
More informationArtificial Neuron (Perceptron)
9/6/208 Gradient Descent (GD) Hantao Zhang Deep Learning with Python Reading: https://en.wikipedia.org/wiki/gradient_descent Artificial Neuron (Perceptron) = w T = w 0 0 + + w 2 2 + + w d d where
More informationLecture 16: Introduction to Neural Networks
Lecture 16: Introduction to Neural Networs Instructor: Aditya Bhasara Scribe: Philippe David CS 5966/6966: Theory of Machine Learning March 20 th, 2017 Abstract In this lecture, we consider Bacpropagation,
More informationFeed-forward Network Functions
Feed-forward Network Functions Sargur Srihari Topics 1. Extension of linear models 2. Feed-forward Network Functions 3. Weight-space symmetries 2 Recap of Linear Models Linear Models for Regression, Classification
More informationLecture 5 Neural models for NLP
CS546: Machine Learning in NLP (Spring 2018) http://courses.engr.illinois.edu/cs546/ Lecture 5 Neural models for NLP Julia Hockenmaier juliahmr@illinois.edu 3324 Siebel Center Office hours: Tue/Thu 2pm-3pm
More informationCSCI567 Machine Learning (Fall 2018)
CSCI567 Machine Learning (Fall 2018) Prof. Haipeng Luo U of Southern California Sep 12, 2018 September 12, 2018 1 / 49 Administration GitHub repos are setup (ask TA Chi Zhang for any issues) HW 1 is due
More informationNeural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig
Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT marine@cs.umd.edu Slides credit: Graham Neubig Outline Perceptron: recap and limitations Neural networks Multi-layer perceptron Forward propagation
More informationArtificial Intelligence
Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement
More informationCOMP 551 Applied Machine Learning Lecture 14: Neural Networks
COMP 551 Applied Machine Learning Lecture 14: Neural Networks Instructor: Ryan Lowe (ryan.lowe@mail.mcgill.ca) Slides mostly by: Class web page: www.cs.mcgill.ca/~hvanho2/comp551 Unless otherwise noted,
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More information1 What a Neural Network Computes
Neural Networks 1 What a Neural Network Computes To begin with, we will discuss fully connected feed-forward neural networks, also known as multilayer perceptrons. A feedforward neural network consists
More informationCh.6 Deep Feedforward Networks (2/3)
Ch.6 Deep Feedforward Networks (2/3) 16. 10. 17. (Mon.) System Software Lab., Dept. of Mechanical & Information Eng. Woonggy Kim 1 Contents 6.3. Hidden Units 6.3.1. Rectified Linear Units and Their Generalizations
More informationNeural Networks Learning the network: Backprop , Fall 2018 Lecture 4
Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More information18.6 Regression and Classification with Linear Models
18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight
More informationAutomatic Differentiation and Neural Networks
Statistical Machine Learning Notes 7 Automatic Differentiation and Neural Networks Instructor: Justin Domke 1 Introduction The name neural network is sometimes used to refer to many things (e.g. Hopfield
More informationy(x n, w) t n 2. (1)
Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,
More informationIntroduction to Neural Networks
Introduction to Neural Networks Steve Renals Automatic Speech Recognition ASR Lecture 10 24 February 2014 ASR Lecture 10 Introduction to Neural Networks 1 Neural networks for speech recognition Introduction
More informationConvolutional Neural Networks
Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationIntro to Neural Networks and Deep Learning
Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1 Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs
More informationDEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY
DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY 1 On-line Resources http://neuralnetworksanddeeplearning.com/index.html Online book by Michael Nielsen http://matlabtricks.com/post-5/3x3-convolution-kernelswith-online-demo
More informationHomework 3 COMS 4705 Fall 2017 Prof. Kathleen McKeown
Homework 3 COMS 4705 Fall 017 Prof. Kathleen McKeown The assignment consists of a programming part and a written part. For the programming part, make sure you have set up the development environment as
More informationCSC321 Lecture 5: Multilayer Perceptrons
CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3
More informationMultilayer Perceptrons and Backpropagation
Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationWeek 5: Logistic Regression & Neural Networks
Week 5: Logistic Regression & Neural Networks Instructor: Sergey Levine 1 Summary: Logistic Regression In the previous lecture, we covered logistic regression. To recap, logistic regression models and
More informationArtificial Neural Networks
Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks
More informationLecture 3 Feedforward Networks and Backpropagation
Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression
More informationECE521 Lecture 7/8. Logistic Regression
ECE521 Lecture 7/8 Logistic Regression Outline Logistic regression (Continue) A single neuron Learning neural networks Multi-class classification 2 Logistic regression The output of a logistic regression
More informationarxiv: v3 [cs.cl] 30 Jan 2016
word2vec Parameter Learning Explained Xin Rong ronxin@umich.edu arxiv:1411.2738v3 [cs.cl] 30 Jan 2016 Abstract The word2vec model and application by Mikolov et al. have attracted a great amount of attention
More informationNeural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron
More informationCSCI 315: Artificial Intelligence through Deep Learning
CSCI 35: Artificial Intelligence through Deep Learning W&L Fall Term 27 Prof. Levy Convolutional Networks http://wernerstudio.typepad.com/.a/6ad83549adb53ef53629ccf97c-5wi Convolution: Convolution is
More informationNaïve Bayes, Maxent and Neural Models
Naïve Bayes, Maxent and Neural Models CMSC 473/673 UMBC Some slides adapted from 3SLP Outline Recap: classification (MAP vs. noisy channel) & evaluation Naïve Bayes (NB) classification Terminology: bag-of-words
More informationLecture 6: Neural Networks for Representing Word Meaning
Lecture 6: Neural Networks for Representing Word Meaning Mirella Lapata School of Informatics University of Edinburgh mlap@inf.ed.ac.uk February 7, 2017 1 / 28 Logistic Regression Input is a feature vector,
More informationMultilayer Perceptron
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4
More informationDeep Learning for Natural Language Processing. Sidharth Mudgal April 4, 2017
Deep Learning for Natural Language Processing Sidharth Mudgal April 4, 2017 Table of contents 1. Intro 2. Word Vectors 3. Word2Vec 4. Char Level Word Embeddings 5. Application: Entity Matching 6. Conclusion
More informationDeep Learning & Artificial Intelligence WS 2018/2019
Deep Learning & Artificial Intelligence WS 2018/2019 Linear Regression Model Model Error Function: Squared Error Has no special meaning except it makes gradients look nicer Prediction Ground truth / target
More informationDeep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated
Deep Feedforward Networks Lecture slides for Chapter 6 of Deep Learning www.deeplearningbook.org Ian Goodfellow Last updated 2016-10-04 Roadmap Example: Learning XOR Gradient-Based Learning Hidden Units
More informationSerious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions
BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design
More informationCSC321 Lecture 7 Neural language models
CSC321 Lecture 7 Neural language models Roger Grosse and Nitish Srivastava February 1, 2015 Roger Grosse and Nitish Srivastava CSC321 Lecture 7 Neural language models February 1, 2015 1 / 19 Overview We
More informationNatural Language Processing and Recurrent Neural Networks
Natural Language Processing and Recurrent Neural Networks Pranay Tarafdar October 19 th, 2018 Outline Introduction to NLP Word2vec RNN GRU LSTM Demo What is NLP? Natural Language? : Huge amount of information
More informationBackpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Backpropagation Matt Gormley Lecture 12 Feb 23, 2018 1 Neural Networks Outline
More informationDeep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville
Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville Chapter 6 :Deep Feedforward Networks Benoit Massé Dionyssos Kounades-Bastian Benoit Massé, Dionyssos Kounades-Bastian Deep Feedforward
More informationDeep Feedforward Networks
Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3
More informationMachine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16
Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 16 Slides adapted from Jordan Boyd-Graber, Justin Johnson, Andrej Karpathy, Chris Ketelsen, Fei-Fei Li, Mike Mozer, Michael Nielson
More informationComputing Neural Network Gradients
Computing Neural Network Gradients Kevin Clark 1 Introduction The purpose of these notes is to demonstrate how to quickly compute neural network gradients in a completely vectorized way. It is complementary
More informationLecture 13: Introduction to Neural Networks
Lecture 13: Introduction to Neural Networks Instructor: Aditya Bhaskara Scribe: Dietrich Geisler CS 5966/6966: Theory of Machine Learning March 8 th, 2017 Abstract This is a short, two-line summary of
More informationBidirectional Representation and Backpropagation Learning
Int'l Conf on Advances in Big Data Analytics ABDA'6 3 Bidirectional Representation and Bacpropagation Learning Olaoluwa Adigun and Bart Koso Department of Electrical Engineering Signal and Image Processing
More informationRecurrent Neural Networks. Jian Tang
Recurrent Neural Networks Jian Tang tangjianpku@gmail.com 1 RNN: Recurrent neural networks Neural networks for sequence modeling Summarize a sequence with fix-sized vector through recursively updating
More informationReading Group on Deep Learning Session 1
Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular
More informationArtificial Neural Networks
Artificial Neural Networks Oliver Schulte - CMPT 310 Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I
Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In
More information8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples
8-1: Backpropagation Prof. J.C. Kao, UCLA Backpropagation Chain rule for the derivatives Backpropagation graphs Examples 8-2: Backpropagation Prof. J.C. Kao, UCLA Motivation for backpropagation To do gradient
More informationUnit III. A Survey of Neural Network Model
Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of
More informationJakub Hajic Artificial Intelligence Seminar I
Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network
More informationNeural Networks (Part 1) Goals for the lecture
Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationDeep Learning Sequence to Sequence models: Attention Models. 17 March 2018
Deep Learning Sequence to Sequence models: Attention Models 17 March 2018 1 Sequence-to-sequence modelling Problem: E.g. A sequence X 1 X N goes in A different sequence Y 1 Y M comes out Speech recognition:
More informationAdvanced Machine Learning
Advanced Machine Learning Lecture 4: Deep Learning Essentials Pierre Geurts, Gilles Louppe, Louis Wehenkel 1 / 52 Outline Goal: explain and motivate the basic constructs of neural networks. From linear
More informationDeep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation
Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation Steve Renals Machine Learning Practical MLP Lecture 5 16 October 2018 MLP Lecture 5 / 16 October 2018 Deep Neural Networks
More informationDeep Learning for NLP
Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning
More informationCSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer
CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene
More informationNeural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35
Neural Networks David Rosenberg New York University July 26, 2017 David Rosenberg (New York University) DS-GA 1003 July 26, 2017 1 / 35 Neural Networks Overview Objectives What are neural networks? How
More informationNatural Language Processing with Deep Learning CS224N/Ling284
Natural Language Processing with Deep Learning CS224N/Ling284 Lecture 4: Word Window Classification and Neural Networks Richard Socher Organization Main midterm: Feb 13 Alternative midterm: Friday Feb
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More information11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)
11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes
More informationNLP Programming Tutorial 8 - Recurrent Neural Nets
NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and Technology (NAIST) 1 Feed Forward Neural Nets All connections point forward ϕ( x) y It is a directed acyclic
More informationRecurrent Neural Networks. COMP-550 Oct 5, 2017
Recurrent Neural Networks COMP-550 Oct 5, 2017 Outline Introduction to neural networks and deep learning Feedforward neural networks Recurrent neural networks 2 Classification Review y = f( x) output label
More informationAn overview of word2vec
An overview of word2vec Benjamin Wilson Berlin ML Meetup, July 8 2014 Benjamin Wilson word2vec Berlin ML Meetup 1 / 25 Outline 1 Introduction 2 Background & Significance 3 Architecture 4 CBOW word representations
More informationAI Programming CS F-20 Neural Networks
AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols
More informationNeural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016
Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph
More informationBackpropagation: The Good, the Bad and the Ugly
Backpropagation: The Good, the Bad and the Ugly The Norwegian University of Science and Technology (NTNU Trondheim, Norway keithd@idi.ntnu.no October 3, 2017 Supervised Learning Constant feedback from
More informationNeural Networks 2. 2 Receptive fields and dealing with image inputs
CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There
More informationNeural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav
Neural Networks 30.11.2015 Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav 1 Talk Outline Perceptron Combining neurons to a network Neural network, processing input to an output Learning Cost
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,
More informationLecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep
More information