Statistical NLP for the Web

Similar documents
CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

How to do backpropagation in a brain

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Lecture 17: Neural Networks and Deep Learning

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Logistic Regression & Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks

Neural Nets Supervised learning

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

Lecture 5: Logistic Regression. Neural Networks

Course 395: Machine Learning - Lectures

Neural Networks Task Sheet 2. Due date: May

Artificial Intelligence

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural networks. Chapter 20. Chapter 20 1

Neural Networks: Backpropagation

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Lab 5: 16 th April Exercises on Neural Networks

Artificial Neural Networks

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

CSC 411 Lecture 10: Neural Networks

CSC321 Lecture 5: Multilayer Perceptrons

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Machine Learning

Neural Networks and Deep Learning

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Data Mining Part 5. Prediction

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

From perceptrons to word embeddings. Simon Šuster University of Groningen

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Machine Learning

Neural Networks biological neuron artificial neuron 1

Nonlinear Classification

Introduction to Machine Learning

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

18.6 Regression and Classification with Linear Models

A thorough derivation of back-propagation for people who really want to understand it by: Mike Gashler, September 2010

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Computational Intelligence Winter Term 2017/18

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Machine Learning. Boris

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Lecture 4: Perceptrons and Multilayer Perceptrons

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Neural networks. Chapter 19, Sections 1 5 1

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

4. Multilayer Perceptrons

Intro to Neural Networks and Deep Learning

CSE446: Neural Networks Winter 2015

Multilayer Neural Networks

Computational Intelligence

17 Neural Networks NEURAL NETWORKS. x XOR 1. x Jonathan Richard Shewchuk

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Supervised Learning in Neural Networks

Introduction to Artificial Neural Networks

Artificial Neural Networks Examination, June 2005

Multilayer Perceptrons (MLPs)

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

How to do backpropagation in a brain. Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto

Midterm for CSC321, Intro to Neural Networks Winter 2018, night section Tuesday, March 6, 6:10-7pm

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Feedforward Neural Nets and Backpropagation

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Unit III. A Survey of Neural Network Model

Input layer. Weight matrix [ ] Output layer

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Neural networks. Chapter 20, Section 5 1

Neural Networks and the Back-propagation Algorithm

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

CSC321 Lecture 6: Backpropagation

ECE521 Lectures 9 Fully Connected Neural Networks

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

RegML 2018 Class 8 Deep learning

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

1 What a Neural Network Computes

Math for Machine Learning Open Doors to Data Science and Artificial Intelligence. Richard Han

Artificial Neural Networks

Artificial Neural Networks 2

Neural Networks 2. 2 Receptive fields and dealing with image inputs

SGD and Deep Learning

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Introduction to Neural Networks

A summary of Deep Learning without Poor Local Minima

AI Programming CS F-20 Neural Networks

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Transcription:

Statistical NLP for the Web Neural Networks, Deep Belief Networks Sameer Maskey Week 8, October 24, 2012 *some slides from Andrew Rosenberg

Announcements Please ask HW2 related questions in courseworks HW2 due date has been moved to Oct 30 (next Tuesday) HW3 will be released next week

Student Projects Hashtag Recommendation for Twitter Reviews: How can the reviews help the restaurants improve more efficiently? Question Answering System dealing with factual questions in the field of Classical Music Automatic Summarization of Video Content Mood Sync: Text Mining for Mood Classification of Songs Web app for fashion item recognition TCoG Twitter Dedupe Unsupervised Medical Entity Recognition A Web App for Personalized Health News Twitter movie tweets sentiment analysis An intelligent newsreader service Legal Auto Assist

HW2 How to do well in HW2? Understand the concept clearly Go through the animation of forward backward in the slides Make sure you understand where each numbers are coming from Also, take a look at Jason Eisner s excel sheet You can make sure your algorithm is correct by first trying Eisner s example in the code Make sure you do things in log probabilities

Topics for Today Neural Networks Deep Belief Networks

Neurons Neurons accept information from multiple inputs, transmit information to other neurons. Multiply inputs by weights along edges Apply some function to the set of inputs at each node 6

Types of Neurons Linear Neuron Logistic Neuron Perceptron Potentially more. Require a convex loss function for gradient descent training. 7

Multilayer Networks Cascade Neurons together The output from one layer is the input to the next Each Layer has its own sets of weights 8

Linear Regression Neural Networks What happens when we arrange linear neurons in a multilayer network? 9

Linear Regression Neural Networks Nothing special happens. The product of two linear transformations is itself a linear transformation. 10

Neural Networks We want to introduce non-linearities to the network. Non-linearities allow a network to identify complex regions in space 11

Linear Separability 1-layer cannot handle XOR More layers can handle more complicated spaces but require more parameters Each node splits the feature space with a hyperplane If the second layer is AND a 2-layer network can represent any convex hull. 12

XOR Problem and Neural Net Solution Picture from [1]

Neural Net Picture from [1]

Feed-Forward Networks Predictions are fed forward through the network to classify 15

Feed-Forward Networks Predictions are fed forward through the network to classify 16

Feed-Forward Networks Predictions are fed forward through the network to classify 17

Feed-Forward Networks Predictions are fed forward through the network to classify 18

Feed-Forward Networks Predictions are fed forward through the network to classify 19

Feed-Forward Networks Predictions are fed forward through the network to classify 20

Error Backpropagation We will do gradient descent on the whole network. Training will proceed from the last layer to the first. 21

Error Backpropagation Introduce variables over the neural network 22

Error Backpropagation Introduce variables over the neural network Distinguish the input and output of each node 23

Error Backpropagation 24

Error Backpropagation Training: Take the gradient of the last component and iterate backwards 25

Error Backpropagation Empirical Risk Function 26

Error Backpropagation Optimize last layer weights w kl Calculus chain rule 27

Chain Rule What is chain rule saying? If we want to know how error changes when the weights change we can think of it as See how error changes when the input to the weight changes Multiply it with a factor that shows how the input changes when the weight changes

Error Backpropagation Optimize last layer weights w kl Calculus chain rule 29

Error Backpropagation Optimize last layer weights w kl Calculus chain rule 30

Remember ± ±w ik (t k j w jkx j )= x i when i=j Onlypartofthesumthatisfunctionofw ik iswheni=j

Error Backpropagation Optimize last layer weights w kl Calculus chain rule 32

Error Backpropagation Optimize last layer weights w kl Calculus chain rule 33

Error Backpropagation Optimize last hidden weights w jk 34

Error Backpropagation Optimize last hidden weights w jk Multivariate chain rule 35

Error Backpropagation Optimize last hidden weights w jk Multivariate chain rule 36

Error Backpropagation Optimize last hidden weights w jk Multivariate chain rule 37

Error Backpropagation Optimize last hidden weights w jk Multivariate chain rule 38

Error Backpropagation Repeat for all previous layers 39

Error Backpropagation Now that we have well defined gradients for each parameter, update using Gradient Descent 40

Error Back-propagation Error backprop unravels the multivariate chain rule and solves the gradient for each partial component separately. The target values for each layer come from the next layer. This feeds the errors back along the network. 41

Neural Net Algorithm : Forward Phase h j a j h k y k w ij w jk h j = i x iw ij a j =g(h j )=1/(1+e βh j ) h k = j a jw jk y k =g(h k )=1/(1+e βh k )

Neural Networks : Backward Phase h j a j h k y k w ij w jk δ ok =(t k y k )y k (1 y k ) δ hj =a j (1 a j ) k w jkδ ok w jk w jk +ηδ ok a j w ij +ηδ hj x i

Deriving Backprop Again Remember δ δw ik (t k j w jkx j )= x i when i=j Onlypartofthesumthatisfunctionofw ik iswheni=j

Also Derivative of Activation Function g(h)= 1 1+e βh dg dh = d dh 1 1+e βh =βg(h)(1 g(h))

Backpropagation of Error δe = δe δh k δw jk δh k δw jk δe =( δe δy k ) δh k δw jk δy k δh k δw jk δ 1 δy k 2 k (y k t k ) 2 δh k δw jk = δ l w lka l δw jk = l δw lk a l δw jk (y k t k ) y k (1 y k ) a j w jk w jk +ηδ ok a j

Problems with Neural Networks Neural Networks can easily overfit Many parameters to estimate It s hard to interpret the numbers produced by hidden layer 47

Types of Neural Networks Convolutional Networks Multiple Outputs Skip Layer Network Recurrent Neural Networks 48

What is wrong with back-propagation? It requires labeled training data. Almost all data is unlabeled. The learning time does not scale well It is very slow in networks with multiple hidden layers. It can get stuck in poor local optima.

Backpropagation Problems Backpropagation does not scale well with many hidden layer Requires a lot of data Easily stuck in poor local minima Use similar gradient method to adjust weights but maximize the likelihood of data given the model Deep Belief Networks

Deep Belief Network in NLP and Speech Deep Networks used in variety of NLP and Speech processing tasks [Colbert and Weston, 2008] Tagging, Chunking Words into features [Mohamed et. al, 2009] ASR Phone recognition [Dealaers et. al, 2007] Machine Transliteration

Deep Networks p(v,h 1,h 2,h 3,...,h l ) join distribution factored into conditionals across layers such as p(h 1 h 2 ) Hidden Nodes Visible Nodes

Conditional Distributions of Layers Conditionals are given by p(h k h k+1 )= i p(hk i hk +1) where p(h k i hk +1)=sig(b k i + j Wk ij hk+1 j )

Conditional Distribution per Node p(h k i hk +1)=sig(b k i + j Wk ij hk+1 j ) This is basically saying Sigmoid function W ik Weight matrix if NXM size

Reference [1] Duda, Hart, and Stock, Pattern Classification