Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Similar documents
Logistic Regression & Neural Networks

Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig

ECE521 Lectures 9 Fully Connected Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Neural Network Language Modeling

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Artificial Neural Networks

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Deep Feedforward Networks

Neural Networks and Deep Learning

Artificial Neural Networks. MGS Lecture 2

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks: Backpropagation

Neural Networks (Part 1) Goals for the lecture

Feed-forward Network Functions

Artificial Intelligence

Lab 5: 16 th April Exercises on Neural Networks

CSC242: Intro to AI. Lecture 21

Intro to Neural Networks and Deep Learning

Introduction to Neural Networks

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Introduction to Neural Networks

Revision: Neural Network

Computational statistics

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Multilayer Perceptron

Learning from Data: Multi-layer Perceptrons

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

ECE521 Lecture 7/8. Logistic Regression

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

CSCI567 Machine Learning (Fall 2018)

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

A summary of Deep Learning without Poor Local Minima

y(x n, w) t n 2. (1)

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Neural Networks and the Back-propagation Algorithm

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

CSC 411 Lecture 10: Neural Networks

Introduction to Neural Networks

Artificial Neural Networks

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Stochastic gradient descent; Classification

Reading Group on Deep Learning Session 1

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Introduction to Machine Learning

Multi-layer Neural Networks

Lecture 17: Neural Networks and Deep Learning

CSC321 Lecture 6: Backpropagation

epochs epochs

4. Multilayer Perceptrons

Introduction to Machine Learning Spring 2018 Note Neural Networks

Convolutional Neural Networks

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Feedforward Neural Nets and Backpropagation

Cheng Soon Ong & Christian Walder. Canberra February June 2018

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Lecture 5 Neural models for NLP

From perceptrons to word embeddings. Simon Šuster University of Groningen

Machine Learning. 7. Logistic and Linear Regression

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

CSC321 Lecture 4: Learning a Classifier

Neural Networks. Haiming Zhou. Division of Statistics Northern Illinois University.

Neural networks. Chapter 20. Chapter 20 1

NLP Programming Tutorial 8 - Recurrent Neural Nets

Machine Learning (CSE 446): Neural Networks

Machine Learning Basics III

1 What a Neural Network Computes

Unit III. A Survey of Neural Network Model

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Artificial Neuron (Perceptron)

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Neural Networks DWML, /25

Neural Network Training

Neural Networks: Backpropagation

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS16

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

Machine Learning. Boris

18.6 Regression and Classification with Linear Models

Nonlinear Classification

Deep Neural Networks (1) Hidden layers; Back-propagation

Error Backpropagation

CSC321 Lecture 4: Learning a Classifier

Machine Learning (CSE 446): Backpropagation

NN V: The generalized delta learning rule

CSC321 Lecture 5: Multilayer Perceptrons

Lecture 16: Introduction to Neural Networks

text classification 3: neural networks

Neural networks (NN) 1

Transcription:

Neural Networks, Computation Graphs CMSC 470 Marine Carpuat

Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ black = 0-1

Example: binary classification with a NN φ 0 (x 1 ) = {-1, 1} X φ 0 [1] φ 0 (x 2 ) = {1, 1} O O X φ 0 [0] φ 0 (x 3 ) = {-1, -1}φ 0 (x 4 ) = {1, -1} 1 1 1 φ 2 [0] = y 1 1-1 -1-1 -1 φ 1 [0] φ 1 [1] φ 1 (x 3 ) = {-1, 1} φ 1 [1] O φ 1 [0] φ 1 (x 1 ) = {-1, -1} X O φ 1 (x 2 ) = {1, -1} φ 1 (x 4 ) = {-1, -1}

Example: the Final Net φ 0 [0] φ 0 [1] 1 1 1-1 tanh φ 1 [0] 1 Replace sign with smoother non-linear function (e.g. tanh, sigmoid) φ 0 [0] φ 0 [1] -1-1 tanh φ 1 [1] 1 tanh φ 2 [0] 1-1 1 1

Multi-layer Perceptrons are a kind of Neural Network (NN) φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ black = 0-1 Input (aka features) Output Nodes (aka neuron) Layers Hidden layers Activation function (non-linear)

Neural Networks as Computation Graphs Example & figures by Philipp Koehn

Computation Graphs Make Prediction Easy: Forward Propagation

Computation Graphs Make Prediction Easy: Forward Propagation

Neural Networks as Computation Graphs Decomposes computation into simple operations over matrices and vectors Forward propagation algorithm Produces network output given an output By traversing the computation graph in topological order

Neural Networks for Multiclass Classification

Multiclass Classification The softmax function P y x = x,y ew ϕ y ew ϕ x, y Current class Sum of other classes Exact same function as in multiclass logistic regression

Example: A feedforward Neural Network for 3-way Classification Sigmoid function Softmax function (as in multi-class logistic reg) From Eisenstein p66

Designing Neural Networks: Activation functions Hidden layer can be viewed as set of hidden features The output of the hidden layer indicates the extent to which each hidden feature is activated by a given input The activation function is a nonlinear function that determines range of hidden feature values

Designing Neural Networks: Network structure 2 key decisions: Width (number of nodes per layer) Depth (number of hidden layers) More parameters means that the network can learn more complex functions of the input

Neural Networks so far Powerful non-linear models for classification Predictions are made as a sequence of simple operations matrix-vector operations non-linear activation functions Choices in network structure Width and depth Choice of activation function Feedforward networks (no loop) Next: how to train?

Training Neural Networks

How do we estimate the parameters (aka train ) a neural net? For training, we need: Data: (a large number of) examples paired with their correct class (x,y) Loss/error function: quantify how bad our prediction y is compared to the truth t Let s use squared error:

Stochastic Gradient Descent We view the error as a function of the trainable parameters, on a given dataset We want to find parameters that minimize the error w = 0 for I iterations for each labeled pair x, y in the data w = w μ Start with some initial parameter values derror(w, x, y) dw Go through the training data one example at a time Take a step down the gradient

Computation Graphs Make Training Easy: Computing Error

Computation Graphs Make Training Easy: Computing Gradients

Computation Graphs Make Training Easy: Given forward pass + derivatives for each node

Computation Graphs Make Training Easy: Computing Gradients

Computation Graphs Make Training Easy: Computing Gradients

Computation Graphs Make Training Easy: Updating Parameters

Computation Graph: A Powerful Abstraction To build a system, we only need to: Define network structure Define loss Provide data (and set a few more hyperparameters to control training) Given network structure Prediction is done by forward pass through graph (forward propagation) Training is done by backward pass through graph (back propagation) Based on simple matrix vector operations Forms the basis of neural network libraries Tensorflow, Pytorch, mxnet, etc.

Neural Networks Powerful non-linear models for classification Predictions are made as a sequence of simple operations matrix-vector operations non-linear activation functions Choices in network structure Width and depth Choice of activation function Feedforward networks (no loop) Training with the back-propagation algorithm Requires defining a loss/error function Gradient descent + chain rule Easy to implement on top of computation graphs