Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Similar documents
Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks and Deep Learning

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Lecture 17: Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

From perceptrons to word embeddings. Simon Šuster University of Groningen

Statistical Machine Learning from Data

Neural Network Tutorial & Application in Nuclear Physics. Weiguang Jiang ( 蒋炜光 ) UTK / ORNL

Deep Feedforward Networks

Logistic Regression & Neural Networks

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

Introduction to Neural Networks

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Deep Feedforward Networks

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Introduction to Machine Learning Spring 2018 Note Neural Networks

Artificial Neural Networks. MGS Lecture 2

Intro to Neural Networks and Deep Learning

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

CSCI567 Machine Learning (Fall 2018)

4. Multilayer Perceptrons

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Multilayer Perceptrons and Backpropagation

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Statistical NLP for the Web

Based on the original slides of Hung-yi Lee

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Neural networks COMS 4771

Solutions. Part I Logistic regression backpropagation with a single training example

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

A Tutorial On Backward Propagation Through Time (BPTT) In The Gated Recurrent Unit (GRU) RNN

Jakub Hajic Artificial Intelligence Seminar I

Introduction to Machine Learning

A summary of Deep Learning without Poor Local Minima

Learning Neural Networks

Machine Learning Basics III

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Ch.6 Deep Feedforward Networks (2/3)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Machine Learning (CSE 446): Backpropagation

Convolutional Neural Networks

y(x n, w) t n 2. (1)

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Bits of Machine Learning Part 1: Supervised Learning

Lecture 5 Neural models for NLP

Natural Language Processing and Recurrent Neural Networks

Deep Learning Lab Course 2017 (Deep Learning Practical)

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

CSC 578 Neural Networks and Deep Learning

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Introduction to Deep Neural Networks

Lecture 5: Logistic Regression. Neural Networks

Introduction to Neural Networks

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Reading Group on Deep Learning Session 1

Neural networks. Chapter 20. Chapter 20 1

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Neural Networks (Part 1) Goals for the lecture

SGD and Deep Learning

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

Computational Graphs, and Backpropagation. Michael Collins, Columbia University

Understanding Neural Networks : Part I

Deep Learning: a gentle introduction

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Feedforward Neural Nets and Backpropagation

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Lecture 2: Modular Learning

Recurrent Neural Networks. COMP-550 Oct 5, 2017

Advanced computational methods X Selected Topics: SGD

Computational Graphs

Multilayer Perceptron

Neural networks CMSC 723 / LING 723 / INST 725 MARINE CARPUAT. Slides credit: Graham Neubig

CSC 411 Lecture 10: Neural Networks

Grundlagen der Künstlichen Intelligenz

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Computational statistics

Introduction to Convolutional Neural Networks (CNNs)

Course 395: Machine Learning - Lectures

Natural Language Processing

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Machine Learning. Boris

Modelling Time Series with Neural Networks. Volker Tresp Summer 2017

Multilayer Neural Networks

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

NEURAL LANGUAGE MODELS

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Machine Learning Lecture 12

Neural networks and optimization

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Neural networks. Chapter 19, Sections 1 5 1

Transcription:

Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph & Backpropagation Dropout Machine Learning for NLP 2/46

Outline Part 2 Word Embeddings Recurrent Neural Network Some Use Cases in NLP TensorFlow and the Assignment Machine Learning for NLP 3/46

Introduction History 1940s, first neural network computing model. 1950s, two-layer network, perceptron 1980s, backpropagation 2009 (2011) - now, recent great success in deep learning Machine Learning for NLP 4/46

Introduction Deep Learning Revolution Machine Learning for NLP 5/46

Introduction Deep Reinforcement Learning Machine Learning for NLP 6/46

Introduction Language Modelling, Question Answering, Speech Recognition (Black Mirror, Season 2, Be Right Back ) Machine Learning for NLP 7/46

Introduction Language Modelling, Recurrent Neural Network (http://karpathy.github.io/2015/05/21/rnn-effectiveness/) Machine Learning for NLP 8/46

Introduction Deep Convolutional Neural Networks (https://devblogs.nvidia.com/parallelforall/mocha-jl-deep-learningjulia/) Machine Learning for NLP 9/46

Introduction Machine Translation (https://research.googleblog.com/2016/09/a-neural-network-formachine.html) Machine Learning for NLP 10/46

Feedforward Neural Network Linear Perceptron: f (x i ) = w i x i + b (1) Input #1 Input #2 Input #3 Input #4 Machine Learning for NLP 11/46

Feedforward Neural Network Linear Perceptron: y = w i x i + b (2) Input #1 Input #2 Input #3 Input #4 Machine Learning for NLP 12/46

Feedforward Neural Network Linear Perceptron: y j = w ij x i + b j (3) Input #1 Input #2 Input #3 Input #4 Machine Learning for NLP 13/46

Feedforward Neural Network y j = w ij x i + b j y k = w jk y j + b k (4) Input #1 Input #2 Input #3 Input #4 Machine Learning for NLP 14/46

Introduction Non-linear Activation 1 y y = 1 1+e t 2 1 1 2 x 1 (Sigmoid Function) Machine Learning for NLP 15/46

Introduction Non-linear Activation y 1 y = tanh x 2 1 1 2 x 1 (Hyberbolic Tangent) Machine Learning for NLP 16/46

Feedforward Neural Network Stacked Linear Layers y j = w ij x i + b j y k = w jk y j + b k (5) Input #1 Input #2 Input #3 Input #4 Machine Learning for NLP 17/46

Feedforward Neural Network Add Non-linear Activation yj h = h(w ij x i + b j ) y k = w jk yj h (6) + b k Input #1 Input #2 Input #3 Input #4 Machine Learning for NLP 18/46

Feedforward Neural Network Softmax function (normalized exponential function) σ(z j ) = ey j e y k k Machine Learning for NLP 19/46

Feedforward Neural Network Add Softmax y h j = h(w ij x i + b j ) y g k = g(w jk y h j + b k ) (7) Input #1 Input #2 Input #3 Input #4 σ σ σ Machine Learning for NLP 20/46

Feedforward Neural Network y h j = h(w ij x i + b j ) y g k = g(w jk y h j + b k ) (8) Input Layer Hidden Layer Layer Input #1 Input #2 Input #3 Input #4 σ σ σ Machine Learning for NLP 21/46

Feedforward Neural Network y h j = h(w ij x i + b j ) y g k = g(w jk y h j + b k ) (9) j equals to the size of the hidden layer. Theorem: If j is big enough, we can simulate any vector-valued function to any degree of precision. Machine Learning for NLP 22/46

Feedforward Neural Network Training Training data, input [x i ], output [y i ] Loss Function, L(θ; y i, ŷ i ) (Stochastic) Gradient Descent Backpropagation Machine Learning for NLP 23/46

Feedforward Neural Network Loss Function L(θ; y i, ŷ i ) a.k.a Cost Function, Energy Function, Objective Function ŷ i = δ(x i θ) Mean Square Error mse = 1 n (ŷi y i ) 2 Cross Entropy H = y i log(ŷ i ) i Machine Learning for NLP 24/46

Gradient Descent To minimise the loss function L(θ; y i, ŷ i ), at every step t, we pick the training sample(s) i and update the weights parameter θ as: L t = L(θ;y i,ŷ i ) θ t θ=θt θ t+1 = θ t η L t where η (learning rate) is a very small value. We stop the iteration when it reaches the optima (local or global). Example Link How can we efficiently compute the gradient L t? Machine Learning for NLP 25/46

Gradient Descent To minimise the loss function L(θ; y i, ŷ i ), at every step t, we pick the training sample(s) i and update the weights parameter θ as: L t = L(θ;y i,ŷ i ) θ t θ=θt θ t+1 = θ t η L t where η (learning rate) is a very small value. We stop the iteration when it reaches the optima (local or global). Example Link How can we efficiently compute the gradient L t? Machine Learning for NLP 25/46

Computational Graph & Backpropagation Derivative? Partial Derivative? Machine Learning for NLP 26/46

Computational Graph & Backpropagation Derivative? Partial Derivative? Machine Learning for NLP 26/46

Computational Graph & Backpropagation Partial derivatives: Sum rule: Chain rule: a (a + b) = a a + b a = 1 u (uv) = u u v + v u u = v Machine Learning for NLP 27/46

Computational Graph & Backpropagation We can represent numeric computations in any degree of complexity via a data flow graph using the nodes for computational operations. A simple example e = (a + b) (b + 1) where we have a and b as inputs. This expression can be rewritten as: c = a + b d = b + 1 e = c d Machine Learning for NLP 28/46

Computational Graph & Backpropagation Represent the expression as a graph: (https://colah.github.io/posts/2015-08-backprop/) Machine Learning for NLP 29/46

Computational Graph & Backpropagation Let s set a = 2, b = 1: (https://colah.github.io/posts/2015-08-backprop/) Machine Learning for NLP 30/46

Computational Graph & Backpropagation Add partial derivatives on the edges: (https://colah.github.io/posts/2015-08-backprop/) Machine Learning for NLP 31/46

Computational Graph & Backpropagation e e a = 1 2 = 2; b = 1 2 + 1 3 = 5 (https://colah.github.io/posts/2015-08-backprop/) Machine Learning for NLP 32/46

Computational Graph & Backpropagation Factoring Paths (https://colah.github.io/posts/2015-08-backprop/) Z X Z X = αδ + αɛ + αζ + βδ + βɛ + βζ + γδ + γɛ + γζ = (α + β + γ)(ζ + ɛ + δ) Machine Learning for NLP 33/46

Computational Graph & Backpropagation Factoring Paths (https://colah.github.io/posts/2015-08-backprop/) Z X Z X = αδ + αɛ + αζ + βδ + βɛ + βζ + γδ + γɛ + γζ = (α + β + γ)(ζ + ɛ + δ) Machine Learning for NLP 33/46

Computational Graph & Backpropagation Forward vs. Backward (https://colah.github.io/posts/2015-08-backprop/) Machine Learning for NLP 34/46

Computational Graph & Backpropagation Forward Propagation (https://colah.github.io/posts/2015-08-backprop/) Machine Learning for NLP 35/46

Computational Graph & Backpropagation Backpropagation (https://colah.github.io/posts/2015-08-backprop/) Machine Learning for NLP 36/46

Computational Graph & Backpropagation Employing the computational graph with backpropagation makes training of neural networks millions of times faster. When we train our neural networks with the standard deep learning libraries (Theano, Torch, Tensorflow etc.), there is always such a graph running behind. Machine Learning for NLP 37/46

Dropout Overfitting Machine Learning for NLP 38/46

Dropout A regularisation technique to migrate overfitting. Randomly drop units (along with their connections) from the neural network. It is only applied during training. Machine Learning for NLP 39/46

Dropout y h j = h(w ij x i + b j ) y g k = g(w jk y h j + b k ) (10) Input #1 Input #2 Input #3 Input #4 σ σ σ Machine Learning for NLP 40/46

Dropout y h j = d(h(w ij x i + b j )); dropout_rate = 0.5 y g k = g(w jk y h j + b k ) (11) Input #1 Input #2 Input #3 Input #4 σ σ σ Machine Learning for NLP 41/46

Dropout y h j = d(h(w ij x i + b j )); dropout_rate = 0.5 y g k = g(w jk y h j + b k ) (12) Input #1 Input #2 Input #3 Input #4 σ σ σ Machine Learning for NLP 42/46

Dropout y h j = d(h(w ij x i + b j )); dropout_rate = 0.5 y g k = g(w jk y h j + b k ) (13) Input #1 Input #2 Input #3 Input #4 σ σ σ Machine Learning for NLP 43/46

Dropout y h j = d(h(w ij x i + b j )); dropout_rate = 0.5 y g k = g(w jk y h j + b k ) (14) Input #1 Input #2 Input #3 Input #4 σ σ σ Machine Learning for NLP 44/46

Next time Word Embeddings Recurrent Neural Network Some Use Cases in NLP TensorFlow and the Assignment Machine Learning for NLP 45/46

Further Readings Deep Learning online courses: Stanford http://cs224d.stanford.edu// Cambridge https://youtu.be/plhfwt7vaew?list= PLE6Wd9FR--EfW8dtjAuPoTuPcqmOV53Fu More on computational graph and back propagation: https://colah.github.io/posts/2015-08-backprop/ Machine Learning for NLP 46/46