Intro to Neural Networks and Deep Learning

Similar documents
Lecture 17: Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Machine Learning Basics III

Artificial Neuron (Perceptron)

ECE521 Lectures 9 Fully Connected Neural Networks

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

y(x n, w) t n 2. (1)

Making Deep Learning Understandable for Analyzing Sequential Data about Gene Regulation

Neural Networks: Backpropagation

CSCI567 Machine Learning (Fall 2018)

Artificial Neural Networks 2

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Neural Networks. Intro to AI Bert Huang Virginia Tech

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Logistic Regression & Neural Networks

Machine Learning (CSE 446): Neural Networks

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

CSC321 Lecture 5: Multilayer Perceptrons

Neural Network Training

Stochastic gradient descent; Classification

Artificial Neural Networks

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 13 Mar 1, 2018

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Introduction to Neural Networks

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Neural Networks and the Back-propagation Algorithm

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Learning Deep Architectures for AI. Part I - Vijay Chakilam

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Deep Feedforward Networks

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Artificial Intelligence

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

Feedforward Neural Nets and Backpropagation

LECTURE # - NEURAL COMPUTATION, Feb 04, Linear Regression. x 1 θ 1 output... θ M x M. Assumes a functional form

CSC 411 Lecture 10: Neural Networks

Multilayer Perceptron

Statistical Machine Learning from Data

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Learning from Data: Multi-layer Perceptrons

Deep Feedforward Networks. Sargur N. Srihari

Security Analytics. Topic 6: Perceptron and Support Vector Machine

Introduction to Convolutional Neural Networks (CNNs)

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

CSC321 Lecture 6: Backpropagation

text classification 3: neural networks

CSC242: Intro to AI. Lecture 21

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Neural Networks in Structured Prediction. November 17, 2015

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Lecture 3 Feedforward Networks and Backpropagation

Machine Learning Lecture 10

Course 395: Machine Learning - Lectures

Unit III. A Survey of Neural Network Model

Lecture 3 Feedforward Networks and Backpropagation

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Neural Networks: Backpropagation

Artificial Neural Networks

CSC321 Lecture 5 Learning in a Single Neuron

Machine Learning (CSE 446): Backpropagation

Gradient-Based Learning. Sargur N. Srihari

Artifical Neural Networks

Logistic Regression. COMP 527 Danushka Bollegala

Introduction to Machine Learning Spring 2018 Note Neural Networks

Revision: Neural Network

Deep Learning Lab Course 2017 (Deep Learning Practical)

Machine Learning Lecture 12

Statistical NLP for the Web

ECE521 Lecture 7/8. Logistic Regression

CSC321 Lecture 4: Learning a Classifier

Deep Learning & Artificial Intelligence WS 2018/2019

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Lecture 4 Backpropagation

4. Multilayer Perceptrons

Neural Networks and Deep Learning

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Deep Feedforward Networks

Solutions. Part I Logistic regression backpropagation with a single training example

Lecture 5: Logistic Regression. Neural Networks

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Computational Intelligence Winter Term 2017/18

Deep Neural Networks (1) Hidden layers; Back-propagation

Natural Language Processing with Deep Learning CS224N/Ling284

Neural Nets Supervised learning

From perceptrons to word embeddings. Simon Šuster University of Groningen

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Machine Learning Basics

Computational Intelligence

Multilayer Perceptron

Transcription:

Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi UVA CS 6316 1

Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs in Practice 2

W1 1 2 W2 w3 ŷ 3 3

Logistic Regression w+b e P(Y=1 ) = w+b 1+e Sigmoid Function (aka logistic, logit, S, soft-step) 4

Epanded Logistic Regression 1 2 Input p=3 3 +1 w1 w2 w 3 1 b Σ z Summing Function ŷ = P(Y=1,w) Sigmoid Function Multiply by weights T z = w p1+ b 11 1p y = sigmoid(z) = ez 1 + ez 5

Neuron 1 2 3 w1 w2 w3 Σ z b1 +1 6

Neurons 7 http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Neuron 1 Input 2 w1 w2 Σ w3 z ŷ 3 From here on, we leave out bias for simplicity T z = w p1 11 1p ŷ = sigmoid(z) = ez 1 + ez 8

Block View of a Neuron parameterized block w * Input Dot Product z T z = w p1 11 1p ŷ = sigmoid(z) = ŷ Sigmoid ez 1 + ez output 9

Neuron Representation 1 2 w1 w2 Σ ŷ w3 3 The linear transformation and nonlinearity together is typically considered a single neuron 10

Neuron Representation w * ŷ The linear transformation and nonlinearity together is typically considered a single neuron 11

Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs in Practice 12

W1 1 2 W2 w3 ŷ 3 13

1-Layer Neural Network (with 4 neurons) vector z matri W Σ ŷ1 Σ ŷ2 1 Input 2 Output ŷ Σ ŷ3 Σ ŷ4 3 Linear Sigmoid 1 layer 14

1-Layer Neural Network (with 4 neurons) z W Σ ŷ1 Σ ŷ2 1 Input 2 Output ŷ Σ ŷ3 Σ ŷ4 3 p=3 Element-wise on vector z d=4 z =WT d1 dp p1 ŷ = sigmoid(z) = d1 d1 ez 1 + ez 15

1-Layer Neural Network (with 4 neurons) W ŷ1 Σ 1 ŷ2 Σ Input 2 Output ŷ Σ ŷ3 Σ ŷ4 3 p=3 Element-wise on vector z d=4 z =WT d1 dp p1 ŷ = sigmoid(z) = d1 d1 ez 1 + ez 16

Block View of a Neural Network W is now a matri W * Input Dot Product z is now a vector z ŷ Sigmoid output z =WT d1 dp p1 ŷ = sigmoid(z) = d1 d1 ez 1 + ez 17

Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs in Practice 18

W1 1 2 W2 w3 ŷ 3 19

Multi-Layer Neural Network (Multi-Layer Perceptron (MLP) Network) weight subscript represents layer number W1 w2 1 ŷ 2 3 Hidden layer Output layer 2-layer NN 20

Multi-Layer Neural Network (MLP) W2 W1 w3 1 ŷ 2 3 1st hidden layer 2nd hidden layer Output layer 3-layer NN 21

Multi-Layer Neural Network (MLP) h1 W1 h2 W2 w3 1 ŷ 2 3 hidden layer 1 output z1 =W1T h1 = sigmoid(z1) z2 =W2T h1 h2 = sigmoid(z2) z3 =w3t h2 ŷ = sigmoid(z3) 22

Multi-Class Output MLP W1 1 h1 W2 h2 w3 2 ŷ 3 z1 =W1T h1 = sigmoid(z1) z2 =W2T h1 h2 = sigmoid(z2) z3 =W3T h2 ŷ = sigmoid(z2) 23

Block View Of MLP W1 * z1 1st hidden layer h1 W2 * z2 2nd hidden layer h2 W3 * z3 y Output layer 24

Deep Neural Networks (i.e. > 1 hidden layer) Researchers have successfully used 1000 layers to train an object classifier 25

Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs in Practice 26

W1 1 2 W2 w3 ŷ E (ŷ) 3 27

Binary Classification Loss W1 W2 w3 1 ŷ = P(y=1 X,W) 2 3 this eample is for a single sample E= loss = - logp(y = ŷ X= ) = - y log(ŷ) - (1 - y) log(1-ŷ) 28 true output

Regression Loss W1 1 W2 w3 Σ 2 ŷ 3 E = loss= 12 ( y - ŷ )2 true output 29

Multi-Class Classification Loss W1 W2 W3 z1 Σ 1 ŷ1 z2 2 Σ 3 Σ ŷ2 z3 ŷi = P( ŷi = 1 ) ŷ3 K=3 Softma function. Normalizing function which converts each class output to a probability. E = loss = - Σ yj ln ŷj j = 1...K 0 for all ecept true class 30

Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs in Practice 31

W1 1 2 W2 w3 ŷ E (ŷ) 3 32

Training Neural Networks How do we learn the optimal weights WL for our task?? Gradient descent: WL(t+1) = WL(t) - E WL(t) W1 W2 w3 ŷ But how do we get gradients of lower layers? Backpropagation! Repeated application of chain rule of calculus Locally minimize the objective Requires all blocks of the network to be differentiable 33 LeCun et. al. Efficient Backpropagation. 1998

Backpropagation Intro 34 http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Backpropagation Intro 35 http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Backpropagation Intro 36 http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Backpropagation Intro 37 http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Backpropagation Intro 38 http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Backpropagation Intro 39 http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Backpropagation Intro 40 http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Backpropagation Intro 41 http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Backpropagation Intro 42 http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Backpropagation Intro 43 http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Backpropagation Intro 44 http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Backpropagation Intro 45 http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Backpropagation Intro Tells us: by increasing by a scale of 1, we decrease f by a scale of 4 46 http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Backpropagation (binary classification eample) W1 1 2 w2 ŷ = P(y=1 X,W) 3 Eample on 1-hidden layer NN for binary classification 47

Backpropagation (binary classification eample) W1 * z1 h1 w2 * z2 y 48

Backpropagation (binary classification eample) W1 * z1 h1 w2 * z2 y E = loss = Gradient Descent to Minimize loss: Need to find these! 49

Backpropagation (binary classification eample) W1 * z1 h1 w2 * z2 y 50

Backpropagation (binary classification eample) W1 * z1 h1 w2 * z2 y =?? =?? 51

Backpropagation (binary classification eample) W1 * z1 h1 w2 * z2 y =?? =?? Eploit the chain rule! 52

Backpropagation (binary classification eample) W1 * z1 h1 w2 * z2 y 53

Backpropagation (binary classification eample) W1 * z1 h1 w2 * z2 y chain rule 54

Backpropagation (binary classification eample) W1 * z1 h1 w2 * z2 y 55

Backpropagation (binary classification eample) W1 * z1 h1 w2 * z2 y 56

Backpropagation (binary classification eample) W1 * z1 h1 w2 * z2 y 57

Backpropagation (binary classification eample) W1 * z1 h1 w2 * z2 y 58

Backpropagation (binary classification eample) W1 * z1 h1 w2 * z2 y 59

Backpropagation (binary classification eample) W1 * z1 h1 w2 * z2 y 60

Backpropagation (binary classification eample) W1 * z1 h1 w2 * z2 y 61

Backpropagation (binary classification eample) W1 * z1 h1 w2 * z2 y already computed 62

Local-ness of Backpropagation activations local gradients f y gradients 63 http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Local-ness of Backpropagation activations local gradients f y gradients 64 http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Eample: Sigmoid Block sigmoid() = () 65 http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Deep Learning = Concatenation of Differentiable Parameterized Layers (linear & nonlinearity functions) W1 * z1 1st hidden layer h1 W2 * z2 2nd hidden layer h2 W3 * z3 y Output layer Want to find optimal weights W to minimize some loss function E! 66

Backprop Whiteboard Demo w1 1 w2 2 Σ w3 w4 b1 z1 h1 w5 Σ Σ h2 z2 b2 1 ŷ w6 b3 1 f1 z1 = 1w1 + 2w3 + b1 z2 = 1w2 + 2w4 + b2 f2 ep(z1) h1 = 1 + ep(z ) 1 ep(z2) h2 = 1 + ep(z2) f3 ŷ = h1w5 + h2w6 + b3 f4 E = ( y - ŷ )2 w(t+1) = w(t) - E w E w(t) =?? 67

Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs in Practice 68

Nonlinearity Functions (i.e. transfer or activation functions) 1 2 3 +1 w1 w2 w3 b1 Σ Summing Function z Sigmoid Function Multiply by weights 69

Nonlinearity Functions (i.e. transfer or activation functions) Name Plot Equation Derivative ( w.r.t ) 70 https://en.wikipedia.org/wiki/activation_function#comparison_of_activation_functions

Nonlinearity Functions (i.e. transfer or activation functions) Name Plot Equation Derivative ( w.r.t ) 71 https://en.wikipedia.org/wiki/activation_function#comparison_of_activation_functions

Nonlinearity Functions (aka transfer or activation functions) Name Plot Equation Derivative ( w.r.t ) 72 https://en.wikipedia.org/wiki/activation_function#comparison_of_activation_functions usually works best in practice

Neurons 1-Layer Neural Network Multi-layer Neural Network Loss Functions Backpropagation Nonlinearity Functions NNs in Practice 73

Neural Net Pipeline W2 W1 w3 ŷ E 1. Initialize weights 2. For each batch of input samples S: a. Run the network Forward on S to compute outputs and loss b. Run the network Backward using outputs and loss to compute gradients c. Update weights using SGD (or a similar method) 3. Repeat step 2 until loss convergence 74

Non-Conveity of Neural Nets In very high dimensions, there eists many local minimum which are about the same. Pascanu, et. al. On the saddle point problem for non-conve optimization 2014 75

Building Deep Neural Nets y f 76 http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Building Deep Neural Nets GoogLeNet for Object Classification 77

Block Eample Implementation 78 http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf

Advantage of Neural Nets As long as it s fully differentiable, we can train the model to automatically learn features for us. 79