Deep Feedforward Networks

Similar documents
Deep Feedforward Networks

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Ch.6 Deep Feedforward Networks (2/3)

Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Feed-forward Network Functions

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Machine Learning Basics III

Gradient-Based Learning. Sargur N. Srihari

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

CS60010: Deep Learning

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Statistical Machine Learning (BE4M33SSU) Lecture 5: Artificial Neural Networks

1 What a Neural Network Computes

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Introduction to Neural Networks

Multilayer Perceptron

4. Multilayer Perceptrons

Machine Learning

Feedforward Neural Networks. Michael Collins, Columbia University

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Neural Networks and the Back-propagation Algorithm

Computational statistics

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

A summary of Deep Learning without Poor Local Minima

Neural Networks: Backpropagation

Artificial Neural Networks 2

From perceptrons to word embeddings. Simon Šuster University of Groningen

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Deep Neural Networks (1) Hidden layers; Back-propagation

CSC321 Lecture 5: Multilayer Perceptrons

Deep Feedforward Networks. Sargur N. Srihari

Unit III. A Survey of Neural Network Model

Deep Feedforward Networks

text classification 3: neural networks

CSC242: Intro to AI. Lecture 21

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

CSC 411 Lecture 10: Neural Networks

Advanced Machine Learning

Learning Deep Architectures for AI. Part I - Vijay Chakilam

Reading Group on Deep Learning Session 1

Introduction to Machine Learning Spring 2018 Note Neural Networks

Lecture 17: Neural Networks and Deep Learning

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

EVERYTHING YOU NEED TO KNOW TO BUILD YOUR FIRST CONVOLUTIONAL NEURAL NETWORK (CNN)

Course 395: Machine Learning - Lectures

Deep Neural Networks (1) Hidden layers; Back-propagation

Understanding Neural Networks : Part I

Artificial Neural Networks (ANN)

Introduction to Convolutional Neural Networks (CNNs)

Neural networks. Chapter 20. Chapter 20 1

Grundlagen der Künstlichen Intelligenz

Artificial Neural Networks. MGS Lecture 2

Intro to Neural Networks and Deep Learning

Machine Learning (CSE 446): Neural Networks

Lecture 5 Neural models for NLP

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Logistic Regression & Neural Networks

Learning from Data: Multi-layer Perceptrons

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

y(x n, w) t n 2. (1)

Notes on Back Propagation in 4 Lines

Neural networks. Chapter 19, Sections 1 5 1

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Neural networks and support vector machines

Neural Networks: Basics. Darrell Whitley Colorado State University

Neural Networks: Optimization & Regularization

Machine Learning Lecture 10

Lecture 2: Learning with neural networks

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Rapid Introduction to Machine Learning/ Deep Learning

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Course 10. Kernel methods. Classical and deep neural networks.

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Neural Network Training

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Neural networks COMS 4771

2018 EE448, Big Data Mining, Lecture 5. (Part II) Weinan Zhang Shanghai Jiao Tong University

Deep Learning Lab Course 2017 (Deep Learning Practical)

Rapid Introduction to Machine Learning/ Deep Learning

Feedforward Neural Nets and Backpropagation

ECE521 Lectures 9 Fully Connected Neural Networks

Jakub Hajic Artificial Intelligence Seminar I

Neural Networks. Advanced data-mining. Yongdai Kim. Department of Statistics, Seoul National University, South Korea

Neural Networks (Part 1) Goals for the lecture

Introduction to Deep Neural Networks

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Artificial Intelligence

Stochastic gradient descent; Classification

Data Mining & Machine Learning

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Transcription:

Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24

Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3 Hidden Units 4 Architecture Design 5 Back-Propagation and other differentiation algorithms 6 Regularization in deep learning Liu Yang Short title March 30, 2017 2 / 24

Background A general introduction Deep forward network Deep forward network is also called feedforward neural networks or MLP (Multilayer perceptron) The goal of a deep forward network is to approximate (learn) a function y = f Why we call it feedforward: no feedback connections in outputs A network properties for deep forward network is composing may different functions: f (3) (f (2) (f (1) (x))) Liu Yang Short title March 30, 2017 3 / 24

Background A general introduction Figure: An Example Liu Yang Short title March 30, 2017 4 / 24

Background Example Example: learning XOR The input is operation on two binary values, if one value equals 1 it will return to 1, otherwise it will return to zero Input X = {(0, 0), (0, 1), (1, 1), (1, 0)} Suppose we would like to fit a model y = f (x, θ) to learn the target function, then the loss function will be: J(θ) = 1 4 (f (x) f (x; θ)) 2 (1) Liu Yang Short title March 30, 2017 5 / 24

Background Example Example: learning XOR Linear approach can be used as the first try f (x; w, b) = x T w + b or f (x; w, b) = x 1 w 1 + x 2 w 2 + b Solution: w = 0, b = 0.5 with outputs 0.5 everywhere Why linear function fail? Liu Yang Short title March 30, 2017 6 / 24

Background Example Linear approach for XOR Major challenge for single-layer perceptron network: the two classes must be linearly separable, however, in XOR example, one linear function cannot separate these two classes, two lines may separate them Thus multiple-layer perceptron network can be used to provide a solution Liu Yang Short title March 30, 2017 7 / 24

Background Example Basic Components There are several basic components for deep forward network: Cost function Output Units Hidden Units Architecture Design Back-Propagation Algorithms Liu Yang Short title March 30, 2017 8 / 24

Gradient based learning Cost functions Cost functions Learning conditional distribution with maximum likelihood J(θ) = E x,y ˆpdata logp model (y x) (2) Learning conditional statistics f = argmine x,y ˆpdata y f (x) 2 (3) Liu Yang Short title March 30, 2017 9 / 24

Gradient based learning Output Units Output Units Linear Units for Gaussian Output Distributions : ŷ = W T h + b Sigmoid Units for Bernoulli Output Distributions: ŷ = σ(w T h + b) Softmax Units for Multinoulli Output Distributions: softmax(z) i = exp(z i) Σ j exp(z j ) (4) where z i = logp(y = i x) Other Output Types: Mixture units Liu Yang Short title March 30, 2017 10 / 24

Hidden Units Hidden Units Activation functions is used to compute the hidden layer values How to choose the type of hidden unit to use in the hidden layers? Rectified Linear Units (ReLU) ReLU use the activation function g(z) = max{0, z} ReLU are used on top of an affine transformation: h = g(w T x + b) Noisy ReLU: g(z) = max(0, z + Y ), Y N(0, σ(z)) Absolute value rectification g(z) = z Leaky ReLU: g(z, α) = max(0, z) + αmin(0, z), α = 0.01 parametric ReLU: treat α as a learnable parameter Logistic Sigmoid and Hyperbolic Tangent: use activation function: g(z) = σ(z) or g(z) = tanh(z) Other Hidden Units: RBF, softplus and Hard tanh Liu Yang Short title March 30, 2017 11 / 24

Architecture Design Architecture Design Architecture refers to the overall structure of the network: How many units it should have and how these units should be connected to each other. Universal Approximation Properties and Depth Other Architectural Considerations Liu Yang Short title March 30, 2017 12 / 24

Back-Propagation and other differentiation algorithms Back-Propagation and other differentiation algorithms Back-Propagation allows information from the cost to then flow backward through the network in order to compute the gradient Back-Propagation use the chain rule to iteratively compute gradients for each layer Back-Propagation requires activation function to be differentiable Liu Yang Short title March 30, 2017 13 / 24

Back-Propagation and other differentiation algorithms Back-Propagation and other differentiation algorithms Suppose we have a loss function E and a three layer network y = f (h(x)). Our goal is to minimize the loss function and obtain a solution for the weights (w (1) ) from input region to hidden layer and the weights (w (2) ) from hidden layer to output unit. E = 1 2 o t, where o is output unit and t is the target value. Liu Yang Short title March 30, 2017 14 / 24

Back-Propagation and other differentiation algorithms Back-Propagation and other differentiation algorithms Back-propagated error for output unit: o (2) j output, t j is the target value, w (2) ij is the value for j th is the weight from i th hidden layer to j th output unit. The right part of each circle is the target function and the left part is the gradient for the target function. Liu Yang Short title March 30, 2017 15 / 24

Back-Propagation and other differentiation algorithms Back-Propagation and other differentiation algorithms Back-propagated error for hidden layer: o (1) j hidden layer, t j is the target value, w (2) jq layer to q th output unit. δ (2) q is the value for j th is the weight from j th hidden is the BP error for output unit. Liu Yang Short title March 30, 2017 16 / 24

Back-Propagation and other differentiation algorithms Back-Propagation and other differentiation algorithms Back-Propagation algorithm can be divided into two phases: Phase 1: Propagation Forward propagation of a training pattern s input through the neural network in order to generate the network s output value(s). Backward propagation of the propagation s output activations through the neural network using the training pattern target in order to generate the deltas (the difference between the targeted and actual output values) of all output and hidden neurons. Phase 2: Weight update The weight s output delta and input activation are multiplied to find the gradient of the weight A ratio (percentage) of the weight s gradient is subtracted from the weight Liu Yang Short title March 30, 2017 17 / 24

Back-Propagation and other differentiation algorithms Back-Propagation and other differentiation algorithms How Back-Propagation works in a three layer network Figure: Pseudocode for a stochastic gradient algorithm Liu Yang Short title March 30, 2017 18 / 24

Back-Propagation and other differentiation algorithms Example: learning XOR A linear approach fails, we can consider changing the input space: Left: Original x space Right: Learned h space and with this h space, we can approach by a linear model, using one line to separate two classes Figure: A linear approach Liu Yang Short title March 30, 2017 19 / 24

Back-Propagation and other differentiation algorithms Example: learning XOR How can we do a nonlinear transformation to get a h space? Use neural network: f (1) (x) = W T x and f (2) (h) = h T w Use a Hidden layers function defined as: h = g(w T x + c) The activation function g can be defined as the rectified linear unit (ReLU): g(z) = max{0, z} Liu Yang Short title March 30, 2017 20 / 24

Back-Propagation and other differentiation algorithms Example: learning XOR Now the complete network is: f (x; W, c, w, b) = f (2) (f 1 (x)) = w T max{0, W T x + c} + b (5) Now walk through how model processes a batch of inputs Design matrix X for four points First step: XW Adding c Comput h Multiply by w Liu Yang Short title March 30, 2017 21 / 24

Regularization in deep learning Regularization Regularization is widely used in machine learning method Goal: reduce the generalization error but not its training error Liu Yang Short title March 30, 2017 22 / 24

Regularization in deep learning Regularization Parameter Norm Penalties L 2 Parameter Regularization L 1 Regularization Norm Penalties as Constrained Optimization Regularization and Under-Constrained Problems Dataset Augmentation Noise Robustness Injecting Noise at the Output Targets Semi-Supervised Learning Multitask Learning Liu Yang Short title March 30, 2017 23 / 24

Regularization in deep learning References Ian Goodfellow, Yoshua Bengio, and Aaron Courville (2017) Deep Learning R. Rojas (1996) Neural Networks, Springer-Verlag Liu Yang Short title March 30, 2017 24 / 24