Learning Deep Architectures for AI. Part I - Vijay Chakilam

Similar documents
Learning Deep Architectures for AI. Part II - Vijay Chakilam

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

Deep Learning book, by Ian Goodfellow, Yoshua Bengio and Aaron Courville

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

Lecture 5: Logistic Regression. Neural Networks

Machine Learning Lecture 12

Ch.6 Deep Feedforward Networks (2/3)

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Deep Belief Networks are compact universal approximators

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation

A summary of Deep Learning without Poor Local Minima

Advanced Machine Learning

Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures. Lecture 04

Deep Feedforward Networks

Lecture 17: Neural Networks and Deep Learning

Course 395: Machine Learning - Lectures

CSC321 Lecture 5: Multilayer Perceptrons

Machine Learning Lecture 10

Rapid Introduction to Machine Learning/ Deep Learning

Intro to Neural Networks and Deep Learning

CS489/698: Intro to ML

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

ECE521 Lectures 9 Fully Connected Neural Networks

Neural Networks: Introduction

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Unit III. A Survey of Neural Network Model

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Deep Feedforward Networks. Seung-Hoon Na Chonbuk National University

Reading Group on Deep Learning Session 1

Deep Belief Networks are Compact Universal Approximators

Supervised Learning. George Konidaris

Lecture 12. Neural Networks Bastian Leibe RWTH Aachen

Feed-forward Network Functions

Simple neuron model Components of simple neuron

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Feedforward Neural Nets and Backpropagation

Computational Intelligence Winter Term 2017/18

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

Neural networks. Chapter 20. Chapter 20 1

Artificial Neural Networks. MGS Lecture 2

Neural networks. Chapter 19, Sections 1 5 1

Computational Intelligence

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Turing Machine. Author: Alex Graves, Greg Wayne, Ivo Danihelka Presented By: Tinghui Wang (Steve)

The Perceptron Algorithm

Neural Networks and Deep Learning

The Perceptron algorithm

Data Mining Part 5. Prediction

Artificial Intelligence

y(x n, w) t n 2. (1)

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Artificial Neuron (Perceptron)

CSC242: Intro to AI. Lecture 21

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

ECE521 Lecture 7/8. Logistic Regression

Grundlagen der Künstlichen Intelligenz

Simple Neural Nets For Pattern Classification

CSC 411 Lecture 10: Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks

Deep Neural Networks (1) Hidden layers; Back-propagation

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Introduction to Convolutional Neural Networks (CNNs)

Artificial Neural Networks

Deep Feedforward Networks

Machine Learning. Neural Networks

ECE521 Lecture7. Logistic Regression

2018 EE448, Big Data Mining, Lecture 5. (Part II) Weinan Zhang Shanghai Jiao Tong University

Neural Networks: Backpropagation

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

CSCI567 Machine Learning (Fall 2018)

The perceptron learning algorithm is one of the first procedures proposed for learning in neural network models and is mostly credited to Rosenblatt.

Neural Nets Supervised learning

From perceptrons to word embeddings. Simon Šuster University of Groningen

Security Analytics. Topic 6: Perceptron and Support Vector Machine

Introduction to Machine Learning Spring 2018 Note Neural Networks

Artificial Neural Networks

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Machine Learning

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

Neural networks. Chapter 20, Section 5 1

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Introduction to Deep Learning

WHY DEEP NEURAL NETWORKS FOR FUNCTION APPROXIMATION SHIYU LIANG THESIS

Artificial Neural Networks Examination, June 2005

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Introduction to Machine Learning

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Transcription:

Learning Deep Architectures for AI - Yoshua Bengio Part I - Vijay Chakilam

Chapter 0: Preliminaries

Neural Network Models The basic idea behind the neural network approach is to model the response as a nonlinear function of various linear combinations of the predictors output units hidden units input units

Types of Neurons Linear Binary threshold Rectified Linear Sigmoid

One of the earliest algorithms for linear classification (Rosenblatt, 1958) Based on finding a separating hyperplane of the data Perceptron

Perceptron Architecture Manually engineer features; mostly based on common sense and hand-written programs. Learn how to weight each of the features to get a single scalar quantity. If this quantity is above some threshold, decide that the input vector is a positive example of the target class.

Perceptron Algorithm Add an extra component with value 1 to each input vector. The bias weight on this component is minus the threshold. Now we can have the threshold set to zero. Pick training cases using any policy that ensures that every training case will keep getting picked. If the output unit is correct, leave its weights alone. If the output unit incorrectly outputs a negative class, add the input vector to the weight vector. If the output unit incorrectly outputs a positive class, subtract the input vector from the weight vector. This is guaranteed to find a set of weights that gets the right answer for all the training cases if any such set exists.

Perceptron Updates w new = w old + x

Perceptron Updates w new = w old x

Perceptron Convergence Theorem (Block, 1962, and Novikoff, 1962). Let a sequence of examples (x (1), y (1) ),(x (2), y (2) ),...(x (m), y (m) ) be given. Suppose that for all, and further that there exists a unit-length vector u ( u 2 =1) such that y (i).(u T x (i) ) γ for all examples in the sequence. x (i) D i Then the total number of mistakes that the algorithm makes on this sequence is at most D γ ( ) 2

Chapter 1: Deep Architectures

Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model results in right target for every example 0,0 1,0 A graphical view of the XOR problem

Learning representations x1 W, b φ u1 h1 V, c 2,1 y x2 u2 h2 W = 11 11, b = 0 1, V = 1 2 and c = 0 0,0 1,0

Universal Approximation Theorem: A neural network with one hidden layer can approximate any continuous function. More formally, given a continuous function where is a compact subset of R n, C n such that ε ε, f NN K i=1 : x V i φ(w T i + b i )+ c f :C n! R m x C n, f (x) f ε NN (x) < ε

Universal Approximation Cybenko 1989; Hornik et al., 1989 proved the universal approximation properties for networks with squashing activation function such as logistic sigmoid. Leshno et al., 1993 proved the UA properties for Rectified Linear Unit activation functions.

Deep vs Shallow Pascanu et al. (2013) compared deep rectifier networks with their shallow counterparts. For a deep model with n 0 inputs and k hidden layers of width n, the maximal number of response regions per parameter behaves as Ω n n 0 k 1 n n 0 2 For a shallow model with inputs and nk hidden units, the maximal number of response regions per parameter behaves as n 0 k O( k n0 1 n n 0 1 )

Deep vs Shallow Pascanu et al. (2013) showed that the deep model can generate exponentially more regions per parameter in terms of the number of hidden layers, and at least order polynomially more regions per parameter in terms of layer width n. Montufar et al. (2014) came up with significantly improved lower bound on the maximal linear regions.

Chapter 2: Feed-forward Neural Networks

Feed-forward neural networks These are the commonest type of neural network in practical applications. The first layer is the input and the last layer is the output. If there is more than one hidden layer, we call them deep neural networks. They compute a series of transformations that change the similarities between cases. The activities of the neurons in each layer are a non-linear function of the activities in the layer below. output units hidden units input units

Backpropagation First convert the discrepancy between each output and its target value into an error derivative. Then compute error derivatives in each hidden layer from error derivatives in the layer above. E = 1 2 (t j y j ) 2 j output E y j = (t j y j ) E Then use error derivatives w.r.t. activities to get error derivatives w.r.t. the incoming weights. E y i y j

y j j z j Backpropagation E = 1 2 (t j y j ) 2 j output E y j = (t j y j ) y i i E z j = dy j dz j E y j = y j (1 y j ) E y j E w ij = z j w ij E z j = y i E z j E y i = dz j E dy i z = w E ij j z j j j

References Yoshua Bengio, Learning deep architecures for AI: http://www.iro.umontreal.ca/~bengioy/papers/ ftml_book.pdf Geoff Hinton, Neural networks for machine learning: https://www.coursera.org/learn/neural-networks Goodfellow et al., Deep Learning: http://www.deeplearningbook.org/ Montufar et al., On the number of linear regions of Deep Networks: https://arxiv.org/pdf/1402.1869.pdf

References Pascanu et al., On the number of response regions of deep feed forward neural networks with piece-wise linear activations: https://arxiv.org/pdf/1312.6098.pdf Perceptron weight update graphics: https://www.cs.utah.edu/~piyush/teaching/8-9- print.pdf Proof of Theorem (Block, 1962, and Novikoff, 1962). http://cs229.stanford.edu/notes/cs229-notes6.pdf