Neural Networks: Introduction

Similar documents
Neural Networks: Backpropagation

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

SGD and Deep Learning

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Artificial Neural Networks

Course 395: Machine Learning - Lectures

Introduction Biologically Motivated Crude Model Backpropagation

Lecture 4: Feed Forward Neural Networks

Feedforward Neural Nets and Backpropagation

Intelligent Systems Discriminative Learning, Neural Networks

Neural networks. Chapter 20, Section 5 1

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural networks. Chapter 19, Sections 1 5 1

Artificial Neural Networks

Machine Learning (CSE 446): Neural Networks

Artificial neural networks

Logistic Regression. Machine Learning Fall 2018

EEE 241: Linear Systems

CSC321 Lecture 5: Multilayer Perceptrons

Introduction To Artificial Neural Networks

Artificial Intelligence

Introduction to Deep Learning

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Lecture 13: Introduction to Neural Networks

Introduction to Artificial Neural Networks

Machine Learning. Neural Networks. Le Song. CSE6740/CS7641/ISYE6740, Fall Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU

CMSC 421: Neural Computation. Applications of Neural Networks

Artificial Neural Networks. Part 2

Artificial Neural Networks. Historical description

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Networks for Machine Learning. Lecture 2a An overview of the main types of neural network architecture

CSC 411 Lecture 10: Neural Networks

Neural networks. Chapter 20. Chapter 20 1

CS:4420 Artificial Intelligence

Machine Learning. Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Artifical Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

CS 4700: Foundations of Artificial Intelligence

Data Mining Part 5. Prediction

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Final Examination CS 540-2: Introduction to Artificial Intelligence

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

FINAL: CS 6375 (Machine Learning) Fall 2014

Pattern Recognition and Machine Learning. Artificial Neural networks

Logistic Regression & Neural Networks

Simple Neural Nets For Pattern Classification

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

Deep Learning for NLP

EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, Sasidharan Sreedharan

Learning Deep Architectures for AI. Part I - Vijay Chakilam

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Artificial Neural Network and Fuzzy Logic

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Artificial Neural Networks

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Deep Neural Networks (1) Hidden layers; Back-propagation

PV021: Neural networks. Tomáš Brázdil

CS 4700: Foundations of Artificial Intelligence

Feed-forward Network Functions

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

Neural Nets in PR. Pattern Recognition XII. Michal Haindl. Outline. Neural Nets in PR 2

Artificial Neural Networks. MGS Lecture 2

Neural Networks Introduction CIS 32

Machine Learning Lecture 12

Linear Classifiers: Expressiveness

2018 EE448, Big Data Mining, Lecture 5. (Part II) Weinan Zhang Shanghai Jiao Tong University

Neural Networks and Deep Learning

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation

Introduction to Neural Networks

Unit 8: Introduction to neural networks. Perceptrons

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Chapter 9: The Perceptron

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Introduction to Deep Learning

Lecture 3 Feedforward Networks and Backpropagation

Machine Learning Lecture 10

Lecture 5: Logistic Regression. Neural Networks

Nonlinear Classification

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Feedforward Networks

Numerical Learning Algorithms

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Introduction to Neural Networks

Lecture 7 Artificial neural networks: Supervised learning

Support vector machines Lecture 4

Advanced statistical methods for data analysis Lecture 2

Sections 18.6 and 18.7 Artificial Neural Networks

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Lecture 4: Perceptrons and Multilayer Perceptrons

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Neural Networks and the Back-propagation Algorithm

Introduction to Machine Learning Midterm Exam

Sections 18.6 and 18.7 Artificial Neural Networks

Transcription:

Neural Networks: Introduction Machine Learning Fall 2017 Based on slides and material from Geoffrey Hinton, Richard Socher, Dan Roth, Yoav Goldberg, Shai Shalev-Shwartz and Shai Ben-David, and others 1

Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Support Vector Machines Naïve Bayes Logistic Regression 2

Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Support Vector Machines Naïve Bayes Logistic Regression General learning principles Overfitting Mistake-bound learning PAC learning, sample complexity Hypothesis choice & VC dimensions Training and generalization errors Regularized Empirical Loss Minimization Bayesian Learning 3

Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Support Vector Machines Naïve Bayes Logistic Regression Produce linear classifiers General learning principles Overfitting Mistake-bound learning PAC learning, sample complexity Hypothesis choice & VC dimensions Training and generalization errors Regularized Empirical Loss Minimization Bayesian Learning 4

Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Support Vector Machines Naïve Bayes Logistic Regression Produce linear classifiers General learning principles Overfitting Mistake-bound learning PAC learning, sample complexity Hypothesis choice & VC dimensions Training and generalization errors Regularized Empirical Loss Minimization Bayesian Learning Not really resolved How do we train non-linear classifiers? Where do the features come from? 5

Neural Networks What is a neural network? Predicting with a neural network Training neural networks Practical concerns 6

This lecture What is a neural network? The hypothesis class Structure, expressiveness Predicting with a neural network Training neural networks Practical concerns 7

We have seen linear threshold units dot product threshold features 8

We have seen linear threshold units Prediction sgn (w ' x + b) = sgn( w / x / + b) dot product threshold features 9

We have seen linear threshold units Prediction sgn (w ' x + b) = sgn( w / x / + b) dot product threshold Learning various algorithms perceptron, SVM, logistic regression, in general, minimize loss features 10

We have seen linear threshold units Prediction sgn (w ' x + b) = sgn( w / x / + b) dot product threshold Learning various algorithms perceptron, SVM, logistic regression, in general, minimize loss features But where do these input features come from? What if the features were outputs of another classifier? 11

Features from classifiers 12

Features from classifiers 13

Features from classifiers Each of these connections have their own weights as well 14

Features from classifiers 15

Features from classifiers This is a two layer feed forward neural network 16

Features from classifiers This is a two layer feed forward neural network The input layer The hidden layer The output layer Think of the hidden layer as learning a good representation of the inputs 17

Features from classifiers This is a two layer feed forward neural network The dot product followed by the threshold constitutes a neuron 18

Features from classifiers This is a two layer feed forward neural network The dot product followed by the threshold constitutes a neuron Five neurons in this picture (four in hidden layer and one output) 19

But where do the inputs come from? The input layer What if the inputs were the outputs of a classifier? We can make a three layer network. And so on. 20

Let us try to formalize this 21

Neural networks A robust approach for approximating real-valued, discrete-valued or vector valued functions Among the most effective general purpose supervised learning methods currently known Especially for complex and hard to interpret data such as realworld sensory data The Backpropagation algorithm for neural networks has been shown successful in many practical problems handwritten character recognition, speech recognition, object recognition, some NLP problems 22

Biological neurons Neurons: core components of brain and the nervous system consisting of 1. Dendrites that collect information from other neurons 2. An axon that generates outgoing spikes The first drawing of a brain cells by Santiago Ramón y Cajal in 1899 23

Biological neurons Neurons: core components of brain and the nervous system consisting of 1. Dendrites that collect information from other neurons 2. An axon that generates outgoing spikes Modern artificial neurons are inspired by biological neurons But there are many, many fundamental differences The first drawing of a brain Don t take the similarity seriously (as also claims in the news cells by Santiago Ramón y about the emergence of intelligent behavior) Cajal in 1899 24

Artificial neurons Functions that very loosely mimic a biological neuron A neuron accepts a collection of inputs (a vector x) and produces an output by: Applying a dot product with weights w and adding a bias b Applying a (possibly non-linear) transformation called an activation output = activation(w ' x + b) 25

Artificial neurons Functions that very loosely mimic a biological neuron A neuron accepts a collection of inputs (a vector x) and produces an output by: Applying a dot product with weights w and adding a bias b Applying a (possibly non-linear) transformation called an activation output = activation(w ' x + b) Dot product Threshold activation 26

Artificial neurons Functions that very loosely mimic a biological neuron A neuron accepts a collection of inputs (a vector x) and produces an output by: Applying a dot product with weights w and adding a bias b Applying a (possibly non-linear) transformation called an activation output = activation(w ' x + b) Dot product Threshold activation Other activations are possible 27

Activation functions Also called transfer functions output = activation(w ' x + b) Name of the neuron Linear unit Threshold/sign unit Sigmoid unit Rectified linear unit (ReLU) Tanh unit Activation function: activation(z) z sign(z) 1 1 + exp ( z) max (0, z) tanh (z) Many more activation functions exist (sinusoid, sinc, gaussian, polynomial ) 28

A neural network A function that converts inputs to outputs defined by a directed acyclic graph Nodes organized in layers, correspond to neurons Edges carry output of one neuron to another, associated with weights To define a neural network, we need to specify: The structure of the graph How many nodes, the connectivity The activation function on each node The edge weights 29

A neural network A function that converts inputs to outputs defined by a directed acyclic graph Nodes organized in layers, correspond to neurons Edges carry output of one neuron to another, associated with weights w GH I J w GH Output Hidden Input To define a neural network, we need to specify: The structure of the graph How many nodes, the connectivity The activation function on each node The edge weights 30

A neural network A function that converts inputs to outputs defined by a directed acyclic graph Nodes organized in layers, correspond to neurons Edges carry output of one neuron to another, associated with weights w GH I J w GH Output Hidden Input To define a neural network, we need to specify: The structure of the graph How many nodes, the connectivity The activation function on each node The edge weights 31

A neural network A function that converts inputs to outputs defined by a directed acyclic graph Nodes organized in layers, correspond to neurons Edges carry output of one neuron to another, associated with weights To define a neural network, we need to specify: The structure of the graph How many nodes, the connectivity The activation function on each node The edge weights w GH I Called the architecture of the network Typically predefined, part of the design of the classifier J w GH Output Hidden Input 32

A neural network A function that converts inputs to outputs defined by a directed acyclic graph Nodes organized in layers, correspond to neurons Edges carry output of one neuron to another, associated with weights To define a neural network, we need to specify: The structure of the graph How many nodes, the connectivity The activation function on each node The edge weights Called the architecture of the network Typically predefined, part of the design of the classifier Learned from data w GH I J w GH Output Hidden Input 33

A brief history of neural networks 1943: McCullough and Pitts showed how linear threshold units can compute logical functions 1949: Hebb suggested a learning rule that has some physiological plausibility 1950s: Rosenblatt, the Peceptron algorithm for a single threshold neuron 1969: Minsky and Papert studied the neuron from a geometrical perspective 1980s: Convolutional neural networks (Fukushima, LeCun), the backpropagation algorithm (various) 2003-today: More compute, more data, deeper networks See also: http://people.idsia.ch/~juergen/deep-learning-overview.html 34

What functions do neural networks express? 35

A single neuron with threshold activation b +w 1 x 1 + w 2 x 2 =0 Prediction = sgn(b +w 1 x 1 + w 2 x 2 ) + + + + ++ + - - - - - - - - - - - -- - - 36

Two layers, with threshold activations In general, convex polygons Figure from Shai Shalev-Shwartz and Shai Ben-David, 2014 37

Three layers with threshold activations In general, unions of convex polygons Figure from Shai Shalev-Shwartz and Shai Ben-David, 2014 38

Neural networks are universal function approximators Any continuous function can be approximated to arbitrary accuracy using one hidden layer of sigmoid units [Cybenko 1989] Approximation error is insensitive to the choice of activation functions [DasGupta et al 1993] Two layer threshold networks can express any Boolean function Exercise: Prove this VC dimension of threshold network with edges E: VC = O( E log E ) VC dimension of sigmoid networks with nodes V and edges E: Upper bound: Ο V I E I Lower bound: Ω E I Exercise: Show that if we have only linear units, then multiple layers does not change the expressiveness 39

Neural networks are universal function approximators Any continuous function can be approximated to arbitrary accuracy using one hidden layer of sigmoid units [Cybenko 1989] Approximation error is insensitive to the choice of activation functions [DasGupta et al 1993] Two layer threshold networks can express any Boolean function Exercise: Prove this VC dimension of threshold network with edges E: VC = O( E log E ) VC dimension of sigmoid networks with nodes V and edges E: Upper bound: Ο V I E I Lower bound: Ω E I 40

Neural networks are universal function approximators Any continuous function can be approximated to arbitrary accuracy using one hidden layer of sigmoid units [Cybenko 1989] Approximation error is insensitive to the choice of activation functions [DasGupta et al 1993] Two layer threshold networks can express any Boolean function Exercise: Prove this VC dimension of threshold network with edges E: VC = O( E log E ) VC dimension of sigmoid networks with nodes V and edges E: Upper bound: Ο V I E I Lower bound: Ω E I Exercise: Show that if we have only linear units, then multiple layers does not change the expressiveness 41