Artificial Neural Networks

Similar documents
Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 20. Chapter 20 1

Multilayer Perceptrons (MLPs)

Neural networks. Chapter 20, Section 5 1

Lecture 5: Logistic Regression. Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Artificial neural networks

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Feed-forward Network Functions

Neural Networks: Introduction

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

CS:4420 Artificial Intelligence

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Introduction to Machine Learning

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Networks biological neuron artificial neuron 1

CSC 411 Lecture 10: Neural Networks

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Neural networks and optimization

4. Multilayer Perceptrons

The XOR problem. Machine learning for vision. The XOR problem. The XOR problem. x 1 x 2. x 2. x 1. Fall Roland Memisevic

Artificial Neural Networks. MGS Lecture 2

Sections 18.6 and 18.7 Artificial Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks

Artificial Neural Networks

Neural Networks and the Back-propagation Algorithm

Neural Networks and Deep Learning

Grundlagen der Künstlichen Intelligenz

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

CSC321 Lecture 5: Multilayer Perceptrons

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Feedforward Neural Nets and Backpropagation

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Reading Group on Deep Learning Session 1

CMSC 421: Neural Computation. Applications of Neural Networks

Unit 8: Introduction to neural networks. Perceptrons

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Lecture 17: Neural Networks and Deep Learning

Course 395: Machine Learning - Lectures

Artificial Neural Networks

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

SGD and Deep Learning

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Artificial Neural Networks. Edward Gatt

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Artificial Neural Networks

Introduction to feedforward neural networks

Statistical NLP for the Web

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Machine Learning. Neural Networks

Artificial Neuron (Perceptron)

RegML 2018 Class 8 Deep learning

Logistic Regression & Neural Networks

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Computational statistics

Neural Nets Supervised learning

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Learning and Neural Networks

Artifical Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

CSC242: Intro to AI. Lecture 21

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Unit III. A Survey of Neural Network Model

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Stanford Machine Learning - Week V

Data Mining Part 5. Prediction

Intro to Neural Networks and Deep Learning

Introduction to Convolutional Neural Networks (CNNs)

Artificial Intelligence

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Machine Learning (CSE 446): Backpropagation

Neural networks and optimization

Neural Networks. Henrik I. Christensen. Computer Science and Engineering University of California, San Diego

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Multilayer Neural Networks

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

How to do backpropagation in a brain

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

From perceptrons to word embeddings. Simon Šuster University of Groningen

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Transcription:

Artificial Neural Networks Oliver Schulte - CMPT 310

Neural Networks Neural networks arise from attempts to model human/animal brains Many models, many claims of biological plausibility We will focus on statistical and computational properties rather than plausibility An artificial neural network is a general function approximator The inner or hidden layers compute learned auxilliary functions

Uses of Neural Networks Pros Good for continuous input variables. General continuous function approximators. Highly non-linear. Trainable basis functions. Good to use in continuous domains with little knowledge: When you don t know good features. You don t know the form of a good functional model. Cons Not interpretable, black box. Learning is slow. Good generalization can require many datapoints.

Function Approximation Demos Home Value of Hockey State https://user-images. githubusercontent.com/22108101/ 28182140-eb64b49a-67bf-11e7-97aa-046298f721e5. jpg Function Learning Examples (open in Safari) http://neuron.eng.wayne.edu/ bpfunctionapprox/bpfunctionapprox.html

Applications There are many, many applications. World-Champion Backgammon Player. http://en.wikipedia.org/wiki/td-gammon http://en.wikipedia.org/wiki/backgammon No Hands Across America Tour. http://www.cs.cmu.edu/afs/cs/usr/tjochem/ www/nhaa/nhaa_home_page.html Digit Recognition with 99.26% accuracy. Speech Recognition http://research.microsoft.com/en-us/news/ features/speechrecognition-082911.aspx http://deeplearning.net/demos/

Outline Feed-forward Networks Network Training Error Backpropagation Applications

Outline Feed-forward Networks Network Training Error Backpropagation Applications

No Hands Across America Sharp Left Straight Ahead Sharp Right 30 Output Units 4 Hidden Units 30x32 Sensor Input Retina

Non-linear Activation Functions Pass input in j through a non-linear activation function g( ) to get output a j = g(in j ) Model of an individual neuron from Russell and Norvig, AIMA3e

Non-linear Activation Functions Pass input in j through a non-linear activation function g( ) to get output a j = g(in j ) Model of an individual neuron from Russell and Norvig, AIMA3e

Non-linear Activation Functions Pass input in j through a non-linear activation function g( ) to get output a j = g(in j ) Model of an individual neuron Bias Weight a 0 = 1 a j = g(in j ) a i wi,j w 0,j Σ in j g a j Input Links Input Function Activation Function Output Output Links from Russell and Norvig, AIMA3e

Network of Neurons 1 w 1,3 3 1 w 1,3 3 w 3,5 5 w 1,4 w 1,4 w 3,6 2 w 2,3 w 2,4 4 2 w 2,3 w 2,4 4 w 4,5 w 4,6 6 (a) (b)

Activation Functions Can use a variety of activation functions Sigmoidal (S-shaped) Logistic sigmoid 1/(1 + exp( a)) (useful for binary classification) Hyperbolic tangent tanh Softmax Useful for multi-class classification Rectified Linear Unit (RLU) max(0, x)... Should be differentiable for gradient-based learning (later) Can use different activation functions in each unit See http://aispace.org/neural/.

Function Composition Think logic circuits h W (x 1, x 2 ) 1 0.8 0.6 0.4 0.2 0-4 -2 0 x 1 2 4-4 -2 0 2 4 x 2 h W (x 1, x 2 ) 1 0.8 0.6 0.4 0.2 0-4 -2 0 x 1 2 4-4 -2 0 Two opposite-facing sigmoids = ridge. Two ridges = bump. 2 4 x 2

x 1 x 2 The XOR Problem Revisited x 2 1 z=-1 R 2 z=+1-1 1 R 2 z=-1 R 1 x 1-1

The XOR Problem Solved 1 z x 2 0-1 -1 0 1 z k 0-1 x 1 1 output k y 1 y 2 x 2 1 0 1-1 0-1 0-1 1 x 1 bias -1.7-1.5.5 1 1 1 -.4 y 1 y 2 1 w kj hidden j w ji 1 0-1 -1 0 1 0-1 x 1 1 x 2 input i x 1 x 2

Hidden Units Compute Auxilliary Functions red dots = network function dashed line = hidden unit activation function. blue dots = data points Network function is roughly the sum of activation functions.

Hidden Units As Feature Extractors sample training patterns......... learned input-to-hidden weights FIGURE 6.13. The top images represent patterns from a large training set used to train a 64 input nodes 64-2-3 sigmoidal network for classifying three characters. The bottom figures show the input-to-hidden 2 hidden weights, unitsrepresented as patterns, at the two hidden units after training. Note that learned these learned weight weights matrix indeed at hidden describeunits feature groupings useful for the classification task. In large networks, such patterns of learned weights may be difficult to

Outline Feed-forward Networks Network Training Error Backpropagation Applications

Network Training Given a specified network structure, how do we set its parameters (weights)? As usual, we define a criterion to measure how well our network performs, optimize against it Training data are (x n, y n ) Corresponds to neural net with multiple output nodes Given a set of weight values w, the network defines a function h w (x). Can train by minimizing L2 loss: E(w) = N h w (x n ) y n ) 2 = n=1 where k indexes the output nodes N (y k a k ) 2 n=1 k

Network Training Given a specified network structure, how do we set its parameters (weights)? As usual, we define a criterion to measure how well our network performs, optimize against it Training data are (x n, y n ) Corresponds to neural net with multiple output nodes Given a set of weight values w, the network defines a function h w (x). Can train by minimizing L2 loss: E(w) = N h w (x n ) y n ) 2 = n=1 where k indexes the output nodes N (y k a k ) 2 n=1 k

Parameter Optimization E(w) w A w B wc w 1 w 2 E For either of these problems, the error function E(w) is nasty Nasty = non-convex Non-convex = has local minima

Gradient Descent The function h w (x) implemented by a network is complicated. No closed-form: Use gradient descent. It isn t obvious how to compute error function derivatives with respect to hidden weights. The credit assignment problem. Backpropagation solves the credit assignment problem

Outline Feed-forward Networks Network Training Error Backpropagation Applications

Error Backpropagation Backprop is an efficient method for computing error derivatives E w ij for all weights in the network. Intuition: 1. Calculating derivatives for weights connected to output nodes is easy. 2. Treat the derivatives as virtual error, compute derivative of error for nodes in previous layer. 3. Repeat until you reach input nodes. This procedure propagates backwards the output error signal through the network. Stochastic Gradient Descent: Fix input x x n and target output y y n, resulting in error E n.

Error at the output nodes First, feed training example x n forward through the network, storing all node activations a i Calculating derivatives for weights connected to output nodes is easy. like logistic regression with input features a i For output node j with activation a j = g(in j ) = g( i w ija i ): E n w ij = 1 w ij 2 (y j a j ) 2 = a j g (in j ) (y j a j ) 0 if no error, or if input a i from node i is 0. Modified Error: [j] g (in j )(y j a j ). Gradient Descent Weight Update: w ij w ij + α a i [j]

Error at the output nodes First, feed training example x n forward through the network, storing all node activations a i Calculating derivatives for weights connected to output nodes is easy. like logistic regression with input features a i For output node j with activation a j = g(in j ) = g( i w ija i ): E n w ij = 1 w ij 2 (y j a j ) 2 = a j g (in j ) (y j a j ) 0 if no error, or if input a i from node i is 0. Modified Error: [j] g (in j )(y j a j ). Gradient Descent Weight Update: w ij w ij + α a i [j]

Error at the hidden nodes Consider a hidden node i connected to downstream nodes in the next layer. The modified error signal [i] is node activation derivative, times the weighted sum of contributions to the connected errors. In symbols, [i] = g (in i ) j w ij [j].

backprop with new nota/on Backpropagation Picture output ω 1 ω 2 ω 3 ω k ω c...... δ 1 δ 2 δ 3 wj3 δ k δ c w kj hidden δ j w ij input The error signal at a hidden unit is proportional to the error signals at the units it influences: [j] = g (in j ) k w jk [k].

The Backpropagation Algorithm 1. Apply input vector x n and forward propagate to find all inputs in i and activation output levels a i. 2. Evaluate the error signals [j] for all output nodes. 3. Backpropagate the [j] to obtain error signals [i] for each hidden node i. 4. Update each weight vector w ij using w ij := w ij + α a i [j]. Demo AIspace http://aispace.org/neural/.

Other Learning Topics Regularization: L2-regularizer (weight decay). Prune Weights: the Optimal Brain Method. Experimenting with Network Architectures is often key.

Outline Feed-forward Networks Network Training Error Backpropagation Applications

Applications of Neural Networks Many success stories for neural networks Credit card fraud detection Hand-written digit recognition Face detection Autonomous driving (CMU ALVINN)

Hand-written Digit Recognition MNIST - standard dataset for hand-written digit recognition 60000 training, 10000 test images

LeNet-5 INPUT 32x32 C1: feature maps 6@28x28 C3: f. maps 16@10x10 S4: f. maps 16@5x5 S2: f. maps 6@14x14 C5: layer 120 F6: layer 84 OUTPUT 10 Convolutions Subsampling Convolutions Full connection Gaussian connections Subsampling Full connection LeNet developed by Yann LeCun et al. Convolutional neural network Local receptive fields (5x5 connectivity) Subsampling (2x2) Shared weights (reuse same 5x5 filter ) Breaking symmetry See http://www.codeproject.com/kb/library/neuralnetrecognition.aspx

4!>6 3!>5 8!>2 2!>1 5!>3 4!>8 2!>8 3!>5 6!>5 7!>3 9!>4 8!>0 7!>8 5!>3 8!>7 0!>6 3!>7 2!>7 8!>3 9!>4 8!>2 5!>3 4!>8 3!>9 6!>0 9!>8 4!>9 6!>1 9!>4 9!>1 9!>4 2!>0 6!>1 3!>5 3!>2 9!>5 6!>0 6!>0 6!>0 6!>8 4!>6 7!>3 9!>4 4!>6 2!>7 9!>7 4!>3 9!>4 9!>4 9!>4 8!>7 4!>2 8!>4 3!>5 8!>4 6!>5 8!>5 3!>8 3!>8 9!>8 1!>5 9!>8 6!>3 0!>2 6!>5 9!>5 0!>7 1!>6 4!>9 2!>1 2!>8 8!>5 4!>9 7!>2 7!>2 6!>5 9!>7 6!>1 5!>6 5!>0 4!>9 2!>8 The 82 errors made by LeNet5 (0.82% test error rate)

Conclusion Feed-forward networks can be used for regression or classification Learning is more difficult, error function not convex Use stochastic gradient descent, obtain (good?) local minimum Backpropagation for efficient gradient computation