AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Similar documents
Unit 8: Introduction to neural networks. Perceptrons

Introduction to Machine Learning

Lecture 4: Perceptrons and Multilayer Perceptrons

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Course 395: Machine Learning - Lectures

CMSC 421: Neural Computation. Applications of Neural Networks

Neural Networks biological neuron artificial neuron 1

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Artificial Neural Networks

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Artificial Neural Networks

Input layer. Weight matrix [ ] Output layer

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Lab 5: 16 th April Exercises on Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Artificial Neural Networks

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Artificial Neural Networks. MGS Lecture 2

CSC321 Lecture 5: Multilayer Perceptrons

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Machine Learning

COMP-4360 Machine Learning Neural Networks

Introduction To Artificial Neural Networks

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Answers Machine Learning Exercises 4

Multilayer Perceptron

Machine Learning. Neural Networks

Multilayer Perceptrons (MLPs)

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Nonlinear Classification

Neural Networks (Part 1) Goals for the lecture

Linear discriminant functions

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Artificial Intelligence

Artificial Neural Networks Examination, June 2004

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units

Multilayer Neural Networks

Multilayer Neural Networks

AI Programming CS F-20 Neural Networks

y(x n, w) t n 2. (1)

Neural networks. Chapter 19, Sections 1 5 1

Machine Learning. Neural Networks. Le Song. CSE6740/CS7641/ISYE6740, Fall Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU

Neural networks. Chapter 20. Chapter 20 1

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Chapter 2 Single Layer Feedforward Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning (CSE 446): Neural Networks

CSC321 Lecture 4 The Perceptron Algorithm

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

Data Mining Part 5. Prediction

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Midterm: CS 6375 Spring 2015 Solutions

Module 12. Machine Learning. Version 2 CSE IIT, Kharagpur

Revision: Neural Network

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

Neural Networks and the Back-propagation Algorithm

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Feedforward Neural Nets and Backpropagation

Introduction to feedforward neural networks

Machine Learning

Lecture 6. Notes on Linear Algebra. Perceptron

Multilayer Perceptron

CSC321 Lecture 4: Learning a Classifier

Neural Networks: Introduction

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Simple Neural Nets For Pattern Classification

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Introduction to Artificial Neural Networks

The Perceptron. Volker Tresp Summer 2016

Logistic Regression & Neural Networks

Artifical Neural Networks

17 Neural Networks NEURAL NETWORKS. x XOR 1. x Jonathan Richard Shewchuk

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

ML4NLP Multiclass Classification

Learning and Neural Networks

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Single layer NN. Neuron Model

Artificial Neural Networks Examination, June 2005

Lecture 5: Logistic Regression. Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Machine Learning and Data Mining. Linear classification. Kalev Kask

CS:4420 Artificial Intelligence

In the Name of God. Lecture 11: Single Layer Perceptrons

Neural Networks Lecture 4: Radial Bases Function Networks

CSC321 Lecture 4: Learning a Classifier

Numerical Learning Algorithms

Neural Networks. Intro to AI Bert Huang Virginia Tech

Sections 18.6 and 18.7 Artificial Neural Networks

CSC 411 Lecture 10: Neural Networks

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Transcription:

AN INTRODUCTION TO NEURAL NETWORKS Scott Kuindersma November 12, 2009

SUPERVISED LEARNING We are given some training data: We must learn a function If y is discrete, we call it classification If it is continuous, we call it regression

ARTIFICIAL NEURAL NETWORKS Artificial neural networks are one technique that can be used to solve supervised learning problems Very loosely inspired by biological neural networks real neural networks are much more complicated, e.g. using spike timing to encode information Neural networks consist of layers of interconnected units

PERCEPTRON UNIT The simplest computational neural unit is called a perceptron The input of a perceptron is a real vector x The output is either 1 or -1 Therefore, a perceptron can be applied to binary classification problems Whether or not it will be useful depends on the problem... more on this later...

PERCEPTRON UNIT[MITCHELL 1997]

SIGN FUNCTION

EXAMPLE Suppose we have a perceptron with 3 weights: On input x1 = 0.5, x2 = 0.0, the perceptron outputs: where x0 = 1

LEARNING RULE Now that we know how to calculate the output of a perceptron, we would like to find a way to modify the weights to produce output that matches the training data This is accomplished via the perceptron learning rule for an input pair where, again, x0 = 1 Loop through the training data until (nearly) all examples are classified correctly

MATLAB EXAMPLE

LIMITATIONS OF THE PERCEPTRON MODEL Can only distinguish between linearly separable classes of inputs Consider the following data:

PERCEPTRONS AND BOOLEAN FUNCTIONS Suppose we let the values (1,-1) correspond to true and false, respectively Can we describe a perceptron capable of computing the AND function? What about OR? NAND? NOR? XOR? Let s think about it geometrically

BOOLEAN FUNCS CONT D AND OR NAND NOR

EXAMPLE: AND Let pand(x1,x2) be the output of the perceptron with weights w0 = -0.3, w1 = 0.5, w2 = 0.5 on input x1, x2 x1 x2 pand(x1,x2) -1-1 -1-1 1-1 1-1 -1 1 1 1

XOR

XOR XOR cannot be represented by a perceptron, but it can be represented by a small network of perceptrons, e.g., x1 x2 x1 x2 OR NAND AND

PERCEPTRON CONVERGENCE The perceptron learning rule is not guaranteed to converge if the data is not linearly separable We can remedy this situation by considering linear unit and applying gradient descent The linear unit is equivalent to a perceptron without the sign function. That is, its output is given by: where x0 = 1

LEARNING RULE DERIVATION Goal: a weight update rule of the form First we define a suitable measure of error Typically we choose a quadratic function so we have a global minimum

ERROR SURFACE [MITCHELL 1997]

LEARNING RULE DERIVATION The learning algorithm should update each weight in the direction that minimizes the error according to our error function That is, the weight change should look something like

GRADIENT DESCENT

GRADIENT DESCENT Good: guaranteed to converge to the minimum error weight vector regardless of whether the training data are linearly separable (given that α is sufficiently small) Bad: still can only correctly classify linearly separable data

NETWORKS In general, many-layered networks of threshold units are capable of representing a rich variety of nonlinear decision surfaces However, to use our gradient descent approach on multi-layered networks, we must avoid the non-differentiable sign function Multiple layers of linear units can still only represent linear functions Introducing the sigmoid function...

SIGMOID FUNCTION

SIGMOID UNIT [MITCHELL 1997]

EXAMPLE Suppose we have a sigmoid unit k with 3 weights: On input x1 = 0.5, x2 = 0.0, the unit outputs:

NETWORK OF SIGMOID UNITS o 2 o 3 o 4 2 3 4 output layer w 02 0 1 hidden layer w 31 x 0 x 1 x 2 x 3

EXAMPLE 3 1.0.5 -.5 1 2.1.2.3 3.2 0 -.2 x 0 x1 x 2

EXAMPLE 3 1.0.5 -.5 1 2 output 0.8 0.75 0.7.1.2.3 3.2 0 -.2 x 0 x1 x 2 0.65 2 1.5 1 0.5 0 0.5 x2 1 1.5 2 2 1 0 x1 1 2

BACK-PROPAGATION Really just applying the same gradient descent approach to our network of sigmoid units We use the error function:

BACKPROP ALGORITHM

BACKPROP CONVERGENCE Unfortunately, there may exist many local minima in the error function Therefore we cannot guarantee convergence to an optimal solution as in the single linear unit case Time to convergence is also a concern Nevertheless, backprop does reasonably well in many cases

MATLAB EXAMPLE Quadratic decision boundary Single linear unit vs. Three-sigmoid unit backprop network... GO!

BACK TO ALVINN ALVINN was a 1989 project at CMU in which an autonomous vehicle learned to drive by watching a person drive ALVINN's architecture consists of a single hidden layer backpropagation network The input layer of the network is a 30x32 unit two dimensional "retina" which receives input from the vehicles video camera The output layer is a linear representation of the direction the vehicle should travel in order to keep the vehicle on the road

ALVINN

REPRESENTATIONAL POWER OF NEURAL NETWORKS Every boolean function can be represented by a network with two layers of units Every bounded continuous function can be approximated to arbitrarily accuracy by a two-layer network of sigmoid hidden units and linear output units Any function can be approximated to arbitrarily accuracy by a three layer network sigmoid hidden units and linear output units

READING SUGGESTIONS Mitchell, Machine Learning, Chapter 4 Russell and Norvig, AI a Modern Approach, Chapter 20