22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Similar documents
CS:4420 Artificial Intelligence

Ch.8 Neural Networks

Neural networks. Chapter 19, Sections 1 5 1

Data Mining Part 5. Prediction

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

Neural networks. Chapter 20, Section 5 1

Neural networks. Chapter 20. Chapter 20 1

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Artifical Neural Networks

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Artificial neural networks

Learning and Neural Networks

Lecture 4: Perceptrons and Multilayer Perceptrons

Artificial Neural Network

Artificial Neural Networks

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Feedforward Neural Nets and Backpropagation

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

AI Programming CS F-20 Neural Networks

Course 395: Machine Learning - Lectures

Artificial Neural Networks. Edward Gatt

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

18.6 Regression and Classification with Linear Models

CS 4700: Foundations of Artificial Intelligence

Lecture 7 Artificial neural networks: Supervised learning

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

CMSC 421: Neural Computation. Applications of Neural Networks

Lab 5: 16 th April Exercises on Neural Networks

Neural Networks biological neuron artificial neuron 1

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Revision: Neural Network

Introduction To Artificial Neural Networks

Linear discriminant functions

Artificial Neural Networks

Lecture 5: Logistic Regression. Neural Networks

Supervised Learning Artificial Neural Networks

Unit 8: Introduction to neural networks. Perceptrons

EE04 804(B) Soft Computing Ver. 1.2 Class 2. Neural Networks - I Feb 23, Sasidharan Sreedharan

Linear classification with logistic regression

CSC242: Intro to AI. Lecture 21

Neural Networks. Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994

y(x n, w) t n 2. (1)

Neural Networks Introduction CIS 32

CS 4700: Foundations of Artificial Intelligence

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Artificial Neural Networks

Grundlagen der Künstlichen Intelligenz

Multilayer Perceptrons (MLPs)

Multilayer Perceptrons and Backpropagation

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Introduction to Neural Networks

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Sections 18.6 and 18.7 Artificial Neural Networks

Machine Learning. Neural Networks. Le Song. CSE6740/CS7641/ISYE6740, Fall Lecture 7, September 11, 2012 Based on slides from Eric Xing, CMU

Neural Networks (Part 1) Goals for the lecture

Introduction to Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Numerical Learning Algorithms

Multilayer Perceptron

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

Artificial Neural Networks

Artificial Neural Network and Fuzzy Logic

Machine Learning. Neural Networks

Simple Neural Nets For Pattern Classification

Introduction to Artificial Neural Networks

CSC321 Lecture 5: Multilayer Perceptrons

Introduction Biologically Motivated Crude Model Backpropagation

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Neural Networks Introduction

Unit III. A Survey of Neural Network Model

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks Task Sheet 2. Due date: May

Artificial Neural Networks. Historical description

Neural Networks DWML, /25

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Lecture 4: Feed Forward Neural Networks

Neural Networks and the Back-propagation Algorithm

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation

Artificial Neural Networks. MGS Lecture 2

Machine Learning

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Introduction to Neural Networks

Artificial Intelligence (AI) Common AI Methods. Training. Signals to Perceptrons. Artificial Neural Networks (ANN) Artificial Intelligence

Artificial Intelligence

Perceptron. (c) Marcin Sydow. Summary. Perceptron

COMP9444 Neural Networks and Deep Learning 2. Perceptrons. COMP9444 c Alan Blair, 2017

Transcription:

Neural Networks Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Brains as Computational Devices Brains advantages with respect to digital computers: Massively parallel Fault-tolerant Reliable Graceful degradation Cesare Tinelli 2

A Neuron Axonal arborization Axon from another cell Synapse Dendrite Axon Nucleus Synapses Cell body or Soma Cesare Tinelli 3

Comparing Brains with Computers Computer Human Brain Computational units 1 CPU, 10 5 gates 10 11 neurons Storage units 10 9 bits RAM, 10 10 bits disk 10 11 neurons, 10 14 synapses Cycle time 10 8 sec 10 3 sec Bandwidth 10 9 bits/sec 10 14 bits/sec Neuron updates/sec 10 5 10 14 Even if a computer is one million times faster than a brain in raw speed, a brain ends up being one billion times faster than a computer at what it does. Example: Recognizing a face Brain: < 1s (a few hundred computer cycles) Computer: billions of cycles Cesare Tinelli 4

Artificial Neural Network A neural network is a graph with nodes, or units, connected by links. Each link has an associated weight, a real number. Typically, each node i outputs a real number, which is fed as input to the nodes connected to i. The output of a note is a function of the weighted sum of the node s inputs. Cesare Tinelli 5

A Neural Network Unit a j Input Links W j,i in i Σ a i = g(in i ) g a i Output Links Input Function Activation Function Output Cesare Tinelli 6

The Input Function Each incoming link of a unit i feeds an input value, or activation value, a j coming from another unit. The input function in i of a unit is simply the weighted sum of the unit s input: in i (a 1,..., a ni ) = n i j=1 W j,i a j The unit applies the activation function g i to the result of in i to produce an output. Cesare Tinelli 7

Different Activation Functions a i a i a i +1 +1 +1 t in i in i in i 1 (a) Step function (b) Sign function (c) Sigmoid function step t (x) = { 1, if x t 0, if x < t sign(x) = { +1, if x 0 1, if x < 0 sigmoid(x) = 1 1+e x Cesare Tinelli 8

Units as Logic Gates W = 1 W = 1 W = 1 W = 1 t = 1.5 t = 0.5 t = 0.5 W = 1 AND OR NOT Activation function: step t Since units can implement the,, boolean operators, neural nets are Turing-complete: they can implement any computable function. Cesare Tinelli 9

Computing with NNs Different functions are implemented by different network topologies and unit weights. The lure of NNs is that a network need not be explicitly programmed to compute a certain function f. Given enough nodes and links, a NN can learn the function by itself. It does so by looking at a training set of input/output pairs for f and modifying its topology and weights so that its own input/output behavior agrees with the training pairs. In other words, NNs too learn by induction. Cesare Tinelli 10

Learning Network Structures The structure of a NN is given by its nodes and links. The type of function a network can represent depends on the network structure. Fixing the network structure in advance can make the task of learning a certain function impossible. On the other hand, using a large network is also potentially problematic. If a network has too many parameters (ie, weights), it will simply learn the examples by memorizing them in its weights (overfitting). Cesare Tinelli 11

Learning Network Structures There are two ways to modify a network structure in accordance with the training set. 1. Optimal brain damage. 2. Tiling. Neither technique is completely satisfactory. Often, it is people that define the network structure manually by trial and error. Learning procedures are then used to learn the network weights only. Cesare Tinelli 12

Multilayer, Feed-forward Networks A kind of neural network in which links are unidirectional and form no cycles (the net is a directed acyclic graph); the root nodes of the graph are input units, their activation value is determined by the environment; the leaf nodes are output units; the remaining nodes are hidden units; units can be divided into layers: a unit in a layer is connected only to units in the next layer. Cesare Tinelli 13

A Two-layer, Feed-forward Network Output units O i W j,i Hidden units a j W k,j Input units I k Notes. This graph is upside-down: the roots of the graph are at the bottom and the (only) leaf at the top. The layer of input units is generally not counted (which is why this is a two-layer net). Cesare Tinelli 14

Multilayer, Feed-forward Networks Are a powerful computational device: with just one hidden layer, they can approximate any continuous function; with just two hidden layers, they can approximate any computable function. However, the number of needed units per layer may grow exponentially with the number of the input units. Cesare Tinelli 15

Perceptrons Single-layer, feed-forward networks whose units use a step function as activation function. I j W j,i O i I j W j O Input Units Output Units Input Units Output Unit Perceptron Network Single Perceptron Cesare Tinelli 16

Perceptrons Perceptrons caused a great stir when they were invented because it was shown that If a function is representable by a perceptron, then it is learnable with 100% accuracy, given enough training examples. The problem is that perceptrons can only represent linearly-separable functions. It was soon shown that most of the functions we would like to compute are not linearly-separable. Cesare Tinelli 17

Linearly Separable Functions on a 2-dimensional Space I 1 I 1 I 1 1 1 1? 0 0 0 0 1 I 2 I 2 0 1 0 1 I 2 I 1 I 2 I 1 I 2 (a) and (b) or (c) xor I 1 I 2 A black dot corresponds to an output value of 1. An empty dot corresponds to an output value of 0. Cesare Tinelli 18

A Linearly Separable Function on a 3-dimensional Space The minority function: Return 1 if the input vector contains less ones than zeros. Return 0 otherwise. I 1 W = 1 I 2 W = 1 t = 1.5 W = 1 I 3 (a) Separating plane (b) Weights and threshold Cesare Tinelli 19

Learning with NNs Most NN learning methods are current-best-hypothesis methods. function NEURAL-NETWORK-LEARNING(examples) returns network network a network with randomly assigned weights repeat for each e in examples do O NEURAL-NETWORK-OUTPUT(network, e) T the observed output values from e update the weights in network based on e, O, and T end until all examples correctly predicted or stopping criterion is reached return network Each cycle in the procedure above is called an epoch. Cesare Tinelli 20

The Perceptron Learning Method Weight updating in perceptrons is very simple because each output node is independent of the other output nodes. I j W j,i O i I j W j O Input Units Output Units Input Units Output Unit Perceptron Network Single Perceptron With no loss of generality then, we can consider a perceptron with a single output node. Cesare Tinelli 21

Normalizing Unit Thresholds. Notice that, if t is the threshold value of the output unit, then n n step t ( W j I j ) = step 0 ( W j I j ) j=1 where W 0 = t and I 0 = 1. Therefore, we can always assume that the unit s threshold is 0 if we include the actual threshold as the weight of an extra link with a fixed input value. This allows thresholds to be learned like any other weight. Then, we can even allow output values in [0, 1] by replacing step 0 by the sigmoid function. j=0 Cesare Tinelli 22

The Perceptron Learning Method If O is the value returned by the output unit for a given example and T is the expected output, then the unit s error is e = T O If the error e is positive we need to increase O; otherwise, we need to decrease O. Cesare Tinelli 23

The Perceptron Learning Method Since O = g( n j=0 W ji j ), we can change O by changing each W j. To increase O we should increase W j if I j is positive, decrease W j if I j is negative. To decrease O we should decrease W j if I j is positive, increase W j if I j is negative. This is done by updating each W j as follows W j W j + α I j (T O) where α is a positive constant, the learning rate. Cesare Tinelli 24

Perceptron Learning as Gradient Descent Search Provided that the learning rate constant is not too high, the perceptron will learn any linearly-separable function. Why? The perceptron learning procedure is a gradient descent search procedure whose search space has no local minima. Err a W 1 b W 2 Each possible configuration of weights for the perceptron is a state in the search space. Cesare Tinelli 25

Back-Propagation Learning Extends the the main idea of perceptron learning to multilayer networks: Assess the blame for a unit s error and divide it among the contributing weights. We start from the units in the output layer and propagate the error back to previous layers until we reach the input layer. Weight updates: Output layer. Analogous to the perceptron case. Hidden layer. By back-propagation. Cesare Tinelli 26

Updating Weights: Output Layer a j Wji g(in i ) Oi unit i W ji W ji + α a j i where i = g (in i ) (T i O i ), T i is the expected output, g is the derivative of g, in i = j W ji a j, α is the learning rate. Cesare Tinelli 27

Updating Weights: Hidden Layers a k Wkj g(in j) a j a j unit j a j where W kj W kj + α a k j j = g (in j ) i W ji i i = error of unit in the next layer that is connected to unit j Cesare Tinelli 28

The Back-propagation Procedure Choose a learning rate α Repeat until network performance is satisfactory For each training example 1. for each output node i compute i := g (in i )(T i O i ) 2. for each hidden node j compute j := g (in j ) i W ji i 3. Update each weight W rs by W rs W rs + α a r s Cesare Tinelli 29

Why Back-propagation Works Back-propagation learning as well is a gradient descent search in the weight space over a certain error surface. Where W is the vector of all the weights in the network, the error surface is given by i E(W) := (T i O i ) 2 2 The update for each weight W ji of a unit i is the opposite of the gradient (slope) of the error surface along the direction W ji, that is, a j i = E(W) W ji Cesare Tinelli 30

Why Back-propagation doesn t Always Work Producing a new vector W by adding to each W ji in W the opposite of E s slope along W ji guarantees that E(W ) E(W). Err a W 1 b W 2 In general, however, the error surface may contain local minima. Hence, convergence to an optimal set of weights is not guaranteed in back-propagation learning (contrary to perceptron learning). Cesare Tinelli 31

Evaluating Back-propagation To assess the goodness of back-propagation learning for multilayer networks one needs to consider the following issues. Expressiveness. Computational efficiency. Generalization power. Sensitivity to noise. Transparency. Background Knowledge. Cesare Tinelli 32