Ch.8 Neural Networks

Similar documents
22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

CS:4420 Artificial Intelligence

Neural networks. Chapter 19, Sections 1 5 1

Learning and Neural Networks

Neural networks. Chapter 20, Section 5 1

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Data Mining Part 5. Prediction

Neural networks. Chapter 20. Chapter 20 1

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Artificial neural networks

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

Sections 18.6 and 18.7 Artificial Neural Networks

Artifical Neural Networks

Sections 18.6 and 18.7 Artificial Neural Networks

Introduction to Neural Networks

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Artificial Neural Network

Artificial Neural Networks. Edward Gatt

Lecture 4: Feed Forward Neural Networks

Feedforward Neural Nets and Backpropagation

Lecture 7 Artificial neural networks: Supervised learning

Lecture 4: Perceptrons and Multilayer Perceptrons

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

Revision: Neural Network

Artificial Neural Networks

Neural Networks Introduction

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Artificial Neural Network and Fuzzy Logic

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Artificial Neural Networks Examination, June 2005

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Unit III. A Survey of Neural Network Model

Artificial Neural Networks Examination, June 2004

Artificial Neural Networks Examination, March 2004

CMSC 421: Neural Computation. Applications of Neural Networks

Course 395: Machine Learning - Lectures

Unit 8: Introduction to neural networks. Perceptrons

18.6 Regression and Classification with Linear Models

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction To Artificial Neural Networks

Supervised Learning Artificial Neural Networks

Artificial Neural Networks

CS 4700: Foundations of Artificial Intelligence

Machine Learning. Neural Networks

EEE 241: Linear Systems

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

AI Programming CS F-20 Neural Networks

CS 4700: Foundations of Artificial Intelligence

CSC242: Intro to AI. Lecture 21

Artificial Intelligence

Neural Networks DWML, /25

Neural Networks biological neuron artificial neuron 1

Part 8: Neural Networks

Neural Networks and the Back-propagation Algorithm

Multilayer Perceptron

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Multilayer Neural Networks

Neural Networks Introduction CIS 32

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

Lecture 5: Logistic Regression. Neural Networks

Learning and Memory in Neural Networks

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

Artificial Neural Networks. Q550: Models in Cognitive Science Lecture 5

Grundlagen der Künstlichen Intelligenz

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Multilayer Perceptron Tutorial

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation

Artificial Neural Networks The Introduction

4. Multilayer Perceptrons

Artificial Neuron (Perceptron)

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Perceptron. (c) Marcin Sydow. Summary. Perceptron

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

PV021: Neural networks. Tomáš Brázdil

Simple Neural Nets For Pattern Classification

Numerical Learning Algorithms

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

ECE521 Lectures 9 Fully Connected Neural Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

CSC321 Lecture 5: Multilayer Perceptrons

Artificial Neural Networks Examination, March 2002

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Artificial Neural Networks. Historical description

Neural Networks. Bishop PRML Ch. 5. Alireza Ghane. Feed-forward Networks Network Training Error Backpropagation Applications

Feed-forward Networks Network Training Error Backpropagation Applications. Neural Networks. Oliver Schulte - CMPT 726. Bishop PRML Ch.

Mining Classification Knowledge

y k = (a)synaptic f(x j ) link linear i/p o/p relation (b) Activation link linear i/p o/p relation

Artificial Neural Network : Training

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

In the Name of God. Lectures 15&16: Radial Basis Function Networks

Chapter 3 Supervised learning:

Neural Networks. Fundamentals of Neural Networks : Architectures, Algorithms and Applications. L, Fausett, 1994

Linear discriminant functions

Multilayer Neural Networks

Transcription:

Ch.8 Neural Networks Hantao Zhang http://www.cs.uiowa.edu/ hzhang/c145 The University of Iowa Department of Computer Science Artificial Intelligence p.1/??

Brains as Computational Devices Motivation: Algorithms developed over centuries do not fit the complexity of real world problem. The human brain is the most sophisticated computer suitable for solving extremely complex problems. Reasonable Size: 10 11 neurons (neural cells) and only a small portion of these cells are used Simple Building Blocks: No cell contains too much information. Massively parallel: Each region of the brain controls specialized tasks. Fault-tolerant: Information is saved mainly in the connections among neurons Reliable Graceful degradation Artificial Intelligence p.2/??

Comparing Brains with Computers Computer Human Brain Computational units 1 CPU, 10 5 gates 10 11 neurons Storage units 10 9 bits RAM, 10 10 bits disk 10 11 neurons, 10 14 synapses Cycle time 10 8 sec 10 3 sec Bandwidth 10 9 bits/sec 10 14 bits/sec Neuron updates/sec 10 5 10 14 Even if a computer is one million times faster than a brain in raw speed, a brain ends up being one billion times faster than a computer at what it does. Example: Recognizing a face Brain: Computer: < 1s (a few hundred computer cycles) billions of cycles Artificial Intelligence p.3/??

Biological System Axonal arborization Synapse Axon from another cell Dendrite Axon Nucleus Synapses Cell body or Soma A neuron does nothing until the collective influence of all its inputs reaches a threshold level. At that point, the neuron produces a full-strength output in the form of a narrow pulse that proceeds from the cell body, down the axon, and into the axon s branches. It fires! : Since it fires or does nothing it is considered an all or nothing device. Increases or decreases the strength of connection and causes excitation or inhibition of a subsequent neuron Artificial Intelligence p.4/??

Analogy from Biology a j Input Links Wj,i in i Σ a i = g(in i ) g a i Output Links Input Function Activation Function Output Artificial neurons are viewed as a node connected to other nodes via links that correspond to neural connections. Each link is associated with a weight. The weight determines the nature (+/-) and strength of the node s influence on another. If the influence of all the links is strong enough the node is activate (similar to the firing of a neuron). Artificial Intelligence p.5/??

A Neural Network Unit Artificial Intelligence p.6/??

Artificial Neural Network A neural network is a graph of nodes (or units), connected by links. Each link has an associated weight, a real number. Typically, each node i has several incoming links and several outgoing links. Each incomping link provides a real number as input to the node and the node sends one real number through every outgoing link. The output of a node is a function of the weighted sum of the node s inputs. Artificial Intelligence p.7/??

The Input Function Each incoming link of a unit i feeds an input value, or activation value, a j coming from another unit. The input function in i of a unit is simply the weighted sum of the unit s input: in i (a 1,...,a ni ) = n i j=1 W j,i a j The unit applies the activation function g i to the result of in i to produce an output: n i out i = g i (in i ) = g i ( j=1 W j,i a j ) Artificial Intelligence p.8/??

Typical Activation Functions a i a i a i +1 +1 +1 t in i in i in i 1 (a) Step function step t (x) = 1, if x t 0, if x < t (b) Sign function sign(x) = +1, if x 0 1, if x < 0 (c) Sigmoid function sig(x) = 1 1+e x Artificial Intelligence p.9/??

Typical Activation Functions 2 Hard limiter (binary step) f θ (x) = 1, if x > θ 0, if θ x θ 1, if x < θ Binary sigmoid (exponential sigmoid) sig(x) = 1 1 + e σx where σ controls the saturation of the curve. When σ, hard limiter is achieved. Bipolar sigmoid (atan) f(x) = tan 1 (σx) Artificial Intelligence p.10/??

Units as Logic Gates W = 1 W = 1 W = 1 W = 1 t = 1.5 t = 0.5 t = 0.5 W = 1 AND OR NOT Activation function: step t Since units can implement the,, boolean operators, neural nets are Turing-complete: they can implement any computable function. Artificial Intelligence p.11/??

Structures of Neural Networks Directed: Acyclic: Feed-Forward: Multi Layers: Nodes are grouped into layers and all links are from one layer to the next layer. Single Layer: Each node sends its output out of the network. Tree:... Arbitrary Feed:... Cyclic:... Undirected:... Artificial Intelligence p.12/??

Multilayer, Feed-forward Networks A kind of neural network in which links are directional and form no cycles (the net is a directed acyclic graph); the root nodes of the graph are input units, their activation value is determined by the environment; the leaf nodes are output units; the remaining nodes are hidden units; units can be divided into layers: a unit in a layer is connected only to units in the next layer. Artificial Intelligence p.13/??

A Two-layer, Feed-forward Network Output units O i W j,i Hidden units a j W k,j Input units I k Notes: The roots of the graph are at the bottom and the (only) leaf at the top. The layer of input units is generally not counted (which is why this is a two-layer net). Artificial Intelligence p.14/??

Example w 13 I 1 H 3 w 35 w 14 O 5 I 2 w 23 w 24 H 4 w 45 a 5 = g 5 (W 3,5 a 3 + W 4,5 a 4 ) = g 5 (W 3,5 g 3 (W 1,3 a 1 + W 2,3 a 2 ) + W 4,5 g 4 (W 1,4 a 1 + W 2,4 a 2 )) where a i is the output and g i is the activation function of node i. Artificial Intelligence p.15/??

Multilayer, Feed-forward Networks A powerful computational device: with just one hidden layer, they can approximate any continuous function; with just two hidden layers, they can approximate any computable function. However, the number of needed units per layer may grow exponentially with the number of the input units. Artificial Intelligence p.16/??

Perceptrons Single-layer, feed-forward networks whose units use a step function as activation function. I j W j,i O i I j W j O Input Units Output Units Input Units Output Unit Perceptron Network Single Perceptron Artificial Intelligence p.17/??

Perceptrons Perceptrons caused a great stir when they were invented because it was shown that If a function is representable by a perceptron, then it is learnable with 100% accuracy, given enough training examples. The problem is that perceptrons can only represent linearly-separable functions. Artificial Intelligence p.18/??

Linearly Separable Functions On a 2-dimensional space: I 1 I 1 I 1 1 1 1? 0 0 0 0 1 I 2 I 2 0 1 0 1 I 2 I 1 I 2 I 1 I 2 (a) and (b) or (c) xor I 1 I 2 A black dot corresponds to an output value of 1. An empty dot corresponds to an output value of 0. Artificial Intelligence p.19/??

How to Represent XOR function by NN w 13 I 1 H 3 w 35 w 14 O 5 I 2 w 23 w 24 H 4 w 45 a 5 = g 5 (W 3,5 a 3 + W 4,5 a 4 ) = g 5 (W 3,5 g 3 (W 1,3 a 1 + W 2,3 a 2 ) + W 4,5 g 4 (W 1,4 a 1 + W 2,4 a 2 )) where a i is the output, g i = step 0.5 is the activation function of node i, W 13 = W 24 = W 35 = W 45 = 1, W 14 = W 23 = 1. Artificial Intelligence p.20/??

A Linearly Separable Function On a 3-dimensional space: The minority function: Return 1 if the input vector contains less ones than zeros. Return 0 otherwise. I 1 W = 1 I 2 W = 1 t = 1.5 W = 1 I 3 (a) Separating plane (b) Weights and threshold Artificial Intelligence p.21/??

Computing with a 2-layer NN Artificial Intelligence p.22/??

Computing with a 2-layer NN #define NUM_INPUTS 4 #define NUM_HIDDEN_NEURONS 4 #define NUM_OUTPUT_NEURONS 3 typedef mlp_s { /* Inputs to the MLP (+1 for bias) */ double inputs[num_inputs+1]; /* Weights from Hidden to Input Layer (+1 for bias) */ double w_h_i[num_hidden_neurons+1][num_inputs+1]; /* Hidden layer */ double hidden[num_hidden+1]; /* Weights from Output to Hidden Layer (+1 for bias) */ double w_o_h[num_output_neurons][num_hidden_neurons+1]; /* Outputs of the MLP */ double outputs[num_output_neurons]; } mlp_t; double step(double x) { if (x > 0.0) return 1.0; else return 0.0; } Artificial Intelligence p.23/??

Computing with a 2-layer NN void feed_forward( mlp_t *mlp ) { int i, h, out; /* Feed the inputs to the hidden layer through the hidden to input weights. */ for ( h = 0 ; h < NUM_HIDDEN_NEURONS ; h++ ) { mlp->hidden[h] = 0.0; for ( i = 0 ; i < NUM_INPUT_NEURONS+1 ; i++ ) { mlp->hidden[h] += ( mlp->inputs[i] * mlp->w_h_i[h][i] ); } mlp->hidden[h] = step( mlp->hidden[h] ); } /* Feed the hidden layer activations to the output layer * through the output to hidden weights. */ for( out = 0 ; out < NUM_OUTPUT_NEURONS ; out++ ) { mlp->output[out] = 0.0; for ( h = 0 ; h < NUM_HIDDEN_NEURONS ; h++ ) { mlp->outputs[out] += ( mlp->hidden[h] * mlp->w_o_h[out][h] ); } mlp->outputs[out] = step( mlp->outputs[out] ); }} Artificial Intelligence p.24/??

Applications of Neural Networks Signal and Image Processing Signal prediction (e.g., weather prediction) Adaptive noise cancellation Satellite image analysis Multimedia processing Bioinformatics Functional classification of protein and genes Clustering of genesbased on DNA microarray data Artificial Intelligence p.25/??

Applications of Neural Networks Astronomy Classification of objects (stars and galaxies) Compression of astronomical data Finance and Marketing Stock market prediction Fraud detection Loan approval Product bundling Strategic planning Artificial Intelligence p.26/??

Computing with NNs Different functions are implemented by different network topologies and unit weights. The lure of NNs is that a network need not be explicitly programmed to compute a certain function f. Given enough nodes and links, a NN can learn the function by itself. It does so by looking at a training set of input/output pairs for f and modifying its topology and weights so that its own input/output behavior agrees with the training pairs. In other words, NNs learn by induction, too. Artificial Intelligence p.27/??

Learning = Training in NN Neural networks are trained using data referred to as a training set. The process is one of computing outputs, compare outputs with desired answers, adjust weights and repeat. The information of a Neural Network is in its structure, activation functions, weights, and Learning to use different structures and activation functions is very difficult. These weights are used to express the relative strength of an input value or from a connecting unit (i.e., in another layer). It is by adjusting these weights that a neural network learns. Artificial Intelligence p.28/??

Process for Developing NN 1. Collect data Ensure that application is amenable to a NN approach and pick data randomly. 2. Separate Data into Training Set and Test Set 3. Define a Network Structure Are perceptrons sufficient? 4. Select a Learning Algorithm Decided by available tools 5. Set Parameter Values They will affect the length of the training period. 6. Training Determine and revise weights 7. Test If not acceptable, go back to steps 1, 2,..., or 5. 8. Delivery of the product Artificial Intelligence p.29/??

The Perceptron Learning Method Weight updating in perceptrons is very simple because each output node is independent of the other output nodes. I j W j,i O i I j W j O Input Units Output Units Input Units Output Unit Perceptron Network Single Perceptron With no loss of generality then, we can consider a perceptron with a single output node. Artificial Intelligence p.30/??

Normalizing Unit Thresholds Notice that, if t is the threshold value of the output unit, then step t ( n W j I j ) = step 0 ( j=1 where W 0 = t and I 0 = 1. n W j I j ) Therefore, we can always assume that the unit s threshold is 0 if we include the actual threshold as the weight of an extra link with a fixed input value. j=0 This allows thresholds to be learned like any other weight. Then, we can even allow output values in [0, 1] by replacing step 0 by the sigmoid function. Artificial Intelligence p.31/??

The Perceptron Learning Method If O is the value returned by the output unit for a given example and T is the expected output, then the unit s error is Err = T O If the error Err is positive we need to increase O; otherwise, we need to decrease O. Artificial Intelligence p.32/??

The Perceptron Learning Method Since O = g( n j=0 W ji j ), we can change O by changing each W j. Assuming g is monotonic, to increase O we should increase W j if I j is positive, decrease W j if I j is negative. Similarly, to decrease O we should decrease W j if I j is positive, increase W j if I j is negative. This is done by updating each W j as follows W j W j + α I j (Err) where α is a positive constant, the learning rate, and Err = T O Artificial Intelligence p.33/??

Theoretic Background Learn by adjusting weights to reduce error on training set The squared error for an example with input x and true output y is E = 1 2 Err2 = 1 2 (y h W(x)) 2, Perform optimization search by gradient descent: E = Err Err = Err n y g( W j x j ) W j W j W j = Err g (in) x j j=0 Artificial Intelligence p.34/??

Weight update rule W j W j α E W j = W j + α Err g (in) x j E.g., positive error = increasing network output = increasing weights on positive inputs and decreasing on negative inputs. Simple weight update rule (assuming g (in) constant): W j W j + α Err x j Artificial Intelligence p.35/??

A 5-place Minority Function At first, collect the data (see below), then choose a structure (a perceptron with five inputs and one output) and the activation function (i.e., step 3 ). Finally, set up parameters (i.e., W i = 0) and start to learn: Assumping α = 1, we have Sum = 5 i=1 W ii i, Out = step 3 (Sum), Err = T Out, and W j W j + I j Err. I 1 I 2 I 3 I 4 I 5 T W 1 W 2 W 3 W 4 W 5 Sum Out Err e 1 1 0 0 0 1 1 0 0 0 0 0 e 2 1 1 0 1 0 0 e 3 0 0 0 1 1 1 e 4 1 1 1 1 0 0 e 5 0 1 0 1 0 1 e 6 0 1 1 1 1 0 e 7 0 1 0 1 0 1 e 8 1 0 1 0 0 1 Artificial Intelligence p.36/??

A 5-place Minority Function The same as the last example, except that α = 0.5 instead of α = 1, and initial W i. Sum = 5 i=1 W ii i, Out = step 3 (Sum), Err = T Out, and W j W j + I j Err. I 1 I 2 I 3 I 4 I 5 T W 1 W 2 W 3 W 4 W 5 Sum Out Err e 1 1 0 0 0 1 1 1 1 1 1 1 e 2 1 1 0 1 0 0 e 3 0 0 0 1 1 1 e 4 1 1 1 1 0 0 e 5 0 1 0 1 0 1 e 6 0 1 1 1 1 0 e 7 0 1 0 1 0 1 e 8 1 0 1 0 0 1 Artificial Intelligence p.37/??

Multilayer Perceptrons (MLP) Layers are usually fully connected; numbers of hidden units typically chosen by hand Output units a i W j,i Hidden units a j W k,j Input units a k All continuous functions with 2 layers, all functions with 3 layers Artificial Intelligence p.38/??

Sigmoid Function in MLP g(x) = 1 1 + e cx As c goes bigger, the curve becomes sharper: Artificial Intelligence p.39/??

Sigmoid Function in Perceptron g(x) = 1 1 + e cx Only linearly-separable functions: Artificial Intelligence p.40/??

Sigmoid Function in Two Layer MLP g(x) = 1 1 + e cx Two hidden nodes produce ridge-like functions: h W (x 1, x 2 ) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-4 -2 0 x 1 2 4-4 -2 0 2 4 x 2 Artificial Intelligence p.41/??

Sigmoid Function in Two Layer MLP g(x) = 1 1 + e cx Four hidden nodes produce bump-like functions: h W (x 1, x 2 ) 0.9 1 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-4 -2 0 x 2-4 -2 0 1 4 2 4 x 2 Artificial Intelligence p.42/??

Errors with Sigmoid Functions E W j = Err g (in) x j W j W j α E W j = W j + α Err g (in) x j Assuming g(x) = g (x) = 1 (1 + e x ) e x (1 + e x = g(x)(1 g(x)) ) 2 Eq (8.8) on the page 267 of the textbook, g (u) = u(1 u), is a typo. Artificial Intelligence p.43/??

Back-propagation Learning 1. Phase 1: Propagation (a) Forward propagation of a training example s input to get the output O (b) Backward propagation of the output error to generate the deltas of all neural nodes (neurons). 2. Phase 2: Weight update (a) Multiply its output delta and inputs activation to get the gradient of the weight. (b) Bring the wright in the opposite direction of the gradient by substracting a ratio of it from the weight. (Most neuroscientists deny that back-propagation occurs in the brain) Artificial Intelligence p.44/??

Back-propagation Learning Output layer: similar to the single-layer perceptron, let i = Err i g (in i ) (called E O in (Eq 8.6)). W ji W ji + α a j i Hidden layer: back-propagate the error from the output layer, j = ( i W ji i ) g (in j ). (called E h in (Eq 8.7) which has typos). The update rule is identical: W kj W kj + α a k j. Artificial Intelligence p.45/??

Back-propagation Learning Initialize the weights in the network (often randomly while (stopping criterion not meet) For each example e in the training set O = neural-net-output(network, e); \\ forward ph T = teacher output for e Calculate err = (T - O) at the output units \\ backward phase Compute delta_wh for all weights from hidden lay to output layer; Compute delta_wi for all weights from input laye to hidden layer; Update the weights in the network Return the network Artificial Intelligence p.46/??

Learning XOR Function Assuming α = 0.5, g = 1; S 1 = u u 1 + v v 1, a 1 = step 0.5 (S 1 ), S 2 = u u 2 + v v 2, a 2 = step 0.5 (S 2 ), S 3 = a 1 w 1 + a 2 w 2, Out = step 0.5 (S 3 ), Err = T Out, w j w j + αa j Err for j = 1,2, u i u i + αu w 1 Err, and v i v i + αv w 2 Err, for i = 1,2. Artificial Intelligence p.48/??

Learning XOR Function Assuming α = 0.5, g = 1; S 1 = u u 1 + v v 1, a 1 = step 0.5 (S 1 ), S 2 = u u 2 + v v 2, a 2 = step 0.5 (S 2 ), S 3 = a 1 w 1 + a 2 w 2, Out = step 0.5 (S 3 ), Err = T Out, w j w j + αa j Err for j = 1,2, u i u i + αu w 1 Err, and v i v i + αv w 2 Err, for i = 1,2. u v T u 1 u 2 v 1 v 2 w 1 w 2 S 1 a 1 S 2 a 2 S 3 Out Err e 1 0 0 0 1 1 1 1 1 1 e 2 0 1 1 e 3 1 1 1 e 4 1 1 0 e 1 0 0 0 Artificial Intelligence p.49/??

Handwriting Digits The Neural Network has 35 input nodes for such an image. The NN is trained by these perfect examples many times. Artificial Intelligence p.50/??

Handwriting Digits The Neural Network has 10 hidden nodes, fully connected to all the input nodes (350 edges), and fully connected to all 10 output nodes (100 edges). Each output node represents one type of digits. The final result is decided by the maximum output node (winner takes all). Artificial Intelligence p.51/??

Neural Network Summary Advantages Easy to adapt to unknown situations Robustness: fault tolerance due to network redundancy Autonomous learning and generalization Disadvantages Poor accuracy Large complexity of the network structure Over-fitting to training examples (cannot generalize well) The solution is a black box (no insights into the problem) Artificial Intelligence p.52/??

Nearest Neighbor Classification Compute the distances from the input to all the examples; Choose k examples which are nearest neighbors of the input; Decide the majority class of these k neighbors. Question: Can this technique of classification be regarded as neural network? Artificial Intelligence p.53/??

Probabilistic Neural Network (PNN) Compute the distances from the input to all the examples; Cumulate the normalized distances for each class of the examples; Decide the majority class of the normalized distances for each class (winner takes all). Artificial Intelligence p.54/??

Probabilistic Neural Network (PNN) Compute the distances from the input to all the examples: h i = E i F = n j=1 e ija j (Eq 8.13), where E i = (e i1,e i2,...,e in ) is an example and F = (a 1,a 2,...,a n ) is the input. Some people also use Euclidean distance: h i = n j=1 (e ij a j ) 2. Cumulate the normalized distances for each class of the examples: Initially c j := 0, c j +=e (h i 1)/γ 2 /N j for each h i, where c j is the accumulated normalized distance for class j, and N j is the number of examples of class j. γ is the smoothing factor. This normalized distance is called normalized Radial-Basis Form (RBF) in Probability Theory (a kind of probability density). Other normalized distances are also used in practice. Decide the majority class of the normalized distances for each class (winner takes all). Artificial Intelligence p.55/??

Summary Learning needed for unknown environments, lazy designers Learning method depends on type of performance element, available feedback, type of component to be improved, and its representation For supervised learning, the aim is to find a simple hypothesis approximately consistent with training examples Learning performance = prediction accuracy measured on test set Many applications: speech, driving, handwriting, credit cards, etc. Artificial Intelligence p.56/??