AI Programming CS F-20 Neural Networks

Similar documents
Lecture 4: Perceptrons and Multilayer Perceptrons

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural networks. Chapter 19, Sections 1 5 1

CS:4420 Artificial Intelligence

Neural networks. Chapter 20, Section 5 1

Neural networks. Chapter 20. Chapter 20 1

Data Mining Part 5. Prediction

Last update: October 26, Neural networks. CMSC 421: Section Dana Nau

Lecture 5: Logistic Regression. Neural Networks

Artifical Neural Networks

Machine Learning. Neural Networks

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks biological neuron artificial neuron 1

y(x n, w) t n 2. (1)

Neural Networks and the Back-propagation Algorithm

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Course 395: Machine Learning - Lectures

Neural Nets Supervised learning

Artificial Neural Networks

Artificial neural networks

Artificial Neural Networks Examination, June 2005

Artificial Intelligence

Artificial Neural Networks Examination, March 2004

CSC321 Lecture 5: Multilayer Perceptrons

Artificial Neural Networks

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

2015 Todd Neller. A.I.M.A. text figures 1995 Prentice Hall. Used by permission. Neural Networks. Todd W. Neller

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

18.6 Regression and Classification with Linear Models

Input layer. Weight matrix [ ] Output layer

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

Unit III. A Survey of Neural Network Model

Multilayer Neural Networks

CMSC 421: Neural Computation. Applications of Neural Networks

Multilayer Neural Networks

Multilayer Perceptrons and Backpropagation

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

4. Multilayer Perceptrons

Neural Networks (Part 1) Goals for the lecture

Nonlinear Classification

Introduction Neural Networks - Architecture Network Training Small Example - ZIP Codes Summary. Neural Networks - I. Henrik I Christensen

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks DWML, /25

COMP-4360 Machine Learning Neural Networks

Sections 18.6 and 18.7 Analysis of Artificial Neural Networks

Machine Learning

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

CSC242: Intro to AI. Lecture 21

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Artificial Neural Networks

Machine Learning

Multilayer Perceptrons (MLPs)

From perceptrons to word embeddings. Simon Šuster University of Groningen

Back-Propagation Algorithm. Perceptron Gradient Descent Multilayered neural network Back-Propagation More on Back-Propagation Examples

Feedforward Neural Nets and Backpropagation

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Lecture 7 Artificial neural networks: Supervised learning

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Artificial Neural Networks. MGS Lecture 2

) (d o f. For the previous layer in a neural network (just the rightmost layer if a single neuron), the required update equation is: 2.

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Sections 18.6 and 18.7 Artificial Neural Networks

Artificial Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Artificial Neural Networks. Edward Gatt

Sections 18.6 and 18.7 Artificial Neural Networks

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Artificial Neural Networks Examination, June 2004

Artificial Neural Network

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Supervised Learning in Neural Networks

Pattern Classification

Multilayer Perceptron

Grundlagen der Künstlichen Intelligenz

Unit 8: Introduction to neural networks. Perceptrons

1 What a Neural Network Computes

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang

Neural Networks. Xiaojin Zhu Computer Sciences Department University of Wisconsin, Madison. slide 1

Introduction to Artificial Neural Networks

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning (CSE 446): Neural Networks

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Revision: Neural Network

Using a Hopfield Network: A Nuts and Bolts Approach

CSC 411 Lecture 10: Neural Networks

ECE 471/571 - Lecture 17. Types of NN. History. Back Propagation. Recurrent (feedback during operation) Feedforward

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Introduction to Neural Networks

Simple Neural Nets For Pattern Classification

CS 4700: Foundations of Artificial Intelligence

Lab 5: 16 th April Exercises on Neural Networks

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks: Basics. Darrell Whitley Colorado State University

Transcription:

AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco

20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols and relationships between them Search, logic, decision trees, etc. Assumption: Key requirement for intelligent behavior is the manipulation of symbols Neural networks are a little different: subsymbolic behavior

etirdned amosro ydoblec suelcun noxa lec rehtona morfnoxa noitazirobralanoxa sespanys 20-1: Biological Neurons espanys

20-2: Biological Neurons Biological neurons transmit (more-or-less) electrical signals Each Nerve cell is connected to the outputs of several other neurons When there is a sufficient total input from all inputs, the cell fires, and sends ouputs to other cells Extreme oversimplification, simple version is the model for Artificial Neural Networks

20-3: Artificial Neural Networks -1 Bias weight w 0 w 1 Ouput puts w 2 Σ w 3 Input weights Threshold Function or Activation Function

20-4: Activation Function Neurons are mostly binary Fire, or don t fire Not fire at 37% Model this with an activation function Step Function Sigmoid function: f (x)= 1 1+e x Talk about why the sigmoid function can be better than the step function when we do training

20-5: Single Neuron NNs = 1 = 1 w = -1.5 0 Σ 1 w = 1 1 w = 1 2 w = -0.5 0 Σ 1 w = -1 1 w = 0.5 0 Σ A single Neuron can compute a simple function What functions do each of these neurons compute?

20-6: Single Neuron NNs w = -1.5 0 1 w = -0.5 0 1 w = 0.5 0 = 1 = 1 Σ w = 1 1 w = 1 2 Σ w = -1 1 Σ AND OR NOT A single Neuron can compute a simple function What functions do each of these neurons compute?

20-7: Neural Networks Of course, things get more fun when we connect individual neurons together into a network Ouputs of some neurons feed into the inputs of other neurons Add special input nodes, used for input

20-8: Neural Networks Hidden Nodes Input Nodes Σ Σ Output Nodes Σ Σ Σ Σ Σ Σ

20-9: Neural Networks Feed Forward Networks No cycles, signals flow in one direction Recurrent Networks Cycles in signal propagation Much more complicated: Need to deal with time, learning is much harder We will focus on Feed Forward networks

20-10: Function Approximators Feed Forward Neural Networks are Nonlinear Function Approximators Output of the network is a function of its inputs Activation function is non-linear, allows for representation of non-linear functions Adjust weights, change function Neural Networks are used to efficiently approximate complex functions

20-11: Classification Common use for Neural Networks is classification We ve already seen classification with decision trees and naive bayes Map inputs into one or more outputs Output range is split into discrete classes (like spam and not spam Useful for learning tasks where what to look for is unknown Face recognition Handwriting recognition

20-12: Perceptrons Feed Forward Single-layer network Each input is directly connected to one or more outputs

20-13: Perceptrons Input Units Weights Output Units

20-14: Perceptrons For each perceptron: Threshold firing function is used (not sigmoid) Output function o: o(x 1,... x n )=1if w o + w 1 x 1 + w 2 x 2 +...+w n x n > 0 o(x 1,... x n )=0if w o + w 1 x 1 + w 2 x 2 +...+w n x n 0

20-15: Perceptrons Since each perceptron is independent of others, we can examine each in isolation Output of a single perceptrion: nj=1 W j x j > 0 (or, W x>0) Perceptrons can represent any linearly separable function Perceptrons can only represent linearly separable functions

20-16: Linearly Seperable 2 X 2 1 0 1 X 1 0 0 1 X 1 Function: X1 or X2 Function: X1 and X2

20-17: Linearly Seperable 2 X 2 1 1 0 0 1 X 1 0 0 1 X 1 Function: X1 or X2 Function: X1 and X2

20-18: Linearly Seperable X 2 1 0 0 1 X 1 Function: X1 xor X2

20-19: Perceptron Learning inptuts : in1, in2,..., inj weights : w1, w2,... wn training examples: t1 = (tin1, to1), t2 = (tin2, to2),... do for t in training examples inputs = tin o = compute output with current weights E = to - o for eacxh wi in weight wi = wi + alpha * tin[i] * E while notconverged

20-20: Perceptron Learning If the output signal is too high, weights need to be reduced Turn down weights that contributed to output Weights with zero input are not affected If output is too low, weights need to be increased Turn up weights that contribute to output Zero-input weights not affected Doing a hill-climbing search through weight space

20-21: Perceptron Example Learn the majority function with 3 inputs (plus bias input) out = 1 if j w j in j > 0, 0 otherwise α=0.2 Initially, all weights 0

20-22: Perceptron Example bias inputs expected out 1 1 0 0 0 1 0 1 1 1 1 0 1 0 0 1 1 1 1 1 1 0 0 1 0 1 1 0 1 1 1 1 1 0 1 1 0 0 0 0

20-23: Perceptron Example bias inputs expected w0 w1 w2 w3 actual new weights out out 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 1 0 0 1 1 1 1 1 1 0 0 1 0 1 1 0 1 1 1 1 1 0 1 1 0 0 0 0

20-24: Perceptron Example bias inputs expected w0 w1 w2 w3 actual new weights out out 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0.2 0 0.2 0.2 1 0 1 0 0 1 1 1 1 1 1 0 0 1 0 1 1 0 1 1 1 1 1 0 1 1 0 0 0 0

20-25: Perceptron Example bias inputs expected w0 w1 w2 w3 actual new weights out out 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0.2 0 0.2 0.2 1 0 1 0 0 0.2 0 0.2 0.2 1 0 0 0 0.2 1 1 1 1 1 1 0 0 1 0 1 1 0 1 1 1 1 1 0 1 1 0 0 0 0

20-26: Perceptron Example bias inputs expected w0 w1 w2 w3 actual new weights out out 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0.2 0 0.2 0.2 1 0 1 0 0 0.2 0 0.2 0.2 1 0 0 0 0.2 1 1 1 1 1 0 0 0 0.2 1 0 0 0 0.2 1 0 0 1 0 1 1 0 1 1 1 1 1 0 1 1 0 0 0 0

20-27: Perceptron Example bias inputs expected w0 w1 w2 w3 actual new weights out out 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0.2 0 0.2 0.2 1 0 1 0 0 0.2 0 0.2 0.2 1 0 0 0 0.2 1 1 1 1 1 0 0 0 0.2 1 0 0 0 0.2 1 0 0 1 0 0 0 0 0.2 1-0.2 0 0 0 1 1 0 1 1 1 1 1 0 1 1 0 0 0 0

20-28: Perceptron Example bias inputs expected w0 w1 w2 w3 actual new weights out out 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0.2 0 0.2 0.2 1 0 1 0 0 0.2 0 0.2 0.2 1 0 0 0 0.2 1 1 1 1 1 0 0 0 0.2 1 0 0 0 0.2 1 0 0 1 0 0 0 0 0.2 1-0.2 0 0 0 1 1 0 1 1-0.2 0 0 0 0 0 0.2 0 0.2 1 1 1 0 1 1 0 0 0 0

20-29: Perceptron Example bias inputs expected w0 w1 w2 w3 actual new weights out out 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0.2 0 0.2 0.2 1 0 1 0 0 0.2 0 0.2 0.2 1 0 0 0 0.2 1 1 1 1 1 0 0 0 0.2 1 0 0 0 0.2 1 0 0 1 0 0 0 0 0.2 1-0.2 0 0 0 1 1 0 1 1-0.2 0 0 0 0 0 0.2 0 0.2 1 1 1 0 1 0 0.2 0 0.2 1 0 0.2 0 0.2 1 0 0 0 0

20-30: Perceptron Example bias inputs expected w0 w1 w2 w3 actual new weights out out 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0.2 0 0.2 0.2 1 0 1 0 0 0.2 0 0.2 0.2 1 0 0 0 0.2 1 1 1 1 1 0 0 0 0.2 1 0 0 0 0.2 1 0 0 1 0 0 0 0 0.2 1-0.2 0 0 0 1 1 0 1 1-0.2 0 0 0 0 0 0.2 0 0.2 1 1 1 0 1 0 0.2 0 0.2 1 0 0.2 0 0.2 1 0 0 0 0 0 0.2 0 0.2 0 0 0.2 0 0.2 Still hasn t converged, need more iterations

20-31: Perceptron Example After 3 more iterations (of all weights): bias inputs expected w0 w1 w2 w3 actual out out 1 1 0 0 0-0.4 0.2 0.4 0.4 0 1 0 1 1 1-0.4 0.2 0.4 0.4 1 1 0 1 0 0-0.4 0.2 0.4 0.4 0 1 1 1 1 1-0.4 0.2 0.4 0.4 1 1 0 0 1 0-0.4 0.2 0.4 0.4 0 1 1 0 1 1-0.4 0.2 0.4 0.4 1 1 1 1 0 1-0.4 0.2 0.4 0.4 1 1 0 0 0 0-0.4 0.2 0.4 0.4 0

20-32: Gradient Descent & Delta Rule What if we can t learn the function exactly? Function is not linearly seperable Want to do as well as possible Minimize the sum of the squared error E= (t d o d ) 2 for d in the training set

20-33: Gradient Descent & Delta Rule Searching through a space of weights Much like local search, define an error E as a function of the weights Find values of weights to minimize E Follow the gradient largest negative change in E Alas, E is discontinuous, hard to differentiate Instead of using actual output, use unthresholded output

20-34: Gradient Descent & Delta Rule Garient descent: follow the steepest slope down the error surface Consider the derivative of E with respect to each weight E= (t d o d ) 2 for d in the training set de dw i = 2(t d o d ) d(t d o d ) dw i First, we will simplify for looking at a single training data point (then we can sum over all of them, since the derivative of a sum is the sum of the derivatives)

20-35: Gradient Descent & Delta Rule For a single training example: de = d(t d o d ) 2 dw i dw i = 2(t d o d ) d(t d o d ) dw i since d( f (x)2 ) dx = 2 f (x) d( f (x)) dx

20-36: Gradient Descent & Delta Rule For a single training example: de = 2(t d o d ) d(t d o d ) dw i dw i = 2(d d o d ) d( x idw i ) dw i t d doesn t involve w i, so d(t t) dw i = 0 o d = w 1 x 1d + w 2 x 2d + w 3 x 3d +..., the only term that involves w i is w i x id

20-37: Gradient Descent & Delta Rule For a single training example: de = 2(d d o d ) d( x idw i ) dw i dw i = 2(t d o d )( x id ) Since d(cx) dx = c

20-38: Gradient Descent & Delta Rule Garient descent: follow the steepest slope down the error surface Consider the derivative of E with respect to each weight E= (t d o d ) 2 for d in the training set de dw i = = 2(t d o d ) d(t d o d ) dw i 2(t d o d )( x id ) Want to go down the gradiant, w i =α d D(t d o d )x id

20-39: Gradient Descent & Delta Rule Garient descent: follow the steepest slope down the error surface Consider the derivative of E with respect to each weight After derivation, updating rule (called the Delta Rule) is: w i =α (t d o d )x id d D D is the training set,αis the training rate, t d is the expected output, and o d is the actual output, x i d is the input along weight w i.

20-40: Incremental Learning Often not practical to compute global weight change for entire training set Instead, update weights incrementally Observe one piece of data, then update Update rule: w i =α(t o)x i Like perceptron learning rule except uses unthresholded output Smaller training rateαtypically used No theoretical guarantees of convergence

20-41: Multilayer Networks While perceptrons have the advantage of a simple learning algorithm, their computational limitations are a problem. What if we add another hidden layer? Computational power increases With one hidden layer, can represent any continuous function With two hidden layers, can represent any function Example: Create a multi-layer network that computes XOR

20-42: XOR 1-1 0.6 0.6 0.5 0.5 1-0.3-0.3 0.4 Input Units 1 --.04 Hidden Units Output Unit

20-43: Multilayer Networks While perceptrons have the advantage of a simple learning algorithm, their computational limitations are a problem. What if we add another hidden layer? Computational power increases With one hidden layer, can represent any continuous function With two hidden layers, can represent any function Problem: How to find the correct weights for hidden nodes?

stinu nedih stinu tupni 20-44: Multilayer Network Example tinutuptuosa i W j,i a j W k,j a k

20-45: More on computing error Backpropagation is an extension of the perceptron learning algorithm to deal with multiple layers of nodes. Our goal is to minimize the error function. To do this, we want to change the weights so as to reduce the error. We define error as a function of the weights like so: E(w) = expected actual E(w) = expected g(input) So, to determine how to change the weights, we compute the derivative of error with respect to the weights. This tells us the slope of the error curve at that

20-46: Backpropagation Nodes use sigmoid activation function, rather than the step function Sigmoid function works in much the same way, but is differentiable. g(input i )= 1 1+e input i. g (input i )=g(input i )(1 g(input i )) (good news here -calculating the derivative only requires knowing the output!)

20-47: More on computing error Recall that our goal is to minimize the error function. To do this, we want to change the weights so as to reduce the error. We define error as a function of the weights like so: E(w) = expected actual E(w) = expected g(input) So, to determine how to change the weights, we compute the derivative of error with respect to the weights. This tells us the slope of the error curve at that point.

20-48: More on computing error E(w) = expected g(input) E(w)=expected g(w i) de dw = 0 g (input)i δ w = de dw = g(input) (1 g(input)) i

20-49: Updating hidden weights Each weight is updated byα i W j,i = W j,i +α a j i

20-50: Backpropagation Updating input-hidden weights: Idea: each hidden node is responsible for a fraction of the error inδ i. Divideδ i according to the strength of the connection between the hidden and output node. For each hidden node j δ j = g(input)(1 g(input)) i outputs W j,i δ i Update rule for input-hidden weights: W k, j = W k, j +α input k δ j

20-51: Backpropagation Algorithm The whole algorithm can be summed up as: While not done: for d in training set Apply inputs of d, propagate forward. for node i in output layer δ i = output (1 output) (t exp output) for each hidden node j δ j = output (1 output) W j,i δ i Adjust each weight W j,i = W j,i +α δ i input j

20-52: Theory vs Practice In the definition of backpropagation, a single update for all weights is computed for all data points at once. Find the update that minimizes total sum of squared error. Guaranteed to converge in this case. Problem: This is often computationally space-intensive. Requires creating a matrix with one row for each data point and inverting it. In practice, updates are done incrementally instead.

20-53: Stopping conditions Unfortunately, incremental updating is not guaranteed to converge. Also, convergence can take a long time. When to stop training? Fixed number of iterations Total error below a set threshold Convergence - no change in weights

20-54: Backpropagation Also works for multiple hidden layers Backpropagation is only guaranteed to converge to a local minimum May not find the absolute best set of weights Low initial weights can help with this Makes the network act more linearly - fewer minima Can also use random restart - train multiple times with different initial weights.

20-55: Momentum Since backpropagation is a hillclimbing algorithm, it is susceptible to getting stuck in plateaus Areas where local weight changes don t produce an improvement in the error function. A common extension to backpropagation is the addition of a momentum term. Carries the algorithm through minima and plateaus. Idea: remember the direction you were going in, and by default keep going that way. Mathematically, this means using the second derivative.

20-56: Momentum Implementing momentum typically means remembering what update was done in the previous iteration. Our update rule becomes: w ji (n)=α j x ji +β w ji (n 1) To consider the effect, imagine that our new delta is zero (we haven t made any improvement) Momentum will keep the weights moving in the same direction. Also gradually increases step size in areas where gradient is unchanging. This speeds up convergence, helps escape plateaus and local minima.

20-57: Design issues One difficulty with neural nets is determining how to encode your problem Inputs must be 1 and 0, or else real-valued numbers. Same for outputs Symbolic variables can be given binary encodings More complex concepts may require care to represent correctly.

20-58: Design issues Like some of the other algorithms we ve studied, neural nets have a number of paramters that must be tuned to get good performance. Number of layers Number of hidden units Learning rate Initial weights Momentum term Training regimen These may require trial and error to determine

20-59: Design issues The more hidden nodes you have, the more complex function you can approximate Is this always a good thing? That is, are more hidden nodes better?

20-60: Overfitting Overfitting Conisder a network with i input nodes, o output nodes, and k hidden nodes Training set has k examples Could end up learning a lookup table

20-61: Overfitting bias bias Training Data 1101 100 0110 010 1111 001 0011 100 0000 010 bias

20-62: Overfitting -2.5 1 1-1 1 1-1 -1 Training Data 1101 100 0110 010 1111 001 0011 100 0000 010

20-63: Overfitting 1 1-1 -1-1.5-1 1-1 Training Data 1101 100 0110 010 1111 001 0011 100 0000 010

20-64: Overfitting 1 1 1 1-3.5-1 -1 1 Training Data 1101 100 0110 010 1111 001 0011 100 0000 010

20-65: Overfitting -1 1-1 -1.5 1-1 Training Data 1101 100 0110 010 1111 001 0011 100 0000 010 1-1

20-66: Overfitting -1-1 -1-1 1 Training Data 1101 100 0110 010 1111 001 0011 100 0000 010-1 -1 0.5

20-67: Number Recognition

20-68: Number Recognition Each pixel is an input unit

20-69: Number Recognition...... Fully Connected...... 0 1 2 3 4 5 6 7 8 9

20-70: Recurrent NNs So far, we ve talked only about feedforward networks. Signals propagate in one direction Output is immediately available Well-understood training algorithms There has also been a great deal of work done on recurrent neural networks. At least some of the outputs are connected back to the inputs.

20-71: Recurrent NNs This is a single-layer recurrent neural network In1 Out1 In2 Out2 In3 Out3

20-72: Hopfield networks A Hopfield network has no special input or output nodes. Every node receives an input and produces an output Every node connected to every other node. Typically, threshold functions are used. Network does not immediately produce an output. Instead, it oscillates Under some easy-to-achieve conditions, the network will eventually stabilize. Weights are found using simulated annealing.

20-73: Hopfield networks Hopfield networks can be used to build an associative memory A portion of a pattern is presented to the network, and the net recalls the entire pattern. Useful for letter recognition Also for optimization problems Often used to model brain activity

20-74: Neural nets - summary Key idea: simple computational units are connected together using weights. Globally complex behavior emerges from their interaction. No direct symbol manipulation Straightforward training methods Useful when a machine that approximates a function is needed No need to understand the learned hypothesis