Artificial Intelligence


 August Hoover
 5 months ago
 Views:
Transcription
1 Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory
2 Announcements Be making progress on your projects!
3 Three Types of Learning Unsupervised Supervised Reinforcement
4 Perceptron Learning Rule Typically applied one example at a time choosing samples at random Guaranteed to find a perfect boundary if data are linearly separable Example Run
5 Perceptron Learning Rule Not guaranteed to find optimal solution if data are not linearly separable below it gets to the minimal error many times, but keeps changing the weights With NonSeparable Data
6 Perceptron Learning Rule Not guaranteed to find optimal solution if data are not linearly separable below it gets to the minimal error many times, but keeps changing the weights decreasing learning rate (setting alpha to 1/numIterations) will find a minimumerror solution With NonSeparable Data With Cooling
7 Perceptron Learning Rule Lots!,, With Linearly Separable Data With NonSeparable Data With Cooling
8 Logistic Regression in contrast to linear regression Threshold ( stepfunction ) has drawbacks nondifferentiable always produces a 1 or 0 classification  even near the boundary
9 Logistic Regression Threshold ( stepfunction ) has drawbacks nondifferentiable always produces a 1 or 0 classification  even near the boundary Solutions the differentiable logistic function output is a probability of the class near boundary towards 0 or 1 farther away
10 Logistic Regression Sigmoid for Earthquakes vs. Explosions
11 No closedform solution Logistic Regression Weight update rule (see book for derivation) Linearly Separable Data Bit slower than linear regression, but more predictable not linearly separable, fixed learning rate Faster than linear regression not linearly separable, decaying learning rate Faster than linear regression
12 Logistic Regression
13 Your brain is a neural network Neural Networks
14
15 Neural Networks We create computational brains called Artificial Neural Networks or ANNs
16 Neural Networks Nodes feed forward  layered
17 Neural Networks Nodes feedforward  layers recurrent  allows memory recurrent
18 Neural Networks Connections excitatory or inhibitory different weights (magnitudes) typically real valued
19 Neural Networks Nodes often layered function of inputs  step/threshold  sigmoid  other...
20 Neural Networks Nodes often layered function of inputs  step/threshold  sigmoid  other... biases  often treated as another input set to 1  always use them!  otherwise decision boundary has to pass through origin  also good when default output is nonzero, such as a robot that moves forward unless X  make sure they are on for your projects!
21 Neural Networks Simple example: kiwi classifier binary inputs weights =?? bias = 1 Roundess Furriness T=0 = 1 0 f(x) = 1 if >=0, Bias 0 otherwise
22 NonLinear Activation Functions Combining many of them builds an ANN that can represent nonlinear functions  because activation functions are nonlinear
23 Outputs Can have single or multiple outputs e.g. one output for each class
24 Perceptron Perceptron: Single unit Default activation function: step Sigmoid: sigmoid perceptron Perceptron Network: singlelayer network McCullough & Pitts (1943)
25 Perceptron Training Rule Training: step activation function: perceptron learning rule logistic activation function: gradient descent both work for linearly separable data  we already new that just a new metaphor now
26 Neural Networks Outputs 1 if >= number in node 1 neuron2 (right) 1 2D NonLinear Mapping 1 Outputs of each node x y Left Center Right Output neuron3 (center) 3D neuron1 (left)
27 Neural Networks neuron2 (right) 1 1 neuron3 (center) neuron1 (left) 3D
28 Neural Networks Cover s theorem: The probability that classes are linearly separable increases when the features are nonlinearly mapped to a higher dimensional feature space. [Coover 1965] The output layer requires linear separability. The purpose of the hidden layers is to make the problem linearly separable! From:
29 Neural Networks Multilayer networks thus allow nonlinear regression
30 Neural Networks Multilayer networks thus allow nonlinear regression Single hidden layer (often very large): can represent any continuous function Two hidden layers: can represent any discontinuous function
31 Neural Networks Multilayer networks thus allow nonlinear regression Single hidden layer (often very large): can represent any continuous function Two hidden layers: can represent any discontinuous function But how do we train them?
32 Training MultiLayer Neural Networks General Idea: Propagate the error backwards Called Backpropagation
33 Backpropagation Can still write the equation of a neuron Activation of Neuron 5 Bias
34 Backpropagation Can still write the equation of a neuron Allows us to calculate loss and take derivative Activation of Neuron 5 Bias
35 Backpropagation Can still write the equation of a neuron Allows us to calculate loss and take derivative At the output, the loss eq. is the same as before prev. gradient descent eq. now generalized to multiple output nodes
36 Backpropagation But what is the error at hidden nodes? We don t know what the right answer is for hidden nodes
37 Backpropagation
38 Backpropagation Detailed Pseudocode in Book
39 ANNs How do we choose the topology? number of layers number of neurons per layer Too many neurons = memorization/overfitting No perfect answer try different ones and crossvalidate evolve (take my class in the Spring!) many others
40 Deep Learning Hierarchically composed feature representations Deep Neural Networks ~6+ Layers Each layer transforms data to new space could be higherdimensional, lower, or just different
41 Hierarchically composed feature representations
42 Learning features relevant to the data Lehman, Clune, & Risi 2015 Lee et al. 2007
43 Deep Learning Follows biology closely V1, V2, etc. in Visual Cortex go from edges to more abstract features eventually to a Halle Berry Neuron responded to her picture a line drawing of her her picture in her Catwoman costume her name (typed out) Quiroga et al. Nature 2005 For more, see: