Neural Networks. Intro to AI Bert Huang Virginia Tech

Neural Networks Intro to AI Bert Huang Virginia Tech

Outline Biological inspiration for artificial neural networks Linear vs. nonlinear functions Learning with neural networks: back propagation

https://en.wikipedia.org/wiki/neuron#/media/file:chemical_synapse_schema_cropped.jpg

https://en.wikipedia.org/wiki/neuron#/media/file:gfpneuron.png https://en.wikipedia.org/wiki/neuron#/media/file:chemical_synapse_schema_cropped.jpg

Parameterizing p(y x) p(y x) :=f f : R d! [0, 1] f (x) := 1 1 + exp( w > x)

Logistic Function (x) = 1 1 + exp( x)

Logistic Function (x) = 1 1 + exp( x) lim x!1 (x) = lim x!1 1 1 + exp( x) = 1 1 = 1.0 (0) = 1 1 + exp( 0) = 1 1+1 = 0.5 lim x! 1 (x) = lim x! 1 1 1 + exp( x) = 0.0

From Features to Probability f (x) := 1 1 + exp( w > x)

Parameterizing p(y x) p(y x) :=f f : R d! [0, 1] f (x) := 1 1 + exp( w > x) x1 x2 x3 x4 x5 w2 w1 w3 w 4 w5 y

Multi-Layered Perceptron x1 x2 x3 x4 x5 w2 w1 w3 w 4 w5 h1

Multi-Layered Perceptron raw data x1 x2 x3 x4 x5 pixel values representation h1 h2 shapes shadows round y prediction faces

Multi-Layered Perceptron x1 x2 x3 x4 x5 h 1 = (w > 11x) h 2 = (w > 12x) h1 h2 h =[h 1, h 2 ] > y p(y x) = (w > 21h) p(y x) = w > 21 (w > 11 x), (w > 12x) >

Decision Surface: Logistic Regression

Decision Surface: 2-Layer, 2 Hidden Units

Decision Surface: 2-Layer, More Hidden Units 3 hidden units 10 hidden units

Decision Surface: More Layers, More Hidden Units 10 layers, 5 hidden units per layer 4 layers, 10 hidden units each layer

Training Neural Networks 1 n min W n i=1 l( f(x i, W), y i ) = L(W) average training error W j W j α ( L W j ) Gradient Descent

Gradient Descent W j W j α ( L W j ) L(W1) very positive take big step left W1 L(W2) almost zero take tiny step left W2 L(W3) slightly negative W3 take medium step right

Approximate Q-Learning ˆQ(s, a) :=g(s, a, ) := 1 f 1 (s, a)+ 2 f 2 (s, a)+...+ d f d (s, a) i i + R(s)+ max a 0 ˆQ(s 0,a 0 ) ˆQ(s, a) @g @ i i i + R(s)+ max a 0 ˆQ(s 0,a 0 ) ˆQ(s, a) f i (s, a)

Back Propagation Back propagation: Compute hidden unit activations: forward propagation Compute gradient at output layer: error Propagate error back one layer at a time Chain rule via dynamic programming

Chain Rule Review f(g(x)) d f(g(x)) d x d f(g(x)) = d g(x) d g(x) d x x g(x) f(g(x))

Chain Rule on More Complex Function h( f(a) + g(b)) d h( f(a) + g(b)) d a d h( f(a) + g(b)) d b d h( f(a) + g(b)) d f(a) d h( f(a) + g(b)) d g(b) d f(a) + g(b) d a d f(a) + g(b) d b

Back to Neural Networks raw input x weights w1 hidden layer h1 h 1 = f(x, w 1 ) weights wn-1 hidden layer hn-1 h n 1 = f(h n 2, w n 1 ) weights wn final output y y = f(h, w ) n 1 n

Back to Neural Networks raw input x weights w1 hidden layer h1 h 1 = f(x, w 1 ) weights wn-1 hidden layer hn-1 h n 1 = f(h n 2, w n 1 ) weights wn final output y y = f(h, w ) n 1 n L(y) d L d w n = d L d y d y d w n

Intuition of Back Propagation At each layer, calculate how much changing the input changes the final output (derivative of final output w.r.t. layer s input) Calculate directly for last layer For preceding layers, use calculation from next layer and work backwards through network Use that derivative to find how changing the weights affect the error of the final output

FYI: Matrix Form x h1 = s(w1x) h2 = s(w2 h1) (You will not be tested on this matrix form in this course.) hm-1 = s(wm-1 hm-2) J(W )=`(f(x, W )) f(x, W) = s(wm hm-1)

h1 = s(w1x) FYI: Matrix Gradient Recipe r W1 J = 1 x > h2 = s(w2 h1) r Wi J = i h > i 1 i =(W > i+1 i+1) s 0 (W i h i 1 ) hm-1 = s(wm-1 hm-2) r Wm 1 J = m 1 h > m 2 m 1 =(W > m m) s 0 (W m 1 h m 2 ) f(x, W) = s(wm hm-1) r Wm J = m h > m 1 m = `0(f(x, W )) J(W )=`(f(x, W )) (You will not be tested on this matrix form in this course.)

FYI: Matrix Gradient Recipe h1 = s(w1x) (You will not be tested on this matrix form in this course.) hi = s(wi hi-1) i =(W > i+1 i+1) s 0 (W i h i 1 ) r W1 J = 1 x > f(x, W) = s(wm hm-1) m = `0(f(x, W )) r Wi J = i h > i 1 J(W )=`(f(x, W )) Feed Forward Propagation Back Propagation

Other New Aspects of Deep Learning GPU computation Differentiable programming Automatic differentiation Neural network structures

Types of Neural Network Structures Feed-forward Recurrent neural networks (RNNs) Good for analyzing sequences (text, time series) Convolutional neural networks (convnets, CNNs) Good for analyzing spatial data (images, videos)

Outline Biological inspiration for artificial neural networks Linear vs. nonlinear functions Learning with neural networks: backpropagation