Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester

0/25/6 Admn Assgnment 7 Class /22 Schedule for the rest of the semester NEURAL NETWORKS Davd Kauchak CS58 Fall 206 Perceptron learnng algorthm Our Nervous System repeat untl convergence (or for some # of teratons): for each tranng example (f, f 2,, f n, label): predcton = b + f predcton * label 0: // they don t agree for each w : w = w + f *label b = b + label n = w f Why s t called the perceptron learnng algorthm f what t learns s a lne Why not lne learnng algorthm Synapses + + + - - Dendrtes Axon What do you know Neuron

0/25/6 Our nervous system: the computer scence vew A neuron/perceptron Input x Dendrtes the human bran s a large collecton of nterconnected neurons Weght w Synapses + + + - - Axon a NEURON s a bran cell! they collect, process, and dssemnate electrcal sgnals! they are connected va synapses! they FIRE dependng on the condtons of the neghborng neurons Input x 3 Input x 4 Weght w 2 Weght w 3 Weght w 4 g(n) n = w x actvaton functon Output y How s ths a lnear classfer (.e. perceptron) Hard threshold = lnear classfer Neural Networks hard threshold: " f n > b g(n) = # $ 0 otherwse " $ f w output = x + b > 0 # %$ 0 otherwse Neural Networks try to mmc the structure and functon of our nervous system People lke bologcally motvated approaches x w x 2 w 2 g(n) output w m x m 2

0/25/6 Artfcal Neural Networks Node A Weght w Node B Node (Neuron/perceptron) (perceptron) (perceptron) " $ f w output = x + b > 0 # %$ 0 otherwse W s the strength of sgnal sent between A and B. Edge (synapses) If A fres and w s postve, then A stmulates B. our approxmaton If A fres and w s negatve, then A nhbts B. Other actvaton functons Neural network hard threshold: " f n > b g(n) = # $ 0 otherwse sgmod tanh x g(x) = + e ax Indvdual perceptrons/ neurons why other threshold functons 3

0/25/6 Neural network Neural network some are provded/ entered each perceptron computes and calculates an answer Neural network Neural network those answers become for the next level fnally get the answer after all levels compute 4

0/25/6 Actvaton spread Computaton (assume 0 bas) http://www.youtube.com/watchv=yq7d4rovz6i 0 0.5 - -0.5 0 0.5 0.5 " f n > b g(n) = # $ 0 otherwse Computaton - 0.05 0.03-0.02 0.0-0.05-0.02= -0.07 0.483 0.495-0.03+0.0=-0.02 0.5 0.483*0.5+0.495=0.7365 0.676 Neural networks Dfferent knds/characterstcs of networks How are these dfferent 5

0/25/6 Hdden unts/layers Hdden unts/layers Can have many layers of hdden unts of dfferng szes hdden unts/layer To count the number of layers, you count all but the Feed forward networks Hdden unts/layers Alternate ways of vsualzng Sometmes the nput layer wll be drawn wth nodes as well 2-layer network 3-layer network 2-layer network 2-layer network 6

0/25/6 Multple outputs Multple outputs 0 Can be used to model multclass datasets or more nterestng predctors, e.g. mages nput output (edge detecton) Neural networks NN decson boundary Recurrent network x w Output s fed back to nput x 2 w 2 g(n) output Can support memory! w m x m Good for temporal data What does the decson boundary of a perceptron look lke Lne (lnear set of weghts) 7

0/25/6 NN decson boundary XOR Input x b= Output = x xor x 2 b= What does the decson boundary of a 2-layer network look lke Is t lnear What types of thngs can and can t t model b= " $ f w output = x + b > 0 # %$ 0 otherwse x x 2 x xor x 2 0 0 0 XOR What does the decson boundary look lke Input x - Output = x xor x 2 Input x - Output = x xor x 2 - - " $ f w output = x + b > 0 # %$ 0 otherwse x x 2 x xor x 2 0 0 0 x x 2 x xor x 2 0 0 0 8

0/25/6 What does the decson boundary look lke NN decson boundary Input x - Output = x xor x 2 Input x - x 2 - (-,) x x 2 x xor x 2 x What does ths perceptron s decson boundary look lke 0 0 0 Let x 2 = 0, then: x 0.5 = 0 x = 0.5 (wthout the bas) NN decson boundary What does the decson boundary look lke Input x - x 2 Input x - Output = x xor x 2 - x What does ths perceptron s decson boundary look lke x x 2 x xor x 2 0 0 0 9

0/25/6 NN decson boundary NN decson boundary Input x Input x x 2 x 2 - - x x Let x 2 = 0, then: x 0.5 = 0 x = 0.5 (wthout the bas) (,-) What does the decson boundary look lke Fll n the truth table Input x - Output = x xor x 2 - out out2 What operaton does ths perceptron perform on the result x x 2 x xor x 2 0 0 0 0 0 0 0 0

0/25/6 OR What does the decson boundary look lke Input x - - Output = x xor x 2 out out2 0 0 x x 2 x xor x 2 0 0 0 Input x - - Output = x xor x 2 x 2 Input x - - Output = x xor x 2 x 2 x x x x 2 x xor x 2 0 0 0

0/25/6 What does the decson boundary look lke Ths decson boundary Input x Output = x xor x 2 Input x b= Output b= b= lnear splts of the feature space combnaton of these lnear spaces " $ f w output = x + b > 0 # %$ 0 otherwse Ths decson boundary Ths decson boundary Input x - - Output Input x - - Output - - b=0.5 - - b=0.5 b=0.5 b=0.5 " $ f w output = x + b > 0 # %$ 0 otherwse " $ f w output = x + b > 0 # %$ 0 otherwse 2

0/25/6 NOR - - - b=0.5 out out2 0 0 0 0 - b=0.5 out out2 0 0 0 0 0 0 0 What does the decson boundary look lke Three hdden nodes Input x Output = x xor x 2 lnear splts of the feature space combnaton of these lnear spaces 3

0/25/6 NN decson boundares NN decson boundares For DT, as the tree gets larger, the model gets more complex The same s true for neural networks: more hdden nodes = more complexty Or, n colloqual terms two-layer networks can approxmate any functon. Addng more layers adds even more complexty (and much more quckly) Good rule of thumb: number of 2-layer hdden nodes number of examples number of dmensons Tranng Tranng multlayer networks Input x b= b= How do we learn the weghts b= Output = x xor x 2 x x 2 x xor x 2 0 0 0 perceptron learnng: f the perceptron s output s dfferent than the expected output, update the weghts gradent descent: compare output to label and adjust based on loss functon Any other problem wth these for general NNs w w w perceptron/ lnear model w w w w w w w w w neural network 4

0/25/6 Learnng n multlayer networks Backpropagaton: ntuton Challenge: for multlayer networks, we don t know what the expected output/error s for the nternal nodes! Gradent descent method for learnng weghts by optmzng a loss functon. calculate output of all nodes w w w how do we learn these weghts w w w w w w expected output w w w 2. calculate the weghts for the output layer based on the error 3. backpropagate errors through hdden layers perceptron/ lnear model neural network Backpropagaton: ntuton Backpropagaton: ntuton Key dea: propagate the error back to ths layer We can calculate the actual error here 5

0/25/6 Backpropagaton: ntuton Backpropagaton: ntuton backpropagate the error: Assume all of these nodes were responsble for some of the error How can we fgure out how much they were responsble for w w 2 w3 error error for node s: w * error Backpropagaton: ntuton Backpropagaton: the detals w 4 w 5 w6 Gradent descent method for learnng weghts by optmzng a loss functon w 3 * error. calculate output of all nodes 2. calculate the updates drectly for the output layer Calculate as normal usng ths as the error 3. backpropagate errors through hdden layers What loss functon 6

0/25/6 Backpropagaton: the detals Gradent descent method for learnng weghts by optmzng a loss functon. calculate output of all nodes 2. calculate the updates drectly for the output layer 3. backpropagate errors through hdden layers loss = x (y ŷ)2 2 squared error 7