Introduction to the Introduction to Artificial Neural Network

Introducton to the Introducton to Artfcal Neural Netork Vuong Le th Hao Tang s sldes Part of the content of the sldes are from the Internet (possbly th modfcatons). The lecturer does not clam any onershp n any of the content of these sldes.

Outlne Bologcal Inspratons Applcatons & Propertes of ANN Perceptron Mult-Layer Perceptrons rror Backpropagaton Algorthm Remarks on ANN

Bologcal Inspratons Humans perform complex tasks lke vson, motor control, or language understandng very ell One ay to buld ntellgent machnes s to try to mtate the (organzatonal prncples of) human bran

Bologcal Neuron

Artfcal Neural Netorks ANNs have been dely used n varous domans for: Pattern recognton Functon approxmaton tc.

Perceptron (Artfcal Neuron) A perceptron takes a vector of real-valued nputs calculates a lnear combnaton of the nputs outputs f the result s greater than some threshold and - (or 0) otherse

Perceptron To smplfy notaton, assume an addtonal constant nput x 0 =. We can rte the perceptron functon as

Representatonal Poer of Perceptron The perceptron ~ a hyperplane decson surface n the n-dmensonal space of nstances X 2 - <, 2 > - - X - Lnearly separable data

Boolean Functons A sngle perceptron can be used to represent many boolean functons (true); 0 (false) Perceptrons can represent all of the prmtve boolean functons AND, OR, NOT

Implementng AND x x 2 W=-.5 o(x,x 2 ) o( x, x2) = f.5 x x2 = 0 otherse > 0

Implementng OR x x 2 W=-0.5 o(x,x 2 ) o(x,x2) = f 0.5 x x2 > 0 = 0 otherse

Implementng NOT W=0.5 - x o(x ) o( x ) = f 0.5 x > = 0 otherse 0

The XOR Functon Unfortunately, some Boolean functons cannot be represented by a sngle perceptron x x 2 o(x,x 2 ) 0 2 0 0 0 0 0 0 0 0 2 0 2 0 2 0 2 0 > > There s no assgnment of values to 0, and 2 that satsfes above nequaltes. XOR cannot be represented! XOR(x,x 2 )

Lnear Seprablty Boolean AND Boolean XOR Representaton Theorem: Perceptrons can only represent lnearly separable functons. That s, the decson surface separatng the output values has to be a plane. (Mnsky & Papert, 969)

Remarks on perceptron Perceptrons can represent all the prmtve Boolean functons AND, OR, and NOT Some Boolean functons cannot be represented by a sngle perceptron Such as the XOR functon very Boolean functon can be represented by some combnaton of AND, OR, and NOT We ant netorks of the perceptrons

Implementng XOR by Mult-layer perceptron (MLP) NAND OR x XOR x2 = (x OR x2) AND (NOT(x AND x2))

Mult-Layer Perceptrons (MLP)

Representaton Poer of MLP Conjuncton of pece-se hyperplanes

Representaton Poer of ANN Boolean functons: very Boolean functon can be represented exactly by some netork th to layers of unts Contnuous functons: very bounded contnuous functon can be approxmated th arbtrarly small error (under a fnte norm) by a netork th to layers of unts Arbtrary functons: Any functon can be approxmated to arbtrary accuracy by a netork th three layers of unts

Neuron netork desgn for non-closed form problem Testng data x * Tranng data {x, y } Tranng process Neuron netork Testng process Result y *

Defnton of Tranng rror Tranng error : a functon of eght vector over the tranng data set D ( ) 2 d D ( t o d d ) 2 Unthresholded perceptron or lnear unt

Gradent Descent To reduce error, update the eght vector n the drecton of steepest descent along the error surface n,,,, ) ( 2 0 )) ( ( η

Gradent Descent

Weght Update Rule )) ( ( η ), ( η = D d d d d x o t ) )( (

Gradent Descent Search Algorthm repeat 0 for each tranng example <x, t(x)> o(x)= x for each η(t(x)-o(x))x for each untl (termnaton condton)

Perceptron Learnng Rule vs Delta Rule = ( t o) η x Perceptron learnng rule = η( t o) x Delta rule The perceptron learnng rule uses the output of the threshold functon (ether - or ) for learnng. The delta-rule uses the net output thout further mappng nto output values - or

Perceptron Learnng Algorthm Guaranteed to converge thn a fnte tme f the tranng data s lnearly separable and η s suffcently small If the data are not lnearly separable, convergence s not assured

Feed-forard Netorks

Defnton of rror for MLP ( ) ( ) ( ) D d d d t o 2 2 ) ( ( ) ( ) ( ) ( ) ( ) h j h h o o,,,,,, ) ( 2 2 ( ) () η

Output Layer s Weght Update ( ) ( ) ( ) j j m m m m m m j j m m m j l m m lm l j l l l d j d h o o o t h e o o t h and e o o o h t h t o t ) ( 2 ) ( 2 ) ( 2 2 2 2 2 2 ) ( ) ( = = = = = Θ = Θ = Θ = = σ σ σ σ σ σ σ

Hdden Layer s Weght Update Dstrbute error to rror of h j nputs proportonal to eghts j j Smlar to output layer: jk = [ ( t o ) o ( o ) ] j h j ( h j ) x k rror Back-propagaton

rror Back Propagaton Algorthm ntalze all eghts to small random numbers repeat for each tranng example <x, t(x)> for each hdden node j Θ( jk xk k for each output node o Θ( jh for each output node s eght for each hdden node s eght for each hdden node s eght for each output node s eght untl (termnaton condton) h ) j j ) / = o ( o )( t o ) h / = [ o ( o )( t o ) ] h ( h ) x j jk jk j j jk η j η jk j j j j k

Generalzaton, Overfttng, etc. Artfcal neural netorks th a large number of eghts tend to overft the tranng data To ncrease generalzaton accuracy, use a valdaton set Fnd the optmal number of perceptrons Fnd the optmal number of tranng teratons Stop hen overfttng happens

Generalzaton, Overfttng, etc.