Motvaton for non-lnear Classfers Neural Networs CPS 27 Ron Parr Lnear methods are wea Mae strong assumptons Can only express relatvely smple functons of nputs Comng up wth good features can be hard Why not mae the classfer to more wor for us? What does the space of hypotheses loo le? How do we navgate n ths space? Neural Networ Motvaton Human brans are only nown example of actual ntellgence Indvdual neurons are slow, borng Brans succeed by usng massve parallelsm Idea: Copy what wors Rases many ssues: Is the computatonal metaphor suted to the computatonal hardware? How do we now f we are copyng the mportant part? Are we amng too low? Why Neural Networs? Maybe computers should be more bran-le: Computers Brans Computatonal Unts 0 9 gates/cpu 0 neurons Storage Unts 0 0 bts RAM 0 2 bts HD Cycle Tme 0-9 S 0-3 S 0 neurons 0 4 synapses Bandwdth 0 0 bts/s* 0 4 bts/s Compute Power 0 0 Ops/s 0 4 Ops/s Comments on Blue Gene Blue Gene: World s Fastest Supercomputer 360 Teraflops Currently at 3,000 processors 0 3 -> 0 4 Ops/s (bran level?) 6 TB memory (0 3 bts) 4 Megawatts power ($M/year n electrcty) 2500 sq ft sze (nce szed house) Pctures and other detals: http://domno.research.bm.com/comm/pr.nsf/pages/rsc.bluegene_2004.html More Comments on Blue Gene What s wrong wth ths pcture? Weght Sze Power Consumpton What s mssng? Stll can t replcate human abltes (though vastly exceeds human abltes n many areas) Are we runnng the wrong programs? Is the archtecture well suted to the programs we mght need to run?
Artfcal Neural Networs Develop abstracton of functon of actual neurons Smulate large, massvely parallel artfcal neural networs on conventonal computers Some have tred to buld the hardware too Try to approxmate human learnng, robustness to nose, robustness to damage, etc. Use of neural networs Traned to pronounce Englsh Tranng set: Sldng wndow over text, sounds 95% accuracy on tranng set 78% accuracy on test set Traned to recognze handwrtten dgts >99% accuracy Traned to drve (Pomerleau s no-hands across Amerca) Neural Networ Lore Neural nets have been adopted wth an almost relgous fervor wthn the AI communty - several tmes Often ascrbed near magcal powers by people, usually those who now the least about computaton or brans For most AI people, magc s gone, but neural nets reman extremely nterestng and useful mathematcal obects Artfcal Neurons a w, a a = f node/ neuron ( w, a ) f can be any functon, but usually a smoothed step functon g Threshold Functons Networ Archtectures.5 0.5 0-0.5 - -.5-0 -5 0 5 0 g(x)=tanh(x) or /(+exp(-x)) (logstc regresson) 0.5 0-0.5 g(x)=sgn(x) (perceptron) Cyclc vs. Acyclc Cyclc s trcy, but more bologcally plausble Hard to analyze n general May not be stable Need to assume latches to avod race condtons Hopfeld nets: specal type of cyclc net useful for assocatve memory Sngle layer (perceptron) Multple layer - -0-5 0 5 0 2
Feedforward Networs We consder acyclc networs One or more computatonal layers Entre networ can be vewed as computng a complex non-lnear functon Typcal uses n learnng: Classfcaton (usually nvolvng complex patterns) General contnuous functon approxmaton x w Perceptron node/ neuron f f s a smple step functon (sgn) Y Perceptron Learnng Update Rule We are gven a set of nputs x () x (n) t () t (n) s a set of target outputs (boolean) {-,} ws our set of weghts output of perceptron = w T x Perceptron_error(x (), w) = -net(x (),w) t () Goal: Pc w to optmze: mn w msclassf ed perceptron_ error( x ( ), w) Repeat untl convergence: msclassfed : w w terates over samples terates over weghts + αx ( ) ( ) t Learnng Rate (can be any constant) http://neuron.eng.wayne.edu/ava/perceptron/new38.html Observatons Lnear separablty s farly wea We have other trcs: Functons that are not lnearly separable n one space, may be lnearly separable n another space If we engneer our nputs to our neural networ, then we change the space n whch we are constructng lnear separators Every functon has a lnear separator (n some space) Perhaps other networ archtectures wll help Is red lnearly separable from green? Are the crcles lnearly separable from the squares? 3
Multlayer Networs Once people realzed how smple perceptrons were, they lost nterest n neural networs for a whle Multlayer networs turn out to be much more expressve (wth a smoothed step functon) Use sgmod, e.g., tanh(w T x) Wth 2 layers, can represent any contnuous functon Wth 3 layers, can represent many dscontnuous functons Trcy part: How to adust the weghts Smoothng Thngs Out Idea: Do gradent descent on a smooth error functon Error functon s sum of squared errors Consder a sngle tranng example frst ( ) 2 E = 0.5error( X, w) = = z z a z a = w w z z z = f a ) ( Propagatng Errors For output unts (assumng no weghts on outputs) For hdden unts = = y t = All upstream nodes from f '( a ) a z w δ a = w w z z z = f a ) ( Puttng t together Apply nput xto networ (sum for multple nputs) Compute all actvaton levels Compute fnal output Compute δfor output unts δ = y t Bacpropagate δ s to hdden unts δ = f '( a ) = Compute gradent update w δ a Summary of Gradent Update Gradent calculaton, parameter update have recursve formulaton Decomposes nto: Local message passng No transcendentals: g (x)=-g(x) 2 for tanh(x) Hghly parallelzable Bologcally plausble(?) Celebrated bacpropagaton algorthm Good News Can represent any contnuous functon wth two layers ( hdden) Can represent essentally any functon wth 3 layers (But how many hdden nodes?) Multlayer nets are a unversal approxmaton archtecture wth a hghly parallelzable tranng algorthm 4
Bac-prop Issues Bacprop = gradent descent on error functon Functon s nonlnear (= powerful) Functon s nonlnear (= local mnma) Bg nets: Many parameters Many optma Slow gradent descent Rs of overfttng Bologcal plausblty Electronc plausblty Many NN experts became experts n numercal analyss (by necessty) Neural Networ Trcs Many gradent descent acceleraton trcs Early stoppng Methods of enforcng transformaton nvarance Modfy error functon Transform/augment tranng data Weght sharng Handcrafted networ archtectures Neural Nets n Practce Many applcatons for pattern recognton tass Very powerful representaton Can overft Can fal to ft wth too many parameters, poor features Very wdely deployed AI technology, but Few open research questons (Best way to get a machne learnng paper reected: Neural Networ n ttle.) Connecton to bology stll uncertan Results are hard to nterpret Second best way to solve any problem Can do ust about anythng w/enough twddlng Now thrd or fourth to SVMs, boostng, and??? 5