WE extend the familiar unidirectional backpropagation

Size: px
Start display at page:

Download "WE extend the familiar unidirectional backpropagation"

Transcription

1 To appear n the IEEE Transactons on Systems, Man, and Cybernetcs Bdrectonal Bacpropagaton Olaoluwa Adgun, Member, IEEE, and Bart Koso, Fellow, IEEE Abstract We extend bacpropagaton learnng from ordnary undrectonal tranng to bdrectonal tranng of deep multlayer neural networs. Ths gves a form of bacward channg or nverse nference from an observed networ output to a canddate nput that produced the output. The traned networ learns a bdrectonal mappng and can apply to some nverse problems. A bdrectonal multlayer neural networ can exactly represent some nvertble functons. We prove that a fxed three-layer networ can always exactly represent any fnte permutaton functon and ts nverse. The forward pass computes the permutaton functon value. The bacward pass computes the functon s nverse value usng the same hdden neurons and weghts. A ont forward-bacward error functon allows bacpropagaton learnng n both drectons wthout overwrtng learnng n ether drecton. Ths apples to both classfcaton and regresson. The algorthm does not requre that the underlyng sampled functons have an nverse. The traned networ tends to map an output to the centrod of ts pre-mage set. Index Terms Bacpropagaton learnng, bacward channg, nverse problems, bdrectonal assocatve memory, functon representaton and approxmaton. Input Layer Forward Pass: a x a h a y Hdden Layer Output Layer I. BIDIRECTIONAL BACKPROPAGATION WE extend the famlar undrectonal bacpropagaton (BP algorthm [] [5] to the bdrectonal case. Undrectonal BP maps an nput vector to an output vector by passng the nput vector forward through the networ s vsble and hdden neurons and ts connecton weghts. Bdrectonal BP (B-BP combnes ths forward pass wth a bacward pass through the same neurons and weghts. It does not use two separate feedforward or undrectonal networs. B-BP tranng endows a multlayered neural networ N : R n R p wth a form of bacward nference. The forward pass gves the usual predcted neural output N(x gven a vector nput x. The output vector value y N(x answers the what-f queston that x poses: What would we observe f x occurred? What would be the effect? The bacward pass answers the why queston that y poses: Why dd y occur? What type of nput would cause y? Feedbac convergence to a resonatng bdrectonal fxed-pont attractor [6], [7] gves a long-term or equlbrum answer to both the what-f and why questons. Ths paper does not address the global stablty of multlayered bdrectonal networs. Bdrectonal neural learnng apples to large-scale problems and bg data because the BP algorthm scales lnearly wth tranng data. BP has tme complexty O(n for n tranng samples because ts forward pass has complexty O( whle ts bacward pass has complexty O(n. So the B-BP algorthm stll has O(n complexty because O(n+O(n O(n. Olaoluwa Adgun and Bart Koso are wth the Department of Electrcal Engneerng, Unversty of Southern Calforna, Los Angeles, CA, 989 USA e-mal: (oso@usc.edu. Manuscrpt receved November 9, 26 ; revsed January 26, 27. Bacward Pass: a x a h a y Fg. : Exact bdrectonal representaton of a permutaton map. The -layer bdrectonal threshold networ exactly represents the nvertble -bt bpolar permutaton functon f n Table I. The forward pass taes the nput bpolar vector x at the nput layer and passes t forward through the weghted edges and the hdden layer of threshold neurons to the output layer. The bacward pass feeds the output bpolar vector y bac through the same weghted edges and threshold neurons. All neurons are bpolar and use zero thresholds. The bdrectonal networ computes y f(x on the forward pass and the nverse value f (y on the bacward pass. Ths lnear scalng does not hold for most machne-learnng algorthms. An example s the quadratc complexty O(n 2 of support-vector ernel methods [8]. We frst show that multlayer bdrectonal networs have suffcent power to exactly represent permutaton mappngs. These mappngs are nvertble and dscrete. Then we develop the B-BP algorthms that can approxmate these and other mappngs f the networ has enough hdden neurons. A neural networ N exactly represents a functon f ust n case N(x f(x for all nput vectors x. Exact representaton s a much stronger condton than the more famlar property of functon approxmaton: N(x f(x. Feedforward multlayer neural networs can unformly approxmate contnuous functons on compact sets [9], []. Addtve fuzzy systems are also unform functon approxmators []. But addtve

2 2 fuzzy systems have the further property that they can exactly represent any real functon f t s bounded [2]. Ths exact representaton needs only two fuzzy rules because the rules absorb the functon nto ther fuzzy sets. Ths holds more generally for generalzed probablty mxtures because the fuzzy rules defne the mxed probablty denstes [], [4]. Fgure shows a bdrectonal -layer networ of zerothreshold neurons that exactly represents the -bt permutaton functon f n Table I where,, +} denotes,, }. So f s a self-becton that rearranges the 8 vectors n the bpolar hypercube, }. Ths f s ust one of the 8! or 4,2 permutaton maps or rearrangements on the bpolar hypercube, }. The forward pass converts the nput bpolar vector (,, to the output bpolar vector (,,. The bacward pass converts (,, to (,, over the same fxed synaptc connecton weghts. These same weghts and neurons smlarly convert the other 7 nput vectors n the frst column of Table to the correspondng 7 output vectors n the second column and vce versa. Theorem states that multlayer bdrectonal networs can exactly represent all fnte bpolar or bnary permutaton functons. Ths result requres a hdden layer wth 2 n hdden neurons for a permutaton functon on the bpolar hypercube, } n. Usng so many hdden neurons s nether practcal nor necessary n most real-world cases. The exact bdrectonal representaton n Fgure uses only 4 hdden threshold neurons to represent the -bt permutaton functon. Ths was the smallest hdden layer that we found through guesswor. Many other bdrectonal representatons also use fewer than 8 hdden neurons. We see nstead a practcal learnng algorthm that can learn bdrectonal approxmatons from sample data. Fgure 2 shows a learned bdrectonal representaton of the same - bt permutaton n Table I. It uses only hdden neurons. The B-BP algorthm tuned the neurons threshold values as well as ther connecton weghts. All the learned threshold values were near zero. We rounded them to zero to acheve the bdrectonal representaton wth ust hdden neurons. The rest of the paper derves the B-BP algorthm for regresson n both drectons and for classfcaton networs and mxed classfcaton-regresson networs. Ths taes some care because tranng the same weghts n one drecton tends to undo the BP tranng n the other drecton. The B-BP algorthm solves ths problem by mnmzng a ont error functon. The lone error functon s cross entropy for classfcaton. It s squared error for regresson or classfcaton. See Fgure 4. The learnng approxmaton tends to mprove f we add more hdden neurons. Fgure 5 shows that the B-BP tranng cross-entropy error falls off as the number of hdden neurons grows when learnng the 5-bt permutaton n Table 2. Fgure 6 shows a deep 8-layer bdrectonal approxmaton of the nonlnear functon f(x.5σ(6x++.5σ(4x.2 and ts nverse. The networ used 6 hdden layers wth bpolar logstc neurons per layer. A bpolar logstc actvaton σ scales and translates an ordnary logstc. It has the form σ(x 2. ( + e x Input Layer 6. 8 Forward Pass: a x a h a y Hdden Layer Bacward Pass: a x a h a y Output Layer Fg. 2: Learned bdrectonal representaton of the -bt permutaton n Table I. The bdrectonal bacpropagaton algorthm found ths representaton usng the double-classfcaton learnng laws of Secton. All the neurons were bpolar and had zero thresholds. The zero thresholdng gave exact representaton of the -bt permutaton. The fnal sectons show that smlar B-BP algorthms hold for tranng two-way classfcaton networs and mxed classfcaton-regresson networs. B-BP learnng also approxmates non-nvertble functons. The algorthm tends to learn the centrod of many-to-one functons. Suppose that the target functon f : R n R p s not one-to-one or nectve. So t has no nverse f pont mappng. But t does have a set-valued nverse or pre-mage pullbac mappng f : 2 Rp 2 Rn such that f (B x R n : f(x B} for any B R p. Suppose that the n nput tranng samples x,..., x n map to the same output tranng sample y: f (y} x,..., x n }. Then B-BP learnng tends to map y to the centrod x of f (y} because the centrod mnmzes the mean-squared error of regresson. Fgure 7 shows such an approxmaton for the non-nvertble target functon f(x sn x. The forward regresson approxmates sn x. The bacward regresson approxmates the average or centrod of the two ponts n the pre-mage set of y sn x. Then f (y} sn (y θ, π θ} for < θ < π 2 f < y <. Ths gves the pullbac s centrod as π 2. The centrod equals π 2 f < y <. Bdrectonal BP dffers from earler neural approaches to approxmatng nverses. Mars et al. developed an nverse algorthm for query-based learnng n bnary classfcaton [5]. The BP-based algorthm s not bdrectonal but t explots the data-weght nner-product nput to neurons. It holds the weghts constant whle t tunes the data for a gven output. Wunsch et al. have appled ths nverse algorthm to problems n aerospace and elsewhere [6], [7]. Bdrectonal BP also

3 dffers from the more recent bdrectonal extreme-learnngmachne algorthm that uses a two-stage learnng process but n a undrectonal networ [8]. II. BIDIRECTIONAL EXACT REPRESENTATION OF BIPOLAR PERMUTATIONS Ths secton proves that there exsts multlayered neural networs that can exactly bdrectonally represent some nvertble functons. We frst defne the networ varables. The proof uses threshold neurons. The B-BP algorthms use softthreshold logstc sgmods for hdden neurons. They use dentty actvatons for nput and output neurons. A bdrectonal neural networ s a multlayer networ N : X Y that maps the nput space X to the output space Y and conversely through the same set of weghts. The bacward pass uses the matrx transposes of the weght matrces that the forward pass uses. Such a networ s a bdrectonal assocatve memory or BAM [6], [7]. The forward pass sends the nput vector x through the weght matrx W that connects the nput layer to the hdden layer. The result passes on through matrx U to the output layer. The bacward pass sends the output y from the output layer bac through the hdden layer to the nput layer. Let I, J, and K denote the respectve number of nput, hdden, and output neurons. Then the I J matrx W connects the nput layer to the hdden. The J K matrx U connects the hdden layer to the output layer. TABLE I: -Bt Bpolar Permutaton Functon f. Input x The hdden-neuron nput o h o h Output t [+ + +] [ +] [+ + ] [ + +] [+ +] [+ + +] [+ ] [+ +] [ + +] [ + ] [ + ] [ ] [ +] [+ ] [ ] [+ + ] has the affne form w a x (x + b h (2 where weght w connects the th nput neuron to the th hdden neuron, a x s the actvaton of the th nput neuron, and b h s the bas term of the th hdden neuron. The actvaton a h of the th hdden neuron s a bpolar threshold: a h (o h f o h f o h >. ( The B-BP algorthm n the next secton uses soft-threshold bpolar logstc functons for the hdden actvatons because such sgmod functons are dfferentable. The proof below also modfes the hdden thresholds to tae on bnary values n (4 and to fre wth a slghtly dfferent condton. The nput o y to the th output neuron from the hdden layer s also affne: J o y u a h + b y (4 where weght u connects the th hdden neuron to the th output neuron. Term b y s the addtve bas of the th output neuron. The output actvaton vector a y gves the predcted outcome or target on the forward pass. The th output neuron has bpolar threshold actvaton a y : a y (oy f o y f o y >. (5 The forward pass of an nput bpolar vector x from Table I through the networ n Fgure gves an output actvaton vector a y that equals the table s correspondng target vector y. The bacward pass feeds y from the output layer bac through the hdden layer to the nput layer. Then the bacward-pass nput o hb to the th hdden neuron s o hb u a y (y + b h (6 where y s the output of the th output neuron, a y s the actvaton of the th output neuron. The bacward-pass actvaton of the th hdden neuron a hb s a hb (o hb The bacward-pass nput o xb J o xb (o xb f o hb f o hb >. to the th nput neuron s (7 w a hb + b x (8 where b x s the bas for the th nput neuron. The nput-layer actvaton a x gves the predcted value for the bacward pass. The th nput neuron has bpolar actvaton f o xb f o xb >. We can now state and prove the bdrectonal representaton theorem for bpolar permutatons. The theorem also apples to bnary permutatons because the nput and output neurons have bpolar threshold actvatons. Theorem : Exact Bdrectonal Representaton of Bpolar Permutaton Functons: Suppose that the nvertble functon f :, } n, } n s a permutaton. Then there exsts a -layer bdrectonal neural networ N :, } n, } n that exactly represents f n the sense that N(x f(x and N (x f (x for all x. The hdden layer has 2 n threshold neurons. Proof: The proof strategy pcs weght matrces W and U so that exactly one hdden neuron fres on both the forward and the bacward pass. Fgure llustrates the proof technque for the specal case of a -bt bpolar permutaton. So we structure (9

4 4 the networ such that any nput vector x fres only one hdden neuron on the forward pass and such that the output vector y N(x fres only the same hdden neuron on the bacward pass. The bpolar permutaton f s a bectve map of the bpolar hypercube, } n nto tself. The bpolar hypercube contans the 2 n nput bpolar column vectors x, x 2,..., x 2 n. It lewse contans the 2 n output bpolar vectors y, y 2,..., y 2 n. The networ wll use 2 n correspondng hdden threshold neurons. So J 2 n. Matrx W connects the nput layer to the hdden layer. Matrx U connects the hdden layer to the output layer. Defne W so that each row lsts all 2 n bpolar nput vectors. Defne U so that each column lsts all 2 n transposed bpolar output vectors: W [ ] x x 2... x 2 n y T T y 2 U. y T 2 n We now show that ths arrangement fres only one hdden neuron and that the forward pass of any nput vector x n gves the correspondng output vector y n. Assume that every neuron has zero bas. Pc a bpolar nput vector x m for the forward pass. Then the nput actvaton vector a x (x m (a x (x m,..., a x n(x n m equals the nput bpolar vector x m because the nput actvatons (9 are bpolar threshold functons wth zero threshold. So a x equals x m because the vector space s bpolar, } n. The hdden layer nput o h s the same as (2. It has the matrx-vector form o h W T a x ( W T x m ( (o h, o h 2,..., o h n,..., o h 2 nt (2 (x T x m, x T 2 x m,..., x T x m,..., x T 2 nx m T ( from the defnton of W snce o h s the nner product of x and x m. The nput o h to the th neuron of the hdden layer obeys o h n when m and oh < n when m. Ths holds because the vectors x are bpolar wth scalar components n, }. The magntude of a bpolar vector n, } n s n. The nner product x T x m s a maxmum when both vectors have the same drecton. Ths occurs when m. The nner product s otherwse less than n. Fgure shows a bdrectonal neural networ that fres ust the sxth hdden neuron. The weghts for the networ n Fgure are W U T Input Layer W Hdden Layer Output Layer Fg. : Bdrectonal networ structure for the proof of Theorem. The nput and output layers have n threshold neurons whle the hdden layer has 2 n neurons wth threshold values of n. The 8 fan-n -vectors of weghts n W from the nput to the hdden layer lst the 2 elements of the bpolar cube, } and thus the 8 vectors n the nput column of Table I. The 8 fan-n -vectors of weghts n U from the output to the hdden layer lst the 8 bpolar vectors n the output column of Table I. The threshold value for the sxth and hghlghted hdden neuron s. Passng the sxth nput vector (-,, - through W leads to the vector of thresholded hdden unts of (,,,,,,,. Passng ths 8-bt vector through U produces after thresholdng the sxth output vector (-, -, - n Table I. Passng ths output vector bac through the transpose of U produces the same unt bt vector of thresholded hdden-unt values. Passng ths vector bac through the transpose of W produces the orgnal bpolar vector (,,. Now comes the ey step n the proof. Defne the hdden actvaton a h as a bnary (not bpolar threshold functon where n s the threshold value: a h (o h U f o h n f o h < n. (4 Then the hdden-layer actvaton a h s the unt bt vector (,,...,,..., T where a h when m and where a h when m. Ths holds because all 2n bpolar vectors x m n, } n are dstnct. So exactly one of these 2 n vectors acheves the maxmal nner-product value n x T mx m. See Fgure for detals n the case of representng the -bt bpolar permutaton n Table I.

5 5 The nput vector o y to the output layer s o y U T a h (5 J y a h (6 y m (7 where a h s the actvaton of the th hdden neuron. The actvaton a y of the output layer s a y (o y f o y f o y <. (8 The output layer actvaton leaves o y unchanged because o y equals y m and because y m s a vector n, } n. So a y y m. (9 So the forward pass of an nput vector x m through the networ yelds the desred correspondng output vector y m where y m f(x m for bpolar permutaton map f. Consder next the bacward pass over N. The bacward pass propagates the output vector y m from the output layer to the nput layer through the hdden layer. The hdden layer nput o hb has the form (6 and so o hb U y m (2 where o hb (y T y m, y2 T y m,..., y T y m,..., y2 T ny m T. The nput o hb of the th neuron n the hdden layer equals the nner product of y and y m. So o hb n when m and o hb < n when m. Ths holds because agan the magntude of a bpolar vector n, } n s n. The nner product o hb s a maxmum when vectors y m and y le n the same drecton. The actvaton a hb for the hdden layer has the same components as (4. So the hdden-layer actvaton a hb agan equals the unt bt vector (,,...,,..., T where a hb when m and a hb when m. Then the nput vector o xb for the nput layer s o xb W a hb (2 J x a hb (22 x m. (2 The th nput neuron has a threshold actvaton that s the same as (o xb f o xb f o xb (24 < where o xb s the nput of th neuron n the nput layer. Ths actvaton leaves o xb unchanged because o xb equals x m and because the vector x m les n, } n. So o xb (25 x m. (26 So the bacward pass of any target vector y m yelds the desred nput vector x m where f (y m x m. Ths completes the bacward pass and the proof. III. BIDIRECTIONAL BACKPROPAGATION ALGORITHMS A. Double Regresson We now derve the frst of three bdrectonal BP learnng algorthms. The frst case s double regresson where the networ performs regresson n both drectons. Bdrectonal BP tranng mnmzes both the forward error E f and bacward error E b. B-BP alternates between the bacward tranng and forward tranng. Forward tranng mnmzes E f whle holdng E b constant. Bacward mnmzes E b whle holdng E f constant. E f s the error at the output layer. E b s the error at the nput layer. Double regresson uses squared error for both error functons. The forward pass sends the nput vector x through the hdden layer to the ouput layer. We use only one hdden layer for smplcty and wthout loss of generalty. The B-BP doubleregresson algorthm apples to any number of hdden layers n a deep networ. The hdden-layer nput values o h are the same as (2. The th hdden actvaton a h s the ordnary bnary logstc map: a h (o h + e oh (27 where (4 gves the nput o y to the th output neuron. The hdden actvatons can be logstc or any other sgmodal functon so long as t s dfferentable. The actvaton for an output neuron s the dentty functon: a y oy (28 where a y s the actvaton of th output neuron. The error functon E f for the forward pass s squared error: E f 2 (y a y 2 (29 where y denotes the value of the th neuron n the output layer. Ordnary undrectonal BP updates the weghts and other networ parameters by propagatng the error from the output layer bac to the nput layer. The bacward pass sends the output vector y from the output layer to the nput layer through the hdden layer. The nput to th actvaton a hb hdden neuron o hb for th hdden neuron s a hb + e ohb s the same as (6. The. ( The nput o x for the th nput neuron s the same as (8. The actvaton at the nput layer s the dentty functon: o xb. ( A nonlnear sgmod (or Gaussan actvaton can replace the lnear functon. The bacward-pass error E b s smlarly E b 2 (x a x 2. (2

6 6 The partal dervatve of the hdden-layer actvaton n the forward drecton s a h o h ( o h ( + e oh e oh (4 ( + e oh 2 [ ] (5 + e oh + e oh a h ( a h. (6 Let a h denote the dervatve of a h wth respect to the nnerproduct term o h. We agan use the superscrpt b to denote bacward pass. Then the partal dervatve of E f wth respect to weght u s E f u 2 u (y a y 2 (7 E f a y o y u (8 (a y y a h. (9 The partal dervatve of E f wth respect to w s E f w 2 ( K w (y a y 2 (4 E f a h (a y y u a h a h o h o h w (4 x (42 where a h s the same as n (. The partal dervatve of Ef wth respect to the bas b y of the th output neuron s E f b y 2 b y (y a y 2 (4 E f a y o y b y (44 (a y y. (45 The partal dervatve of E f wth respect to the bas b h of the th hdden neuron s where a h E f b h 2 b h ( K (y a y 2 (46 E f a h (a y y u a h s the same as (. a h o h o h b h (47 (48 The partal dervatve of the hdden-layer actvaton n the bacward drecton s a hb o hb ( o hb (49 + e ohb e ohb ( + e ohb 2 + e ohb a hb [ + e ohb ] (5 (5 ( a hb. (52 The partal dervatve of E b wth respect to w s E b w 2 E b ( w (x 2 (5 o xb o xb w (54 x a hb. (55 The partal dervatve of E b wth respect to u s E b u 2 ( u E b ( (x 2 (56 o xb o xb a hb a hb o hb o hb u (57 x w a hb y (58 where a hb s the same as n (49. The partal dervatve of E b wth respect to the bas b x th nput neuron s E b b x 2 b x E b of (x 2 (59 o xb o xb b x (6 ( x. (6 The partal dervatve of E b wth respect to the bas b h of th hdden neuron s E b b h 2 b h (x 2 (62 ( E b ( o xb x w a hb o xb a hb a hb o hb o hb b h (6 (64 where a hb s the same as (49. The error for the nput neuron bas s E b because x o x for the forward pass. The error for the output neuron bas s E f only for the output neuron bas because y o y for the bacward pass.

7 7 The above update laws for forward regresson have the fnal form: u (n+ w (n+ b h u (n η(ay y a h (65 ( K w (n η (a y y u a h x (66 ( (n+ b h(n K η (a y y u a h (67 b y(n+ b y(n η(a y y. (68 The dual update laws for bacward regresson have the fnal form: u (n+ u (n η ( ( x w a hb y (69 w (n+ w (n η( x a yb (7 b x(n+ b x(n η( x (7 ( (n+ b h(n η ( x w a hb. (72 b h The B-BP tranng mnmzes E f whle holdng E b constant. It then mnmzes the E b whle holdng E b constant. Equatons (?? (68 state the update rules for forward tranng. Equatons (69 (72 state the update rules for bacward tranng. Forward tranng mnmzes E f whle eepng E b constant. Bacward tranng mnmzes E b whle eepng E f constant. Each tranng teraton nvolves forward tranng and then bacward tranng. Algorthm summarzes the B-BP algorthm. The algorthm explans how to combne forward and bacward tranng n B- BP. Fgure 6 shows how double-regresson B-BP approxmates the nvertble functon f(x.5σ(6x + +.5σ(4x.2 where σ(x s the logstc functon. The approxmaton used a deep 8-layer networ wth 6 layers of bpolar logstc neurons each. The nput and output layer each contaned only a sngle dentty neuron. B. Double Classfcaton We now derve a B-BP algorthm where the networ s forward pass acts as a classfer networ and so does ts bacward pass. We call ths double classfcaton. We present the dervaton n terms of cross entropy for the sae of smplcty. But our double-classfcaton smulatons used the slghtly more general form of cross entropy n (4 that we call logstc cross entropy. The smpler cross-entropy dervaton apples to softmax nput neurons and output neurons (wth mpled -n-k codng. But logstc nput and output neurons requre logstc cross entropy for the same BP dervaton. The smplest double classfcaton uses Gbbs or softmax neurons at both the nput and output layer to create a wnnertae-all structure at those layers. Then the th softmax neuron n the output layer codes for the nput pattern and represents t as K-length unt bt vector wth a n the th slot and a n the other K slots [], [9]. The same -n-i bnary encodng holds for the th neuron at the nput layer. The softmax structure mples that the nput and output felds each compute a dscrete probablty dstrbuton for each nput. Classfcaton networs dffer from regresson networs n another ey aspect: They do not mnmze squared error. They nstead mnmze the cross entropy of the gven target vector and the softmax actvaton values of the output or nput layers []. Equaton (79 states the forward cross entropy at the output layer where y s the desred or target value of the th output neuron and where a y s ts actual softmax actvaton value. The entropy structure apples because both the target vector and the nput/output vector are probablty vectors. Mnmzng the cross entropy maxmzes the Kullbac-Lebler dvergence [2] and vce versa [9]. The classfcaton BP algorthm depends on another optmzaton equvalence: Mnmzng the cross entropy s equvalent to maxmzng the networ s lelhood or log-lelhood [9]. We wll establsh ths equvalence because t mples that the BP learnng laws have the same form for both classfcaton and regresson. We wll prove the equvalence only for the forward drecton but t apples equally n the bacward drecton. The result both unfes the BP learnng laws and allows nose-boosts and other methods to enhance the networ lelhood snce BP s a specal case [9], [2] of the Expectaton-Maxmzaton algorthm for teratvely maxmzng a lelhood wth mssng data or hdden varables [22]. Denote the networ s forward probablty densty functon as p f (y x, Θ. The nput vector x passes through the networ wth total parameter vector Θ and produces the output vector y. Then the networ s forward lelhood L f (Θ s the natural logarthm of the forward networ probablty: L f (Θ ln p f (y x, Θ. We wll show that p f (y x, Θ exp E f (Θ}. So BP s forward pass computes the forward cross entropy as t maxmzes the lelhood [9]. The ey assumpton s that output softmax neurons n a classfer networ are ndependent because there are no ntra-layer connectons among them. Then the networ probablty densty p f (y x, Θ factors nto a product of K-many margnals []: p f (y x, Θ K p f (y x, Θ. Ths gves L f (Θ ln p f (y x, Θ (7 K ln p f (y x, Θ (74 ln K (a y y (75 y ln a y (76 E f (Θ (77 from (79. Then exponentaton gves p f (y x, Θ exp E f (Θ}. Mnmzng the forward cross entropy E f s equvalent to maxmzng the negatve cross entropy E f. So mnmzng E maxmzes the forward networ lelhood L and vce versa.

8 8 The thrd equalty (75 holds because the th margnal factor p f (y x, Θ n a classfer networ equals the exponentated softmax actvaton (a t y snce y f s the correct class label for the nput pattern x and y otherwse. Ths defnes an output categorcal dstrbuton or a sngle-sample multnomal. We now derve the B-BP algorthm for double classfcaton. The algorthm mnmzes the error functons separately where E f (Θ s the forward cross entropy n (75 and E b (Θ s the bacward cross entropy n (8. We frst derve the forward B-BP classfer algorthm and then derve the bacward B-BP algorthm. The forward pass sends the nput vector x through the hdden layer or layers to the output layer. The nput actvaton vector a x s the vector x. We assume only one hdden layer for smplcty. The dervaton apples to deep networs wth any number of hdden layers. The nput to the th hdden neuron o h has the same lnear form as n (2. The th hdden actvaton a h s the same ordnary unt-nterval-valued logstc functon n (27. The nput o y to the th output neuron s the same as n (4. The hdden actvatons can also be hyperbolc tangent or any other dfferentable monotone nondecreasng sgmod functon. The forward classfer s output layer neurons use Gbbs or softmax actvatons: a y e (oy K l e(oy l (78 where a y s the actvaton of the th output neuron. Then the forward error E f s the cross entropy E f y ln a y (79 between the bnary target values y and the actual output actvatons a y. We next descrbe the bacward pass through the classfer networ. The bacward pass sends the output target vector y through the hdden layer to the nput layer. So the ntal actvaton vector a y equals the target vector y. The nput to the th neuron of the hdden layer o hb has the same lnear form as (8. The actvaton of the th hdden neuron s the same as (. The bacward-pass nput to the th nput neuron s also the same as (8. The nput actvaton s Gbbs or softmax: e (oxb I (8 l e(oxb where a xb s the bacward-pass actvaton for the th neuron of the nput neuron. Then the bacward error E b s the cross entropy E b x ln (8 where x s the target value of the th nput neuron. The partal dervatves of the hdden actvaton a h are the same as ( and (49. and ahb The partal dervatve of the output actvaton a y forward classfcaton pass s o y ( e (oy o y K l e(oy l for the (82 e oy ( K l e(oy l e oy e oy ( K l e(oy l 2 (8 ( K e oy l e(oy l e oy ( K l e(oy l 2 (84 a y ( ay. (85 The dervatve when l s o y ( l o y l e (oy K m e(oy m (86 e oy e oy l ( K l e(oy l 2 (87 a y ay l. (88 So the dervatve of a y wth respect to o l s a y ay l f l l a y ( ay f l. (89 We denote ths dervatve as a h. The dervatve a xb of the bacward classfcaton pass has the same form because both classfer neurons are softmax actvatons. The partal dervatve of the forward cross entropy E f wth respect to u s E f u u ( Ef ( y a y y ln a y (9 o y u ( a y ay K y l a a y ay l l l (9 a h (92 (a y y a h. (9 The partal dervatve of the forward cross entropy E f wth respect to the bas b y of the th output neuron s E f b y b y ( Ef ( y a y y ln a y (94 o y b y ( a y ay K y l a a y ay l l l (95 (96 (a y y. (97 Equatons (9 and (97 show that the dervatves of E f wth respect to u and b y for double classfcaton are the same as for double regresson n (9 and (45. The actvatons of the hdden neurons are the same as for double regresson. So the

9 9 dervatves of E f wth respect to w and b h are the same as the respectve ones n (42 and (. The partal dervatve of E b wth respect to w s E b x ln (98 w w ( x ( ( Eb o xb o xb w ( l x l l (99 l a hb ( x a hb. ( The partal dervatve of E b wth respect to the bas b x of the th nput neuron s E b b x b xb x ln (2 ( x ( Eb o xb o xb b x ( x l l l l ( (4 ( x. (5 Equatons ( and (5 lewse show that the dervatves of E b wth respect to w and b x for double classfcaton are the same as for double regresson n (5 and (59. The actvatons of the hdden neurons are the same as for double regresson. So the dervatves of E b wth respect to u and b h are the same as the respectve ones n (56 and (62. The bdrectonal tranng of BP for double classfcaton alternates between mnmzng E f whle holdng E b constant and mnmzng E b whle holdng E f constant. The forward and bacward errors here are cross entropes. The update laws for forward classfcaton have the fnal form: u (n+ w (n+ b h u (n ((a η y y a h w (n ( K η (a y y u a h x ( (n+ b h(n K η (a y y u a h (6 (7 (8 b y(n+ b y(n η(a y y. (9 The dual update laws for bacward classfcatn have the fnal form: u (n+ w (n+ u (n w (n η ( ( η ( ( x w a hb y x a yb ( ( b x(n+ b x(n η( x (2 ( b h (n+ b h(n η ( x w a hb. ( The dervaton shows that the update rules for double classfcaton are the same as the update rules for double regresson. The B-BP tranng mnmzes E f whle holdng E b constant and then mnmzes the E b whle holdng E b constant. Equatons (6 (9 are the update rules for forward tranng. Equatons ( ( are the update rules for bacward tranng. Forward tranng mnmzes E f whle eepng E b constant. Bacward tranng mnzes E b wth E f constant. Each tranng teraton nvolves runnng the forward tranng and then the bacward tranng. Algorthm summarzes B- BP algorthm. The more general case of double classfcaton uses logstc neurons at the nput and output layer. Then the BP dervaton requres the slghtly more general logstc-cross-entropy performance measure. We used logstc cross-entropy E log for double classfcaton tranng because the nput and output neurons were logstc (rather than softmax: E log K y ln a y ( y ln( a y. (4 C. Mxed Case: Classfcaton and Regresson We last derve the B-BP learnng algorthm for the mxed case of a neural classfer networ n the forward drecton and a regresson networ n the bacward drecton. Ths mxed case descrbes the common case of neural mage classfcaton. The user need only add bacward-regresson tranng to allow the same classfer net to predct whch mage nput produced a gven output classfcaton. Bacward regresson estmates ths answer as the centrod of the nverse settheoretc mappng or pre-mage. The B-BP algorthm acheves ths by alternatng between mnmzng E f and mnmzng E b. The forward error E f s the same as the cross entropy n the double-classfcaton networ above. The bacward error E b s the same as the squared error n the double-regresson networ of the prevous secton. The nput space s lewse the I -dmensonal real space R I for regresson. The output space uses -n-k bnary encodng for classfcaton. Regresson networs use dentty functons as actvaton functons. Classfer networs use softmax actvatons. The forward pass sends the nput vector x through the hdden layer to the output layer. The nput actvaton vector equals x. We agan consder only a sngle hdden layer for smplcty. The nput o h to the th hdden neuron s the same as (2. The actvaton a h of the th hdden layer s the ordnary logstc actvaton n (27. Equaton (4 represents the nput o y to the th output neuron. The output actvaton s softmax. So the output actvaton a y s the same as n (78. The forward error E f s the cross entropy n (79. The forward pass n ths case s the same as the forward pass for the double classfcaton. So (9, (97, (, and (5 gve the dervatve of the forward error E f wth respect to u, by, w, and b x. The bacward pass propagates the -n-k vector y from the output through the hdden layer to the nput layer. The output a x layer actvaton vector a y s equals to y. The nput o hb to the

10 th hdden neuron for the bacward pass s the same as (6 and ( gves the actvaton ahb for the th hdden unt n the bacward pass. Equaton (8 gves the nput oxb for the th nput neuron. The actvaton axb of the th nput neuron for the bacward pass s the same as (. The bacward error Eb s the squared error n (2. The bacward pass n ths case s the same as the bacward pass for the double classfcaton. So (55, (58, (6, and (64 gve the dervatve of the bacward error Eb wth respect to w, bx, u, and by. The update laws for forward classfcaton regresson tranng have the fnal form: (n+ u (n+ w bh by (n+ (n+ (n u η(ay y ah K X (n w η (ay y u ah x bh by (n (n η K X (ay y u ah η(ay y. (5 (a (6 (7 (8 The update laws for bacward classfcaton tranng have the fnal form: I X (n+ (n hb (axb u u η (9 x w a y (n+ w bx (n+ bh (n+ (n yb w η(axb x a bx (n bh (n η(axb I X (2 x (2 hb (axb. x w a η (b (22 B-BP tranng mnmzes Ef whle holdng Eb constant and then mnmzes the Eb whle holdng Eb constant. Equatons (5 (8 state the update rules for forward tranng. Equatons (9 (22 state the update rules for bacward tranng. Algorthm shows how forward learnng combnes wth bacward learnng n B-BP. IV. S IMULATION R ESULTS TABLE II: 5-Bt Bpolar Permutaton Functon. (c Input x Output t Input x Output t [ ] [ +] [ + ] [ + +] [ + ] [ + +] [ + + ] [ + + +] [ + ] [ + +] [ + + ] [ + + +] [ + + ] [ + + +] [ ] [ ] [ ] [ + ] [ + ] [ ] [+ + + ] [+ + +] [ + + +] [ + + +] [ ] [+ +] [+ + + ] [ + + ] [ ] [+ + ] [+ + +] [ +] [+ ] [+ +] [+ + ] [+ + +] [+ + ] [+ + +] [+ + + ] [ ] [+ + ] [+ + +] [+ + + ] [ ] [+ + + ] [ ] [ ] [ ] [ ] [ + ] [+ + ] [ + +] [ + + +] [+ + +] [ ] [ + + ] [+ + + ] [ + + ] [+ ] [ + +] [ ] [ + +] [ ] [+ + ] Fg. 4: Logstc-cross-entropy-based learnng for double classfcaton usng hdden neurons wth forward BP tranng, bacward BP tranng, and bdrectonal BP tranng. The traned networ represents the 5-bt permutaton functon n Table II. (a Forward BP tuned the networ wth respect to logstc cross entropy for the forward pass only Ef only. (b Bacward BP tranng tuned the networ wth respect to logstc cross entropy for the bacward pass Eb only. (c Bdrectonal BP tranng summed the logstc cross entropy for both the forward pass Ef and bacward pass Eb to update the networ parameters We tested the B-BP algorthm for double classfcaton on a 5-bt permutaton functon. We used -layer networs wth dfferent numbers of hdden neurons. The neurons used bpolar logstc actvatons. The performance measure was logstc

11 Fg. 5: B-BP tranng error for the 5-bt permutaton n Table II (a usng dfferent numbers of hdden neurons. Tranng used the doubleclassfcaton algorthm. The two curves descrbe the logstc cross entropy for the forward and bacward passes through the -layer networ. Each test used 64 samples. The number of hdden neurons ncreased from 5,, 2, 5, to. (b Fg. 7: Bdrectonal bacpropagaton double-regresson learnng for the non-nvertble target functon f (x sn x. (a The forward pass learns the functon y f (x sn x. (b The bacward pass approxmates the centrod of the values n the set-theoretc pre-mage f (y} for y values n (,. The two centrods are π2 and π2. accuracy. The forward pass of (standard BP used logstc cross entropy as ts error functon. The bacward pass dd as well. Bdrectonal BP summed the forward and bacward errors for ts ont error. We computed the test error for the forward and bacward passes. Each plotted error value averaged 2 runs. TABLE III: Forward Pass Cross-Entropy Ef Fg. 6: B-BP double-regresson approxmaton of the nvertble functon f (x.5σ(6x + +.5σ(4x.2 usng a deep 8-layer networ wth 6 hdden layers. The functon σ denotes the bpolar logstc functon n (. Each hdden layer contaned bpolar logstc neurons. The nput and output layers each used a sngle neuron wth an dentty actvaton functon. The forward pass approxmated the forward functon f. The bacward pass approxmated the nverse functon f. cross entropy. The B-BP algorthm produced ether an exact representaton or an approxmaton. The permutaton functon bectvely mapped the 5-bt bpolar vector space, }n onto tself. Table II dsplays the the permutaton test functon. We compared the forward and bacward forms of undrectonal BP wth bdrectonal BP. We also tested whether addng more hdden neurons mproved networ approxmaton Bacpropagaton Tranng Hdden Neurons Forward Bacward Bdrectonal Fgure 4 shows the results of runnng the three types of BP learnng for classfcaton on a -layer networ wth hdden neurons. The values of Ef and Eb decrease wth an ncrease n the tranng teratons for bdrectonal BP. Ths was not the case for the undrectonal cases of forward BP and bacward BP tranng. Forward and bacward tranng performed well only for functon approxmaton n ther respectve tranng drecton. Nether performed well n

12 2 TABLE IV: Bacward Pass Cross-Entropy E b Bacpropagaton Tranng Hdden Neurons Forward Bacward Bdrectonal the opposte drecton. Table III shows the E f for learnng -layer classfcaton neural networs as the number of hdden neurons grows. We agan compared the three forms of BP for the networ tranng two forms of undrectonal BP and bdrectonal BP. The forward-pass error for forward BP fell substantally as the number of hdden neurons grew. The forward-pass error of bacward BP decreased slghtly as the number of hdden neurons grew. It gave the worst performance. Bdrectonal BP performed well on the test set. Its forward-pass error also fell substantally as the number of hdden neurons grew. Table IV shows smlar error-versus-hdden-neuron results for the bacward-pass error E b. The two tables ontly show that the undrectonal forms of BP for regresson performed well only n one drecton whle the B-BP algorthm performed well n both drectons. We tested the B-BP algorthm for double regresson wth the nvertble functon f(x.5σ(6x + +.5σ(4x.2 for values of x [.5,.5]. We used a deep 8-layer networ wth 6 hdden layers for ths approxmaton. There was only a sngle dentty neuron n the nput and output layers. The error functons E f and E b were ordnary squared errors. Fgure 6 compares the bdrectonal BP approxmaton wth the target functon for both the forward pass and the bacward pass. We also tested the B-BP double-regresson algorthm on the non-nvertble functon f(x sn(x for x [ π, π]. The forward mappng f(x sn(x s a well-defned pont functon. The bacward mappng y sn (f(x s not. It defnes nstead a set-based pullbac or pre-mage f (y f (y} x R: f(x y} R. The B- BP-traned neural networ tends to map each output pont y to the centrod of ts pre-mage f (y on the bacward pass because centrods mnmze squared error and because bacward regresson tranng uses squared error as ts performance measure. Fgure 7 shows that the forward regresson learns the target functon sn(x whle the bacward regresson approxmates the centrods π 2 and π 2 of the two pre-mage sets. V. CONCLUSION Undrectonal bacpropagaton learnng extends to bdrectonal bacpropagaton learnng f the algorthm uses the approprate ont error functon for both forward and bacward passes. Ths bdrectonal extenson apples to classfcaton networs as well as to regresson networs and to ther combnatons. Most classfcaton networs n practce can easly acqure a bacward-nference capablty by addng a bacward-regresson step to ther current tranng. So most networs smply gnore ths property of ther weght structure. Data: T nput vectors x, x 2,..., x T } and T correspondng output vectors y, y 2,..., y T } such that f(x n y n. Number of hdden neurons J. Batch sze S and number of epochs R. Choose the learnng rate η. Result: Bdrectonal neural networ representaton for functon f. Intalze: Randomly select the ntal weghts W ( and U (. Randomly pc the bas weghts for nput, hdden, and output neurons b x(, b h(, b y( }. whle epoch r : R do End Select S random samples from the tranng dataset. Intalze: ΔW, ΔU, Δb x, Δb h, Δb y. FORWARD TRAINING whle batch_sze l : End S Randomly pc nput vector x l and ts correspondng output vector y l. Compute hdden layer nput o h and the correspondng hdden actvaton a h Compute output layer nput o y and the correspondng output actvaton a y Compute the forward error E f Compute the followng dervatves: W E f, U E f, b he f, and b ye f Update : ΔW ΔW + W E f ; Δb h Δb h + b he f ΔU ΔU + W E f ; Δb y Δb y + b ye f Update: W (r W (r η ΔW; U (r U (r ηδu ; b h(r b h(r ηδb h ; b y(r b y(r ηδb y Intalze: ΔW, ΔU, Δb x, Δb h, Δb y. BACKWARD TRAINING whle batch_sze l : S Pc nput vector x l and ts correspondng output vector y l. Compute hdden layer nput o hb and hdden actvaton a hb. Compute nput o xb at the nput layer and nput actvaton. Compute the bacward error E b Compute the followng dervatves: W E b, U E b, b he b, and b xe b Update : ΔW ΔW + W E b ; Δb h Δb h + b he b ΔU ΔU + W E b ; Δb x Δb x + b xe b End Update: W (r+ W (r η ΔW U (r+ U (r ηδu b x(r+ b x(r ηδb x b h(r+ b h(r ηδb h b y(r+ b y(r Algorthm : The Bdrectonal Bacpropagaton Algorthm A bdrectonal multlayer threshold networ can exactly represent permutaton mappngs f the hdden layer contans an exponental number of hdden threshold neurons. An open queston s whether these networs can represent or at least approxmate an arbtrary nvertble mappng wth fewer hdden neurons. A related queston s what specfc classes of nvertble functons can these bdrectonal networs exactly represent and whch classes they cannot represent. Another open queston s the extent to whch carefully nected nose can speed B-BP convergence and accuracy. Ths follows because of the statstcal structure of BP as a specal case of the expectaton-maxmzaton algorthm [9] and because the approprate nose can boost such hll-clmbng algorthms [2], [2]. REFERENCES [] P. J. Werbos, Beyond regresson: New tools for predcton and analyss n the behavoral scences, Doctoral Dssertaton, Appled Mathematcs, Harvard Unversty, MA, 974.

13 [2] D. Rumelhart, G. Hnton, and R. Wllams, Learnng representatons by bac-propagatng errors, Nature, pp. 2 5, 986. [] C. M. Bshop, Pattern recognton and machne learnng. sprnger, 26. [4] Y. LeCun, Y. Bengo, and G. Hnton, Deep learnng, Nature, vol. 52, pp , 25. [5] M. Jordan and T. Mtchell, Machne learnng: trends, perspectves, and prospects, Scence, vol. 49, pp , 25. [6] B. Koso, Bdrectonal assocatve memores, IEEE Transactons on Systems, Man and Cybernetcs, vol. 8, no., pp. 49 6, 988. [7] B. Koso, Neural Networs and Fuzzy Systems: A Dynamcal Systems Approach to Machne Intellgence. Prentce Hall, 99. [8] S. Y. Kung, Kernel methods and machne learnng. Cambrdge Unversty Press, 24. [9] G. Cybeno, Approxmaton by superpostons of a sgmodal functon, Mathematcs of control, sgnals and systems, vol. 2, no. 4, pp. 4, 989. [] K. Horn, M. Stnchcombe, and H. Whte, Multlayer feedforward networs are unversal approxmators, Neural networs, vol. 2, no. 5, pp , 989. [] B. Koso, Fuzzy systems as unversal approxmators, IEEE Transactons on Computers, vol. 4, no., pp. 29, 994. [2] F. Watns, The representaton problem for addtve fuzzy systems, n Proceedngs of the Internatonal Conference on Fuzzy Systems (IEEE FUZZ-95, 995, pp [] B. Koso, Generalzed mxture representatons and combnatons for addtve fuzzy systems, n Neural Networs (IJCNN, 27 Internatonal Jont Conference on. IEEE, 27, pp [4], Addtve fuzzy systems: From generalzed mxtures to rule contnua, Internatonal Journal of Intellgent Systems, 27. [5] J.-N. Hwang, J. J. Cho, S. Oh, and R. Mars, Query-based learnng appled to partally traned multlayer perceptrons, IEEE Transactons on Neural Networs, vol. 2, no., pp. 6, 99. [6] E. W. Saad, J. J. Cho, J. L. Van, and D. Wunsch, Query-based learnng for aerospace applcatons, IEEE Transactons on Neural Networs, vol. 4, no. 6, pp , 2. [7] E. W. Saad and D. C. Wunsch, Neural networ explanaton usng nverson, Neural Networs, vol. 2, no., pp. 78 9, 27. [8] Y. Yang, Y. Wang, and X. Yuan, Bdrectonal extreme learnng machne for regresson problem and ts learnng effectveness, IEEE Transactons on Neural Networs and Learnng Systems, vol. 2, no. 9, pp , 22. [9] K. Audhhas, O. Osoba, and B. Koso, Nose-enhanced convolutonal neural networs, Neural Networs, vol. 78, pp. 5 2, 26. [2] S. Kullbac and R. A. Lebler, On nformaton and suffcency, The Annals of Mathematcal Statstcs, vol. 22, no., pp , 95. [2] O. Osoba and B. Koso, The nosy expectaton-maxmzaton algorthm for multplcatve nose necton, Fluctuaton and Nose Letters, p. 657, 26. [22] R. V. Hogg, J. McKean, and A. T. Crag, Introducton to Mathematcal Statstcs. Pearson, 2. [2] O. Osoba, S. Mtam, and B. Koso, The nosy expectaton maxmzaton algorthm, Fluctuaton and Nose Letters, vol. 2, no., p. 52, 2. PLACE PHOTO HERE Bart Koso Bography text here. Olaoluwa Adgun Bography text here. PLACE PHOTO HERE

Bidirectional Representation and Backpropagation Learning

Bidirectional Representation and Backpropagation Learning Int'l Conf on Advances in Big Data Analytics ABDA'6 3 Bidirectional Representation and Bacpropagation Learning Olaoluwa Adigun and Bart Koso Department of Electrical Engineering Signal and Image Processing

More information

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results. Neural Networks : Dervaton compled by Alvn Wan from Professor Jtendra Malk s lecture Ths type of computaton s called deep learnng and s the most popular method for many problems, such as computer vson

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they

More information

Multilayer Perceptron (MLP)

Multilayer Perceptron (MLP) Multlayer Perceptron (MLP) Seungjn Cho Department of Computer Scence and Engneerng Pohang Unversty of Scence and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjn@postech.ac.kr 1 / 20 Outlne

More information

Lecture Notes on Linear Regression

Lecture Notes on Linear Regression Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume

More information

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:

More information

1 Convex Optimization

1 Convex Optimization Convex Optmzaton We wll consder convex optmzaton problems. Namely, mnmzaton problems where the objectve s convex (we assume no constrants for now). Such problems often arse n machne learnng. For example,

More information

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD CHALMERS, GÖTEBORGS UNIVERSITET SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS COURSE CODES: FFR 35, FIM 72 GU, PhD Tme: Place: Teachers: Allowed materal: Not allowed: January 2, 28, at 8 3 2 3 SB

More information

Generalized Linear Methods

Generalized Linear Methods Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set

More information

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton

More information

Classification as a Regression Problem

Classification as a Regression Problem Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class

More information

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing

CSC321 Tutorial 9: Review of Boltzmann machines and simulated annealing CSC321 Tutoral 9: Revew of Boltzmann machnes and smulated annealng (Sldes based on Lecture 16-18 and selected readngs) Yue L Emal: yuel@cs.toronto.edu Wed 11-12 March 19 Fr 10-11 March 21 Outlne Boltzmann

More information

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata

Multilayer Perceptrons and Backpropagation. Perceptrons. Recap: Perceptrons. Informatics 1 CG: Lecture 6. Mirella Lapata Multlayer Perceptrons and Informatcs CG: Lecture 6 Mrella Lapata School of Informatcs Unversty of Ednburgh mlap@nf.ed.ac.uk Readng: Kevn Gurney s Introducton to Neural Networks, Chapters 5 6.5 January,

More information

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

Hidden Markov Models & The Multivariate Gaussian (10/26/04) CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models

More information

Week 5: Neural Networks

Week 5: Neural Networks Week 5: Neural Networks Instructor: Sergey Levne Neural Networks Summary In the prevous lecture, we saw how we can construct neural networks by extendng logstc regresson. Neural networks consst of multple

More information

Ensemble Methods: Boosting

Ensemble Methods: Boosting Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement

More information

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran

More information

Online Classification: Perceptron and Winnow

Online Classification: Perceptron and Winnow E0 370 Statstcal Learnng Theory Lecture 18 Nov 8, 011 Onlne Classfcaton: Perceptron and Wnnow Lecturer: Shvan Agarwal Scrbe: Shvan Agarwal 1 Introducton In ths lecture we wll start to study the onlne learnng

More information

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Fundamentals of Neural Networks

Fundamentals of Neural Networks Fundamentals of Neural Networks Xaodong Cu IBM T. J. Watson Research Center Yorktown Heghts, NY 10598 Fall, 2018 Outlne Feedforward neural networks Forward propagaton Neural networks as unversal approxmators

More information

CHAPTER III Neural Networks as Associative Memory

CHAPTER III Neural Networks as Associative Memory CHAPTER III Neural Networs as Assocatve Memory Introducton One of the prmary functons of the bran s assocatve memory. We assocate the faces wth names, letters wth sounds, or we can recognze the people

More information

Neural networks. Nuno Vasconcelos ECE Department, UCSD

Neural networks. Nuno Vasconcelos ECE Department, UCSD Neural networs Nuno Vasconcelos ECE Department, UCSD Classfcaton a classfcaton problem has two types of varables e.g. X - vector of observatons (features) n the world Y - state (class) of the world x X

More information

Lecture 10 Support Vector Machines II

Lecture 10 Support Vector Machines II Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

Supporting Information

Supporting Information Supportng Informaton The neural network f n Eq. 1 s gven by: f x l = ReLU W atom x l + b atom, 2 where ReLU s the element-wse rectfed lnear unt, 21.e., ReLUx = max0, x, W atom R d d s the weght matrx to

More information

Gaussian Conditional Random Field Network for Semantic Segmentation - Supplementary Material

Gaussian Conditional Random Field Network for Semantic Segmentation - Supplementary Material Gaussan Condtonal Random Feld Networ for Semantc Segmentaton - Supplementary Materal Ravtea Vemulapall, Oncel Tuzel *, Mng-Yu Lu *, and Rama Chellappa Center for Automaton Research, UMIACS, Unversty of

More information

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )

C4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z ) C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z

More information

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester

Admin NEURAL NETWORKS. Perceptron learning algorithm. Our Nervous System 10/25/16. Assignment 7. Class 11/22. Schedule for the rest of the semester 0/25/6 Admn Assgnment 7 Class /22 Schedule for the rest of the semester NEURAL NETWORKS Davd Kauchak CS58 Fall 206 Perceptron learnng algorthm Our Nervous System repeat untl convergence (or for some #

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Gaussian process classification: a message-passing viewpoint

Gaussian process classification: a message-passing viewpoint Gaussan process classfcaton: a message-passng vewpont Flpe Rodrgues fmpr@de.uc.pt November 014 Abstract The goal of ths short paper s to provde a message-passng vewpont of the Expectaton Propagaton EP

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Gaussian Mixture Models

Gaussian Mixture Models Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous

More information

Multigradient for Neural Networks for Equalizers 1

Multigradient for Neural Networks for Equalizers 1 Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT

More information

Lecture 23: Artificial neural networks

Lecture 23: Artificial neural networks Lecture 23: Artfcal neural networks Broad feld that has developed over the past 20 to 30 years Confluence of statstcal mechancs, appled math, bology and computers Orgnal motvaton: mathematcal modelng of

More information

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012 MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:

More information

Boostrapaggregating (Bagging)

Boostrapaggregating (Bagging) Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod

More information

Lecture 3: Shannon s Theorem

Lecture 3: Shannon s Theorem CSE 533: Error-Correctng Codes (Autumn 006 Lecture 3: Shannon s Theorem October 9, 006 Lecturer: Venkatesan Guruswam Scrbe: Wdad Machmouch 1 Communcaton Model The communcaton model we are usng conssts

More information

Lecture 12: Discrete Laplacian

Lecture 12: Discrete Laplacian Lecture 12: Dscrete Laplacan Scrbe: Tanye Lu Our goal s to come up wth a dscrete verson of Laplacan operator for trangulated surfaces, so that we can use t n practce to solve related problems We are mostly

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton

More information

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement

Markov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 89 Fall 206 Introducton to Machne Learnng Fnal Do not open the exam before you are nstructed to do so The exam s closed book, closed notes except your one-page cheat sheet Usage of electronc devces

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

MATH 567: Mathematical Techniques in Data Science Lab 8

MATH 567: Mathematical Techniques in Data Science Lab 8 1/14 MATH 567: Mathematcal Technques n Data Scence Lab 8 Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 11, 2017 Recall We have: a (2) 1 = f(w (1) 11 x 1 + W (1) 12 x 2 + W

More information

MAXIMUM A POSTERIORI TRANSDUCTION

MAXIMUM A POSTERIORI TRANSDUCTION MAXIMUM A POSTERIORI TRANSDUCTION LI-WEI WANG, JU-FU FENG School of Mathematcal Scences, Peng Unversty, Bejng, 0087, Chna Center for Informaton Scences, Peng Unversty, Bejng, 0087, Chna E-MIAL: {wanglw,

More information

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018 INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton

More information

Multi-layer neural networks

Multi-layer neural networks Lecture 0 Mult-layer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Lnear regresson w Lnear unts f () Logstc regresson T T = w = p( y =, w) = g( w ) w z f () = p ( y = ) w d w d Gradent

More information

CSC 411 / CSC D11 / CSC C11

CSC 411 / CSC D11 / CSC C11 18 Boostng s a general strategy for learnng classfers by combnng smpler ones. The dea of boostng s to take a weak classfer that s, any classfer that wll do at least slghtly better than chance and use t

More information

SDMML HT MSc Problem Sheet 4

SDMML HT MSc Problem Sheet 4 SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be

More information

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010

Parametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010 Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton

More information

VQ widely used in coding speech, image, and video

VQ widely used in coding speech, image, and video at Scalar quantzers are specal cases of vector quantzers (VQ): they are constraned to look at one sample at a tme (memoryless) VQ does not have such constrant better RD perfomance expected Source codng

More information

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING 1 ADVANCED ACHINE LEARNING ADVANCED ACHINE LEARNING Non-lnear regresson technques 2 ADVANCED ACHINE LEARNING Regresson: Prncple N ap N-dm. nput x to a contnuous output y. Learn a functon of the type: N

More information

CS286r Assign One. Answer Key

CS286r Assign One. Answer Key CS286r Assgn One Answer Key 1 Game theory 1.1 1.1.1 Let off-equlbrum strateges also be that people contnue to play n Nash equlbrum. Devatng from any Nash equlbrum s a weakly domnated strategy. That s,

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014

COS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #16 Scribe: Yannan Wang April 3, 2014 COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture #16 Scrbe: Yannan Wang Aprl 3, 014 1 Introducton The goal of our onlne learnng scenaro from last class s C comparng wth best expert and

More information

Evaluation of classifiers MLPs

Evaluation of classifiers MLPs Lecture Evaluaton of classfers MLPs Mlos Hausrecht mlos@cs.ptt.edu 539 Sennott Square Evaluaton For any data set e use to test the model e can buld a confuson matrx: Counts of examples th: class label

More information

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models

More information

On mutual information estimation for mixed-pair random variables

On mutual information estimation for mixed-pair random variables On mutual nformaton estmaton for mxed-par random varables November 3, 218 Aleksandr Beknazaryan, Xn Dang and Haln Sang 1 Department of Mathematcs, The Unversty of Msssspp, Unversty, MS 38677, USA. E-mal:

More information

Report on Image warping

Report on Image warping Report on Image warpng Xuan Ne, Dec. 20, 2004 Ths document summarzed the algorthms of our mage warpng soluton for further study, and there s a detaled descrpton about the mplementaton of these algorthms.

More information

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system

Transfer Functions. Convenient representation of a linear, dynamic model. A transfer function (TF) relates one input and one output: ( ) system Transfer Functons Convenent representaton of a lnear, dynamc model. A transfer functon (TF) relates one nput and one output: x t X s y t system Y s The followng termnology s used: x y nput output forcng

More information

Solving Nonlinear Differential Equations by a Neural Network Method

Solving Nonlinear Differential Equations by a Neural Network Method Solvng Nonlnear Dfferental Equatons by a Neural Network Method Luce P. Aarts and Peter Van der Veer Delft Unversty of Technology, Faculty of Cvlengneerng and Geoscences, Secton of Cvlengneerng Informatcs,

More information

Lecture 3: Dual problems and Kernels

Lecture 3: Dual problems and Kernels Lecture 3: Dual problems and Kernels C4B Machne Learnng Hlary 211 A. Zsserman Prmal and dual forms Lnear separablty revsted Feature mappng Kernels for SVMs Kernel trck requrements radal bass functons SVM

More information

Learning Theory: Lecture Notes

Learning Theory: Lecture Notes Learnng Theory: Lecture Notes Lecturer: Kamalka Chaudhur Scrbe: Qush Wang October 27, 2012 1 The Agnostc PAC Model Recall that one of the constrants of the PAC model s that the data dstrbuton has to be

More information

Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia

Using deep belief network modelling to characterize differences in brain morphometry in schizophrenia Usng deep belef network modellng to characterze dfferences n bran morphometry n schzophrena Walter H. L. Pnaya * a ; Ary Gadelha b ; Orla M. Doyle c ; Crstano Noto b ; André Zugman d ; Qurno Cordero b,

More information

Lossy Compression. Compromise accuracy of reconstruction for increased compression.

Lossy Compression. Compromise accuracy of reconstruction for increased compression. Lossy Compresson Compromse accuracy of reconstructon for ncreased compresson. The reconstructon s usually vsbly ndstngushable from the orgnal mage. Typcally, one can get up to 0:1 compresson wth almost

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

APPENDIX A Some Linear Algebra

APPENDIX A Some Linear Algebra APPENDIX A Some Lnear Algebra The collecton of m, n matrces A.1 Matrces a 1,1,..., a 1,n A = a m,1,..., a m,n wth real elements a,j s denoted by R m,n. If n = 1 then A s called a column vector. Smlarly,

More information

Singular Value Decomposition: Theory and Applications

Singular Value Decomposition: Theory and Applications Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

SL n (F ) Equals its Own Derived Group

SL n (F ) Equals its Own Derived Group Internatonal Journal of Algebra, Vol. 2, 2008, no. 12, 585-594 SL n (F ) Equals ts Own Derved Group Jorge Macel BMCC-The Cty Unversty of New York, CUNY 199 Chambers street, New York, NY 10007, USA macel@cms.nyu.edu

More information

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix Lectures - Week 4 Matrx norms, Condtonng, Vector Spaces, Lnear Independence, Spannng sets and Bass, Null space and Range of a Matrx Matrx Norms Now we turn to assocatng a number to each matrx. We could

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Why feed-forward networks are in a bad shape

Why feed-forward networks are in a bad shape Why feed-forward networks are n a bad shape Patrck van der Smagt, Gerd Hrznger Insttute of Robotcs and System Dynamcs German Aerospace Center (DLR Oberpfaffenhofen) 82230 Wesslng, GERMANY emal smagt@dlr.de

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Multilayer neural networks

Multilayer neural networks Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205 Multlayer

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations

1 Derivation of Rate Equations from Single-Cell Conductance (Hodgkin-Huxley-like) Equations Physcs 171/271 -Davd Klenfeld - Fall 2005 (revsed Wnter 2011) 1 Dervaton of Rate Equatons from Sngle-Cell Conductance (Hodgkn-Huxley-lke) Equatons We consder a network of many neurons, each of whch obeys

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

Problem Set 9 Solutions

Problem Set 9 Solutions Desgn and Analyss of Algorthms May 4, 2015 Massachusetts Insttute of Technology 6.046J/18.410J Profs. Erk Demane, Srn Devadas, and Nancy Lynch Problem Set 9 Solutons Problem Set 9 Solutons Ths problem

More information

Neural Networks. Perceptrons and Backpropagation. Silke Bussen-Heyen. 5th of Novemeber Universität Bremen Fachbereich 3. Neural Networks 1 / 17

Neural Networks. Perceptrons and Backpropagation. Silke Bussen-Heyen. 5th of Novemeber Universität Bremen Fachbereich 3. Neural Networks 1 / 17 Neural Networks Perceptrons and Backpropagaton Slke Bussen-Heyen Unverstät Bremen Fachberech 3 5th of Novemeber 2012 Neural Networks 1 / 17 Contents 1 Introducton 2 Unts 3 Network structure 4 Snglelayer

More information

Relevance Vector Machines Explained

Relevance Vector Machines Explained October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes

More information

STAT 3008 Applied Regression Analysis

STAT 3008 Applied Regression Analysis STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,

More information

EM and Structure Learning

EM and Structure Learning EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder

More information

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Grover s Algorithm + Quantum Zeno Effect + Vaidman Grover s Algorthm + Quantum Zeno Effect + Vadman CS 294-2 Bomb 10/12/04 Fall 2004 Lecture 11 Grover s algorthm Recall that Grover s algorthm for searchng over a space of sze wors as follows: consder the

More information

Errors for Linear Systems

Errors for Linear Systems Errors for Lnear Systems When we solve a lnear system Ax b we often do not know A and b exactly, but have only approxmatons  and ˆb avalable. Then the best thng we can do s to solve ˆx ˆb exactly whch

More information

Generative classification models

Generative classification models CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn

More information

Neural Networks. Class 22: MLSP, Fall 2016 Instructor: Bhiksha Raj

Neural Networks. Class 22: MLSP, Fall 2016 Instructor: Bhiksha Raj Neural Networs Class 22: MLSP, Fall 2016 Instructor: Bhsha Raj IMPORTANT ADMINSTRIVIA Fnal wee. Project presentatons on 6th 18797/11755 2 Neural Networs are tang over! Neural networs have become one of

More information

Lecture 3. Ax x i a i. i i

Lecture 3. Ax x i a i. i i 18.409 The Behavor of Algorthms n Practce 2/14/2 Lecturer: Dan Spelman Lecture 3 Scrbe: Arvnd Sankar 1 Largest sngular value In order to bound the condton number, we need an upper bound on the largest

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

One-sided finite-difference approximations suitable for use with Richardson extrapolation

One-sided finite-difference approximations suitable for use with Richardson extrapolation Journal of Computatonal Physcs 219 (2006) 13 20 Short note One-sded fnte-dfference approxmatons sutable for use wth Rchardson extrapolaton Kumar Rahul, S.N. Bhattacharyya * Department of Mechancal Engneerng,

More information

The Basic Idea of EM

The Basic Idea of EM The Basc Idea of EM Janxn Wu LAMDA Group Natonal Key Lab for Novel Software Technology Nanjng Unversty, Chna wujx2001@gmal.com June 7, 2017 Contents 1 Introducton 1 2 GMM: A workng example 2 2.1 Gaussan

More information

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg prnceton unv. F 17 cos 521: Advanced Algorthm Desgn Lecture 7: LP Dualty Lecturer: Matt Wenberg Scrbe: LP Dualty s an extremely useful tool for analyzng structural propertes of lnear programs. Whle there

More information

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan

Kernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems

More information