arxiv: v1 [cs.cv] 9 Nov 2017

Size: px

Start display at page:

Download "arxiv: v1 [cs.cv] 9 Nov 2017"

Magdalen Poole
5 years ago
Views:

1 Feed Forward and Backward Run n Deep Convoluton Neural Network Pushparaja Murugan School of Mechancal and Aerospace Engneerng, Nanyang Technologcal Unversty, Sngapore arxv:703278v [cscv] 9 Nov 207 Abstract pushpara00@entuedusg Convoluton Neural Networks (CNN), known as ConvNets are wdely used n many vsual magery applcaton, object classfcaton, speech recognton After the mplementaton and demonstraton of the deep convoluton neural network n Imagenet classfcaton n 202 by krzhevsky, the archtecture of deep Convoluton Neural Network s attracted many researchers Ths has led to the major development n Deep learnng frameworks such as Tensorflow, caffe, keras, theno Though the mplementaton of deep learnng s qute possble by employng deep learnng frameworks, mathematcal theory and concepts are harder to understand for new learners and practtoners Ths artcle s ntended to provde an overvew of ConvNets archtecture and to explan the mathematcal theory behnd t ncludng actvaton functon, loss functon, feedforward and backward propagaton In ths artcle, grey scale mage s taken as nput nformaton mage, ReLU and Sgmod actvaton functon are consdered for developng the archtecture and cross-entropy loss functon s used for computng the dfference between predcted value and actual value The archtecture s developed n such a way that t can contan one convoluton layer, one poolng layer, and multple dense layers Keywords: Deep learnng, ConvNets, Convoluton Neural Netowrk, Forward and backward propogaton Nomenclature α ŷ L+ Learnng rate Predcated value

2 L σ a b b L+ b l C c D D 2 D n Dm c Dm p e f (x) f(x) H H H 2 Loss or cost functon Actvaton functon Summaton Non-lnearly transformed of net nput Bas- parameter Bas matrx of fnal layer n fully connected layer Bas value of th neuron at l th layer Channel of mage Depth of convoluton kernel Depth of convoluton layer Depth of poolng layer Number of poolng layer kernel Dmenson of convoluton layer Dmenson of poolng layer Exponental Frst dervatve Functon Wdth of mage Heght of convoluton layer Heght of poolng layer, j Adjecent neurons n fully connected layer k k p,q k k 2 K D L l L + l n Wdth and heght of poolng layer kernel Convoluton Kernel bank Wdth of convoluton kernel Heght of convoluton kernel Number of kernel Fnal layers n fully connected layer Frst layers n fully connected layer Classfcaton layer n fully connected layer Vectorzed poolng layer Last neurons n fully connected layer 2

3 p P p,q q t Number of convoluton kernel Poolng Kernel bank Number of convoluton layer Total number of tranng samples u, v Pxels of kernel W w W l W L+ W W 2 w l x y y L+ y z Z P Z S Heght of mage Wght- parameter Wght matrx of frst layer n fully connected layer Wght matrx of fnal layer n fully connected layer Wdth of convoluton layer Wdth of poolng layer Wghts of th node at l th layer Input sgnal Matrx of actual labled value of tranng set Matrx of predcted value Actual value from labelled tranng set Lnearly transformed net Inputs of fully connected layer Value of Zeropaddng Value of strde Introducton The study of neural networks, human behavor, and perceptons has started n the early 950s Over the decades, dfferent types of neural networks were developed such as Elman, Hopfeld and Jordan networks for approxmatng complex functons and recognzng patterns n the late 970s [] [2] [3] However, recent development n neural networks profoundly showed ncredble results n object classfcaton, pattern recognzaton, and natural language processng The advancement n computer vson and the deep Convoluton Neural Networks are wdely used many applcaton such as cancer cell classfcaton, medcal mage processng applcaton, star cluster classfcaton, self-drvng cars and number plate recognton CovnNets are bo-nspred artfcal neural networks developed on mathematcal representaton to analyze vsual magery, pattern recognton, and speech recognton Unlke machne learnng, CovnNets can be fed wth raw mage pxel values rather than feature vectors as nput [4] The basc desgn prncple of CovnNets s developng an archtecture and learnng algorthm n such way that t reduces the number of the parameter wthout compromsng the computatonal power of learnng algorthm [5] As the name refers, t conssts of the lnear mathematcal operaton of convoluton followed by non-lnear actvators, poolng layers, and deep neural network classfer The convoluton processes act as approprate feature detectors that demonstrate the ablty to deal wth a large 3

4 amount of low-level nformaton A complete convoluton layer has dfferent feature detectors so that multple features can be extracted from the same mage A sngle feature detector s smaller n sze as compares wth the nput mages s sld over the mages for the convoluton operaton Hence, all of the unts n that feature detector share the same weght and bas That wll help to detect same features n all of the ponts n the mage That gves the propertes of nvarance to transformaton and shft of the mages [6] Local connectons between the pxels are used many tmes n an archtecture Wth local respectve feld, neurons can extract the elementary features such as the orentaton of edges and corners and end ponts So that hgher degree of complex features s detected n hdden layers when ts combned n hdden layers These functons of sparse connectvty between subsequent layers, parameter sharng of weghts between the adjacent pxels and equvarent representaton enable CNN to use effcently n mage reorganzaton and mage classfcaton problems [7] [8] 2 Archtecture Fgure 2: Archtecture Convoluton Neural Network 2 Convoluton layers Convoluton layers are set of parallel feature maps, formed by sldng dfferent kernel (feature detector) over an nput mage and projectng the element-wse dot as the feature maps [9] Ths sldng process s known as strde Z s Ths kernel bank s smaller n sze as compares wth the nput mage and are overlapped on the nput mage whch prompts the parameters such as weght and bas sharng between the adjacent pxel of the mage as well as control the dmensons of feature maps Usng the small sze of kernels, however often result n mperfect overlays and lmt the power of the learnng algorthm Hence, Zero paddng Z p process usually mplemented to control the sze of the nput mage Zero paddng wll control the feature maps and kernels dmensons ndependently by addng zero to nput symmetrcally [0] Durng the tranng of algorthm, set of kernel flters, known as flter bank wth the dmenson of (k, k 2, c), slde over the fxed sze (H, W, C) nput mage The strde and zero paddng are the crtcal measures to control the dmenson of the convoluton layers As a result feature maps are produced whch are stacked together to form the convoluton layers The dmenson of the convoluton layer can be computed by followng Eqn 2 4

5 Dm c (H, W, D ) = (H + 2Z P k )/Z S + ), (W + 2Z P k 2 )/Z S + ), K D (Eq 2) 22 Actvaton functons Actvaton functon defnes the output of a neuron based on gven a set of nputs Weghted sum of lnear net nput value s passed through an actvaton functon for non-lnear transformaton A typcal actvaton functon s based on condtonal probablty whch wll return the value one or zero as a output op {P (op = p) or P (op = 0 p)} When the net nput nformaton p cross the threshold value, the actvaton functon returns to value one and t passes the nformaton to the next layers If the net nput p value s below the threshold value, t returns to value zero and wll not pass the nformaton Based on ths segregaton of relevent and rrelevent nformaton, the actvaton functon decdes whether the neuron should actvate or not Hgher the net nput value greater the actvaton Dfferent types of actvaton functons are developed and used for dfferent applcaton Some of the commonly used actvaton functon are gven n the Table 23 Poolng layers Poolng layer refers to downsamplng layer whch combnes the output of the neuron cluster at one layer to sngle neuron n the next layer Poolng operatons carred out after the nonlnear actvaton where the poolng layers help to reduce the number of data ponts and to avod overfttng It also act as a smoothng process from whch unwanted nose can be elmnated Most commonly Max poolng operaton s used Addton to that average poolng and L 2 norm poolng operaton are also used n some cases When D n number of kernel wndows and the strde value of Z S s employed to develop poolng layers, the dmenson of the poolng layer can be computed by, Dm p (H 2, W 2, D 2 ) = (H k)/z S + ), (W k)/z S + ), D n (Eq 22) 24 Fully connected dense layers After the poolng layers, pxels of poolng layers s stretched to sngle column vector These vectorzed and concatnated data ponts are fed nto dense layers,known as fully connected layers for the classfcaton The functon of fully connected dense layers s smlar to Deep Neural Neworks The archtecture of CovnNets s gven n Fgure 2 Ths type of constrant archtecture wll profcently surpass the classcal machne learnng algorthms n mage classfcaton problems [] [2] 25 Loss or cost functon Loss functon maps an event of one or more varable onto a real number assocated wth some cost Loss functon s used to measure the performance of the model and nconsstency between actual y and predcted value ŷ L+ Performance of model ncreses wth the decrease value of loss functon 5

6 Name Functons Dervatves Fgure Sgmod σ(x) = +e x f (x) = f(x)( f(x)) 2 tanh σ(x) = ex e x e z +e z f (x) = f(x) 2 ReLU f(x) = { 0 f x < 0 x f x 0 f (x) = { 0 f x < 0 f x 0 Leaky ReLU f(x) = { 00x f x < 0 x f x 0 f (x) = { 00 f x < 0 f x 0 Softmax f(x) = ex j ex f (x) = ex j ex (ex ) 2 ( j ex ) 2 Table : Non-lnear actvaton functon If the output vector of all possble output s y = {0, } and an event x wth set of nput vector varable x = (x, x 2 x t ), then the mappng of x to y s gven by, L(ŷ L+, y ) = t =t (y, (σ(x), w, b)) (Eq 23) = where L(ŷ L+, y ) s loss functon Many types of loss functons are developed for varous applcatons and some are gven below 25 Mean Squared Error Mean Squared Error or known as quadratc loss functon, s mostly used n lnear regresson models to measure the performance If ŷ L+ s the computed output value of t tranng sample and y s the correspondng labeled value, then the Mean Squared Error(MSE) s gven by, L(ŷ L+, y ) = t =t (y ŷ L+ ) 2 (Eq 24) = 6

7 Downsde of the MSE s, tends to suffer from slow learnng speed (slow convergence) when t ncorprated wth Sgmod actvaton functon 252 Mean Squared Logarthmc Error Mean Squared Logarthmc Error(MSLE) s also used to measure performance of the model 253 L 2 Loss functon L(ŷ L+, y ) = t =t = (log(y + ) log(ŷ L+ )) 2 (Eq 25) L 2 loss functon s square root of L 2 norm of the dfference between actual labeled value and computed value from the net nput and s gven by, 254 L Loss functon =t L(ŷ L+, y ) = (y ŷ L+ ) 2 (Eq 26) = L loss functon s sum of absolute errors of the dfference between actual labeled value and computed value from the net nput and s expressed as, 255 Mean Absolute Error =t L(ŷ L+, y ) = y ŷ L+ (Eq 27) Mean Absolute Error s used to measure the proxmty of the predctons and actual values, whch s expressed by, L(ŷ L+, y ) = t = 256 Mean Absolute Percentage Error Mean Absolute Percentage Error s gven by, L(ŷ L+, y ) = t =t = =t = y ŷ L+ (Eq 28) ( y ŷ L+ ) 00 (Eq 29) y Major downsde of MAPE s, nablty to perform when there are zero values 257 Cross Entrophy The most commonly used loss functon s Cross Entropy loss functon and s expaned below If the probablty of output y s n the tranng set label y L+ ˆ s, P (y a l ) = L+ ˆ t = and the 7

8 the probablty of output y s not n the tranng set label y L+ ˆ s, P (y z l ) = y L+ ˆ = 0 [3] The expected label s y, than Hence, P (y z l ) = ŷ L+ y ( ŷ L+ ) ( y) (Eq 20) log P (y z l ) = log((ŷ L+ ) (yt) ( ŷ L+ ) ( y) ) (Eq 2) To mnmze the cost functon, = (y ) log(ŷ L+ ) + ( y ) log( ŷ L+ t ) (Eq 22) log P (y z l ) = log((ŷ L+ ) (y) ( ŷ L+ ) ( y) ) (Eq 23) In case of tranng samples, the cost functon s, L(ŷ L+, y ) = t =t 3 Learnng of CovnNets 3 Feed - Forward run ((y ) log(ŷ L+ ) + ( y ) log( ŷ L+ )) (Eq 24) (Eq 25) Feed forward run or propogaton can be explaned as mutplyng the nput value by randomly ntated weghts and addng randomly ntated bas values of each connecton of every neurons followed by summaton of all the products of all the neurons Then passng the net nput value through non-lnear actvaton functons In a dscrete color space, mage and kernel can be represented as a 3D tensor wth the dmenson of (H, W, C) and (k, k 2, c) where m, n, c are represent the m th, n th pxel n c th channel Frst two ndces are ndcate the spatal co-ordnates and last ndex s ndcate the color channel If a kernel s slded over the color mage, the multdmensonal tensor convoluton operaton can be expressed as, (I K) j = n m= n= c= Convoluton process s ndcated by sympol For grey scale mage, convoluton process can be expressed as, (I K) j = m= n= C K m,n,c I +m,j+n,c (Eq 3) n K m,n I +m,j+n (Eq 32) A kernel bank ku,v p,q s slded over the mage I m,n wth strde value of and zero paddng value of 0 The feature maps of the convoluton layer Cm,n p,q can be computed by, C p,q m,n = n m= n= I (m u,n v) K p,q u,v + b p,q (Eq 33) 8

9 Fgure 3: Convoluton Neural Network These feature maps are passed through a non-lnear actvaton functon σ, C p,q m,n = σ( n m= n= I (m u,n v) K p,q u,v + b p,q ) (Eq 34) where σ s a ReLU actvaton fucnton Poolng layer Pm,n p,q s developed by takng out the maxmum valued pxels m, n n the convoluton layers The poolng layer can be calculated by, P p,q m,n = max(c p,q m,n) (Eq 35) The poolng layer P p,q s concatenated to form a long vector wth the length of p q and s fed nto fully connected dense layers for the classfcaton, then the vecotozed data ponts a l n l layer s gven by, a l = f(p p,q ) (Eq 36) Ths long vector s fed nto a fully connected dense layers from l layer to L + If the fully connected dense layers s developed wth L number of layers and n number of neurons, then l s the frst layer, L s the last layer and (L + ) s the classfcaton layer as shown n the fgure 32, the forward run between the layers are gven by, z l = wa l l + w2a l l w l z2 l = w2a l l + w22a l l w l z l = w l a l + w l ja l j 2j al + + b l j (Eq 37) al + + b l j (Eq 38) + + w l 2j al + + b l j (Eq 39) 9

10 Fgure 32: Forward run n fully connected layer z l w l w2 l w3 l w l n z l = w l w2 l w3 l wn l a l a l b l + b l (Eq 30) Consder a sngle neuron (j) n a fully connected layer at layer l as gven n the Fg33 The nput values a l are multpled and added by weghts w j and bas values b l j respectvely Then the fnal net nput value z l are passed through a non-lnear actvaton functon σ Then the correspondng output value a l j s computed by, zj l = wja l l + w2ja l l w l j a l + + b l j (Eq 3) Where z l s the nput of the actvaton functon for the neuron j at layer l, n zj l = wja l l j + b l (Eq 32) Hence, the output of l th layer s, a l j = σ( = n = w l ja l j + b l ) (Eq 33) (Eq 34) a l = σ((w l ) T a l + b l ) (Eq 35) 0

11 Inputs a l w l j Bas b l j a l 2 w l 2j Σz l j Actvate functon σ(z l j ) Output a l j a l w l j Weghts Fgure 33: Forward run n a neuron j at l t h layer a l = σ(z l ) (Eq 36) where a l s, a l σ(z l ) a l = a l = σ(z l) (Eq 37) W l s, w l j W l = wj l (Eq 38) In ths same manner, the output value of last leyer L s gven by, a L = σ((w L ) T a L + b L ) (Eq 39) where, a L = σ(z L ) (Eq 320) a L σ(z a L = L ) a L ị = σ(z L) (Eq 32) Expandng ths to classfcaton layers, fnal output predcted value ŷ L+ L + layer can be expressed as, of a neuron unt () at ŷ L+ = σ(w L σ(w 2 (σ(w a + b ) + b 2 + b L )) (Eq 322)

12 If the predcted value s ŷ L+ and the actual labeled value s y, than the performance of the model can be computed by the followng loss functon equaton, From the Eqn24, cross-entropy loss functon s, L(ŷ L+, y ) = t 32 Backward run =t ((y ) log(ŷ L+ ) + ( y ) log( ŷ L+ )) (Eq 323) (Eq 324) Backward run, also known as backward propogaton s referred to backward propogaton of errors whch use gradent descent to compute the gradent of the loss functon wth respect to the parameters such as weght and bas and s shown n the Fg 34 Durng the backward propogaton, gradent of loss functon of fnal layers wth respect to the parameters s computed frst where the gradent of frst layer s computed last Also, the partal dervatve of one layers s reused n computaton of partal dervatve of another layers by chan rule whch wll lead to effcent computaton of gradent at each layers Ths wll be used to mnmze the loss functon Performance of model ncreases as the loss functon value decreses [4] [5] [6] In the back propogaton, the paramters such as W L+, b L+, W l, b l,, k p,q and b p,q are needed to be update n order to mnmze the cost functon Fgure 34: Back propogaton n fully connected layer 2

13 Partal dervatve of loss functon of th neuron at classfcaton layer L + wth respect to predcted values ŷ L+ s,, y ) y L+ = t =t ( ((y t log(ŷ L+ ) + ( y ) log((l ŷ L+ )) ŷ L+ (Eq 325), y ) ŷ L+ = t =t y ŷ L+ + y ŷ L+ (Eq 326) In case of multclass categorcal classfcaton problem, the lost functon of classfcaton layer L + s,,y ) ŷ L+ 2,y 2) ŷ L+ 2 L(y L+,y ) ŷ L+ = t y + y ŷ L+ ŷ L+ t y 2 + y2 ŷ L+ 2 ŷ L+ 2 t y ŷ L+ + y ŷ L+ (Eq 327) Partal dervate of cost functon wth respect to weght w L+, of th neuron n fnal layer L, For convnent purpose, the notaton of the weght of L th layer s denoted as w L,, y ) w L+ = t, =t, y ) ŷ L+ ŷ L+ w L+, (Eq 328) = t ( y ŷ L+ + y ŷ L+ )( ŷl+ w L+ ) (Eq 329), = t ( y ŷ L+ + y ŷ L+ )( al+ t w L+ ) (Eq 330), = t ( y ŷ L+ + y ŷ L+ )( σ(zl+ t ) w L+ ) (Eq 33), = t ( y ŷ L+ + y ŷ L+ )σ (z L+ ) (Eq 332) = t ( y ŷ L+ + y ŷ L+ )σ ( w, a L + b L ) (Eq 333) = 3

14 In ths fnal layer L th, sgmod actvaton functon s utlzed for non-lnear transformaton From the Table, Sgmod actvaton funton s wrtten as, σ(z L+ ) = + exp zl+ (Eq 334) The dervatve of the sgmod functon s expressed as, σ(z L+ ) (z L+ ) = (z L+ ) +exp zl+ (Eq 335) Substutng the Eqn366 n Eqn333, = σ(z L+ )( σ(z L+ ) (Eq 336), y ) w, L = t where ( y ŷ L+ + y ŷ L+ )(σ( w, a L + b L )( σ( w, a L + b L )) = = (Eq 337) ŷ L+ = a L+ = σ(z L+ ), y ) w, L = t ( y y y L+ + σ( = w, a L + b L ) )(σ( w, a L + b L )( σ( w, a L + b L )) = = (Eq 338), y ) w, L = t ŷ L+ (σ( w, a L + b L y ) (Eq 339) = Hence, the partal dervatve loss functon wth respect to weghts of every neuron n L th layers s expressed as,,y ) w,0 L t 2,y 2) t ŷl+ (σ(z L+ y ) t w2, 2, y) L t ŷl+ 2 (σ(z2 L+ y 2 ) W L = = t (Eq 340),y ) t w, L ŷl+ (σ(z L+ y ) Partal dervatve of cost functon wth respect to bas b l n th neuron at L th layer s,, y ) b L = t, y ) ŷ L+ ŷ L+ b L (Eq 34) 4

15 = t y ŷ L+ + ŷ y L+ ( ŷl+ b L ) (Eq 342), y ) b L = σ(z L+ ) y (Eq 343) Partal dervatve of cost functon wth respect to bas of every neurons at L th s wrtten as,,y ) b L σ(z L+ ) y 2,y 2) b L 2 σ(z2 L+ ) y 2 b L = = (Eq 344) L(ŷ L+,y ) σ(z L+ ) y b L In ths same way, partal dervatves of loss functon wth respect to all of hdden neruons and hdden layers can be calculated ReLU non-lnear actvaton functon s used n all of the hdden layers from l to L Partal dervatve of loss functon wth respect to weght of th neuron at frst layer l of fully connected dense layer, y t ) w l, = L(ŷL+, y ) y L+ ŷ L+ w, l (Eq 345) = t y ŷ L+ + y ŷ L+ ( ŷl+ w, l ) (Eq 346) = t y ŷ L+ + y ŷ L+ ( al+ w, l ) (Eq 347) = t y ŷ L+ + y ŷ L+ σ(z L+ ) w l, (Eq 348) = t y ŷ L+ + y ŷ L+ σ (z l ) (Eq 349), y t ) w, l = t y ŷ L+ + y ŷ L+ σ (z l ) (Eq 350), y t ) w, l = t y ŷ L+ + y ŷ L+ σ ( w, a l + b l ) (Eq 35) = 5

16 Snce, ReLU actvaton functon s used, than the dervatve of ReLU actvaton functon s, From the Table, { σ 0 f x < 0 (z) = (Eq 352) f x 0 If z > 0,, y ) w, l = y z l z l( zl ) (Eq 353) Hence, partal dervatve of loss functon wth respect to weght of all neuron at l th layer s, W l =,y ) w,0 l 2,y 2) w2, l,y ) w, l = y z l z l ( zl ) y 2 z l 2 z2 l ( zl 2 ) y z l z l ( zl ) Partal dervatve of loss functon wth respect to bas of th neuron at l th layer s, (Eq 354), y ) b l = L(ŷL+, y ) ŷ L+ ŷ L+ b L (Eq 355) = t y ŷ L + + y ŷ L+ ( ŷl+ b l ) (Eq 356), y ) b l = σ(z l ) y (Eq 357) where σ s a ReLU non-lnear actvaton functon, hence, f z > 0,, y ) b l = z l y (Eq 358) Hence, the partal dervatves of loss functon wth respect to bas at the layer l s, b l =,y ) b l 2,y ) b l,y ) b l z l y z2 l y = z l y (Eq 359) 6

17 In order to perform the learnng of ConvNets, t s also neccessary to update the kernel bank weghts and bas value n convoluton layers as well as n poolng layers, Partal dervatve of loss functon wth respect to nput value a l s, from the (Eq3), L(y L+ t, y t ) a l = L(yL+ t, y t ) y L+ t y L+ t a l (Eq 360), y t ) a l = t ( y ŷ L + y ŷ L+ ) yl+ a l (Eq 36), y ) a l = t ( y ŷ L + y ŷ L+ ) ( wl, al + b L ) a l (Eq 362) = t ( y t+ yt+ L + + y t+ y L+ )w, l (Eq 363) t+ For all nput values a l at l th layer,, y ) a l = t ( y t+ yt+ L + + y t+ y L+ )W l (Eq 364) t+ Reshapng the long vector L(yL+ t,y t) a l P p,q L(yL+ t, y t ) = f a l (Eq 365) Prmary functon of poolng layer s reduce the number of parameters and also to control the overfttng of the model Hence, no learnng takes place n poolng layers The poolng layer error s computed by acqurng sngle value wnnng unt Snce, there are no parameters are needed to be updated n poolng layer, upsamplng can be done to obtan L(yL+ t,y t) Cm,n p,q, y t ) Cm,n p,q = P p,q (Eq 366) Partal dervatve of loss functon wth respect to convoluton kernel k p,q u,v s,, y t ) ku,v p,q = m= n= n, y t ) C p,q m,n C p,q m,n k p,q u,v (Eq 367), y t ) ku,v p,q = m= n= n, y t ) C p,q m,n σ( u v u= v= I m u,j vkuv p,q + b p,q ) ku,v p,q (Eq 368) 7

18 , y ) ku,v p,q = m= n= n, y t ) I m u,j v (Eq 369) C p,q m,n Updated weght of kernel k p,q u,v can be obtaned by rotatng the mage to 80 deg, y t ) ku,v p,q = n m= n= roti m u,n v L(ŷL+, y t ) C p,q m,n (Eq 370) k p,q = rot80 o I L(ŷL+, y t ) Cm,n p,q (Eq 37) Partal dervatve of lost functon wth respect to bas b p,q of convoluton kernel s, = m= n=, y ) b p,q = n, y ) C p,q m,n m= n= n, y t ) C p,q m,n C p,q m,n b p,q (Eq 372) σ( u v u= v= I m u,j vkuv p,q + b p,q ) b p,q (Eq 373) b p,q = L(ŷL+, y ) b p,q = n m= n=, y ) C p,q m,n (Eq 374) 33 Parameter updates In order to mnmze the loss functon, t s necessary to update the learnng parameter at every teraton process on the bass of gradent descent Though varous optmzaton technques are developed to ncrease the learnng speed, ths artcle s consdered only gradent descent optmzaton The weght and bas update of fully connected dense layer L + s gven by, W L+ = W L+ α L(ŷL+, y) W L (Eq 375) b L+ = b L+ α L(ŷL+, y) b L (Eq 376) The weght and bas update of fully connected dense layer l s gven by, W l = W l α L(ŷL+, y) W l (Eq 377) b l = b l α L(ŷL+, y ) b l (Eq 378) The weght and bas update of convoluton kernel l s gven by, k p,q = k p,q α L(ŷL+, y) k p,q u,v (Eq 379) Where α s the learnng rate b p,q = α L(ŷL+, y) b p,q (Eq 380) 8

19 4 Concluson In ths artcle, an overvew of a Convoluton Neural Network archtecture s explaned ncludng varous actvaton fucntons and loss functons Step by step procedure of feed forward and backward propogaton s explaned elobrately For mathametcal smplcty concern, Grey scale mage s taken as nput nformaton, kernel strde value s taken as, Zeropaddng value s taken as 0, non-lnear transformaton of ntermedate layer and fnal layers are carred out by ReLU and sgmod actvaton functons Cross entrohpy loss functon s used as a performance measure of the model However, there are numerous optmazaton and regularzaton procedure to mnmze the loss functon, to ncrease the learnng rate and to avod the overfttng of the model, ths artcle s an attempt of only consderng the formulaton of typcal Convoluton Neural Network archtecture wth gradent descent optmzaton References [] D O Hebb, The organzaton of behavor: A neuropsychologcal theory Psychology Press, 2005 [2] J J Hopfeld, Neural networks and physcal systems wth emergent collectve computatonal abltes, Proceedngs of the natonal academy of scences, vol 79, no 8, pp , 982 [3] H D Smon, Parttonng of unstructured problems for parallel processng, Computng systems n engneerng, vol 2, no 2-3, pp 35 48, 99 [4] Y LeCun, Y Bengo, et al, Convolutonal networks for mages, speech, and tme seres, The handbook of bran theory and neural networks, vol 336, no 0, p 995, 995 [5] Y LeCun et al, Generalzaton and network desgn strateges, Connectonsm n perspectve, pp 43 55, 989 [6] Y LeCun, P Haffner, L Bottou, and Y Bengo, Object recognton wth gradent-based learnng, Shape, contour and groupng n computer vson, pp , 999 [7] Y LeCun, Y Bengo, and G Hnton, Deep learnng, Nature, vol 52, no 7553, pp , 205 [8] C M Bshop, Neural networks for pattern recognton Oxford unversty press, 995 [9] A Krzhevsky, I Sutskever, and G E Hnton, Imagenet classfcaton wth deep convolutonal neural networks, n Advances n neural nformaton processng systems, pp , 202 [0] I Goodfellow, Y Bengo, and A Courvlle, Deep learnng MIT press, 206 [] D C Cresan, U Meer, J Masc, L Mara Gambardella, and J Schmdhuber, Flexble, hgh performance convolutonal neural networks for mage classfcaton, n IJCAI Proceedngs-Internatonal Jont Conference on Artfcal Intellgence, vol 22, p 237, Barcelona, Span, 20 [2] J Schmdhuber, Deep learnng n neural networks: An overvew, Neural networks, vol 6, pp 85 7, 205 [3] P-T De Boer, D P Kroese, S Mannor, and R Y Rubnsten, A tutoral on the crossentropy method, Annals of operatons research, vol 34, no, pp 9 67,

20 [4] D E Rumelhart, G E Hnton, R J Wllams, et al, Learnng representatons by backpropagatng errors, Cogntve modelng, vol 5, no 3, p, 988 [5] F J Pneda, Generalzaton of back propagaton to recurrent and hgher order neural networks, n Neural nformaton processng systems, pp 602 6, 988 [6] Y LeCun, L Bottou, Y Bengo, and P Haffner, Gradent-based learnng appled to document recognton, Proceedngs of the IEEE, vol 86, no, pp ,

EEE 241: Linear Systems

EEE 241: Linear Systems EEE : Lnear Systems Summary #: Backpropagaton BACKPROPAGATION The perceptron rule as well as the Wdrow Hoff learnng were desgned to tran sngle layer networks. They suffer from the same dsadvantage: they