Feedforward Neural Networks

Size: px

Start display at page:

Download "Feedforward Neural Networks"

Loreen Norman
6 years ago
Views:

1 Feedfrward Neural Netwrks Yagmur Gizem Cinar, Eric Gaussier AMA, LIG, Univ. Grenble Alpes 17 March 2017 Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

2 Reference Bk Deep Learning Ian Gdfellw and Yshua Bengi and Aarn Curville MIT Press 2016 Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

3 Table f Cntents 1 Feedfrward Neural Netwrks - Multilayer Perceptrns 2 XOR Eample 3 Gradient-Based Learning 4 Hidden Units 5 Architecture Design 6 Back-Prpagatin Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

4 Feedfrward Neural Netwrks - Multilayer Perceptrns Multilayer Perceptrns Feedfrward Neural Netwrks, Deep feedfrward Netwrks Gal t apprimate functin f y = f () (1) Classificatin y {c 1, c 2,... c K } Regressin y R A feedfrward netwrk Feedfrward: thrugh f and finally y N feedback cnnectins as recurrent neural netwrk y = f (; θ) (2) Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

5 Feedfrward Neural Netwrks - Multilayer Perceptrns Multilayer Perceptrns Feedfrward Neural Netwrks netwrk: cmpsing different functins a directed acyclic graph e.g. f (1), f (2), and f (3) f () = f (3) (f (2) (f (1) ())) f (1) is 1 st layer f (2) is 2 nd layer final layer is called utput layer ther layers are called hidden layers length f the chain is the depth f the netwrk width is the dimensinality f the hidden layers Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

6 Feedfrward Neural Netwrks - Multilayer Perceptrns Feedfrward Neural Netwrks lsely inspired by Neurscience Many units acts at the same time Each unit receives frm many ther units and cmputes its wn activatin MLP as a functin apprimatin machines designed t generalize well Linear mdels + fit efficiently and reliable with cnve ptimizatin - limited t linear functin One way t btain nnlinearity is a mapping φ can be learned with deep learning y = f (; θ, w) = φ(; θ) T w (3) θ parameters f φ w R n parameters f desired map frm φ() t y Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

7 Feedfrward Neural Netwrks - Multilayer Perceptrns Eample: Learning XOR XOR is nt linearly separable Original space Learned 1 2 h h 1 left Figure Figure 1: XOR in6.1, space. Figure 6.1: Slving the XOR prblem by learning a representa learned functin m (Gdfellw 2017) 1 Ian Gdfellw, Yshua Bengi, and Aarn Curville. Deep Learning. printed n the plt indicate MIT Press,the 2016.value that the Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

8 XOR Eample Eample: Learning XOR X = {[0, 0] T, [0, 1] T, [1, 0] T, [1, 1] T } XOR is nt linearly separable XOR target functin y = f () mdel functin y = f (; θ) XOR MSE lss functin J(θ) = 1 (f () f (; θ)) 2 4 X If mdel is a linear single-layer with ne unit f (; θ) = T w + b Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

9 XOR Eample Eample: Learning XOR A single-layer with ne hidden unit als called perceptrn: f (; θ) = T w + b cannt separate XOR Linear separability (5) Eample: N = 4, d = 2: 2 4 = 16 dichtmies 14 dichtmies are linearly separable (everything but XOR) Curse - Artificial Neural Netwrks 28 Figure 2: XOR is nt linearly separable 2. 2 Jhan Suykens. Lecture ntes in Artificial Neural Netwrks Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

10 XOR Eample Eample: Learning XOR APTER 6. DEEP FEEDFORWARD NETWORKS Netwrk Diagrams A single layer with tw hidden units y y w h1 h2 h W 1 2 Figure ure 6.2: An eample f a feedfrward drawn in Figure 3: netwrk, Netwrk diagrams. tw diﬀerent styles. Specific is the feedfrward netwrk we use t slve the XOR eample. It has a single hid r cntaining tw units. (Left)In this style, we draw every unit as a nde in the gr 3 Ian Gdfellw, Yshua Bengi, and Aarn Curville. Deep Learning. s style is very eplicit and unambiguus MIT Press, but 2016.fr netwrks larger than this eam Yagmur Gizem t Cinar, Eric Gaussier space. (Right)In Multilayer Perceptrns 17 March 10 / 42 an cnsume much this (MLP) style, we draw a nde in 2017 the graph (Gdfellw 2017)

11 XOR Eample Eample: Learning XOR One hidden layer with tw hidden units h = f (1) (; W, c) y = f (2) (h; w, b) f (, W, c, w, b) = f (2) (f (1) ()) W and w weights f a linear transfrmatin b and c biases f (1) () = W T f (2) (h) = h T w f () = w T W T (intercept/bias terms ignred) Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

12 XOR Eample Eample: Learning XOR Fr a nnlinearity: activatin functin g h = g(w T + c) A rectified linear unit (ReLU) is the activatin functin fr many feedfrward netwrks g(z) = ma{0, z} Rectified Linear Activatin g(z) = ma{0,z} 0 Figure 6.3 Figure 4: ReLU activatin functin 4. 0 z (Gdfellw 2017) Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

13 XOR Eample Eample: Learning XOR Cmplete netwrk Let f (, W, c, w, b) = w T ma0, W T + c [ ] 1 1 W = 1 1 [ ] 0 c = 1 [ ] 1 w = 2 b = 0 Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

14 XOR Eample Eample: Learning XOR Design matri X 0 0 X = XW = XW + c = Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

15 XOR Eample Eample: Learning XOR 0 0 ma{0, XW + c} = w T ma{0, XW + c} + b = Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

16 Gradient-Based Learning Gradient-Based Learning In real life billins f mdel parameters Gradient-based ptimizatin algrithm prvide slutin with little errr Nnlinearity leads t a nncnve lss functin Trained by iterative gradient-based ptimizers Glbal cnvergence is nt guaranteed Sensitive t initializatin f parameters initialize weights with small randm values bias can be 0 r small psitive values e.g. 0.1 Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

17 Gradient-Based Learning Gradient-Based Learning Cst functin: J(w, b) = E,y ˆpdata lgp mdel (y ) Mstly negative lg-likelihd as a cst functin S, minimizing the cst leads t maimum likelihd estimatin Crss entrpy between the training data and mdel s predictin as a cst functin Typically ttal cst cmpsed f crss entrpy and regularizatin Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

18 Gradient-Based Learning Gradient-Based Learning Cst functins mean squared errr (MSE) Mean abslute errr (MAE) f = argmin E,y pdata y f () 2 f f = argmin E,y pdata y f () 1 f Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

19 Gradient-Based Learning 5 Ian Gdfellw, Yshua Bengi, and Aarn Curville. Deep Learning. Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42 Gradient-Based Learning Output Types Output units The chice f utput functin determines the crss entrpy Output Type Output Distributin Output Layer Binary Bernulli Sigmid Discrete Multinulli Sftma Cntinuus Gaussian Linear Cntinuus Cntinuus Miture f Gaussian Arbitrary Miture Density See part III: GAN, VAE, FVBN Cst Functin Binary crssentrpy Discrete crssentrpy Gaussian crssentrpy (MSE) Crss-entrpy Varius Figure 5: Output units 5. (Gdfellw 2017)

20 Hidden Units Hidden Units Hidden units are cmpsed f input vectrs cmputing an affine transfrmatin z = W T + b element-wise nnlinear functin g(z) Sme activatin functins g(z) are nt differentiable at all pints e.g. ReLU nt differentiable at z = 0 But still perfrm well in practice Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

21 Hidden Units Hidden Units Activatin Functin Why perfrm well in practice? Training algrithms nt usually reach t the glbal minimum (nncnve) CHAPTER 4.d NUMERICAL COMPUTATION but reduce it significantly Apprimate Optimizatin f () This lcal minimum perfrms nearly as well as the glbal ne, s it is an acceptable halting pint. Ideally, we wuld like t arrive at the glbal minimum, but this might nt be pssible. This lcal minimum perfrms prly and shuld be avided. Figure 4.3: Optimizatin algrithms Figure may fail 4.3 t find a glbal minimum when there are 6 multiple lcal minima 6: r plateaus present. In the cntet f deep learning, we generally Figure Apprimate ptimizatin. accept such slutins even thugh they are nt truly minimal, s lng as they crrespnd t significantly lw values f the cst functin. (Gdfellw 2017) Nndifferentiable at a small number f pints critical pints are pints where every element f the gradient is equal t zer. ImplementatinThereturn 1 fr nndifferentiable inputs directinal derivative in directin u (a unit vectr) is the slpe f the 6 Ian functin f in directin u. In ther wrds, the directinal derivative is the derivative Gdfellw, f Yshua Bengi, Aarn Curville. Deepat Learning. the functin f ( + and u) with respect t, evaluated = 0. Using the rule, we can see fmit ( + u) evaluates t u> r f () when = 0. Press, T minimize Yagmur Gizem Cinar, Eric Gaussier f, we wuld likeperceptrns t find the directin in which f decreases the Multilayer (MLP) 17 March / 42

22 Hidden Units Rectified Linear units g(z) = ma{0, z} + Easy t ptimize, clse t linear units + Derivatives thrugh ReLU remain large when active + Derivative is 1 when active + Secnd rder derivative is 0 almst everywhere + Derivative mre useful withut secnd-rder effects ReLU cannt learn via gradient when activatin is 0 Typically applied t affine transfrmatin h = g(w T + c) small psitive b e.g. b = 0.1 enables t allw derivatives t pass > generalized ReLUs t have gradient everywhere Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

23 Hidden Units Generalizatins f Rectified Linear Units 3 generalizatins based n nnzer slpe α i when z i < 0 h i = g(z, α) i = ma(0, z i ) + α i min(0, z i ) Abslute value rectificatin α i = 1 and g(z) = z Leaky ReLU α i a small value like 0.01 Parametric ReLU (PReLU) α i is a learnable parameter Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

24 Hidden Units Maut units Maut units divide z int grups f k values Fr each utput is the maimum element f these grups g(z) i = ma j G (i) z j where G (i) is set f indices fr grup i, {(i 1)k + 1,..., ik} Maut can learn piecewise linear cnve functin up t k pieces Maut generalize rectified units further Requires mre regularizatin cmpared t rectified units Maut with number f elements in grup and large number f eamples can wrk withut regularizatin Net layer can get k times smaller Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

25 Hidden Units Sigmidal Activatin Functins Lgistic sigmid σ Hyperblic tangent tanh e eamples: σ(z) = ep( z) Activatin functins σ( ) 2 tanh(z) = 1 ep( 2z) 1 + ep( 2z) sigmid 2 tanh sat 2 sign 1 1 Figure 7: Sigmid σ and tanh activatin functins Jhan Suykens. Lecture ntes in Artificial Neural Netwrks Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

26 Hidden Units Gradient-Based Learning Sigmidal units tanh(z) = 2σ(2z) 1 sigmid predict that a binary variable is 1 Sigmidal units saturates large and gradient-based learning difficult tanh is mre preferable than σ fr hidden units Cmpatible with the gradient-based learning with a cst functin that can und the saturatin as utput units are mre cmmn in recurrent netwrks, many prbabilistic mdel and autencders Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

27 Architecture Design Architecture Design Architecture Overall netwrk structure Hw many units Hw t cnnect t each ther Layer is rganized grups f units Mstly in a chain structure h (1) = g (1) (W (1)T + b (1) ) h (2) = g (2) (W (2)T h (1) + b (2) ) Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

28 Architecture Design Architecture Design Architecture Main design chices depth f the netwrk width f each layer Deeper netwrks fewer units per layer fewer parameters tend t be harder t ptimize Ideal netwrk architecture via eperimentatin guided by mnitring the validatin errr Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

29 Architecture Design Architecture Design Universal apprimatin therem A feedfrward netwrk with a linear utput At least ne hidden layer with squashing activatin functin with enugh number f hidden units can apprimate any cntinuus functin n a clsed and bunded subset f R n with any desired (nnzer) errr MLP able t represent functin f interest thugh learning nt guaranteed by the training ptimizatin algrithm might fail t find the crrespnding parameter values verfitting Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

30 Architecture Design Architecture Design Universal apprimatin therem and Depth A feedfrward netwrk with ne layer can represent any functin being infeasible large Deeper mdels reduce the number f units and can reduce the generalizatin errr The number f linear regins carved ut by a deep rectifier netwrk 8 O (( ) d(l 1) ) n n d d where input dimensin d, depth l, units per hidden layer n Number f linear regins fr a maut netwrk with k filters per unit O ( k (l 1)+d) 8 Guid Mntúfar et al. On the Number f Linear Regins f Deep Neural Netwrks. In: Prceedings f the 27th Internatinal Cnference n Neural Infrmatin Prcessing Systems - Vlume 2. NIPS 14. Mntreal, Canada: MIT Press, 2014, pp Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

31 Architecture Design Architecture Design Better Generalizatin with Greater Depth Empirically deeper netwrks generalize better Test accuracy (percent) Figure 8: Effect f depth 9. Figure 6.6 Layers (Gdfellw 2017) 9 Ian Gdfellw, Yshua Bengi, and Aarn Curville. Deep Learning. MIT Press, Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

32 Architecture Design Architecture Design Large, Shallw Mdels Overfit Mre Test accuracy (percent) , cnvlutinal 3, fully cnnected 11, cnvlutinal Number f parameters 10 8 Figure 9: Effect f number f parameters 10. Figure 6.7 (Gdfellw 2017) 10 Ian Gdfellw, Yshua Bengi, and Aarn Curville. Deep Learning. MIT Press, Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

33 Back-Prpagatin Back-Prpagatin Frward prpagatin is flw frm t ŷ During training frward prpagatin cntinue nward until cst J(θ) backprp frm cst J(θ) t netwrk backwards t cmpute the gradient Numerical evaluatin f analytical gradient epressin cmputatinally epensive Backprp makes it simple and inepensive Backprp is a methd f cmputing gradient Ntatin f (, y) is the gradient f an arbitrary functin f is a set f variable derivatives desired y is input t functin derivatives nt desired θ J(θ) is gradient f cst functin Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

34 Back-Prpagatin Back-prpagatin Simple Back-Prp Eample Cmpute lss y Cmpute activatins Frward prp h 1 1 h 2 2 Back-prp Cmpute derivatives Figure 10: Back-prpagatin 11. (Gdfellw 2017) 11 Ian Gdfellw, Yshua Bengi, and Aarn Curville. Deep Learning. MIT Press, Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

35 Back-Prpagatin Back-prpagatin Cmputatinal Graphs Each nde a variable A variable might be scalar, vectr, matri r tensr An peratin a simple functin f ne r mre variables Functins mre cmple, cmpsed f many peratins directed edge frm t y indicates used t calculate y Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

36 Back-Prpagatin Back-prpagatin CHAPTER 6. DEEP FEEDFORWARD NETWORKS Cmputatin Graphs Eamples f Cmputatinal Graphs y Multiplicatin Lgistic regressin z u(1) u(2) + dt y w (a) (b) u(2) H ReLU layer relu U (1) U (2) y u(1) dt matmul W (c) b Figure 6.8 u(3) sum + X b sqr Linear regressin and weight decay w (d) Figure 6.8: Eamples f cmputatinal graphs. (a)the graph using the peratin t cmpute z = y. (b)the graph fr the lgistic regressin predictin y = > w 12 +b. Sme f the intermediate epressins d nt have names in the algebraic epressin but need names in the graph. We simply name the i-th such variable u(i). (c)the cmputatinal graph fr the epressin H = ma{0, XW + b}, which cmputes a design 12 matri f rectified linear unit activatins H given a design matri cntaining a minibatch f inputs X. (d)eamples a c applied at mst ne peratin t each variable, but it is pssible t apply mre than ne peratin. Here we shw a cmputatin graph that applies mre than ne peratin t the weights w f a linear regressin mdel. P The Yagmur Gizem Cinar, weights Eric Gaussier Multilayer (MLP) are used t make bth the predictin y andperceptrns the weight decay penalty wi2. (Gdfellw 2017) Figure 11: Cmputatin Graphs. Ian Gdfellw, Yshua Bengi, and Aarn Curville. Deep Learning. MIT Press, March / 42

37 Back-Prpagatin Back-prpagatin Back-prpagatin is a chain rule f calculus Highly efficient is a real number and f, g : R R, y = g() and z = f (g()) = f (y) Chain rule: dz d = dz dy dy d Fr R m, y R n, g : R m R n and f : R n R z i = j z y j y j where y z = is n m Jacbian matri f g ( ) T y y z Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

38 Back-Prpagatin Repeated Subepressins Repeated Subepressins Input w R, f : R R, = f (w), y = f (), z = f (y) z f f (6.51) =f 0 (y)f 0 ()f 0 (w) (6.52) =f 0 (f(f(w)))f 0 (f(w))f 0 (w) (6.53) w Back-prp avids cmputing this twice Figure 6.9 Figure 12: Repeated Subepressins 13. (Gdfellw 2017) 13 Ian Gdfellw, Yshua Bengi, and Aarn Curville. Deep Learning. MIT Press, Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

39 Back-Prpagatin Symbl-t-symbl derivatives Algebraic and graph-based representatins are symblic representatins Symbl-t-number differentiatin: Trch, Caffe Symbl-t-Symbl Symbl-t-symbl Thean, Tensrflw CHAPTERdifferentiatin: 6. DEEP FEEDFORWARD NETWORKS Diﬀerentiatin z f z Figure 6.10 f f0 y f y dz dy f f0 f dy d dz d f f0 w w d dw dz dw (Gdfellw 2017) Figure 6.10: An eample f the symbl-t-symbl apprach t cmputing derivatives. In this apprach, the back-prpagatin algrithm des nt need t ever access any 14 actual specific numeric values. Instead, it adds ndes t a cmputatinal graph describing hw t cmpute these derivatives. A generic graph evaluatin engine can later cmpute the derivatives fr any specific numeric values. (Left)In this eample, we begin with a graph representing z = f (f (f (w))). (Right)We run the back-prpagatin algrithm, instructing dz it t cnstruct the graph fr the epressin crrespnding t dw. In this eample, we d nt eplain hw the back-prpagatin algrithm wrks. The purpse is nly t illustrate Gdfellw, Yshua Bengi, and Aarn Deep Learning. what the desired result is: a cmputatinal graphcurville. with a symblic descriptin f the derivative. Figure 13: Symbl-t-symbl eample. 14 Ian MIT Press, Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

40 Algrithm 6.3 Frward prpagatin Back-Prpagatin thrugh a typical deep neural netwrk and the cmputatin f the cst functin. The lss L(ŷ, y) depends n the utput ŷ and n the target y (see sectin fr eamples f lss functins). T btain the ttal cst J, the lss may be added t a regularizer Ω(θ ), where θ cntains all the parameters (weights and biases). Algrithm 6.4 shws hw t cmpute gradients f J with respect t parameters W and b. Fr simplicity, this demnstratin uses nly a single input eample. Practical applicatins shuld use a minibatch. See sectin fr a mre realistic demnstratin. Require: Netwrk depth, l Require: W ( i ), i {1,..., l }, the weight matrices f the mdel Require: b ( i ), i {1,..., l }, the bias parameters f the mdel Require:, the input t prcess Require: y, the target utput h (0) = fr k = 1,..., l d a ( k ) = b ( k ) + W ( k) h ( k 1) h ( k ) = f( a ( k ) ) end fr ( l) ŷ = h J = L(ŷ, y) + λω( θ) Frward Pass Fully Cnnected MLP Figure 14: Frward Pass Algrithm fr a MLP Ian Gdfellw, Yshua Bengi, and Aarn Curville. Deep Learning. MIT Press, Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

41 rithm 6.3, which uses, in additin t the input, a target y. This cmputatin Back-Prpagatin yields the gradients n the activatins a ( k) fr each layer k, starting frm the utput layer and ging backwards t the first hidden layer. Frm these gradients, which can be interpreted as an indicatin f hw each layer s utput shuld change t reduce errr, ne can btain the gradient n the parameters f each layer. The gradients n weights and biases can be immediately used as part f a stchastic gradient update (perfrming the update right after the gradients have been cmputed) r used with ther gradient-based ptimizatin methds. Backward Pass Fully Cnnected MLP After the frward cmputatin, cmpute the gradient n the utput layer: g ˆy J = ˆy L(ŷ, y) fr k = l, l 1,..., 1 d Cnvert the gradient n the layer s utput int a gradient int the prennlinearity activatin (element-wise multiplicatin if f is element-wise): g a ( k) J = g f (a ( k ) ) Cmpute gradients n weights and biases (including the regularizatin term, where needed): b ( k) J = g + λ b ( k) Ω( θ) W ( k) J = g h ( k 1) + λ W ( k) Ω( θ) Prpagate the gradients w.r.t. the net lwer-level hidden layer s activatins: g h ( k 1) J = W ( k) g end fr Figure 15: Backward Pass Algrithm fr a MLP Symbl-t-Symbl Derivatives Algebraic epressins and cmputatinal graphs bth perate n symbls, r variables that d nt have specific values. These algebraic and graph-based representatins are called symblic representatins. When we actually use r train a neural netwrk, we must assign specific values t these symbls. We 16 replace a symblic input t the netwrk with a specific numeric value, such as Ian Gdfellw, [1. 2, Yshua, 1. 8]. Bengi, and Aarn Curville. Deep Learning. Sme appraches t back-prpagatin MIT Press, take a cmputatinal graph and a set f Yagmur Gizem Cinar, numerical Eric Gaussier values fr the inputsmultilayer t the graph, Perceptrns then(mlp) return a set f numerical values 17 March / 42

42 Back-Prpagatin Net Week Recurrent Neural Netwrks Questins? References Ian Gdfellw, Yshua Bengi, and Aarn Curville. Deep Learning. MIT Press, Guid Mntúfar et al. On the Number f Linear Regins f Deep Neural Netwrks. In: Prceedings f the 27th Internatinal Cnference n Neural Infrmatin Prcessing Systems - Vlume 2. NIPS 14. Mntreal, Canada: MIT Press, 2014, pp Jhan Suykens. Lecture ntes in Artificial Neural Netwrks Yagmur Gizem Cinar, Eric Gaussier Multilayer Perceptrns (MLP) 17 March / 42

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares