A MODIFIED HIGHER-ORDER FEED FORWARD NEURAL NETWORK WITH SMOOTHING REGULARIZATION

Size: px
Start display at page:

Download "A MODIFIED HIGHER-ORDER FEED FORWARD NEURAL NETWORK WITH SMOOTHING REGULARIZATION"

Transcription

1 A MODIFIED HIGHER-ORDER FEED FORWARD NEURAL NETWORK WITH SMOOTHING REGULARIZATION Kh.Sh. Mohamed, W. Wu, Y. Liu Abstract: This paper proposes an offline gradient method with smoothing L 1/ regularization for learning and pruning of the pi-sigma neural networks (PSNNs). The original L 1/ regularization term is not smooth at the origin, since it involves the absolute value function. This causes oscillation in the computation and difficulty in the convergence analysis. In this paper, we propose to use a smooth function to replace and approximate the absolute value function, ending up with a smoothing L 1/ regularization method for PSNN. Numerical simulations show that the smoothing L 1/ regularization method eliminates the oscillation in computation and achieves better learning accuracy. We are also able to prove a convergence theorem for the proposed learning method. Key words: convergence, pi-sigma neural network, offline (batch) gradient method, smoothing L 1/ regularization Received: July 3, 16 Revised and accepted: November 13, 17 DOI: /NNW Introduction Pi-sigma neural networks (PSNNs) [4, 1 1] as a kind of higher-order neural networks can provide more powerful mapping capability than conventional feedforward neural networks, and have been successfully applied to many applications such as the equalization of nonlinear satellite channels, the real classification of seafloor sediments, and the image coding. The PSNN was introduced by Ghosh and Shin [11]. It computes the product of sums of the input layer, instead of the sums of the products. And the weights connecting the product layers and the summation layers are fixed to 1. There are two practical ways to accomplish the gradient weights updating in the network learning process: the online gradient approach, in which weights are updated promptly after a training sample is supplied into the network; and the offline gradient approach, where the network weights are updated after all the Khidir Shaib Mohamed Corresponding author; Wei Wu; Dalian University of Technology, School of Mathematical Sciences, Dalian 1164, China, China khshm7@yahoo.com wuweiw@dlut.edu.cn Yan Liu; Dalian Polytechnic University, School of Information Science and Engineering, Dalian 11634, China, China liuyan@dlpu.edu.cn c CTU FTS

2 Neural Network World 6/17, training samples have been treated by the network (cf. [15, 16]). We shall use the offline gradient approach in this paper. Various kinds of regularization terms have been used in many applications, such as the method in [] in terms of weight decay [5,6], the methods based on Akaike s information criterion (AIC) [3, 13], and the method employing model prior in the Bayesian structure [8]. These regularization terms are inserted into the standard cost function for network learning to improve the learning ability and to get a sparse network. Many regularization terms take the form of the L p norm of the weights, leading to the following new error function E(W) = Ē(W) + λ W p p, (1) where Ē(W) is a usual error function depending on the weights W of the network, W p = ( n w k p ) 1 p is the p-norm of the weights of the network, and λ is the regularization parameter. In particular, the L regularizer is the earliest regularization method applied for variable selection, and its solution is the most sparse. But it is a combinatory optimization problem and is shown to be NP-hard [14]. The popular L 1 regularizer is discussed in [9]. The L 1/ regularizer is suggested in [17] and, among many favourable properties, it is shown to result in better sparsity than the L 1 regularizer. A smoothing L 1/ regularizer is proposed in [1, 7]. And in the numerical experiments, it behaves equally good as or even better than the L 1 regularizer and the standard L 1/ regularizer for neural networks. Therefore, in this paper, we shall use the smoothing L 1/ regularizer for the network regularization. The organization of this paper is as follows. In section, we describe PSNN and the offline gradient method with smoothing L 1/ regularizer. In Section 3, a convergence theorem is given. Section 4 contains some supporting simulation results. The proof of the convergence theorem is given in Section 5. Finally, a brief conclusion is provided in Section 6.. Offline gradient method with smoothing L 1/ regularization In this section, we describe the network structure of PSNN and the off-line gradient method with smoothing L 1/ regularization..1 Error function with L 1/ regularization Let p, n and 1 respectively be the dimensions of the input layer, the summation layer and the product layer of PSNN. Denote by ω = (ω 1,..., ω p ) T R p (1 n) the weight vector connecting the input layer and the k-th summing unit, and write ω = (ω T 1,..., ω T n ) R np. Note that the weights from summing units to product unit are fixed to 1. Let g : R R be a given activation function. The topological structure of SPNN algorithm is shown in Fig

3 Mohamed Kh.Sh., W. Wu, Y. Liu: A modified higher-order feed forward neural... Fig. 1 Pi-sigma neural network structure. For an input data vector x R p, the network output is ( )) y = g (ω i x i = g (ω x), () i=1 where ω x is the usual inner products of ω and x. Suppose that we are supplied with a set of training samples {x l, O l } L R p R, where O l is the desired ideal output for the input x l. By adding an L 1/ regularization term into the the usual error function, the final error function takes the form E(ω) = 1 O l g (ω x l ) + λ ω i 1/. (3) i=1 The partial derivative of the above error function with respect to ω i is E ωi (ω) = (ω x l ) (ω k x l )x l i + λsgn(ω i), (4) ω i 1/ i = 1,,, p; = 1,, n; where λ > is the regularization parameter, E ωi (ω) = E(ω) ω i and (t) = (O l g(t))g (t). Starting from an arbitrary initial value ω, the offline gradient method with L 1/ regularization term updates the weights ω m iteratively by with ωi m = η ω m+1 i = ω m i + ω m i, (5) (ω m x l ) (ωk m x l )x l i + λsgn(ωm i ), (6) ω m i 1/ 579

4 Neural Network World 6/17, i = 1,,, p; = 1,,, n; m =, 1, where again η > is the learning rate.. Error function with smoothing L 1/ regularization The usual L 1/ regularization is non-differentiable at the origin, since it involves the absolute value function. We replace the absolute value function by a smooth function f(t) defined below: t, t a; f(t) = 1 8a t a t + 3 8a, a < t < a; (7) t, t a; where a is a small positive constant. Then we have 1, t a; f (t) = 1 a t at, a < t < a; 1, t a; { f, t a; (t) = 3 a t a, t a; and f(t) [ 3 8 a, ), f (t) [ 1, 1], f (t) [, 3 a ]. Now, the new error function with smoothing L 1/ regularization term is E(ω) = 1 O l g (ω x l ) + λ f 1/ (ω i ). (8) i=1 The partial derivative of the error function E(ω) in Eq. (8) with respect to ω i is E ωi (ω) = ( (ω x)) (ω k x l )x l i + λf (ω i ) f 1/ (ω i ), (9) i = 1,,, p; = 1,,, n. Starting from an arbitrary initial value ω, the offline gradient method with smoothing L 1/ regularization term updates the weights ω m iteratively by with ωi m = η ω m+1 i = ω m i + ω m i, (1) (ω m x) (ωk m x l )x l i + λf (ωi m) f 1/ (ωi m), (11) i = 1,,, p; = 1,,, n; m =, 1, where again η > is the learning rate. 58

5 Mohamed Kh.Sh., W. Wu, Y. Liu: A modified higher-order feed forward neural Main results The following conditions will be used later on to prove the convergence theorem. Proposition 1. (t), δ l (t) C, g(t), g (t), g (t) C 1, t R, 1 l L. Proposition. max{ x l, ω m x l } C, 1 n, 1 l L, m =, 1, Proposition 3. The parameters η and λ are chosen to satisfy: < η < Mλ+C where M = 6 4 a 3 and C = C / + C 3. Proposition 4. There exists a compact set Φ such that ω m Φ {ω Φ : E ω (ω) = } contains finite points. Φ and the set Theorem 5. Let the error function E(ω) be defined by Eq. (8), and the weight {ω m } be generated by the iteration algorithm Eq. (1) for an arbitrary initial value ω. If propositions 1-3 are valid, we have the following error estimates: (i) E ) E(ω m ) (ii) There exists E such that lim m E(ω m ) = E (iii) lim m E ωi (ω m ) =, i = 1,,..., p; = 1,,..., n. Furthermore, if proposition 4 is also valid, we have the following strong convergence: (iv) There exists a point ω Φ such that lim m ω m = ω. 4. Numerical simulations To illustrate the efficiency of our proposed learning method, we consider the numerical simulations of two classification problems (XOR and Parity problems) and two function approximation problems (Gabor and Mayas functions). We compared our batch gradient method with smoothing L 1/ regularization (BGSL1/) with the batch gradient method with L 1/ regularization (BGL1/) and batch gradient method with L regularization (BGL). 4.1 Classification problems The activation function is chosen as g(x) = 1/(1+e x ). Each of the three algorithms takes 1 trials for XOR and parity problems, respectively. Typical performances are shown in Figs. 5. For the XOR problem the network has input nodes, summation nodes, and 1 output node, with the learning rate η =.6 and the regularization parameter λ =.1. For the parity problem, the network has 4 input nodes, 5 summation nodes, and 1 output node, with the learning rate η =.4 and the regularization parameter λ =.1. For both the two cases, the initial weights are selected randomly within the interval [.5,.5], and the maximum iteration number is 3. From Figs. 5, we see that the proposed learning method BGSL1/ enoys the best learning accuracy, while BGL is the worst. In particular, as predicted by 581

6 Neural Network World 6/17, BGL BGL1/ BGSL1/ 1 1 Error function Number of iterations Fig. Learning errors of different algorithms for XOR problem. 1 BGL BGL1/ BGSL1/ 1 1 Norm of gradient Number of iterations Fig. 3 Norms of gradient of different algorithms for XOR problem BGL BGL1/ BGSL1/ Error function Number of iterations Fig. 4 Learning errors of different algorithms for parity problem. 58

7 Mohamed Kh.Sh., W. Wu, Y. Liu: A modified higher-order feed forward neural... 1 BGL BGL1/ BGSL1/ 1 1 Norm of gradient Number of iterations Fig. 5 Norms of gradient of different algorithms for parity problem. Theorem 5, the error of BGSL1/ decreases monotonically and the gradient of the error function goes to zero in the learning process. Tabs. I and II show the results of the average errors over the 1 trials and the average norms of gradients for each learning algorithms. The Average Numbers of neurons Eliminated (ANE in brief) by the pruning over the 1 trials are also shown in Tabs. I and II. The comparison convincingly shows that BGSL1/ is more efficient and has better sparsity-promoting property than BGL1/ and BGL. Algorithm Average training error Norm of gradient ANE BGSL1/ 1.336e BGL1/ e BGL 1.776e Tab. I Numerical results for solving XOR problem. Algorithm Average training error Norm of gradients ANE BGSL1/ 5.98e BGL1/ 1.169e BGL.4495e Tab. II Numerical results for solving Parity problem. 4. Approximation of Gabor function Now, we try to approximate the following Gabor function (cf. Fig. 6) F (x, y) = 1 π(.5) exp ( x + y (.5) ) cos(π(x + y)). 583

8 Neural Network World 6/17, Fig. 6 Gabor function. 36 input training samples are selected from an evenly 6 6 grid on.5 x.5 and.5 y.5. Similarly, the input test samples are 56 points selected from the grid on.5 x.5 and.5 y.5. The neural network has 3 input nodes, 6 summation nodes, and 1 output node. The initial weights are randomly chosen from [.5,.5]. The learning rate η =.6, and the regularization parameter λ =.1. The maximum iteration number is 3. Typical network approximations by using the three algorithms are plotted in Figs We observe that our method BGSL1/ gives the best approximation of the Gabor function. Tab. III presents the average training errors, the average test errors, and the ANE, over the 1 trials for the three learning algorithms. Again, we see that our method BGSL1/ has the best accuracy and the best sparsitypromoting property. Algorithm Average training error Average test error ANE BGSL1/ BGL1/ BGL Tab. III Numerical results for approximating Gabor function. 4.3 Approximation of Mayas function This example is to approximate the following Mayas function F (x, y) =.6(x + y ).48x y. 64 input training samples are selected from an evenly 8 8 grid on.5 x.5 and.5 y.5. Similarly, the input test samples are 9 points selected from the 3 3 grid on.5 x.5 and.5 y.5. The neural network has 3 input nodes, 5 summation nodes, and 1 output node. The initial weights are 584

9 Mohamed Kh.Sh., W. Wu, Y. Liu: A modified higher-order feed forward neural Fig. 7 Gabor function Approximation performane by BGSL1/ Fig. 8 Gabor function Approximation performane by BGL1/ Fig. 9 Gabor function Approximation performane by BGL. 585

10 Neural Network World 6/17, randomly chosen from [.5,.5]. The learning rate η =.9, and the regularization parameter λ =.1. The maximum iteration number is 1. Tab. IV shows that, among the three algorithms, BGSL1/ exhibits the best approximation accuracy and the best sparsity for the Mayas function. Algorithms Average training error Average test error ANE BGSL1/ BGL1/ BGL Tab. IV Numerical results for approximating Mayas function. 5. Proofs In this section, the convergence theorem of the smoothing L 1/ regularization algorithm is proved. First,we give two lemmas. Lemma 6. Suppose that the propositions 1 and are satisfied and {ω m } is generated by Eq. (1), then we have (ω m x l ) C n i=1 ωi m, (1) and (ω m x l ) ( 1 η + C 3) i=1 (ω m x l ) ωi m. (13) Proof. Using the Taylor expansion to first and second orders, we have (ω m x l ) = (ω m x l ) t k ( ω m x l ) (14) 586

11 Mohamed Kh.Sh., W. Wu, Y. Liu: A modified higher-order feed forward neural... = = (ωk m x l )( ω m x l ) + 1 s=1 s k s t,s k ( ωm x l )( ωs m x l ) (ωk m x l )( ω m x l ) + σ, (15) where t k and t i,s k are in between ω m+1 k x l and ωk m xl. By proposition, we can see that t k C, t i,s k C, k = 1,,..., n. (16) Thus, Eq. (1) can be easily obtained from Eq. (14) and Eq. (16). By Eq. (16), we have σ 1 4 (C ) n max 1 l L xl ( ω m + ωs m ) C 3 ωi m. (17) s=1 s i=1 A combination of proposition 1, Eq. (1), Eq. (11), Eq. (14) and Eq. (17) leads to (ω m x l ) (ω m x l ) = = [ (ω m x l ) [ ( ω m x l ) (ωk m x l ) + σ] [ i=1 = 1 η i=1 = ( 1 η + C 3) (ω m x l ) (ωk m x l )x l ] ( ω m ) + LC C 3 (ω m x l ) (ωk m x l )x l i] ωi m + LC C 3 ωi m + C 3 i=1 n i=1 ωi m i=1 i=1 ωi m ωi m ωi m. (18) Here C 3 = LC C 3. This completes the proof. Lemma 7. Suppose that H : R u R is continuous and differentiable on a compact set Ď R and that Ω = { Z Ď H(Z) = } has only finite number of points. If a sequence {Z m } m=1 Ď satisfies lim m Z m+1 Z m = and lim m H(Z m ) =, then there exists a point Z Ω such that lim m Z m = Z. 587

12 Neural Network World 6/17, Proof. Lemma 7 is almost the same as Theorem in [18] and the detail of the proof is omitted The Proof of Theorem 5 is alienated into four parts dealing with statements (i), (ii), (iii) and (iv), respectively. Proof. For convenience, we use the following notations ρ m = i=1 Then, it follows from the error function E(ω) in Eq. (8) that and E ) = 1 O l g E(ω m ) = 1 ( ωi m ). (19) x l ) O l g (ω m x l ) + λ + λ i=1 i=1 f 1/ i ), () f 1/ (ωi m ). (1) Proof to (i) of Theorem 5. By Eq. (1), Eq. (13), Eq. (), Eq. (1) and the Taylor expansion, we have E ) E(ω m ) = 1 O l g = = + λ + 1 i=1 x l ) [f 1/ i ) f 1/ (ωi m )] (ω m x l ) (t l)[ i= λ (t l) i=1 O l g (ω m x l ) (ω m x l ) (ω m x l )] + λ i=1 (ω m x l ) (ωt m x l )x l i ωi m + t=1 t (ω m x l ) [f 1/ i ) f 1/ (ωi m )] [f 1/ i ) f 1/ (ωi m )] (ω m x l ) σ 588

13 Mohamed Kh.Sh., W. Wu, Y. Liu: A modified higher-order feed forward neural... = λ [ i=1 1 η ωm i λ f (ω m i ) f 1/ (ω m i ) (t l) i=1 ] ( ω m i ) + (ω m x l ) [f 1/ i ) f 1/ (ωi m )] = 1 ( ωi m ) + λ η i=1 i=1 + (ω m x l ) σ + 1 δ l(t l) = 1 ( ωi m ) + (ω m x l ) σ η + 1 [ i=1 (t l) ( 1 η λ F (t,m ) ) δ (ω m x l ) σ [ f 1/ i ) f 1/ (ωi m ) f (ωi m) (ω m x l ) ( C + 1 η + C 3 )] n i=1 f(ω m i )1/ ] (ω m x l ) + λ F (t,m ) i=1 ( ωi m ) ( ωi m ), () where t l is in between n (ωm+1 x l ) and n (ωm x l ), t,m R is in between ω m+1 and ω m, M = 6 and F (x) a 3 (f(x))1/. Note that and that F (x) = F (x) = f (x) f(x), (3) f (x) 6 f(x) a. (4) 3 Thus, by using the Lagrangian mean value theorem for f(x) and proposition 3, we have E ) E(ω m ) = [ ( 1 η λ F (t,m )) + ( C 1 η + C 3))] ( ωi m ) ( η λm C C 3) i=1 ( ωi m ) i=1, (5) This completes the proof to statement (i) of Theorem

14 Neural Network World 6/17, Proof to (ii) of Theorem 5. From the conclusion (i), we know that the non-negative sequence {E(ω m )} decreases monotonously. But it is also bounded below. Hence, there must exist E such that lim m E(ω m ) = E. The proof to (ii) is thus completed. Proof to (iii) of Theorem 5. Write β = 1 η λm C C 3 and ρ q = n p i=1 ( ωq i ). It follows from proposition that β >. By using Eq. (5) we get Since E ) >, we have Setting m, we obtain E ) E(ω m ) βρ m E(ω ) β β m ρ q E(ω ) <. q= ρ q 1 β E(ω ) <. q= m ρ q. q= Thus, This leads to and completes the proof. lim ρ m =. m lim m ωm i =, (6) Proof to (iv) of Theorem 5. Note that the error function E(ω) defined in Eq. (8) is continuous and differentiable. According to Eq. (6), proposition 4 and lemma 7, we can easily get the desired result, i.e., there exists a point ω Φ such that This completes the proof to (iv). lim m ωm = ω. 6. Conclusion Some weak and strong convergence results are established for an offline gradient method with smoothing L 1/ regularization for PSNN training. Criterions for choosing the appropriate learning rate and the penalty parameter are given to guarantee the convergence of the learning process. Numerical experiments show that the sparsity and the accuracy of BGSL1/ are better than those of BGL1/ and BGL. 59

15 Mohamed Kh.Sh., W. Wu, Y. Liu: A modified higher-order feed forward neural... Acknowledgements This work is supported by the National Natural Science Foundation of China (Nos and ), Natural Science Foundation Guidance Proect of Liaoning Province (No. 165) and Proect of Dalian Young Technology Stars. References [1] FAN Q., ZURADA J.M., WU W. Convergence of online gradient method for feedforward neural networks with smoothing L 1/ regularization penalty. Neurocomputing. 14, 5(131), pp. 8 16, doi.org/1.116/.neucom [] FRIEDMAN J.H. An overview of predictive learning and function approximation. In: From Statistics to Neural Networks. Springer, Berlin, Heidelberg. 1994, pp. 1 61, org/1.17/ _1 [3] HINTON G. Connectionist learning procedures. Artif Intell. 1989, 4, pp , doi.org/ 1.116/4-37(89)949- [4] HUSSAINA A.J., LIATSISB P. Recurrent pi-sigma networks for DPCM image coding. Neurocomputing. 3, 55(1 ), pp , doi.org/1.116/s95-31()69-x [5] KROGH A., HERTZ J., PALMER R.G. Introduction to the theory of neural computation. Reedwood City CA: Addison-Wesley. 1991, doi.org/1.163/ [6] LE CUN Y., DENKER J., Solla S., Howard R.E, JACKEL L.D., Optimal Brain Damage- Advances in Neural Information Processing Systems II. Morgan Kauffman, San Mafeo, CA [7] LIU Y., LI Z.X., YAN D.K., MOHAMED K.H. S.H., WANG J., Wu W. Convergence of batch gradient learning algorithm with smoothing L 1/ regularization for Sigma Pi Sigma neural networks, Neurocomputing 15, 151, pp , doi.org/1.116/.neucom [8] MACKAY D.J.C., Probable networks and plausible predictions- a review of practical Bayesian methods for supervised neural networks, Network: Computation in Neural Systems, 995, 6(3), pp Available from: 59b6c93553d45aa877ddb6db b.pdf [9] MC LOONE S., IRWIN G. Improving neural network training solutions using regularisation. Neurocomputing. 1 Apr 3; 37(1), pp. 71 9, doi.org/1.116/s95-31()314-3 [1] PIAO J.L.J.X.F., WEN-PING S.C.D. Application of pi-sigma neural network to real-time classification of seafloor sediments. Applied Acoustics. 5, 6, pp. 4. Available from: http: //en.cnki.com.cn/journal_en/a-a5-yysn-5-6.htm [11] SHIN Y., GHOSH J. The pi-sigma network: an efficient higher-order neural network for pattern classification and function approximation. International oint conference on neural networks. 1991, 1, pp , doi:1.119/ijcnn [1] SHIN Y., GHOSH J., SAMANI D. Computationally efficient invariant pattern recognition with higher order Pi-Sigma Networks. The University of Texas at Austin [13] SRIVASTAVA N., HINTON G., KRIZHEVSKY A., SUTSKEVER I., SALAKHUTDI- NOW R. Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res. 14, 15(1), pp Available from: srivastava14a/srivastava14a.pdf [14] WANG C., VENKATESH S.S., JUDD J.S. Optimal stopping and effective machine complexity in learning. In: Advances in neural information processing systems 1994, pp Available from: optimal-stopping-and-effective-machine-complexity-in-learning.pdf [15] WU W., FENG G., LI X. Training multilayer perceptrons via minimization of sum of ridge functions. Advances in Computational Mathematics., 17(4), pp , doi.org/1. 13/A:

16 Neural Network World 6/17, [16] WU W., XU Y. Deterministic convergence of an online gradient method for neural networks. Journal of Computational and Applied Mathematics., 144(1), pp , doi.org/1. 116/S377-47(1)571-4 [17] XU Z.B., ZHANG H., WANG Y., CHANG X.Y., LING Y. L 1/ regularization. Science China Information Sciences. 1, 53(6), pp , doi.org/1.17/s [18] YUAN Y.X., SUN W.Y. Optimization Theory and methods, Science Press, Beiing

CONVERGENCE ANALYSIS OF MULTILAYER FEEDFORWARD NETWORKS TRAINED WITH PENALTY TERMS: A REVIEW

CONVERGENCE ANALYSIS OF MULTILAYER FEEDFORWARD NETWORKS TRAINED WITH PENALTY TERMS: A REVIEW CONVERGENCE ANALYSIS OF MULTILAYER FEEDFORWARD NETWORKS TRAINED WITH PENALTY TERMS: A REVIEW Jian Wang 1, Guoling Yang 1, Shan Liu 1, Jacek M. Zurada 2,3 1 College of Science China University of Petroleum,

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Second-order Learning Algorithm with Squared Penalty Term

Second-order Learning Algorithm with Squared Penalty Term Second-order Learning Algorithm with Squared Penalty Term Kazumi Saito Ryohei Nakano NTT Communication Science Laboratories 2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 69-2 Japan {saito,nakano}@cslab.kecl.ntt.jp

More information

NN V: The generalized delta learning rule

NN V: The generalized delta learning rule NN V: The generalized delta learning rule We now focus on generalizing the delta learning rule for feedforward layered neural networks. The architecture of the two-layer network considered below is shown

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Knowledge

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient

More information

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang C4 Phenomenological Modeling - Regression & Neural Networks 4040-849-03: Computational Modeling and Simulation Instructor: Linwei Wang Recall.. The simple, multiple linear regression function ŷ(x) = a

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Artificial Neural Networks. Edward Gatt

Artificial Neural Networks. Edward Gatt Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

ECE 471/571 - Lecture 17. Types of NN. History. Back Propagation. Recurrent (feedback during operation) Feedforward

ECE 471/571 - Lecture 17. Types of NN. History. Back Propagation. Recurrent (feedback during operation) Feedforward ECE 47/57 - Lecture 7 Back Propagation Types of NN Recurrent (feedback during operation) n Hopfield n Kohonen n Associative memory Feedforward n No feedback during operation or testing (only during determination

More information

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer.

Mark Gales October y (x) x 1. x 2 y (x) Inputs. Outputs. x d. y (x) Second Output layer layer. layer. University of Cambridge Engineering Part IIB & EIST Part II Paper I0: Advanced Pattern Processing Handouts 4 & 5: Multi-Layer Perceptron: Introduction and Training x y (x) Inputs x 2 y (x) 2 Outputs x

More information

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene

More information

Rapid Introduction to Machine Learning/ Deep Learning

Rapid Introduction to Machine Learning/ Deep Learning Rapid Introduction to Machine Learning/ Deep Learning Hyeong In Choi Seoul National University 1/59 Lecture 4a Feedforward neural network October 30, 2015 2/59 Table of contents 1 1. Objectives of Lecture

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Multilayer Perceptron = FeedForward Neural Network

Multilayer Perceptron = FeedForward Neural Network Multilayer Perceptron = FeedForward Neural Networ History Definition Classification = feedforward operation Learning = bacpropagation = local optimization in the space of weights Pattern Classification

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera

Intelligent Control. Module I- Neural Networks Lecture 7 Adaptive Learning Rate. Laxmidhar Behera Intelligent Control Module I- Neural Networks Lecture 7 Adaptive Learning Rate Laxmidhar Behera Department of Electrical Engineering Indian Institute of Technology, Kanpur Recurrent Networks p.1/40 Subjects

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Deep Learning: a gentle introduction

Deep Learning: a gentle introduction Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network

Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network Convergence of Hybrid Algorithm with Adaptive Learning Parameter for Multilayer Neural Network Fadwa DAMAK, Mounir BEN NASR, Mohamed CHTOUROU Department of Electrical Engineering ENIS Sfax, Tunisia {fadwa_damak,

More information

CS:4420 Artificial Intelligence

CS:4420 Artificial Intelligence CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Introduction Goal: Classify objects by learning nonlinearity There are many problems for which linear discriminants are insufficient for minimum error In previous methods, the

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Multilayer Perceptron

Multilayer Perceptron Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Single Perceptron 3 Boolean Function Learning 4

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Neural networks. Chapter 20. Chapter 20 1

Neural networks. Chapter 20. Chapter 20 1 Neural networks Chapter 20 Chapter 20 1 Outline Brains Neural networks Perceptrons Multilayer networks Applications of neural networks Chapter 20 2 Brains 10 11 neurons of > 20 types, 10 14 synapses, 1ms

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Pattern Classification

Pattern Classification Pattern Classification All materials in these slides were taen from Pattern Classification (2nd ed) by R. O. Duda,, P. E. Hart and D. G. Stor, John Wiley & Sons, 2000 with the permission of the authors

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w x + w 2 x 2 + w 0 = 0 Feature x 2 = w w 2 x w 0 w 2 Feature 2 A perceptron can separate

More information

Neural Networks and Deep Learning.

Neural Networks and Deep Learning. Neural Networks and Deep Learning www.cs.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts perceptrons the perceptron training rule linear separability hidden

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

Neural Networks. Xiaojin Zhu Computer Sciences Department University of Wisconsin, Madison. slide 1

Neural Networks. Xiaojin Zhu Computer Sciences Department University of Wisconsin, Madison. slide 1 Neural Networks Xiaoin Zhu erryzhu@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison slide 1 Terminator 2 (1991) JOHN: Can you learn? So you can be... you know. More human. Not

More information

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs)

Multilayer Neural Networks. (sometimes called Multilayer Perceptrons or MLPs) Multilayer Neural Networks (sometimes called Multilayer Perceptrons or MLPs) Linear separability Hyperplane In 2D: w 1 x 1 + w 2 x 2 + w 0 = 0 Feature 1 x 2 = w 1 w 2 x 1 w 0 w 2 Feature 2 A perceptron

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

Lecture 4: Perceptrons and Multilayer Perceptrons

Lecture 4: Perceptrons and Multilayer Perceptrons Lecture 4: Perceptrons and Multilayer Perceptrons Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning Perceptrons, Artificial Neuronal Networks Lecture 4: Perceptrons

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

A Simple Weight Decay Can Improve Generalization

A Simple Weight Decay Can Improve Generalization A Simple Weight Decay Can Improve Generalization Anders Krogh CONNECT, The Niels Bohr Institute Blegdamsvej 17 DK-2100 Copenhagen, Denmark krogh@cse.ucsc.edu John A. Hertz Nordita Blegdamsvej 17 DK-2100

More information

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network LETTER Communicated by Geoffrey Hinton Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network Xiaohui Xie xhx@ai.mit.edu Department of Brain and Cognitive Sciences, Massachusetts

More information

Source localization in an ocean waveguide using supervised machine learning

Source localization in an ocean waveguide using supervised machine learning Source localization in an ocean waveguide using supervised machine learning Haiqiang Niu, Emma Reeves, and Peter Gerstoft Scripps Institution of Oceanography, UC San Diego Part I Localization on Noise09

More information

Logistic Regression & Neural Networks

Logistic Regression & Neural Networks Logistic Regression & Neural Networks CMSC 723 / LING 723 / INST 725 Marine Carpuat Slides credit: Graham Neubig, Jacob Eisenstein Logistic Regression Perceptron & Probabilities What if we want a probability

More information

Multilayer Perceptron Learning Utilizing Singular Regions and Search Pruning

Multilayer Perceptron Learning Utilizing Singular Regions and Search Pruning Multilayer Perceptron Learning Utilizing Singular Regions and Search Pruning Seiya Satoh and Ryohei Nakano Abstract In a search space of a multilayer perceptron having hidden units, MLP(), there exist

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Philipp Koehn 3 October 207 Linear Models We used before weighted linear combination of feature values h j and weights λ j score(λ, d i ) = j λ j h j (d i ) Such models

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

ENTROPIC ANALYSIS AND INCREMENTAL SYNTHESIS OF MULTILAYERED FEEDFORWARD NEURAL NETWORKS

ENTROPIC ANALYSIS AND INCREMENTAL SYNTHESIS OF MULTILAYERED FEEDFORWARD NEURAL NETWORKS International Journal of Neural Systems, Vol. 8, Nos. & 6 (October/December, 997) 67 69 c World Scientific Publishing Company ENTROPIC ANALYSIS AND INCREMENTAL SYNTHESIS OF MULTILAYERED FEEDFORWARD NEURAL

More information

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Neural Networks CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Perceptrons x 0 = 1 x 1 x 2 z = h w T x Output: z x D A perceptron

More information

Artificial Neural Networks

Artificial Neural Networks 0 Artificial Neural Networks Based on Machine Learning, T Mitchell, McGRAW Hill, 1997, ch 4 Acknowledgement: The present slides are an adaptation of slides drawn by T Mitchell PLAN 1 Introduction Connectionist

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

Unit III. A Survey of Neural Network Model

Unit III. A Survey of Neural Network Model Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of

More information

Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

IN neural-network training, the most well-known online

IN neural-network training, the most well-known online IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 1, JANUARY 1999 161 On the Kalman Filtering Method in Neural-Network Training and Pruning John Sum, Chi-sing Leung, Gilbert H. Young, and Wing-kay Kan

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Part 8: Neural Networks

Part 8: Neural Networks METU Informatics Institute Min720 Pattern Classification ith Bio-Medical Applications Part 8: Neural Netors - INTRODUCTION: BIOLOGICAL VS. ARTIFICIAL Biological Neural Netors A Neuron: - A nerve cell as

More information

Feedforward Neural Networks

Feedforward Neural Networks Feedforward Neural Networks Michael Collins 1 Introduction In the previous notes, we introduced an important class of models, log-linear models. In this note, we describe feedforward neural networks, which

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Step lengths in BFGS method for monotone gradients

Step lengths in BFGS method for monotone gradients Noname manuscript No. (will be inserted by the editor) Step lengths in BFGS method for monotone gradients Yunda Dong Received: date / Accepted: date Abstract In this paper, we consider how to directly

More information

Deep Learning & Neural Networks Lecture 4

Deep Learning & Neural Networks Lecture 4 Deep Learning & Neural Networks Lecture 4 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 23, 2014 2/20 3/20 Advanced Topics in Optimization Today we ll briefly

More information

Machine Learning Lecture 10

Machine Learning Lecture 10 Machine Learning Lecture 10 Neural Networks 26.11.2018 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Today s Topic Deep Learning 2 Course Outline Fundamentals Bayes

More information

Optmization Methods for Machine Learning Beyond Perceptron Feed Forward neural networks (FFN)

Optmization Methods for Machine Learning Beyond Perceptron Feed Forward neural networks (FFN) Optmization Methods for Machine Learning Beyond Perceptron Feed Forward neural networks (FFN) Laura Palagi http://www.dis.uniroma1.it/ palagi Dipartimento di Ingegneria informatica automatica e gestionale

More information

Revision: Neural Network

Revision: Neural Network Revision: Neural Network Exercise 1 Tell whether each of the following statements is true or false by checking the appropriate box. Statement True False a) A perceptron is guaranteed to perfectly learn

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Neural Networks (Part 1) Goals for the lecture

Neural Networks (Part 1) Goals for the lecture Neural Networks (Part ) Mark Craven and David Page Computer Sciences 760 Spring 208 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from materials developed

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 6: Multi-Layer Perceptrons I Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 2012 Engineering Part IIB: Module 4F10 Introduction In

More information

A DELAY-DEPENDENT APPROACH TO DESIGN STATE ESTIMATOR FOR DISCRETE STOCHASTIC RECURRENT NEURAL NETWORK WITH INTERVAL TIME-VARYING DELAYS

A DELAY-DEPENDENT APPROACH TO DESIGN STATE ESTIMATOR FOR DISCRETE STOCHASTIC RECURRENT NEURAL NETWORK WITH INTERVAL TIME-VARYING DELAYS ICIC Express Letters ICIC International c 2009 ISSN 1881-80X Volume, Number (A), September 2009 pp. 5 70 A DELAY-DEPENDENT APPROACH TO DESIGN STATE ESTIMATOR FOR DISCRETE STOCHASTIC RECURRENT NEURAL NETWORK

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

CSC 578 Neural Networks and Deep Learning

CSC 578 Neural Networks and Deep Learning CSC 578 Neural Networks and Deep Learning Fall 2018/19 3. Improving Neural Networks (Some figures adapted from NNDL book) 1 Various Approaches to Improve Neural Networks 1. Cost functions Quadratic Cross

More information

Nonlinear Models. Numerical Methods for Deep Learning. Lars Ruthotto. Departments of Mathematics and Computer Science, Emory University.

Nonlinear Models. Numerical Methods for Deep Learning. Lars Ruthotto. Departments of Mathematics and Computer Science, Emory University. Nonlinear Models Numerical Methods for Deep Learning Lars Ruthotto Departments of Mathematics and Computer Science, Emory University Intro 1 Course Overview Intro 2 Course Overview Lecture 1: Linear Models

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Philipp Koehn 4 April 205 Linear Models We used before weighted linear combination of feature values h j and weights λ j score(λ, d i ) = j λ j h j (d i ) Such models can

More information

Neural Networks biological neuron artificial neuron 1

Neural Networks biological neuron artificial neuron 1 Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input

More information

Optimization Methods for Machine Learning Decomposition methods for FFN

Optimization Methods for Machine Learning Decomposition methods for FFN Optimization Methods for Machine Learning Laura Palagi http://www.dis.uniroma1.it/ palagi Dipartimento di Ingegneria informatica automatica e gestionale A. Ruberti Sapienza Università di Roma Via Ariosto

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!! November 18, 2015 THE EXAM IS CLOSED BOOK. Once the exam has started, SORRY, NO TALKING!!! No, you can t even say see ya

More information

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems

Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Stable Adaptive Momentum for Rapid Online Learning in Nonlinear Systems Thore Graepel and Nicol N. Schraudolph Institute of Computational Science ETH Zürich, Switzerland {graepel,schraudo}@inf.ethz.ch

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Neural Networks Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016 Outline Part 1 Introduction Feedforward Neural Networks Stochastic Gradient Descent Computational Graph

More information

y(x n, w) t n 2. (1)

y(x n, w) t n 2. (1) Network training: Training a neural network involves determining the weight parameter vector w that minimizes a cost function. Given a training set comprising a set of input vector {x n }, n = 1,...N,

More information

Deterministic convergence of conjugate gradient method for feedforward neural networks

Deterministic convergence of conjugate gradient method for feedforward neural networks Deterministic convergence of conjugate gradient method for feedforard neural netorks Jian Wang a,b,c, Wei Wu a, Jacek M. Zurada b, a School of Mathematical Sciences, Dalian University of Technology, Dalian,

More information

A Wavelet Neural Network Forecasting Model Based On ARIMA

A Wavelet Neural Network Forecasting Model Based On ARIMA A Wavelet Neural Network Forecasting Model Based On ARIMA Wang Bin*, Hao Wen-ning, Chen Gang, He Deng-chao, Feng Bo PLA University of Science &Technology Nanjing 210007, China e-mail:lgdwangbin@163.com

More information

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation 1 Introduction A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation J Wesley Hines Nuclear Engineering Department The University of Tennessee Knoxville, Tennessee,

More information

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation Master Recherche IAC TC2: Apprentissage Statistique & Optimisation Alexandre Allauzen Anne Auger Michèle Sebag LIMSI LRI Oct. 4th, 2012 This course Bio-inspired algorithms Classical Neural Nets History

More information

Neural networks. Chapter 19, Sections 1 5 1

Neural networks. Chapter 19, Sections 1 5 1 Neural networks Chapter 19, Sections 1 5 Chapter 19, Sections 1 5 1 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural networks Chapter 19, Sections 1 5 2 Brains 10

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Neural Networks Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Brains as Computational Devices Brains advantages with respect to digital computers: Massively parallel Fault-tolerant Reliable

More information

Fast Nonnegative Matrix Factorization with Rank-one ADMM

Fast Nonnegative Matrix Factorization with Rank-one ADMM Fast Nonnegative Matrix Factorization with Rank-one Dongjin Song, David A. Meyer, Martin Renqiang Min, Department of ECE, UCSD, La Jolla, CA, 9093-0409 dosong@ucsd.edu Department of Mathematics, UCSD,

More information

Lecture 3 Feedforward Networks and Backpropagation

Lecture 3 Feedforward Networks and Backpropagation Lecture 3 Feedforward Networks and Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 3, 2017 Things we will look at today Recap of Logistic Regression

More information