An Adaptively Constructing Multilayer Feedforward Neural Networks Using Hermite Polynomials

Size: px
Start display at page:

Download "An Adaptively Constructing Multilayer Feedforward Neural Networks Using Hermite Polynomials"

Transcription

1 An Adaptively Constructing Multilayer Feedforward Neural Networks Using Hermite Polynomials L. Ma 1 and K. Khorasani 2 1 Department of Applied Computer Science, Tokyo Polytechnic University, 1583 Iiyama, Atsugi, Kanagawa, Japan maly@cs.t-kougei.ac.jp 2 Department of Electrical and Computer Engineering, Concordia University, Montreal, Quebec, Canada H3G 1M8 kash@ece.concordia.ca Abstract. In this paper a new strategy is introduced for constructing a multi-hidden-layer feedforward neural network (FNN) where each hidden unit employs a polynomial function for its activation function that is different from other units. The proposed scheme incorporates a structure level adaptation as well as a function level adaptation methodologies in constructing the desired network. The activation functions considered consist of orthonormal Hermite polynomials. Using this strategy, a FNN can be constructed as having as many hidden layers and hidden units as dictated by the complexity of the problem being considered. Keywords: Adaptive structure neural networks; Hermite polynomials; Multilayer feedforward neural networks. 1 Introduction The design of a feedforward neural network (FNN) requires consideration of three important issues. The solutions to these will significantly influence the overall performance of the neural networks ( NN) as far as the following two considerations are concerned: (i) recognition rate to new patterns, and (ii) generalization performance to new data sets that have not been presented during network training. The first problem is the selection of data/patterns for network training. This is a problem that has practical implications and has received limited attention. The training data set selection can have considerable effects on the performance of the trained network. The second problem is the selection of an appropriate and efficient training algorithm from a large number of possible training algorithms that have been developed in the literature. Many new training algorithms This research was supported in part by the NSERC (Natural Science and Engineering Research Council of Canada). D.-S. Huang et al. (Eds.): ICIC 2008, LNAI 5227, pp , c Springer-Verlag Berlin Heidelberg 2008

2 An Adaptively Constructing Multilayer Feedforward Neural Networks 643 with faster convergence properties and less computational requirements are being developed by researchers in the NN community. The third problem is the determination of the network size. This problem is more important from a practical point of view when compared to the above two problems, and is generally more difficult to solve. The problem here is to find a network structure as small as possible to meet certain desired performance specifications. What is usually done in practice is that the developer trains a number of networks with different sizes, and then the smallest network that can fulfill all or most of the required performance requirements is selected. This amounts to a tedious process of trial and errors. The 2nd and 3rd problems are actually closely related to one another in the sense that different training algorithms are suitable for different NN topologies. Therefore, the above three considerations are indeed critical when a NN is to be applied to a real-life problem. This paper focuses on developing a systematic procedure for an automatic determination and/or adaptation of the network architecture for a FNN. Algorithms that can determine an appropriate network architecture automatically according to the complexity of the underlying function embedded in the data set are very cost-efficient, and thus highly desirable. Efforts toward the network size determination have been made in the literature for many years, and many techniques have been developed [3], [6] (and the references therein]. 2 Hermite Polynomials In this section, the Hermite polynomials with their hierarchical nonlinearities are first introduced. These polynomials will be used subsequently as the activation functions of the hidden units of the proposed constructive feedforward neural network (FNN). The orthogonal Hermite polynomials are defined over the interval (, ) of the input space and are given formally as follows [11] (see also the references therein): H 0 (x) =1, (1) H 1 (x) =2x, (2).. H n (x) =2xH n 1 (x) 2(n 1)H n 2 (x), n 2. (3) The definition of H n (x) may be given alternatively by ( H n (x) =( 1) n e x2 dn dx n e x2), n > 0, with H 0 (x) =1. (4) The polynomials given in (1) (3) are orthogonal to each other but not orthonormal. The orthonormal Hermite polynomials may then be defined according to h n (x) =α n H n (x)φ(x), (5)

3 644 L. Ma and K. Khorasani where α n =(n!) 1/2 π 1/4 2 (n 1)/2, (6) φ(x) = 1 e x2 /2. 2π (7) The first-order derivative of h n (x) can be easily obtained by virtue of the recursive nature of the polynomials defined in (3), that is, dh n (x) =(2n) 1/2 h n 1 (x) xh n (x), n 1, (8) dx dh 0 (x) dφ(x) = α 0 dx dx = xh 0(x), n =0. (9) In [11], the orthonormal Hermite polynomials are used as basis functions to model 1-D signals in the biomedical field for the purposes of signal analysis and detection. In [4], a selected combination of the orthonormal Hermite polynomials in a weighted-sum form is used as the fixed activation function of all the hidden units of the OHL-FNN. Hermite coefficients in [8], [9] are used as pre-processing filters, or served as the features of the process, which are then fed to (fuzzy) neural networks for classification problems. A feedforward neural network is designed in [10] that uses Hermite function regression formula to approximate the hidden units activation functions to obtain an improved generalization capability. In this FNN, a fixed number of Hermite polynomials is used. It was indicated earlier that the Hermite polynomials are chosen due to their suitable properties. There are other polynomials that have similar properties, such as, Legendre polynomials, Tchebycheff polynomials and Laguerre polynomials [5]. However, the input range for these polynomials does not meet the requirement for a hidden activation function which should take values ranging from to +. If one uses a polynomial with a limited input range as an activation function for a hidden unit, one should then restrict the input-side weight space in which an optimal input-side weight vector has to be searched for under a given training criterion. The restriction of the input-side weight space is not desirable as it would limit the representational capability of the network. The justification and rationale for choosing the Hermite polynomials are therefore further motivated by their restriction-free input range characteristic. Alternatively, the above polynomials may still be considered as activation functions of the hidden units provided that the input to a hidden unit is normalized each time its input-side weight vector is updated. Although, the weight vector is now unconstrained, and only the input is normalized by a properly selected constant, this constant is actually a function of the weight vector. Therefore, the input to the hidden unit during the input-side training phase will be no longer linear with respect to the weight vector. Consequently, as far as the weight training is concerned the input-side training will now have two types of nonlinearities: one arising from the input to the hidden unit, and the other due to the activation function of the hidden unit. The corresponding optimization problem is clearly more complicated as compared to the one with only the nonlinearity of

4 An Adaptively Constructing Multilayer Feedforward Neural Networks 645 the activation function. It is, therefore, our conclusion that Hermite polynomials are expected to be the most suitable choice for the activation functions of the hidden units of a FNN. 3 The Proposed Incremental Constructive Training Algorithm Consider a typical constructive one-hidden-layer feedforward neural network (OHL-FNN) with a linear output layer and a polynomial-type hidden layer, as shown in Figure 1. The output layer can also be nonlinear, as will be shown in our simulation results. The hidden unit with input and output connections denoted by dotted lines is the present neuron that is being trained before it is allowed to join the other existing hidden units in the network. Suppose, without loss of generality, that a given regression problem has an M-dimensional input vector and a scalar one dimensional output. The j-th input and output samples are denoted by X j =(x j 0,xj 1,,xj M )(xj 0 = 1 denoting the bias) and dj (target), respectively, where j =1, 2,,P (P is the number of training data samples). The OHL constructive algorithm starts from a null network without any hidden units. At any given point during the constructive learning process, suppose there are n 1 hidden units in the hidden layer, and the n-th hidden unit is being trained before it is added to the existing network (see Figure 1). The output of the n-th hidden unit for the j-th training sample is given by ( M ) f n (s j n)=h n 1 w n,m x j m (10) m=0 h 0 h 1 v1 v2 vn-1 h n-2 vn Input-side training h n-1 Output-side training Fig. 1. Structure of a constructive OHL-FNN that utilizes the orthonormal Hermite polynomials as its activation functions for the hidden units

5 646 L. Ma and K. Khorasani where h n ( ) denotes the n-th order Hermite orthonormal polynomial. Its derivative can be calculated recursively from equation (8) and s j n is the input to the n-th hidden unit, given by M s j n = w n,m x j m (11) m=0 where w n,0 is the bias weight of the node with its input x j 0 =1,andw n,m (m 0) is the weight from the m-th component of the input vector to the n-th hidden node. The output of the network may now be expressed as follows n 1 y j n 1 = v 0 + v i h i 1 (s j i ) (12) i=1 = v 0 + v 1 h 0 (s j 1 )+v 2h 1 (s j 2 )+v 3h 2 (s j 3 )+ + v nh n 1 (s j n 1 ). where v 0 is the bias of the output node, and v 1,v 2,,v n are the weights from the hidden layer to the output node. Clearly, the above expression has a very close resemblance to a functional series expansion that utilizes the orthonormal Hermite polynomials as its basis functions, and where each additional term in the expansion contributes to improving the accuracy of the function that is being approximated. Through this series expansion, the approximation error will become smaller as more terms, having higher-order nonlinearities, are included. This situation is actually quite similar to that of an incrementally constructing OHL-FNN, in the sense that the error is expected to become smaller as new hidden units having hierarchically higher-order nonlinearities are successively added to the network. In fact, it has been shown that under certain circumstances incorporating new hidden units in a OHL-FNN can decrease monotonically the training error [7]. Therefore, in this sense adding a new hidden unit to the network is somewhat equivalent to the addition of a higher-order term in a Hermite polynomial-based series expansion. It is in this sense that the present network is envisaged to be more suitable for constructive learning paradigm as compared to a fixed-structure FNNs. Note that, only if s j 1 = sj 2 = = sj n 1,thenyj n 1 will be an exact Hermite polynomial-based series expansion. Otherwise, as structured in the algorithm above the expansion would be only approximate since the weights associated with each added neuron is adjusted separately, resulting in different inputs to the activation functions. The problem addressed now is to determine how to train the n-th hidden unit that is to be added to the active network. There are many objective functions (see for example [2], [6] for more details) that can be considered for the inputside training of this hidden unit. A simple, but a general cost function that has been shown in the literature to work quite well, is as follows [2]: J input = (e j n 1 ē n 1)(f n (s j n ) f n ) (13)

6 An Adaptively Constructing Multilayer Feedforward Neural Networks 647 where f n (s j n )=h n 1(s j n ) (14) ē n 1 = 1 e j n 1 P (15) f n = 1 P h n 1 (s j n), (16) e j n 1 = dj y j n 1 (17) where f n (s j n)(orh n 1 (s j n)) is an orthonormal Hermite polynomial of order n 1 used as the activation function of the n-th hidden unit, e j n 1 is the network output error when the network has n 1 hidden units and d j is the j-th target output to be used for the network training. The derivative of J input with respect to the weight w n,i is calculated by J input w n,i = sgn (e j n 1 ē n 1)(f n (s j n) f n ) (18) (e j n 1 ē n 1)h n 1 (sj n )xj i where sgn( ) is a sign function. The first-order derivative h n 1 (sj n)intheabove expression can be easily evaluated by using the recursive expression (8) for n 2 and (9) for n =1. A candidate unit that maximizes the objective function (13) will be incorporated into the network as the n-th hidden unit. Following this stage, the outputside training is performed by solving a least squared (LS) problem given that the output layer has a linear activation function (see Figure 1). Specifically, once the input-side training is accomplished, the network output y j with n hidden units may now be expressed as follows: y j = n k=0 v k f k (s j k ), j =1, 2,,P (19) where v k, (k =1, 2,,n) are output-side weights of the k-th hidden unit, and v 0 is the bias of the output unit with its input being fixed to f 0 = 1. The corresponding output neuron tracking error is now given by e j = d j y j (20) n = d j v k f k (s j k ), j =1, 2,,P. k=0

7 648 L. Ma and K. Khorasani Subsequently, the output-side training is performed by solving the following LS problem given that the output layer has linear activation function, that is { } 2 J output = 1 (e j ) 2 = 1 n d j v k f k (s j k 2 2 ). After performing the output-side training, a new error signal e j is calculated for the next cycle of input-side and output-side training. Our proposed constructive FNN algorithm may now be summarized according to the following steps: Step 1: Initialization of the network Start the network training process with a OHL-FNN having one hidden unit. Set n =1,e j = d j, and the activation function = h 0 ( ). Step 2: Input-side training for the n-th hidden unit Train only the input-side weights associated with the n-th hidden unit h n 1 ( ). The input-side weights for the existing hidden units, if any, are all frozen. One candidate for the n-th hidden unit with random initial weights is trained using the objective function defined in (13), based on the quickprop algorithm [1]. If instead, for the n-th unit, a pool of candidates are trained, then the one candidate that yields the maximum objective function will be chosen as the n-th hidden unit to be added to the active network. Step 3: Output-side training Train the output-side weights of all the hidden units in the present network. When the output layer has a linear activation function the output-side training may be performed by, for example, invoking the pseudo-inverse operation resulting from the least square optimization criterion as indicated earlier. When the activation function of output layer is nonlinear, the Quasi-Newton algorithm is to be used for training. Step 4: Network performance evaluation and training control Evaluate the network performance by monitoring the metric known as fraction of variance unexplained (FVU) [7] metric on the training data set. The FVU measure is defined according to FVU = k=0 (ĝ(x j ) g(x j )) 2 (21) (g(x j ) ḡ) 2 where g( ) is the function being implemented by the FNN, ĝ( ) isanestimate of g( ) or the output of the trained network, and ḡ is the mean value of g( ). The network training is terminated provided that certain stopping conditions are satisfied. These conditions may be specified formally, for example, in terms of a prespecified FVU threshold oramaximumnumberofpermissible hidden units, among others. If the stopping conditions are not satisfied, then the network output y j and the network output error e j = d j y j are computed, n is increased by 1, i.e., n = n + 1, and we proceed to Step 2.

8 An Adaptively Constructing Multilayer Feedforward Neural Networks 649 It is well known in the constructive OHL-FNNs literature [7] that the determination of a proper initial network size is a problem that needs careful attention. This initialization may significantly influence the effectiveness and efficiency of the network training that follows. However, in our proposed scheme, the network training starts from the smallest possible architecture (that is the null network). This is an important advantage of our proposed algorithm over the similar methods previously presented in the literature. 4 Proposed Multi hidden layer FNN Strategy Construction and design of multi-hidden-layer networks are generally more difficult than those of one-hidden-layer (OHL) networks since one has to determine not only the depth of the network in terms of the number of hidden layers, but also the number of neurons in each layer. In this section, a new strategy for constructing a multi-hidden-layer FNN with regular connections is proposed. The new constructive algorithm has the following features and advantages: (1) As with OHL constructive algorithms, the number of hidden units in a given hidden layer is automatically determined by the algorithm. In addition, the number of hidden layers are also determined automatically by the algorithm. (3) The algorithm adds hidden units and hidden layers one at a time, when and if they are needed. The input-side weights for all the hidden units in the installed hidden layer are fixed (input weights are frozen), however, the inputside weights for a new hidden unit that is being added to the network are updated during the input-side training. Generally, a constructive OHL may need to grow a large number of hidden units, or even could fail to represent a map that actually requires more than a single hidden layer. Therefore, it is expected that our proposed algorithm that incrementally expands its hidden layers and units automatically as a function of the complexity of the problem, would converge very fast and is computationally more efficient when compared to other constructive OHL networks. (4) Incentives are also introduced in the proposed algorithm to control and monitor the hidden layer creation process such that a network with fewer layers are constructed. This will encourage the algorithm to solve problems using smaller networks, reducing the costs of network training and implementation. 4.1 Construction, Design and Training of the Proposed Algorithm For sake of simplicity, and without loss of any generality, suppose one desires to construct a FNN to solve a mapping or a regression problem having a multidimensional input and a single scalar dimensional output. Extension to the case of multi-dimensional output space is quite trivial and straightforward. It is also assumed that the output unit has a linear activation function for dealing with the regression problems, although nonlinear activation function have to be used

9 650 L. Ma and K. Khorasani (a) Initial linear network (b) Creation of the first hidden layer (c) Addition of hidden units to the first hidden layer (d) Creation of the second hidden layer and its units Fig. 2. Multi-hidden-layer network construction process for the proposed strategy using Hermite polynomials for the nonlinear activation function of the neurons for pattern recognition problems. Specifically, the output-side training of the weights would have to be performed by utilizing the Delta rule, instead of an LS solution that is used here when the activation function is selected as a linear function. The proposed constructive algorithm consists of the following steps:

10 An Adaptively Constructing Multilayer Feedforward Neural Networks 651 Step 1: Initially train the network with a single output node and no hidden layers (refer to Figure 2(a)). The training of the weights is achieved according to a LS solution or some gradient-based algorithm, including the bias of the output node. The training error FVU (as described below) is monitored and if it is not smaller than some prespecified required threshold, one would then proceed to Step 2a, otherwise one would go to Step 2c. The performance of a network is measured by the fraction of variance unexplained (FVU) [7] whichisdefinedas FVU = (ĝ(x j ) g(x j )) 2 (22) (g(x j ) ḡ) 2 where g( ) is the function to be implemented by the FNN, ĝ( ) isanestimate of g( ) realized by the network, and ḡ is the mean value of g( ). The FVU is equivalently a ratio of the error variance to the variance of the function being analyzed by the network. Generally, the larger the variance of the function, the more difficult it would be to do the regression analysis. Therefore, the FVU may be viewed as a measure normalized by the complexity of the function. Note that the FVU is proportional to the mean square error (MSE). Furthermore, the function under study is likely to be contaminated by an additive noise ɛ. In this case the signal-to-noise ratio (SNR) is defined by (g(x j ) ḡ) 2 SNR =10log 10 (23) (ɛ j ɛ) 2 where ɛ is the mean value of the additive noise, which is usually assumed to be zero. Step 2a: Treat the output node as a hidden unit with unit output-side weight and zero bias (refer to Figure 2(b)). The hidden unit input-side weights are fixed and are as determined from the previous training step. The output error of the new linear output node will be identical to its hidden unit. Proceed to Step 2b. Step 2b: A new hidden unit is added to the present layer in Step 2a one at a time, just as in a constructive OHL-FNN (refer to Figure 2(c)). Each time a new hidden unit is added, the output error (FVU) is monitored to see if it is smaller than a certain prescribed threshold value. If so, one then proceeds to Step 2c. Otherwise, a new hidden unit is added until the output error FVU is stabilized above the prescribed threshold. Once the error is stabilized, Step 2a is invoked again and a new hidden layer is introduced to possibly further reduce the output error (refer to Figure 2(d)). Steps 2a and 2b are

11 652 L. Ma and K. Khorasani repeated as many times as necessary until the output error FVU is below the prescribed desired threshold, following which one then proceeds to Step 2c. Input-side training of a new hidden unit weights is performed by maximizing a correlation-based objective function, whereas the output-side training of the weights is performed through a LS type algorithm. Step 2c: The constructive training is terminated. For implementing the above algorithm some further specifications are needed. First, a metric stop-ratio is introduced to determine if the training error FVU is stabilized as a function of the added hidden unit, namely stop ratio = FVU(n 1) FVU(n) FVU(n 1) 100% (24) where FVU(n) denotes the training error FVU when the n-th hidden unit is added to a hidden layer of the network. Note that according to [6] addition of a proper hidden unit to an existing network should reduce the training error. The issue that needs addressing here is the magnitude selection of the stop-ratio. At the early stages of the constructive training, this ratio may be large. However, as more hidden units are added to the network the ratio will monotonically decrease. When it has become sufficiently small, say a few percents, the inclusion of further additional hidden units will contribute little to the reduction of the training error, and as a matter of fact may actually start giving rise to an increase in the generalization error, as the unnecessary hidden units may force the network to begin to memorize the data. Towards this end, when the stop-ratio is determined to be smaller than a prespecified desired percentage successively for a given number of times (this is denoted by the metric stop-num ), the algorithm will treat the error FVU as being stabilized, and hence will proceed to the next step to explore the possibility of further reducing the training error by extending a new layer to the network. It should be noted that the stop-ratio and the stop-num are to be generally determined by trial and error, and one may utilize prior experience derived from simulation results to assign them. 5 Conclusions In this paper, a new strategy for constructing a multi-hidden-layer FNN was presented. The proposed algorithm extends the constructive algorithm developed for adding hidden units to a OHL network by generating additional hidden layers. The network obtained by using our proposed algorithm has regular connections to facilitate hardware implementation. The constructed network adds new hidden units and layers incrementally only when they are needed based on monitoring the residual error that can not be reduced any further by the already existing network. We have also proposed a new type of a constructive OHL-FNN that adaptively assigns appropriate orthonormal Hermite polynomials to its generated neurons. The network generally learns as effectively as, but generalizes much better than the conventional constructive OHL networks.

12 An Adaptively Constructing Multilayer Feedforward Neural Networks 653 References 1. Fahlman, S.E.: An Empirical Study of Learning Speed in Back-Propagation Networks. Tech. Rep., CMU-CS , Carnegie Mellon University (1988) 2. Fahlman, S.E., Lebiere, C.: The Cascade-correlation Learning Architecture. Tech. Rep., CMU-CS , Carnegie Mellon University (1991) 3. Hush, D.R., Horne, B.G.: Progress in Supervised Neural Networks. IEEE Signal Processing Magazine, 8 39 (January 1993) 4. Hwang, J.N., Lay, S.R., Maechler, M., Martin, R.D., Schimert, J.: Regression Modeling in Back-propagation and Projection Pursuit Learning. IEEE Trans. on Neural Networks 5, (1994) 5. Korn, G.A., Korn, T.M.: Mathematical Handbook for Scientists and Engineers, 2nd edn. McGraw-Hill, New York (1968) 6. Kwok, T.Y., Yeung, D.Y.: Constructive Algorithms for Structure Learning in Feedforward Neural Networks for Regression Problems. IEEE Trans. on Neural Networks 8(3), (1997) 7. Kwok, T.Y., Yeung, D.Y.: Objective Functions for Training New Hidden Units in Constructive Neural Networks. IEEE Trans. on Neural Networks 8(5), (1997) 8. Lau, B., Chao, T.H.: Aided Target Recognition Processing of Mudss Sonar Data. Proceedings of the SPIE 3392, (1998) 9. Linh, T.H., Osowski, S., Stodolski, M.: On-line Heart Beat Recognition Using Hermite Polynomials and Neuro-fuzzy Network. In: IMTC/2002. Proceedings of the 19th IEEE Instrumentation and Measurement Technology Conference, Anchorage, AK, USA, vol. 1, pp (2002) 10. Pilato, G., Sorbello, F., Vassallo, G.: Using the Hermite Regression Algorithm to Improve the Generalization Capability of a Neural Network. In: Neural Nets WIRN Vietri Proceedings of the 11th Italian Workshop on Neural Nets, Salerno, Italy, pp (1999) 11. Rasiah, A.I., Togneri, R., Attikiouzel, Y.: Modeling 1-d Signals Using Hermite Basis Functions. IEE Proc. -Vis. Image Signal Process 144(6), (1997)

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Introduction Goal: Classify objects by learning nonlinearity There are many problems for which linear discriminants are insufficient for minimum error In previous methods, the

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient

More information

Pattern Classification

Pattern Classification Pattern Classification All materials in these slides were taen from Pattern Classification (2nd ed) by R. O. Duda,, P. E. Hart and D. G. Stor, John Wiley & Sons, 2000 with the permission of the authors

More information

Cascade Neural Networks with Node-Decoupled Extended Kalman Filtering

Cascade Neural Networks with Node-Decoupled Extended Kalman Filtering Cascade Neural Networks with Node-Decoupled Extended Kalman Filtering Michael C. Nechyba and Yangsheng Xu The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 Abstract Most neural networks

More information

Neural Network Approach to Control System Identification with Variable Activation Functions

Neural Network Approach to Control System Identification with Variable Activation Functions Neural Network Approach to Control System Identification with Variable Activation Functions Michael C. Nechyba and Yangsheng Xu The Robotics Institute Carnegie Mellon University Pittsburgh, PA 52 Abstract

More information

Integer weight training by differential evolution algorithms

Integer weight training by differential evolution algorithms Integer weight training by differential evolution algorithms V.P. Plagianakos, D.G. Sotiropoulos, and M.N. Vrahatis University of Patras, Department of Mathematics, GR-265 00, Patras, Greece. e-mail: vpp

More information

Neural Networks and the Back-propagation Algorithm

Neural Networks and the Back-propagation Algorithm Neural Networks and the Back-propagation Algorithm Francisco S. Melo In these notes, we provide a brief overview of the main concepts concerning neural networks and the back-propagation algorithm. We closely

More information

Approximation of Functions by Multivariable Hermite Basis: A Hybrid Method

Approximation of Functions by Multivariable Hermite Basis: A Hybrid Method Approximation of Functions by Multivariable Hermite Basis: A Hybrid Method Bartlomiej Beliczynski Warsaw University of Technology, Institute of Control and Industrial Electronics, ul. Koszykowa 75, -66

More information

Blind Equalization Formulated as a Self-organized Learning Process

Blind Equalization Formulated as a Self-organized Learning Process Blind Equalization Formulated as a Self-organized Learning Process Simon Haykin Communications Research Laboratory McMaster University 1280 Main Street West Hamilton, Ontario, Canada L8S 4K1 Abstract In

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann ECLT 5810 Classification Neural Networks Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann Neural Networks A neural network is a set of connected input/output

More information

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks Topics in Machine Learning-EE 5359 Neural Networks 1 The Perceptron Output: A perceptron is a function that maps D-dimensional vectors to real numbers. For notational convenience, we add a zero-th dimension

More information

AI Programming CS F-20 Neural Networks

AI Programming CS F-20 Neural Networks AI Programming CS662-2008F-20 Neural Networks David Galles Department of Computer Science University of San Francisco 20-0: Symbolic AI Most of this class has been focused on Symbolic AI Focus or symbols

More information

Multilayer Perceptron = FeedForward Neural Network

Multilayer Perceptron = FeedForward Neural Network Multilayer Perceptron = FeedForward Neural Networ History Definition Classification = feedforward operation Learning = bacpropagation = local optimization in the space of weights Pattern Classification

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Sparse Support Vector Machines by Kernel Discriminant Analysis

Sparse Support Vector Machines by Kernel Discriminant Analysis Sparse Support Vector Machines by Kernel Discriminant Analysis Kazuki Iwamura and Shigeo Abe Kobe University - Graduate School of Engineering Kobe, Japan Abstract. We discuss sparse support vector machines

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

Second-order Learning Algorithm with Squared Penalty Term

Second-order Learning Algorithm with Squared Penalty Term Second-order Learning Algorithm with Squared Penalty Term Kazumi Saito Ryohei Nakano NTT Communication Science Laboratories 2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 69-2 Japan {saito,nakano}@cslab.kecl.ntt.jp

More information

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja

DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION. Alexandre Iline, Harri Valpola and Erkki Oja DETECTING PROCESS STATE CHANGES BY NONLINEAR BLIND SOURCE SEPARATION Alexandre Iline, Harri Valpola and Erkki Oja Laboratory of Computer and Information Science Helsinki University of Technology P.O.Box

More information

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska NEURAL NETWORKS Learning Neural Networks Classifier Short Presentation INPUT: classification data, i.e. it contains an classification (class) attribute.

More information

Nonlinear System Identification Based on a Novel Adaptive Fuzzy Wavelet Neural Network

Nonlinear System Identification Based on a Novel Adaptive Fuzzy Wavelet Neural Network Nonlinear System Identification Based on a Novel Adaptive Fuzzy Wavelet Neural Network Maryam Salimifard*, Ali Akbar Safavi** *School of Electrical Engineering, Amirkabir University of Technology, Tehran,

More information

In the Name of God. Lectures 15&16: Radial Basis Function Networks

In the Name of God. Lectures 15&16: Radial Basis Function Networks 1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training

More information

NN V: The generalized delta learning rule

NN V: The generalized delta learning rule NN V: The generalized delta learning rule We now focus on generalizing the delta learning rule for feedforward layered neural networks. The architecture of the two-layer network considered below is shown

More information

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore

Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Pattern Recognition Prof. P. S. Sastry Department of Electronics and Communication Engineering Indian Institute of Science, Bangalore Lecture - 27 Multilayer Feedforward Neural networks with Sigmoidal

More information

A Neuro-Fuzzy Scheme for Integrated Input Fuzzy Set Selection and Optimal Fuzzy Rule Generation for Classification

A Neuro-Fuzzy Scheme for Integrated Input Fuzzy Set Selection and Optimal Fuzzy Rule Generation for Classification A Neuro-Fuzzy Scheme for Integrated Input Fuzzy Set Selection and Optimal Fuzzy Rule Generation for Classification Santanu Sen 1 and Tandra Pal 2 1 Tejas Networks India Ltd., Bangalore - 560078, India

More information

VC-dimension of a context-dependent perceptron

VC-dimension of a context-dependent perceptron 1 VC-dimension of a context-dependent perceptron Piotr Ciskowski Institute of Engineering Cybernetics, Wroc law University of Technology, Wybrzeże Wyspiańskiego 27, 50 370 Wroc law, Poland cis@vectra.ita.pwr.wroc.pl

More information

FEEDBACK GMDH-TYPE NEURAL NETWORK AND ITS APPLICATION TO MEDICAL IMAGE ANALYSIS OF LIVER CANCER. Tadashi Kondo and Junji Ueno

FEEDBACK GMDH-TYPE NEURAL NETWORK AND ITS APPLICATION TO MEDICAL IMAGE ANALYSIS OF LIVER CANCER. Tadashi Kondo and Junji Ueno International Journal of Innovative Computing, Information and Control ICIC International c 2012 ISSN 1349-4198 Volume 8, Number 3(B), March 2012 pp. 2285 2300 FEEDBACK GMDH-TYPE NEURAL NETWORK AND ITS

More information

An artificial neural networks (ANNs) model is a functional abstraction of the

An artificial neural networks (ANNs) model is a functional abstraction of the CHAPER 3 3. Introduction An artificial neural networs (ANNs) model is a functional abstraction of the biological neural structures of the central nervous system. hey are composed of many simple and highly

More information

Neural Network to Control Output of Hidden Node According to Input Patterns

Neural Network to Control Output of Hidden Node According to Input Patterns American Journal of Intelligent Systems 24, 4(5): 96-23 DOI:.5923/j.ajis.2445.2 Neural Network to Control Output of Hidden Node According to Input Patterns Takafumi Sasakawa, Jun Sawamoto 2,*, Hidekazu

More information

Introduction to Machine Learning Spring 2018 Note Neural Networks

Introduction to Machine Learning Spring 2018 Note Neural Networks CS 189 Introduction to Machine Learning Spring 2018 Note 14 1 Neural Networks Neural networks are a class of compositional function approximators. They come in a variety of shapes and sizes. In this class,

More information

Learning and Memory in Neural Networks

Learning and Memory in Neural Networks Learning and Memory in Neural Networks Guy Billings, Neuroinformatics Doctoral Training Centre, The School of Informatics, The University of Edinburgh, UK. Neural networks consist of computational units

More information

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4 Neural Networks Learning the network: Backprop 11-785, Fall 2018 Lecture 4 1 Recap: The MLP can represent any function The MLP can be constructed to represent anything But how do we construct it? 2 Recap:

More information

Unit III. A Survey of Neural Network Model

Unit III. A Survey of Neural Network Model Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of

More information

Notes on Back Propagation in 4 Lines

Notes on Back Propagation in 4 Lines Notes on Back Propagation in 4 Lines Lili Mou moull12@sei.pku.edu.cn March, 2015 Congratulations! You are reading the clearest explanation of forward and backward propagation I have ever seen. In this

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

Introduction to Neural Networks: Structure and Training

Introduction to Neural Networks: Structure and Training Introduction to Neural Networks: Structure and Training Professor Q.J. Zhang Department of Electronics Carleton University, Ottawa, Canada www.doe.carleton.ca/~qjz, qjz@doe.carleton.ca A Quick Illustration

More information

y(n) Time Series Data

y(n) Time Series Data Recurrent SOM with Local Linear Models in Time Series Prediction Timo Koskela, Markus Varsta, Jukka Heikkonen, and Kimmo Kaski Helsinki University of Technology Laboratory of Computational Engineering

More information

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design

More information

488 Solutions to the XOR Problem

488 Solutions to the XOR Problem 488 Solutions to the XOR Problem Frans M. Coetzee * eoetzee@eee.emu.edu Department of Electrical Engineering Carnegie Mellon University Pittsburgh, PA 15213 Virginia L. Stonick ginny@eee.emu.edu Department

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Identification of Nonlinear Systems Using Neural Networks and Polynomial Models

Identification of Nonlinear Systems Using Neural Networks and Polynomial Models A.Janczak Identification of Nonlinear Systems Using Neural Networks and Polynomial Models A Block-Oriented Approach With 79 Figures and 22 Tables Springer Contents Symbols and notation XI 1 Introduction

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition Radial Basis Function Networks Ravi Kaushik Project 1 CSC 84010 Neural Networks and Pattern Recognition History Radial Basis Function (RBF) emerged in late 1980 s as a variant of artificial neural network.

More information

Keywords- Source coding, Huffman encoding, Artificial neural network, Multilayer perceptron, Backpropagation algorithm

Keywords- Source coding, Huffman encoding, Artificial neural network, Multilayer perceptron, Backpropagation algorithm Volume 4, Issue 5, May 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Huffman Encoding

More information

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Neural Networks Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Brains as Computational Devices Brains advantages with respect to digital computers: Massively parallel Fault-tolerant Reliable

More information

Chapter 3 Supervised learning:

Chapter 3 Supervised learning: Chapter 3 Supervised learning: Multilayer Networks I Backpropagation Learning Architecture: Feedforward network of at least one layer of non-linear hidden nodes, e.g., # of layers L 2 (not counting the

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

Novel determination of dierential-equation solutions: universal approximation method

Novel determination of dierential-equation solutions: universal approximation method Journal of Computational and Applied Mathematics 146 (2002) 443 457 www.elsevier.com/locate/cam Novel determination of dierential-equation solutions: universal approximation method Thananchai Leephakpreeda

More information

Multilayer Perceptrons (MLPs)

Multilayer Perceptrons (MLPs) CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1

More information

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha

Neural Networks for Protein Structure Prediction Brown, JMB CS 466 Saurabh Sinha Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha Outline Goal is to predict secondary structure of a protein from its sequence Artificial Neural Network used for this

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Artificial Neural Networks

Artificial Neural Networks Introduction ANN in Action Final Observations Application: Poverty Detection Artificial Neural Networks Alvaro J. Riascos Villegas University of los Andes and Quantil July 6 2018 Artificial Neural Networks

More information

MLPR: Logistic Regression and Neural Networks

MLPR: Logistic Regression and Neural Networks MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition Amos Storkey Amos Storkey MLPR: Logistic Regression and Neural Networks 1/28 Outline 1 Logistic Regression 2 Multi-layer

More information

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap.

Outline. MLPR: Logistic Regression and Neural Networks Machine Learning and Pattern Recognition. Which is the correct model? Recap. Outline MLPR: and Neural Networks Machine Learning and Pattern Recognition 2 Amos Storkey Amos Storkey MLPR: and Neural Networks /28 Recap Amos Storkey MLPR: and Neural Networks 2/28 Which is the correct

More information

On Learning µ-perceptron Networks with Binary Weights

On Learning µ-perceptron Networks with Binary Weights On Learning µ-perceptron Networks with Binary Weights Mostefa Golea Ottawa-Carleton Institute for Physics University of Ottawa Ottawa, Ont., Canada K1N 6N5 050287@acadvm1.uottawa.ca Mario Marchand Ottawa-Carleton

More information

Reading Group on Deep Learning Session 1

Reading Group on Deep Learning Session 1 Reading Group on Deep Learning Session 1 Stephane Lathuiliere & Pablo Mesejo 2 June 2016 1/31 Contents Introduction to Artificial Neural Networks to understand, and to be able to efficiently use, the popular

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

Ch.6 Deep Feedforward Networks (2/3)

Ch.6 Deep Feedforward Networks (2/3) Ch.6 Deep Feedforward Networks (2/3) 16. 10. 17. (Mon.) System Software Lab., Dept. of Mechanical & Information Eng. Woonggy Kim 1 Contents 6.3. Hidden Units 6.3.1. Rectified Linear Units and Their Generalizations

More information

EEE 241: Linear Systems

EEE 241: Linear Systems EEE 4: Linear Systems Summary # 3: Introduction to artificial neural networks DISTRIBUTED REPRESENTATION An ANN consists of simple processing units communicating with each other. The basic elements of

More information

Computational Graphs, and Backpropagation

Computational Graphs, and Backpropagation Chapter 1 Computational Graphs, and Backpropagation (Course notes for NLP by Michael Collins, Columbia University) 1.1 Introduction We now describe the backpropagation algorithm for calculation of derivatives

More information

Automatic Structure and Parameter Training Methods for Modeling of Mechanical System by Recurrent Neural Networks

Automatic Structure and Parameter Training Methods for Modeling of Mechanical System by Recurrent Neural Networks Automatic Structure and Parameter Training Methods for Modeling of Mechanical System by Recurrent Neural Networks C. James Li and Tung-Yung Huang Department of Mechanical Engineering, Aeronautical Engineering

More information

CSC321 Lecture 2: Linear Regression

CSC321 Lecture 2: Linear Regression CSC32 Lecture 2: Linear Regression Roger Grosse Roger Grosse CSC32 Lecture 2: Linear Regression / 26 Overview First learning algorithm of the course: linear regression Task: predict scalar-valued targets,

More information

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units

100 inference steps doesn't seem like enough. Many neuron-like threshold switching units. Many weighted interconnections among units Connectionist Models Consider humans: Neuron switching time ~ :001 second Number of neurons ~ 10 10 Connections per neuron ~ 10 4 5 Scene recognition time ~ :1 second 100 inference steps doesn't seem like

More information

Recursive Least Squares for an Entropy Regularized MSE Cost Function

Recursive Least Squares for an Entropy Regularized MSE Cost Function Recursive Least Squares for an Entropy Regularized MSE Cost Function Deniz Erdogmus, Yadunandana N. Rao, Jose C. Principe Oscar Fontenla-Romero, Amparo Alonso-Betanzos Electrical Eng. Dept., University

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

Classifier s Complexity Control while Training Multilayer Perceptrons

Classifier s Complexity Control while Training Multilayer Perceptrons Classifier s Complexity Control while Training Multilayer Perceptrons âdu QDVRaudys Institute of Mathematics and Informatics Akademijos 4, Vilnius 2600, Lithuania e-mail: raudys@das.mii.lt Abstract. We

More information

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Artificial Neural Networks and Nonparametric Methods CMPSCI 383 Nov 17, 2011! Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error

More information

RegML 2018 Class 8 Deep learning

RegML 2018 Class 8 Deep learning RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

Using the Delta Test for Variable Selection

Using the Delta Test for Variable Selection Using the Delta Test for Variable Selection Emil Eirola 1, Elia Liitiäinen 1, Amaury Lendasse 1, Francesco Corona 1, and Michel Verleysen 2 1 Helsinki University of Technology Lab. of Computer and Information

More information

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation 1 Introduction A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation J Wesley Hines Nuclear Engineering Department The University of Tennessee Knoxville, Tennessee,

More information

Neural Networks Lecture 3:Multi-Layer Perceptron

Neural Networks Lecture 3:Multi-Layer Perceptron Neural Networks Lecture 3:Multi-Layer Perceptron H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011 H. A. Talebi, Farzaneh Abdollahi Neural

More information

Optimal In-Place Self-Organization for Cortical Development: Limited Cells, Sparse Coding and Cortical Topography

Optimal In-Place Self-Organization for Cortical Development: Limited Cells, Sparse Coding and Cortical Topography Optimal In-Place Self-Organization for Cortical Development: Limited Cells, Sparse Coding and Cortical Topography Juyang Weng and Matthew D. Luciw Department of Computer Science and Engineering Michigan

More information

Unit 8: Introduction to neural networks. Perceptrons

Unit 8: Introduction to neural networks. Perceptrons Unit 8: Introduction to neural networks. Perceptrons D. Balbontín Noval F. J. Martín Mateos J. L. Ruiz Reina A. Riscos Núñez Departamento de Ciencias de la Computación e Inteligencia Artificial Universidad

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption

Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption Application of Artificial Neural Networks in Evaluation and Identification of Electrical Loss in Transformers According to the Energy Consumption ANDRÉ NUNES DE SOUZA, JOSÉ ALFREDO C. ULSON, IVAN NUNES

More information

EM-algorithm for Training of State-space Models with Application to Time Series Prediction

EM-algorithm for Training of State-space Models with Application to Time Series Prediction EM-algorithm for Training of State-space Models with Application to Time Series Prediction Elia Liitiäinen, Nima Reyhani and Amaury Lendasse Helsinki University of Technology - Neural Networks Research

More information

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer CSE446: Neural Networks Spring 2017 Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer Human Neurons Switching time ~ 0.001 second Number of neurons 10 10 Connections per neuron 10 4-5 Scene

More information

Artificial Neural Networks

Artificial Neural Networks Artificial Neural Networks 鮑興國 Ph.D. National Taiwan University of Science and Technology Outline Perceptrons Gradient descent Multi-layer networks Backpropagation Hidden layer representations Examples

More information

Multilayer Perceptrons and Backpropagation

Multilayer Perceptrons and Backpropagation Multilayer Perceptrons and Backpropagation Informatics 1 CG: Lecture 7 Chris Lucas School of Informatics University of Edinburgh January 31, 2017 (Slides adapted from Mirella Lapata s.) 1 / 33 Reading:

More information

CS:4420 Artificial Intelligence

CS:4420 Artificial Intelligence CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

A Novel Activity Detection Method

A Novel Activity Detection Method A Novel Activity Detection Method Gismy George P.G. Student, Department of ECE, Ilahia College of,muvattupuzha, Kerala, India ABSTRACT: This paper presents an approach for activity state recognition of

More information

Chapter 2 Single Layer Feedforward Networks

Chapter 2 Single Layer Feedforward Networks Chapter 2 Single Layer Feedforward Networks By Rosenblatt (1962) Perceptrons For modeling visual perception (retina) A feedforward network of three layers of units: Sensory, Association, and Response Learning

More information

Lecture 5: Logistic Regression. Neural Networks

Lecture 5: Logistic Regression. Neural Networks Lecture 5: Logistic Regression. Neural Networks Logistic regression Comparison with generative models Feed-forward neural networks Backpropagation Tricks for training neural networks COMP-652, Lecture

More information

Frequency Selective Surface Design Based on Iterative Inversion of Neural Networks

Frequency Selective Surface Design Based on Iterative Inversion of Neural Networks J.N. Hwang, J.J. Choi, S. Oh, R.J. Marks II, "Query learning based on boundary search and gradient computation of trained multilayer perceptrons", Proceedings of the International Joint Conference on Neural

More information

Deep Learning and Information Theory

Deep Learning and Information Theory Deep Learning and Information Theory Bhumesh Kumar (13D070060) Alankar Kotwal (12D070010) November 21, 2016 Abstract T he machine learning revolution has recently led to the development of a new flurry

More information

Intelligent Modular Neural Network for Dynamic System Parameter Estimation

Intelligent Modular Neural Network for Dynamic System Parameter Estimation Intelligent Modular Neural Network for Dynamic System Parameter Estimation Andrzej Materka Technical University of Lodz, Institute of Electronics Stefanowskiego 18, 9-537 Lodz, Poland Abstract: A technique

More information

Linear Least-Squares Based Methods for Neural Networks Learning

Linear Least-Squares Based Methods for Neural Networks Learning Linear Least-Squares Based Methods for Neural Networks Learning Oscar Fontenla-Romero 1, Deniz Erdogmus 2, JC Principe 2, Amparo Alonso-Betanzos 1, and Enrique Castillo 3 1 Laboratory for Research and

More information

Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation

Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation Cognitive Science 30 (2006) 725 731 Copyright 2006 Cognitive Science Society, Inc. All rights reserved. Unsupervised Discovery of Nonlinear Structure Using Contrastive Backpropagation Geoffrey Hinton,

More information

GMDH-type Neural Networks with a Feedback Loop and their Application to the Identification of Large-spatial Air Pollution Patterns.

GMDH-type Neural Networks with a Feedback Loop and their Application to the Identification of Large-spatial Air Pollution Patterns. GMDH-type Neural Networks with a Feedback Loop and their Application to the Identification of Large-spatial Air Pollution Patterns. Tadashi Kondo 1 and Abhijit S.Pandya 2 1 School of Medical Sci.,The Univ.of

More information

Artificial Neural Networks (ANN)

Artificial Neural Networks (ANN) Artificial Neural Networks (ANN) Edmondo Trentin April 17, 2013 ANN: Definition The definition of ANN is given in 3.1 points. Indeed, an ANN is a machine that is completely specified once we define its:

More information

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat Neural Networks, Computation Graphs CMSC 470 Marine Carpuat Binary Classification with a Multi-layer Perceptron φ A = 1 φ site = 1 φ located = 1 φ Maizuru = 1 φ, = 2 φ in = 1 φ Kyoto = 1 φ priest = 0 φ

More information