Nonlinear system modeling with deep neural networks and autoencoders algorithm

Size: px
Start display at page:

Download "Nonlinear system modeling with deep neural networks and autoencoders algorithm"

Transcription

1 Nonlinear system modeling with deep neural networks and autoencoders algorithm Erick De la Rosa, Wen Yu Departamento de Control Automatico CINVESTAV-IPN Mexico City, Mexico Xiaoou Li Departamento de Computacion CINVESTAV-IPN Mexico City, Mexico Abstract Deep learning techniques have been successfully used for pattern classification. These advantage methods are still not applied in nonlinear systems identification. In this paper, the neural model has deep architecture which is obtained by a random search method. The initial weights of this deep neural model is obtained from the denoising autoencoders model. We propose special unsupervised learning methods for this deep learning model with input data. The normal supervised learning is used to train the weights with the output data. The deep learning identification algorithms are validated with three benchmark examples. I. INTRODUCTION System identification with neural networks falls in two tasks: structure identification and parameter identification. The structure identification often uses trial-and-error approaches [1]. However, these algorithms do not improve significantly the identification accuracy, because they only try to find the hidden neuron number and do not deal with the hidden layer number. In this paper, we use deep structure for the neural model, we use less hidden neurons in each hidden layer while increasing the number of hidden layers. This strategy does not increase the complexity of the neural model but improve its generalization capacity. The parameter identification is usually addressed by some gradient descent variants. They may converge very slowly, and usually present the local minima problem. Since the identification error space is unknown, the neural model can settle down in a local minima easily if the initial weights of the neural model are not suitable. There are some techniques to overcome the local minima in the error surface and settle down the neural model near the global minimum, such as noise-shaping modification [2] and nonlinear clustering [3]. They do not solve the basic problem of the local minima: poor initial weights. In [4], the initial weights of a recurrent neural network are calculated by the sensitivity ratio analysis, in [5], the initial weights are obtained by finding the support vectors of the input data. [6] has shown that deep learning has some capabilities to avoid local minima. In this paper, we use deep learning to find the best initial weights for the neural model. The deep structure of a neural network usually requires three or more hidden layers [7] and the depth of a neural model is its hidden layer number. A deep neural network has the same structure as a MLP. The function approximation and the learning procedure also do not change. However, the deep neural network usually need fewer parameters (weights) than a MLP [8]. On the other hand, increasing the hidden nodes number causes exponential increasing on the number of model parameters, and also requires more training examples [9][10]. There does not exist a method to find the optimal structure of a deep model. [11] proposes some effective algorithms to find suitable hidden layer number and neuron number in each layer. These methods can be classified into two categories: 1) Grid search, it applies the learning algorithm over all possible combination of the structure hyperparameters; 2) Random search, it only use a few combinations by some sampling rules. [12] has proven that the random search can obtain a similar structure complexity as the grid search for deep neural models, while the computation cost is much less than the grid search. In this paper we use the random search to find the structure of the deep neural model. The denoising autoencoders method [13] and the restricted Boltzmann machines [14] are main deep learning methods. The autoencoders method encodes the input and undo the effect of an input corruption process. It needs to add a stochastic corruption on the input. The restricted Boltzmann machines use energy-based learning models. Both of them are unsupervised learning methods. The results of [15] show that the unsupervised pretraining can drive the neural model away from the local minima for the classification problems. Time series forecasting can also use deep learning techniques [16]. The output of the predictive model is current values. While the input is previous output. In [17], the denoising autoencoder is used as pre-training stage. The prediction results are better than the methods without deep learning pretraining. [18] uses RBM approach as pre-training. However, the hidden and visible units of the RBM model are binary. The prediction results for continuous values are not so good. [19] points out denoising autoencoder method may not improve prediction results if the input from the time series is not sufficiently large. All of above RBM based time series forecasting methods use binary probability estimations. The prediction results are not satisfied for continuous time series. Deep learning methods cannot be applied to system identification directly, because the input/output values are nonbinary as classification problem, for example the conditional /16/$31.00 c 2016 IEEE

2 probability transformation in the original restricted Boltzmann machines needs binary values [6]. In order to handle gray level pixels, [7] uses an integral instead of a sum to calculate the conditional probability. To the best of our knowledge, there are no results that apply deep learning for nonlinear system identification. In this paper, we use these two deep learning methods to input data sets to obtain the initial weights of deep neural models. In this paper, we extend the idea of [7] to non-positive values and [0, ) such that system identification works. A deep neural model is first constructed for unknown nonlinear system. Then we use the autoencoders model to design unsupervised learning with input data. The structures of the autoencoders model is the deep neural model. We use the weights trained by the deep learning methods as the initial weights of the deep neural model. The gradient descent supervise learning method is applied to train the weights of the deep neural model. Finally, two benchmark examples are use to show the effectiveness of our deep learning methods for nonlinear system identification. II. NONLINEAR SYSTEM MODELING WITH DEEP NEURAL NETWORKS Consider the following unknown discrete-time nonlinear system x(k +1) = f [ x(k),u(k)], y(k) = g[ x(k)] (1) where u(k) R u is the input vector, x(k) R x is an internal state vector, and y(k) R m is the output vector. f and g are general nonlinear smooth functions f,g C. Let us now recall the following definitions. Denoting Y(k) = [ y T (k),y T (k +1), y T (k +n 1) ] T, U(k) = [ u T (k),u T (k +1), u T (k +n 2) ] T. If Y x is non-singular at x = 0, U = 0, this leads to the NARMA model y(k) = Φ[x(k)] (2) where x(k) = [y T (k 1),y T (k 2), u T (k),u T (k 1), ] T Φ( ) is an unknown nonlinear difference equation representing the plant dynamics,u(k) andy(k) are measurable scalar input and output, d is time delay. The nonlinear system (2) is a NARMA model. We can also regard the input of the nonlinear system asx(k) = [x 1 x n ] T R n, the output asy(k) R m Now we use the following multilayer neural network to identify the unknown nonlinear system (2). φ p = [ 1 m ]. The other layers use sigmoid functions as ) φ i (ω j ) = α i / (1+e βt i ωj γ i where i = 1,,p 1, j = 1,,l i, α i, β i, and γ i are prior defined positive constants, ω j are the input variables to the sigmoid functions. From the Stone-Weierstrass theorem, we know that if the node number of one hidden layer neural network is large enough, the neural model can approximate the nonlinear function Φ to any degree of accuracy for all x(k). Instead of increasing the node number l i, in this paper it is increased the layer number p. We use deep structure, i.e., p 3, for the multilayer neural model (3), such that we can use some existing deep learning technique for system identification. The goal of the neural identification is to find a suitable structure (layer number p, node number in each layer l i ) and the weights (W 1 W p ) such that the neuro identification error e(k) = ŷ(k) y(k) (4) is minimized. [9] has proven that the random search can obtain similar model structure as the grid search for the deep neural model (p 3). In this paper, it is used the random search to find the hidden layer number p, and neuron number in each layer l i (i = 1 p). The next job is to find suitable weights (W 1 W p ). Many supervised learning techniques, such as gradient descent and Hessian, can be applied to train these weights. The gradient descent method or its modification versions can always arrive the local minima with fast or low speed. This local minima completely depends on the initial conditions of the weights W 1 W p. The identification model structure using the deep learning techniques is shown in Fig. 1. In the following sections, it is shown how to use deep learning techniques to find the initial weights W 1 (0) W p (0), and how to identify the nonlinear systems. III. AUTOENCODERS METHOD FOR SYSTEM IDENTIFICATION Although the autoencoders technique is designed for classification and de-noising, in this paper we modify it for system identification. Consider the input x(k) related to the system (2), it is first mapped to a hidden representation h 1 (k) by an encoder φ 1. In this paper, we use the same weight and nonlinear active function as the identification model (3), h 1 (k) = φ 1 [W 1 x(k)+b 1 ] (5) Then the hidden representation or code h 1 (k) is mapped back ŷ(k) = φ p (W p φ p 1...W 3 φ 2 {W 2 φ 1 [W 1 x(k)+b 1 ]+b 2 }...+bto p ) a reconstruction z 1 (k) by the decoder (3) where ŷ(k) R m is the output of the neural model, z 1 (k) = φ 1 [V 1 h 1 (k)+c 1 ] (6) W 1 R l1 n, b 1 R l1, W 2 R l2 l1, b 2 R l2, W p R m lp 1, b p R m, p is the number of layers of the network, l i (i = 1,,p 1) are the node numbers in each layer, φ i R li (i = 1 p) are active vector functions. We use a linear function for the output layer φ p : R m R m, i.e., Here we use the same active function as in (5). The size of z 1 (k) should be the same as x(k). z 1 (k) can be explained as a prediction of x(k) given the code h 1 (k), or input reconstruction. The weight matrix V 1 associated with the regressive mapping is then simplified as

3 x(k) W 1,b 1 h 1 V 1,c 1 z 1 ( q) b ( q) W1, 1 h 1 ( q) c ( q) V1, 1 h 1 W 2,b 2 h 2 V 2,c 2 z 2 ( q) b ( q) W2, 2 h 2 ( q) c ( q) V2, 2 x(k) Φ y(k) Fig. 2. The autoencoders model e(k) w1(0) Lwp(0) yˆ In order to assure the learning process of the autoencoders model robust, a small noise is added to the input. In this paper, we add a zero mean white Gaussian noise as x(k) x 1 (k) = x 1 (k)+ξ 1 (k) z(k) Fig. 1. The identification structure using deep learning techniques. a transposition matrix. In the coding stage it is selected as V 1 = W1 T. For deep neural models an autoencoder model described by (5) and (6) has to be implemented for each hidden layer in (3). The parameters of the autoencoders model W 1, b 1, and c 1 associated with the layer 1 are trained to minimize the error between z 1 (k) and x(k). For classification tasks with binary or normalized inputs the following cross entropy index is applied [7] : q J 1 = {x(k) log[z 1 (k)]+[1 x(k)] log[1 z 1 (k)]} k=1 (7) whereq is the total number of training examples. For nonlinear system identification we use the following squared error cost function J 1 (k) = x(k) z 1 (k) 2 (8) This change transforms the autoencoder model into a single layer neural network. The weightsw 1 are updated by the usual gradient descent method (backpropagation) W 1 (k +1) = W 1 (k) η 1 J 1 (k) W 1 (k) where η 1 > 0 is the learning rate, k = 1,2 q, q is the total training data number. The thresholds b 1 and c 1 can be trained with the weights W 1 together, this is performed adding to the original input x(k) an extra entry of value 1. (9) If the dimension of the input is very big (n > 20), the above method does not work well. In that case as stated in [7], it is forced some entries of the input vector to be zero randomly, such that the input reconstruction process is robust. The unsupervised training for the autocoders deep model is as follows: 1) The input, the output and the hidden representation of the first model are x(k) R n, z 1 (k) R n and h 1 (k) R l1. We use q input data to train the weights of the first model W 1 R l1 n, b 1 R l1 and c 1 R n. 2) After the first model is trained, their weights are fixed. As stated in (5) and (6) W 1 and b 1, once fixed by the autoencoder training, will become the initial weights of the first layer in (3). The code or hidden representation of the first model is computed with fixed weights, h 1 (k) is considered the input of the second autoencoder model associated with layer 2. 3) The second model is then pretrained by the input h 1 (k), with reconstruction z 2 (k) R l1 and code h 2 (k) R l2, noise should be also added to the input in order to use the greedy layer wise training [7] which considers each layer as an independent entity. 4) Then we train the third model, until all p 1 models (one for each hidden layer) are pretrained. This training process is shown in Fig.2. The autoencoders model in Fig.2 has a similar structure as the identification model (3). The nonlinear system identification via deep learning includes two stages: 1) Unsupervised learning with input data, we use the pretrained weights of the autoencoders model W 1 (q) W p (q) as the initial weights of the identification model (3); 2) Supervised learning with output data, in this stage the weights are trained by the classical supervised learning method. In this paper we use the gradient descent method.

4 For system identification, we use the following square error: J 2 (k) = y(k) ŷ(k) 2 (10) where y(k) is the output of the unknown plant (2), ŷ(k) is the output of the neural model (3). The weights W i and biases b i are updated by J 2 (k) W i (k +1) = W i (k) η 2, i = 1 p (11) W i (k) where η 2 > 0 is the learning rate of the supervised learning, k = 1,2 q, q is the total training data number. The biases b i can also be trained expanding the corresponding matrix W i to contain them. According with deep learning literature, the supervised learning stage may be only applied to the output layer of the J identification model, i.e., W p (k +1) = W p (k) η 2(k) 2 W, in p(k) the other layersw i,i = 1 p 1, the weights and biases keep fixed From several simulations we found out that this simple method is not effective for system identification getting better generalization performance when all layers are updated. We use the following algorithm to identify nonlinear systems via deep learning methods: Algorithm 1: 1) Construct a deep neural network model (3) with p 3. The layer number p and the node number l i (i = 1,,p 1) are chosen by the random search method. 2) Reconstruct input with unsupervised learning. The final weights of the autoencoders model are the initial weights of the deep neural network model. This is a batch process with data size q. 3) Use the output data to train the weights of the deep neural model with supervised learning. This supervised learning can be on-line. To avoid overfitting, we use a stop criterion. Noise (or disturbance) is an important issue in the system identification. There are two types of disturbances: external and internal. Internal disturbance can be regarded as an unmodeled dynamic. External disturbance can be regarded as measurement noise, input noise, etc. In the point of deep learning, input noises are included feedforward through each layer. Measurement noise is enlarged due to backpropagation of identification error, therefore the weights of neural identification model are affected by output noise. On the other hand, a small external disturbance can accelerate the convergence rate according to the persistent excitation theory. For control theory, small disturbances in the control signal u(k) or in the output y(k) can enhance the information presented in the signal x(k), this is good for parameters convergence. IV. SIMULATIONS Gas furnace system The gas furnace dataset is a commonly used benchmark [22]. The input u(k) is the flow rate of the methane gas, while the output y(k) is the concentration of CO 2 in the gas mixture under a steady air supply. The dataset Test error Number of layer Nodes per layer Fig. 3. The structure parameters of the gas furnace. has 296 samples at a fixed interval of 9 seconds. [22] used a time-series based approach to develop a linear model. In this paper, we use the same data structure as [22], the recursive input data for the model is x(k) = [y(k 1), y(k 4), u(k), u(k 5)] T, the model output is ŷ(k). 200 samples are applied for training. In order to use the restricted Boltzmann machine (RBM), the training values of x(k) and y are normalized. The gas furnace dataset has the form of (2) with n = 10, m = 1. We use 3 types of restricted Boltzmann machine (RBM) to train the hidden weights: binary input (DN BI), interval [0, 1] (DN RB), interval [ 1,1] (DN NE). For the interval [0,1], 10 x(k) min k { x(k)} max{ x(k)} min k { x(k)}. x(k) is normalized as stated by x(k) = We use200 data to train the deep learning model. The structure parameters of the neural model, layer number p and node number of each layer l i (i = 1 p), are obtained by the random search method [12]. The results are 4 hidden layers (p = 5) and l i = 50 (i = 1,2,3,4) which yields a minimum test error when tested with a (DN NO) model as seen in Fig. 3. Our neural model (3) is: four hidden layers and one linear output layer, each hidden layer has 50 nodes.the training rate for the restricted Boltzmann machine is η 3 = 0.1 and a 1-step Gibbs sampling is use. In the supervised stage the learning rate was η 4 = It was applied only one learning epoch for both pretraining and supervised phases. We compare the RBM models with a deep neural model based on denoising autoencoders (η 1 = η 2,= 0.15, 1 epoch for supervised and unsupervised learning) (DN AM) and a neural model without a pretraining stage (DB BP) with learning rate η 2 = Both DN AM and DB BP have the same structure than DN NO (4 layers with 50 nodes per layer). In the testing phase, we define the average error as 1 N N k=1 ŷ(k) y(k) 2, N = 91. The testing results of the three models are shown in Fig.4. The average errors ( 10 5 ) of DN AM is 4.239, while DN BP is For Boltzmann machine (DN RB), the

5 Output System DN_RB DN_AM MLP DN_BP Outputs System DN_RB DN_AM MLP DN_BP Time Fig. 5. The testing results of Wiener Hammerstein system Time Fig. 4. The testing results of the gas furnace. binary case is 5.103, [0, 1) case is 4.567, [ 1, 1] case is We can see that the best performance of DN RB is to normalize the visible units into [0,1]. DN AM is better than DN RB, because RBMs need more examples, while this dataset only have 200. In the pretraining stage, DB BP obtains better initial weights. Wiener-Hammerstein system A Wiener-Hammerstein (W-H) system is a series connection of three parts: a linear system, a static nonlinearity and another linear system. The data of the Wiener-Hammerstein benchmark is generated from an electrical circuit which consists of three cascade blocks [23]. There is not direct measurement to the static nonlinearity, because it is located between two unknown linear dynamic systems. The benchmark dataset consists of 188, 000 input/output pairs. This dataset is divided in two parts [23]: 100, 000 sample pairs are for training, 88, 000 samples are for testing. Let u(k) be the input and y(k) be the output. We define the recursive input vector to the model as x(k) = [y(k 1) y(k 4) u(k) u(k 5)] T. The W-H dataset has also the mathematical structure of (2) with n = 10, m = 1. We use three types of RBMs: DN BI, DN RB and x(k) is in the interval [ 3,3] (DN NE). For the interval [0, 1], x(k) is also normalized. As suggested by [23], the first 100,000 examples are used to implement the pretraining phase of the deep learning model. The hyperparameters are sampled using the random search method [12]. The best training results were found with a structure of 4 hidden layers (p = 5) and l i = 80 (i = 1,2,3,4) which yields a minimum test error when tested with a (DN NO) model as seen in Fig. 5. The average errors ( 10 3 ) of DN AM is 3.573, DN BP is For DN RB, the binary case is, [0,1) case is 2.534, [ 3,3] case is Finally, we compare these methods with the support vector machine (SVM) [23] and multilayer perceptrons with gradient learning algorithm (MLP) [21]. The testing squared errors ( 10 3) for MLP is 56.03, linear kernel SVM is 43.01, polynomial kernel SVM is 6.01, RBF kernel is 4.71, DN BI is The RBM model is better when the input data are positive and bounded. The autoencoders model is better when the input data are not restricted. The computational cost of RBM model is almost twice as the autoencoders model. V. CONCLUSIONS In this paper, we use input data to obtain the initial weights, and the output to train the weights. The deep learning algorithms of the denoising autoencoders is modified, such that it is suitable for nonlinear system identification. As an alternative model for nonlinear systems identification, the deep neural networks have more hidden layers and less hidden nodes than MLPs. The computational complex does not increase. REFERENCES [1] I.Rivals and L.Personnaz, Neural-network construction and selection in nonlinear modeling, IEEE Transactions on Neural Networks, Vol.14, No.4, , 2003 [2] S.Chakrabartty, ; R.K.Shaga, K.Aono, Noise-Shaping Gradient Descent- Based Online Adaptation Algorithms for Digital Calibration of Analog Circuits, IEEE Transactions on Neural Networks and Learning Systems, Volume: 24, Issue: 4, pp , 2013 [3] Y.Liu, Y.Liu, K.Chan, K.A.Hua, Hybrid Manifold Embedding, IEEE Transactions on Neural Networks and Learning Systems, Volume: 25, Issue: 12, pp , 2014 [4] Q.Song, Robust Initialization of a Jordan Network With Recurrent Constrained Learning, IEEE Transactions on Neural Networks, Vol.22, No.12, pp , 2011 [5] W.Yu, X.Li, Automated Nonlinear System Modeling with Multiple Fuzzy Neural Networks and Kernel Smoothing, International Journal of Neural Systems, Vol.20, No.5, , 2010 [6] G. E. Hinton, S. Osindero, and Y. Teh, A fast learning algorithm for deep belief nets, Neural Computation, vol. 18, pp , [7] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, Greedy layerwise training of deep networks, Advances in Neural Information Processing Systems (NIPS 06), pp , MIT Press, 2007.

6 [8] Y. Bengio and O. Delalleau, Justifying and generalizing contrastive divergence, Neural Computation, vol. 21, no. 6, pp , [9] R. Collobert and J. Weston, A unified architecture for natural language processing: Deep neural networks with multitask learning, 25th International Conference on Machine Learning (ICML 08), pp , ACM, [10] D. Erhan, P.-A. Manzagol, Y. Bengio, S. Bengio, and P. Vincent, The difficulty of training deep architectures and the effect of unsupervised pretraining, 12th International Conference on Artificial Intelligence and Statistics (AISTATS 09), pp , [11] J. Bergstra and Y. Bengio, Algorithms for Hyper-Parameter Optimization, Journal of Machine Learning Research, pp [12] J. Bergstra and Y. Bengio, Random Search for Hyper-Parameter Optimization, Journal of Machine Learning Research, pp , 2011 [13] P.Vincent, H. Larochelle, Y. Bengio and P.A. Manzagol, Extracting and Composing Robust Features with Denoising Autoencoders, 25th International Conference on Machine Learning (ICML 08), pp , ACM, [14] G. E. Hinton and T. J. Sejnowski, Learning and relearning in Boltzmann machines, Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations, pp , Cambridge, MA: MIT Press, [15] D.Erhan, Y.Bengio, A.Courville, P-A.Manzagol and P.Vincent. Why Does Unsupervised Pre-training Help Deep Learning?, Journal of Machine Learning Research, vol.11, , 2010 [16] M. Längkvist, L. Karlsson, and A. Loutfi. A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recognition Letters 42: [17] P. Romeu, et al. Time-Series Forecasting of Indoor Temperature Using Pre-trained Deep Neural Networks. Artificial Neural Networks and Machine Learning ICANN Springer Berlin Heidelberg, [18] L. Qiu, L. Zhang, Y. Ren, P.N. Suganthan, G. Amaratunga, Ensemble deep learning for regression and time series forecasting, 2014 IEEE Symposium on Computational Intelligence in Ensemble Learning (CIEL), pp.1-6, Orlando, FL, USA, 2014 [19] E. Busseti, I. Osband, and S. Wong. Deep learning for time series modeling. Technical report, Stanford University, [20] D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, A learning algorithm for boltzmann machines, Cognitive Science, vol. 9, pp , [21] K. S. Narendra and K. Parthasarathy, Gradient methods for optimization of dynamical systems containing neural networks, IEEE Transactions od Neural Networks, pp , March [22] G. Box, G. Jenkins, G. Reinsel. Time Series Analysis: Forecasting and Control, 4th Ed, Wiley, [23] K. De Brabanter, P. Dreesen, P. Karsmakers, K. Pelckmans, J. De Brabanter, J.A.K. Suykens and B. De Moor, Fixed-size LS-SVM applied to the Wiener-Hammerstein benchmark. In Proceedings of the 15th IFAC Symposium on System Identification, (pp ). Saint-Malo, France, 2009.

Denoising Autoencoders

Denoising Autoencoders Denoising Autoencoders Oliver Worm, Daniel Leinfelder 20.11.2013 Oliver Worm, Daniel Leinfelder Denoising Autoencoders 20.11.2013 1 / 11 Introduction Poor initialisation can lead to local minima 1986 -

More information

Deep unsupervised learning

Deep unsupervised learning Deep unsupervised learning Advanced data-mining Yongdai Kim Department of Statistics, Seoul National University, South Korea Unsupervised learning In machine learning, there are 3 kinds of learning paradigm.

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Deep Belief Networks are compact universal approximators

Deep Belief Networks are compact universal approximators 1 Deep Belief Networks are compact universal approximators Nicolas Le Roux 1, Yoshua Bengio 2 1 Microsoft Research Cambridge 2 University of Montreal Keywords: Deep Belief Networks, Universal Approximation

More information

Greedy Layer-Wise Training of Deep Networks

Greedy Layer-Wise Training of Deep Networks Greedy Layer-Wise Training of Deep Networks Yoshua Bengio, Pascal Lamblin, Dan Popovici, Hugo Larochelle NIPS 2007 Presented by Ahmed Hefny Story so far Deep neural nets are more expressive: Can learn

More information

Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines

Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines Empirical Analysis of the Divergence of Gibbs Sampling Based Learning Algorithms for Restricted Boltzmann Machines Asja Fischer and Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum,

More information

Deep Learning Architecture for Univariate Time Series Forecasting

Deep Learning Architecture for Univariate Time Series Forecasting CS229,Technical Report, 2014 Deep Learning Architecture for Univariate Time Series Forecasting Dmitry Vengertsev 1 Abstract This paper studies the problem of applying machine learning with deep architecture

More information

Introduction to Neural Networks

Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology Copyright by Nguyen, Hotta and Nakagawa 1 Pattern classification Which category of an input? Example: Character

More information

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Mathias Berglund, Tapani Raiko, and KyungHyun Cho Department of Information and Computer Science Aalto University

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal CIFAR NCAP Summer School 2009 August 6th, 2009, Montreal Main reference: Learning Deep Architectures for AI, Y. Bengio, to appear in Foundations and

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Speaker Representation and Verification Part II. by Vasileios Vasilakakis

Speaker Representation and Verification Part II. by Vasileios Vasilakakis Speaker Representation and Verification Part II by Vasileios Vasilakakis Outline -Approaches of Neural Networks in Speaker/Speech Recognition -Feed-Forward Neural Networks -Training with Back-propagation

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Nonlinear Classification

Nonlinear Classification Nonlinear Classification INFO-4604, Applied Machine Learning University of Colorado Boulder October 5-10, 2017 Prof. Michael Paul Linear Classification Most classifiers we ve seen use linear functions

More information

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler + Machine Learning and Data Mining Multi-layer Perceptrons & Neural Networks: Basics Prof. Alexander Ihler Linear Classifiers (Perceptrons) Linear Classifiers a linear classifier is a mapping which partitions

More information

Feature Design. Feature Design. Feature Design. & Deep Learning

Feature Design. Feature Design. Feature Design. & Deep Learning Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Neural Networks: A Very Brief Tutorial

Neural Networks: A Very Brief Tutorial Neural Networks: A Very Brief Tutorial Chloé-Agathe Azencott Machine Learning & Computational Biology MPIs for Developmental Biology & for Intelligent Systems Tübingen (Germany) cazencott@tue.mpg.de October

More information

Multilayer Perceptrons (MLPs)

Multilayer Perceptrons (MLPs) CSE 5526: Introduction to Neural Networks Multilayer Perceptrons (MLPs) 1 Motivation Multilayer networks are more powerful than singlelayer nets Example: XOR problem x 2 1 AND x o x 1 x 2 +1-1 o x x 1-1

More information

TUTORIAL PART 1 Unsupervised Learning

TUTORIAL PART 1 Unsupervised Learning TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew

More information

Deep Learning: a gentle introduction

Deep Learning: a gentle introduction Deep Learning: a gentle introduction Jamal Atif jamal.atif@dauphine.fr PSL, Université Paris-Dauphine, LAMSADE February 8, 206 Jamal Atif (Université Paris-Dauphine) Deep Learning February 8, 206 / Why

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

Knowledge Extraction from DBNs for Images

Knowledge Extraction from DBNs for Images Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental

More information

Deep Learning & Neural Networks Lecture 2

Deep Learning & Neural Networks Lecture 2 Deep Learning & Neural Networks Lecture 2 Kevin Duh Graduate School of Information Science Nara Institute of Science and Technology Jan 16, 2014 2/45 Today s Topics 1 General Ideas in Deep Learning Motivation

More information

Deep Neural Networks

Deep Neural Networks Deep Neural Networks DT2118 Speech and Speaker Recognition Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 45 Outline State-to-Output Probability Model Artificial Neural Networks Perceptron Multi

More information

Basic Principles of Unsupervised and Unsupervised

Basic Principles of Unsupervised and Unsupervised Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization

More information

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017

COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines. COMP9444 c Alan Blair, 2017 COMP9444 Neural Networks and Deep Learning 11. Boltzmann Machines COMP9444 17s2 Boltzmann Machines 1 Outline Content Addressable Memory Hopfield Network Generative Models Boltzmann Machine Restricted Boltzmann

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Machine Learning. Neural Networks

Machine Learning. Neural Networks Machine Learning Neural Networks Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 Biological Analogy Bryan Pardo, Northwestern University, Machine Learning EECS 349 Fall 2007 THE

More information

From perceptrons to word embeddings. Simon Šuster University of Groningen

From perceptrons to word embeddings. Simon Šuster University of Groningen From perceptrons to word embeddings Simon Šuster University of Groningen Outline A basic computational unit Weighting some input to produce an output: classification Perceptron Classify tweets Written

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation

A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation 1 Introduction A Logarithmic Neural Network Architecture for Unbounded Non-Linear Function Approximation J Wesley Hines Nuclear Engineering Department The University of Tennessee Knoxville, Tennessee,

More information

ADAPTIVE NEURO-FUZZY INFERENCE SYSTEMS

ADAPTIVE NEURO-FUZZY INFERENCE SYSTEMS ADAPTIVE NEURO-FUZZY INFERENCE SYSTEMS RBFN and TS systems Equivalent if the following hold: Both RBFN and TS use same aggregation method for output (weighted sum or weighted average) Number of basis functions

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Jakub Hajic Artificial Intelligence Seminar I

Jakub Hajic Artificial Intelligence Seminar I Jakub Hajic Artificial Intelligence Seminar I. 11. 11. 2014 Outline Key concepts Deep Belief Networks Convolutional Neural Networks A couple of questions Convolution Perceptron Feedforward Neural Network

More information

The Neural Support Vector Machine

The Neural Support Vector Machine The Neural Support Vector Machine M.A. Wiering a M.H. van der Ree a M.J. Embrechts b M.F. Stollenga c A. Meijster a A. Nolte d L.R.B. Schomaker a a Institute of Artificial Intelligence and Cognitive Engineering,

More information

Deep Generative Models. (Unsupervised Learning)

Deep Generative Models. (Unsupervised Learning) Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

Part 2. Representation Learning Algorithms

Part 2. Representation Learning Algorithms 53 Part 2 Representation Learning Algorithms 54 A neural network = running several logistic regressions at the same time If we feed a vector of inputs through a bunch of logis;c regression func;ons, then

More information

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. Nicholas Ruozzi University of Texas at Dallas Neural Networks Nicholas Ruozzi University of Texas at Dallas Handwritten Digit Recognition Given a collection of handwritten digits and their corresponding labels, we d like to be able to correctly classify

More information

Deep Learning for NLP

Deep Learning for NLP Deep Learning for NLP CS224N Christopher Manning (Many slides borrowed from ACL 2012/NAACL 2013 Tutorials by me, Richard Socher and Yoshua Bengio) Machine Learning and NLP NER WordNet Usually machine learning

More information

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu Convolutional Neural Networks II Slides from Dr. Vlad Morariu 1 Optimization Example of optimization progress while training a neural network. (Loss over mini-batches goes down over time.) 2 Learning rate

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1)

11/3/15. Deep Learning for NLP. Deep Learning and its Architectures. What is Deep Learning? Advantages of Deep Learning (Part 1) 11/3/15 Machine Learning and NLP Deep Learning for NLP Usually machine learning works well because of human-designed representations and input features CS224N WordNet SRL Parser Machine learning becomes

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks

Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Reading Group on Deep Learning Session 4 Unsupervised Neural Networks Jakob Verbeek & Daan Wynen 206-09-22 Jakob Verbeek & Daan Wynen Unsupervised Neural Networks Outline Autoencoders Restricted) Boltzmann

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Neural Networks and Deep Learning.

Neural Networks and Deep Learning. Neural Networks and Deep Learning www.cs.wisc.edu/~dpage/cs760/ 1 Goals for the lecture you should understand the following concepts perceptrons the perceptron training rule linear separability hidden

More information

An Introduction to Deep Learning

An Introduction to Deep Learning An Introduction to Deep Learning Ludovic Arnold 1,2, Sébastien Rebecchi 1, Sylvain Chevallier 1, Hélène Paugam-Moisy 1,3 1- Tao, INRIA-Saclay, LRI, UMR8623, Université Paris-Sud 11 F-91405 Orsay, France

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28

Neural Networks. Mark van Rossum. January 15, School of Informatics, University of Edinburgh 1 / 28 1 / 28 Neural Networks Mark van Rossum School of Informatics, University of Edinburgh January 15, 2018 2 / 28 Goals: Understand how (recurrent) networks behave Find a way to teach networks to do a certain

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

TIME series forecasting is used for forecasting the future

TIME series forecasting is used for forecasting the future A Novel DBN Model for Time Series Forecasting Yongpan Ren, Jingli Mao, Yong Liu, Yingzhe Li Abstract Deep Belief Network (DBN) via stacking Restricted Boltzmann Machines (RBMs) has been successfully applied

More information

Neural Networks biological neuron artificial neuron 1

Neural Networks biological neuron artificial neuron 1 Neural Networks biological neuron artificial neuron 1 A two-layer neural network Output layer (activation represents classification) Weighted connections Hidden layer ( internal representation ) Input

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation BACKPROPAGATION Neural network training optimization problem min J(w) w The application of gradient descent to this problem is called backpropagation. Backpropagation is gradient descent applied to J(w)

More information

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA 1/ 21

Neural Networks. Chapter 18, Section 7. TB Artificial Intelligence. Slides from AIMA   1/ 21 Neural Networks Chapter 8, Section 7 TB Artificial Intelligence Slides from AIMA http://aima.cs.berkeley.edu / 2 Outline Brains Neural networks Perceptrons Multilayer perceptrons Applications of neural

More information

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch

Course Structure. Psychology 452 Week 12: Deep Learning. Chapter 8 Discussion. Part I: Deep Learning: What and Why? Rufus. Rufus Processed By Fetch Psychology 452 Week 12: Deep Learning What Is Deep Learning? Preliminary Ideas (that we already know!) The Restricted Boltzmann Machine (RBM) Many Layers of RBMs Pros and Cons of Deep Learning Course Structure

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Artificial Neural Networks. Edward Gatt

Artificial Neural Networks. Edward Gatt Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Liu Yang March 30, 2017 Liu Yang Short title March 30, 2017 1 / 24 Overview 1 Background A general introduction Example 2 Gradient based learning Cost functions Output Units 3

More information

CSC321 Lecture 5: Multilayer Perceptrons

CSC321 Lecture 5: Multilayer Perceptrons CSC321 Lecture 5: Multilayer Perceptrons Roger Grosse Roger Grosse CSC321 Lecture 5: Multilayer Perceptrons 1 / 21 Overview Recall the simple neuron-like unit: y output output bias i'th weight w 1 w2 w3

More information

Links between Perceptrons, MLPs and SVMs

Links between Perceptrons, MLPs and SVMs Links between Perceptrons, MLPs and SVMs Ronan Collobert Samy Bengio IDIAP, Rue du Simplon, 19 Martigny, Switzerland Abstract We propose to study links between three important classification algorithms:

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Neural Networks Varun Chandola x x 5 Input Outline Contents February 2, 207 Extending Perceptrons 2 Multi Layered Perceptrons 2 2. Generalizing to Multiple Labels.................

More information

On the complexity of shallow and deep neural network classifiers

On the complexity of shallow and deep neural network classifiers On the complexity of shallow and deep neural network classifiers Monica Bianchini and Franco Scarselli Department of Information Engineering and Mathematics University of Siena Via Roma 56, I-53100, Siena,

More information

A Connection Between Score Matching and Denoising Autoencoders

A Connection Between Score Matching and Denoising Autoencoders A Connection Between Score Matching and Denoising Autoencoders Pascal Vincent vincentp@iro.umontreal.ca Dept. IRO, Université de Montréal, CP 68, Succ. Centre-Ville, Montréal (QC) H3C 3J7, Canada. Technical

More information

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton Motivation Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses

More information

Unit III. A Survey of Neural Network Model

Unit III. A Survey of Neural Network Model Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of

More information

Artificial Neural Networks. Historical description

Artificial Neural Networks. Historical description Artificial Neural Networks Historical description Victor G. Lopez 1 / 23 Artificial Neural Networks (ANN) An artificial neural network is a computational model that attempts to emulate the functions of

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Neural Network Control of Robot Manipulators and Nonlinear Systems

Neural Network Control of Robot Manipulators and Nonlinear Systems Neural Network Control of Robot Manipulators and Nonlinear Systems F.L. LEWIS Automation and Robotics Research Institute The University of Texas at Arlington S. JAG ANNATHAN Systems and Controls Research

More information

A STATE-SPACE NEURAL NETWORK FOR MODELING DYNAMICAL NONLINEAR SYSTEMS

A STATE-SPACE NEURAL NETWORK FOR MODELING DYNAMICAL NONLINEAR SYSTEMS A STATE-SPACE NEURAL NETWORK FOR MODELING DYNAMICAL NONLINEAR SYSTEMS Karima Amoura Patrice Wira and Said Djennoune Laboratoire CCSP Université Mouloud Mammeri Tizi Ouzou Algeria Laboratoire MIPS Université

More information

Course 395: Machine Learning - Lectures

Course 395: Machine Learning - Lectures Course 395: Machine Learning - Lectures Lecture 1-2: Concept Learning (M. Pantic) Lecture 3-4: Decision Trees & CBC Intro (M. Pantic & S. Petridis) Lecture 5-6: Evaluating Hypotheses (S. Petridis) Lecture

More information

A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models. Isabelle Rivals and Léon Personnaz

A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models. Isabelle Rivals and Léon Personnaz In Neurocomputing 2(-3): 279-294 (998). A recursive algorithm based on the extended Kalman filter for the training of feedforward neural models Isabelle Rivals and Léon Personnaz Laboratoire d'électronique,

More information

Deep Learning & Artificial Intelligence WS 2018/2019

Deep Learning & Artificial Intelligence WS 2018/2019 Deep Learning & Artificial Intelligence WS 2018/2019 Linear Regression Model Model Error Function: Squared Error Has no special meaning except it makes gradients look nicer Prediction Ground truth / target

More information

Supervised Learning Part I

Supervised Learning Part I Supervised Learning Part I http://www.lps.ens.fr/~nadal/cours/mva Jean-Pierre Nadal CNRS & EHESS Laboratoire de Physique Statistique (LPS, UMR 8550 CNRS - ENS UPMC Univ. Paris Diderot) Ecole Normale Supérieure

More information

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence

Stochastic Gradient Estimate Variance in Contrastive Divergence and Persistent Contrastive Divergence ESANN 0 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges (Belgium), 7-9 April 0, idoc.com publ., ISBN 97-7707-. Stochastic Gradient

More information

4. Multilayer Perceptrons

4. Multilayer Perceptrons 4. Multilayer Perceptrons This is a supervised error-correction learning algorithm. 1 4.1 Introduction A multilayer feedforward network consists of an input layer, one or more hidden layers, and an output

More information

COMP-4360 Machine Learning Neural Networks

COMP-4360 Machine Learning Neural Networks COMP-4360 Machine Learning Neural Networks Jacky Baltes Autonomous Agents Lab University of Manitoba Winnipeg, Canada R3T 2N2 Email: jacky@cs.umanitoba.ca WWW: http://www.cs.umanitoba.ca/~jacky http://aalab.cs.umanitoba.ca

More information

Artificial Neural Networks. MGS Lecture 2

Artificial Neural Networks. MGS Lecture 2 Artificial Neural Networks MGS 2018 - Lecture 2 OVERVIEW Biological Neural Networks Cell Topology: Input, Output, and Hidden Layers Functional description Cost functions Training ANNs Back-Propagation

More information

A summary of Deep Learning without Poor Local Minima

A summary of Deep Learning without Poor Local Minima A summary of Deep Learning without Poor Local Minima by Kenji Kawaguchi MIT oral presentation at NIPS 2016 Learning Supervised (or Predictive) learning Learn a mapping from inputs x to outputs y, given

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari

Deep Learning Srihari. Deep Belief Nets. Sargur N. Srihari Deep Belief Nets Sargur N. Srihari srihari@cedar.buffalo.edu Topics 1. Boltzmann machines 2. Restricted Boltzmann machines 3. Deep Belief Networks 4. Deep Boltzmann machines 5. Boltzmann machines for continuous

More information

Gaussian Cardinality Restricted Boltzmann Machines

Gaussian Cardinality Restricted Boltzmann Machines Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence Gaussian Cardinality Restricted Boltzmann Machines Cheng Wan, Xiaoming Jin, Guiguang Ding and Dou Shen School of Software, Tsinghua

More information

A Connection Between Score Matching and Denoising Autoencoders

A Connection Between Score Matching and Denoising Autoencoders NOTE CommunicatedbyAapoHyvärinen A Connection Between Score Matching and Denoising Autoencoders Pascal Vincent vincentp@iro.umontreal.ca Département d Informatique, UniversitédeMontréal, Montréal (QC)

More information

Multilayer Neural Networks

Multilayer Neural Networks Multilayer Neural Networks Multilayer Neural Networks Discriminant function flexibility NON-Linear But with sets of linear parameters at each layer Provably general function approximators for sufficient

More information

Chapter 4 Neural Networks in System Identification

Chapter 4 Neural Networks in System Identification Chapter 4 Neural Networks in System Identification Gábor HORVÁTH Department of Measurement and Information Systems Budapest University of Technology and Economics Magyar tudósok körútja 2, 52 Budapest,

More information

Deep Learning Made Easier by Linear Transformations in Perceptrons

Deep Learning Made Easier by Linear Transformations in Perceptrons Deep Learning Made Easier by Linear Transformations in Perceptrons Tapani Raiko Aalto University School of Science Dept. of Information and Computer Science Espoo, Finland firstname.lastname@aalto.fi Harri

More information